Congresso Brasileiro do Leite

Dados do Trabalho


Título

Performance of Artificial Intelligence Algorithms for Detecting Adulteration in Raw Milk

Titulo em português

Desempenho de Algoritmos de Inteligência Artificial na Detecção de Adulteração em Leite Cru

Contribuição para a sociedade (opcional)

Our findings can contribute to the scientific community by showcasing the applications and limitations of artificial intelligence applied to raw milk compliance. Several dairy plants can benefit from our results, as water adulteration occurs in about 10% of individual producer samples. These models can be transformed into software, aiding in the swift and simple identification of non-conformities in the dairy supply chain without incurring additional laboratory analysis costs. Therefore, this research is contemporary, relevant, and addresses the commercial interests of dairies and the food integrity concerns of consumers.

Introdução (obrigatório)

Over the past five years, the application of big data and machine learning has shown significant potential in various areas of science and industry, including milk quality. This statistical approach is feasible because the centesimal composition of milk, which includes parameters such as protein, fat, lactose, urea, and total solids, provides a diverse data source for analysis (Kavitha and Deepa, 2021). By analyzing these variables together with the milk freezing point, it is possible to develop predictive models that identify patterns of adulteration.
Raw milk tampering by adding water is one of the most common economically motivated food frauds. The freezing point can easily detect this adulteration, but in Brazil, this analysis is daily required only for pooled bulk-tank samples. Consequently, this adulteration can be disguised when raw milk is pooled. However, the compositional analysis of milk from individual producers is required monthly. In this sense, we aimed to use the compositional analysis of raw milk for predicting water addition using artificial intelligence.

Material e métodos (obrigatório)

From 2020 and 2021, 511 raw milk samples from a dairy plant were evaluated by mid-infrared spectroscopy method for the determination of fat, protein, lactose, urea, ash, total solids and non-fat solids. All samples were subject to freezing point assessment. Acid samples were excluded totalizing 489 samples. The Pearson Correlation and three machine learning techniques (Logistic Regression, Support Vector Machines (SVM), and Naive Bayes algorithm) were used to predict water adulteration in milk. The classifiers discriminated between samples with freezing points greater than -0.530°H (water added samples) or within -0.531°H and -0.555°H (normal samples) based on compositional parameters as predictive attributes. In logistic regression, each variable underwent univariate analysis to assess individual contributions, followed by the removal of non-significant factors (p>0.05) using stepwise backward elimination. Validation of the logistic regression included Pearson, Deviance, and Hosmer-Lemeshow tests. The Naïve Bayes and SVM training were done with 75% of the whole data (randomly chosen). To evaluate the classification efficiency, sensitivity and specificity were calculated. Sensitivity was defined as the percentage of samples classified as "watered" with a cryoscopic index ≥ -0.530, while specificity was the percentage of samples classified as normal with a cryoscopic index between -0.531°H and -0.555°H. Positive predictive value was the probability of a sample being "watered" with a cryoscopic index ≥ -0.530°H, and negative predictive value was the probability of a sample being normal with a cryoscopic index between -0.531°H and -0.555°H.

Resultados e discussão (obrigatório)

The Pearson correlation showed association (p<0.05) between cryoscopy and percentage of urea (r= -0.17), protein (r= 0.18), and lactose (r= -0.13). Thus, the higher the amounts of urea and lactose, the lower the freezing points of the samples, while higher protein amounts result in higher freezing points. Although significant, the correlation coefficient was low. This allows us to infer that less than 20% of the cryoscopy variation cases are associated with fluctuations in urea, protein, or lactose values. The weak association between the variables in the studied sample did not allow for the prediction of adulteration with the reliability required. The results of the logistic regression also indicated that the percentage of urea was related to the probability of adulteration (p= 0.02). However, the model was not adequate for practical application since most "watered" and "normal" samples shared the same urea values. Thus, the model achieved no correct predictions (0% sensitivity) in identifying potential cases of adulteration. The model evaluation using the Hosmer-Lemeshow test also showed poor fit, and the Nagelkerke coefficient was extremely low (0.04). In this case, less than 4% of the variations in urea values were due to extrapolation of the freezing point. Therefore, the use of logistic regression for predicting potential adulteration in raw milk was not suitable. The results obtained using Support Vector Machine (SVM) were unsatisfactory, with 6% of sensitivity and 70% of specificity.  While specificity was better than sensitivity, it is still far from ideal, especially where false positives must be minimal. Specifically, the SVM algorithm incorrectly classified 30% of the normal samples as containing water. The high rates of false negatives and false positives compromise the model's practical utility. The naive Bayes classifier was the best classifier, with 8% of sensitivity, 98% of specificity and 38% of positive predictive value.  Despite the low sensitivity, this algorithm had a high specificity and thus could help the milk dairy processing plant to identify 4 out of 10 cases predicted as adulterated, bring useful not for screening purposes, but to gradually banish individual cases of true water addition in raw milk. The limitations of this study were due to low sampling and the low frequency (15%) of adulterated samples (>-0.530°H). Subsequent studies can obtain better predictions with samples greater than 1000 cases.

Conclusão (obrigatório)

The relationship between the centesimal composition of milk and its freezing point is complex, making it challenging to establish a reliable algorithm for predicting water adulteration in raw milk. Artificial intelligence tools have shown better performance compared to the statistical approaches studied, detecting hidden patterns in the data that generally resulted in algorithms with high specificity but low sensitivity. The Naive Bayes classifier had the best performance, demonstrating adequate specificity to work as a warning tool, useful in identifying cases that require on-site technical visits to confirm possible adulteration, with a 40% probability that the suspicions will be confirmed.

Agradecimentos (opcional)

To CNPq for the scholarship granted.

Referências bibliográficas (opcional)

KAVITHA, P. V.; DEEPA, P. V. A comparative analysis of the machine learning methods for milk adulteration detection. In: AIP Conference Proceedings. AIP Publishing, 2021.

Área

Geral

Autores

Rafael RAFAEL Fagnani, Aline Romano da Cunha, Jean Gustavo Reis Oliveira