An Approach for Assessing Quality of Labeled Data for a Machine Learning Task in Malaria Detection

Authors:

Rose Nakasi (Makerere University)
Ernest Mwebaze (Google AI Africa)
Aminah Zawedde (Makerere University)
Jeremy Tusubira Francis (Makerere University)
Gilbert Maiga (Makerere University)

DOI: https://doi.org/10.1145/3378393.3402265

Session: 3.2. Health

Abstract: While microscopy diagnosis through supervised learning for image analysis notably contributes to malaria detection, it has limitations. Among its principle challenges is the manual and tiresome process of data annotation for the classification task. The manual annotation of data is prone to inaccuracy defects due to bias, subjectivity and unclear images resulting into many false positives . This is normally due to personal independent judgements that vary from individual microscopists hence summatively affecting the accuracy of the model. In this paper, we seek to investigate the possibility of classifying the negative far examples and the positive near examples from the positives in thick blood smear images for malaria detection. Assessing the classification performance could potentially inform us of the quality of training dataset and guide n selecting the best training dataset for a malaria parasite detection task . We employ the Mean Squared Error (MSE) to distinguish between positive and negative images. We later investigate the performance of the VGG-16 classification model based on how close or far negative examples are from positives. Experimental results showed that negative examples far from the positives produce better results than those near and that the proposed method could potentially be used to reduce false positives and bias in the training data.