Red Blood Cell Classification

Sheldon Sebastian

Abstract

The goal of this project is to identify whether a red blood cell is healthy or infected with any of the following stages of malaria: ring, schizont and trophozoite using a Multi-Layer Perceptron (MLP). Custom MLP was created and hyper-parameter tuning was performed on it using manual search, random grid search and automatic hyper-parameter tuning using Optuna.

Keywords: Computer Vision, Deep Learning, Image Classification, Multi Layer Perceptron

Introduction

Malaria is a mosquito-borne disease caused by a parasite. About 2,000 cases of malaria are diagnosed in the United States each year.^[1] People with malaria often experience fever, chills, and flu-like illness. Left untreated, they may develop severe complications and die. For this project a Multi-Layer Perceptron (MLP) was used to classify blood cells into 4 different categories: red blood cell, ring, schizont and trophozoite. An image labelled as red blood cell is healthy and the rest of the labels indicate some stage of malaria infection. A custom MLP architecture was constructed and hyper-parameter tuning was performed manually, using random-grid search and automatically using Optuna^[2].

If a model can successfully identify these types of cells from images taken with a microscope, this would allow to automate a very time-consuming testing process, leaving human doctors with more time to treat the actual disease. Furthermore, an early malaria detection can save lives!

Dataset Description and Data Preprocessing

The dataset was provided by Dr. Amir Jafari and is available here. The dataset contains 8,607 unique png files, and their corresponding labels in txt files. The images have original dimension of 100x100 pixels, and the different categories for labels are red blood cell, ring, schizont and trophozoite.

The data was preprocessed by creating a csv file containing unique image id and the encoded categorical target. The encoded categorical targets are as follows: 'red blood cell'=0, 'ring'=1, 'schizont'=2, 'trophozoite'=3. The original data was stratified split into 90-10% train-holdout and the train data was additionally split into 80-20% train-validation set. Hence the training set contains 6196 images, validation set contains 1550 images and holdout set contains 861 images.

Exploratory Data Analysis

The exploratory data analysis jupyter notebook can be found here. The distribution of the different categories can be seen in the below figure:

We see that the distribution of data is severely imbalanced, thus to evaluate our model accuracy is not the best metric. Instead we use average of the macro-averaged F1-score and the Cohen’s Kappa score for model evaluation.

A few example images for healthy red blood cells are shown:

A few example images for ring infected cells are shown:

A few example images for schizont infected cells are shown:

A few example images for trophozite infected cells are shown:

Modeling

After the data was split into train-validation-holdout, MLP model was created and hyper-parameter tuning was performed using manual search, random grid search and automatically using Optuna.

Since the data was severely imbalanced the model evaluation metric was average of the macro-averaged F1-score^[3] and the Cohen’s Kappa score^[4]. To handle class imbalance all the three approaches (manual, random-grid, optuna) used class weights inversely proportional to their occurrence frequency.

A custom keras image generator^[5] was used to read the images and labels. The generator performed random brightness and flip augmentation on training images to prevent model overfitting using albumentations^[6] package. The image generator read the png files using OpenCV^[7], performed augmentations for training data, normalized the image by dividing by 255 and finally convert to numpy array.

MLP using manual hyper-parameter tuning:

The code for manual MLP can be found here and model architecture can be found here. The model uses 3 layers of 512 neurons each and contains BatchNormalization for faster convergence and Dropout to avoid dead neurons. The activation function used in each layer was 'relu' and final layer had softmax for classifying images into 4 categories. The loss used was 'sparse categorical cross entropy' and optimizer used was Adam with learning rate of 0.00001. To avoid long training time, early stopping callbacks from keras was used along with model checkpoints to save best model based on validation data. The model was trained for 5000 epochs with batch size of 1024.

A summary of the model and training is shown in below table:

Attribute	Value
Number of hidden layers	3
Number of neurons per layer	512
Activation Function	relu
Loss	sparse categorical cross entropy
Optimizer	Adam
Learning Rate	0.00001
Epochs	5000
Batch Size	1024
Early Stopping, Batch Normalization, Dropout, Model Checkpoint	Yes

MLP using random grid search hyper-parameter tuning:

The code for random grid search MLP can be found here and model architecture can be found here. To perform random grid search for the keras model, it was wrapped in a scikit layer using KerasClassifier^[8]. The evaluation metric used was average of F1 score and Cohen Kappa's score and the code for that can be found here. Since we are performing random grid search, we used predefined split to prevents data leakage among multiple runs.

A summary of the model and training is shown in below table:

Attribute	Value
Number of hidden layers	1,2,3,4,5
Number of neurons per layer	Between 1 and 2048
Activation Function	relu, selu, elu
Loss	sparse categorical cross entropy
Optimizer	Adam
Learning Rate	Random 1000 numbers between reciprocal of 3e-4 to 3e-2
Dropout probability	0.0, 0.1, 0.2, 0.3, 0.4, 0.5
Batch Size	32, 64, 128, 256, 512, 1024
Epochs	300, 600, 1000
Iterations	25

The best parameters found were:

Attribute	Value
Number of hidden layers	5
Number of neurons per layer	658
Activation Function	elu
Loss	sparse categorical cross entropy
Optimizer	Adam
Learning Rate	0.0108
Dropout probability	0.5
Batch Size	64
Epochs	1000

MLP using automatic hyper-parameter tuning using Optuna:

To automatically find the best parameters for MLP optuna with tensorflow-keras integration^[9] was used. The training code can be found here. For Optuna we need to create an objective function that needs to be minimized, here the objective function was validation loss. After the objective function is defined, we create study and perform trials. The details of model architecture and suggested trial values can be found here. A total of 300 trials were performed and early stopping and pruning was used to stop trials which were not giving promising results.

A summary of the model and training is shown in below table:

Attribute	Value
Number of hidden layers	1-5
Number of neurons per layer	Between 1 and 2048
Activation Function	relu, selu, elu
Loss	sparse categorical cross entropy
Optimizer	Adam
Learning Rate	Log Uniform distribution between 1e-6 and 1e-3
Dropout probability	0.0, 0.1, 0.2, 0.3, 0.4, 0.5
Batch Size	32-1024
Epochs	5000
Early Stopping, PruningCallBack	Yes
Trials	300

The best parameters found were:

Attribute	Value
Number of hidden layers	3
Number of neurons per layer	Layer1: 2003, Layer2: 810, Layer3: 1776
Activation Function	elu
Loss	sparse categorical cross entropy
Optimizer	Adam
Learning Rate	1.1785925256981173e-06
Dropout probability	0.3
Batch Size	374

Results

As discussed previously, since the data is severely imbalanced an average of macro-F1 score and Cohen's Kappa score was used. The evaluation metric code can be found here. The summary of the results are as follows:

Approach	Validation Accuracy	Holdout Accuracy
Manual	0.94	0.92
Random Grid Search	0.97	0.92
Optuna	0.94	0.92

Approach	Mean of F1 and Cohen’s Kappa Score for Validation set	Mean of F1 and Cohen’s Kappa Score for Holdout set
Manual	0.70	0.718
Random Grid Search	0.87	0.717
Optuna	0.70	0.716

The confusion matrix for holdout set are as follows:

Manual MLP

Random Grid Search MLP

Optuna MLP

Conclusion and Future Work

All the models performed similarly for holdout set but for manual search the parameters needed to be iteratively tweaked which took significant amount of time. Random grid search suffered from overfitting on the validation data and Optuna trained in shorter amount of time due to pruning mechanism. In conclusion, I would recommend the model created by Optuna, since its performance is similar to other models but train time is significantly smaller. For future work, I will explore adding convolutional layers and also try transfer learning to increase model performance.

References

https://www.cdc.gov/parasites/malaria/index.html
https://optuna.org/
https://scikit-learn.org/stable/modules/generated/sklearn.metrics.f1_score.html
https://scikit-learn.org/stable/modules/generated/sklearn.metrics.cohen_kappa_score.html
https://stanford.edu/~shervine/blog/keras-how-to-generate-data-on-the-fly
https://albumentations.ai/
https://opencv.org/
https://www.tensorflow.org/api_docs/python/tf/keras/wrappers/scikit_learn/KerasClassifier
https://github.com/optuna/optuna/blob/master/examples/tfkeras/tfkeras_integration.py