Abstract:
Artificial Neural Network (ANN) is a data mining algorithm that is used for classification and prediction of any set of data. Previous studies on classification and prediction of data were based only on using MLP, a type of ANN algorithm. However, there are problems of low prediction, low classification accuracy and over-training with the MLP. Genetic Algorithm (GA) is well known for improving prediction and reducing over-training. This work was therefore designed to minimise the problem of low accuracy and over-training by embedding GA into Multi-Layer Perceptron ANN (MLP-ANN).
The model, hereafter named Neuro-Genetic Model (NEGEM) was developed using feed forward ANN that trains MLP with search optimisation ability of GA. The MLP-Delta learning algorithm was used to implement ANN while GA was used to avoid low accuracy and overtraining of MLP-ANN. Demographic and treatment data on HIV/AIDS patients from 2000-2011 were collected where available from selected tertiary and general hospitals, primary health care and non-governmental organisations in southwestern Nigeria. The data was used to create medical database on HIV/AIDS using a two-tier architecture that allows multi-dimensional analysis. Three different MLP-hidden layers (one, two and three) network implemented in C# programming language and Microsoft Structured Query Language (SQL) server were used for the database to predict and classify HIV/AIDS data. Precisely 12,000 and 2,230 data were respectively imported for training and testing into the model. Mutation and crossover operators with 2000 training epochs in GA were used as operating parameters to avoid low prediction accuracy and over-training. The ability of the model to classify and predict was compared with Waikato Engineering Knowledge Analysis (WEKA), an existing MLP software. Classification and predictive accuracies were measured using Root Mean Square Error (RMSE) and Mean Absolute Error (MAE). Recall and precision were used to measure the level of true positive prediction/classification and over-training.
The RMSE for NEGEM with one, two and three hidden layers were 2.9x10-16, 9.87x10-15 and 7.0x10-8 respectively, while for WEKA were 1.0x10-2, 8.0x10-3 and 1.37x10-1 respectively. The MAE for NEGEM was 9.0x10-4, 0.0x10-4 and 3.0x10-4 while for WEKA were 1.0x10-3, 1.0x10-3and 7.0x10-4 for one, two and three hidden layers respectively. These showed higher level of accuracy in NEGEM prediction and classification than WEKA. The NEGEM recall values for one, two, and three hidden layers were respectively 9.8x10-1, 1.0x10-3 and 9.8x10-1, while for WEKA they were 1.0, 1.0 and 1.0. This showed that WEKA over-trained in its predictive/classification values. Precision values of NEGEM were. 9.6x10-1,9.6x10-1and 9.8x10-1for one, two, and three hidden layers respectively, while for WEKA they were 1.0, 1.0 and 1.0. This showed a high level of positive prediction/classification of NEGEM than WEKA. Precision/recall values showed that NEGEM avoids over-training whereas WEKA over-trained because the values exceeded the standard precision/recall range value in MLP-ANN algorithm.
The developed model could be used to mine database more efficiently than Waikato Engineering Knowledge Analysis