[2E3] The application of multiple statistical techniques to predict factors affecting non-communicable diseases (NCDs) among the OECD countries

S Ampon-Wireko
Catholic University College of Ghana, Ghana 

The continuous rise in non-communicable diseases (NCDs) exerts pressure on government budgets, health services and personal patient finance. This has led policymakers to implement reforms and predicting models to mitigate the rise of NCDs. The purpose of this study is to investigate the best model to predict factors influencing NCDs, using both statistical analysis and machine learning (ML) prediction methods. Statistical analysis models such as principal component analysis (PCA) and partial least-squares regression (PLS) were used to extract the most important information from the dataset and analyse the observations' structure. To improve the accuracy of the statistical model, four optimisation algorithms were proposed: genetic algorithm (GA), particle swarm optimisation (PSO), differential evolution (DE) and ant colony optimisation (ACO). Two statistical criteria, such as root-mean-squared error (RMSE) and coefficient determination (R2), were used to assess the aforementioned model’s potential. The results revealed that PLS performed better than PCA, with low RMSE values of 0.41809 and 0.42752 and R2 values of 0.9724 and 0.9737, in both the training and the testing data. Furthermore, to improve the accuracy rate of PLS, hybrid models (GA-PLS, DE-PLS, PSO-PLS and ACO-PLS) were developed. Evaluating the obtained results demonstrated a higher accuracy ability to predict NCDs. The results indicated that the
GA-PLS model provided the highest performance in predicting NCDs. The RMSE and R2 values of (1.235e-3 and 4.643e-3) and
(1 and 1) were obtained for the training and the testing data of the GA-PLS model, respectively. The most important input parameters in predicting NCDs were identified. This study indicates how statistical analysis and machine learning modelling could help stakeholders make preliminary decisions regarding NCDs.

Keywords: non-communicable diseases (NCDs), principal component analysis (PCA), partial least-squares regression (PLS), genetic algorithm (GA), particle swarm optimisation (PSO).