Recent advancements in cancer diagnosis using machine learning techniques: a systematic review of decades of research, comparisons, and problems

Avijit Kumar Chaudhuri, Prithwish Raymahapatra

Abstract


Cancer is a non-communicable disease that spreads throughout the body through uncontrolled cell growth. The malignant cell grows into a tumor, which weakens the immune system and disrupts other biological processes. The most frequent types of cancer are breast, lung, and cervical cancer. Several screening methods are available to detect the presence of cancer at various stages. Misdiagnosis can occur in some circumstances owing to human mistakes or incorrect data interpretation, resulting in the loss of human lives. To address these issues, this research study proposes an effective machine learning-based review and diagnosis technique backed by intelligence learning models. Artificial intelligence-based feature selection and classification techniques are used to detect cancer at an earlier stage, improve prediction accuracy, and save lives. In this research study, breast, cervical, and lung cancer datasets from the University of California, Irvine repository was used in these experimental investigations. To train and validate the optimal features minimized by the proposed system, the authors used supervised machine learning approaches. There could be numerous features that may contribute to the occurrence of cancer, it is difficult to pinpoint the specific environmental and other diagnostic features that contribute to it, but it still plays a role in determining cancer occurrence. We can achieve our goal of estimating the probability of cancer occurrences by using machine learning algorithms and frequent diagnostic data. Cancer data sets contain a variety of patient information features, but not all of them are useful in cancer prognosis. In such cases, a feature selection approach plays a crucial role in identifying the relevant feature set. In this research, we compare the effects of feature selection approaches on the accuracy provided by existing machine learning algorithms. We investigated the following machine learning methods for this purpose: Logistic Regression(LR), Naive Bayes(NB), Random Forest(RF), Hoeffding Tree(HT), and Multi-Layer Perceptron(MLP). Information Gain(IF), Gain Ratio(GR), Relief-F(R-F), and One-R(OR) were all evaluated as feature selection strategies.The training and performance models are validated using various accuracy matrices such as accuracy, sensitivity, specificity, f-measure, kappa score, and area under the ROC curve(AUC) using the 10-fold cross-validation approach. The accuracy of the proposed framework was 100%, 100%, and 91.30% on breast, cervical, and lung cancer datasets, respectively. Furthermore, this approach may serve as a versatile tool for extracting patterns from several clinical trials for various forms of cancer conditions.


Full Text:

PDF

References


D. Hanahan, R.A. Weinberg Hallmarks of cancer: the next generation Cell, 144 (2011), pp. 646-674

M.-Y.C. Polley, B. Freidlin, E.L. Korn, B.A. Conley, J.S. Abrams, L.M. McShane Statistical and practical considerations for clinical evaluation of predictive biomarkers J Natl Cancer Inst, 105 (2013), pp. 1677-1683.

J.A. Cruz, D.S. Wishart Applications of machine learning in cancer prediction and prognosis Cancer Informat, 2 (2006), p. 59

O. Fortunato, M. Boeri, C. Verri, D. Conte, M. Mensah, P. Suatoni, et al. Assessment of circulating microRNAs in plasma of lung cancer patients Molecules, 19 (2014), pp. 3038-3054

H.M. Heneghan, N. Miller, M.J. Kerin MiRNAs as biomarkers and therapeutic targets in cancer Curr Opin Pharmacol, 10 (2010), pp. 543-550 ArticleDownload PDFView Record in ScopusGoogle Scholar

D. Madhavan, K. Cuk, B. Burwinkel, R. Yang Cancer diagnosis and prognosis decoded by blood-based circulating microRNA signatures Front Genet, 4 (2013)

K. Zen, C.Y. Zhang Circulating microRNAs: a novel class of biomarkers to diagnose and monitor human cancers Med Res Rev, 32 (2012), pp. 326-348

Phan, A. C., Cao, H. P., Trieu, T. N., & Phan, T. C. (2023). Improving liver lesions classification on CT/MRI images based on Hounsfield Units attenuation and deep learning. Gene Expression Patterns, 47, 119289.

Abreu, P. H., Santos, M. S., Abreu, M. H., Andrade, B., & Silva, D. C. (2016). Predicting breast cancer recurrence using machine learning techniques: a systematic review. ACM Computing Surveys (CSUR), 49(3), 1-40.

Ahmad, L., Eshlaghy, A., Poorebrahimi, A., Ebrahimi, M., & Razavi, A. (2013). Using three machine learning techniques for predicting breast cancer prediction. Journal of Health and medical informatics, 4(2), 1-3.

Tseng, C. J., Lu, C. J., Chang, C. C., & Chen, G. D. (2014). Application of machine learning to predict the recurrence-proneness for cervical cancer. Neural Computing and Applications, 24, 1311-1316.

Coccia, M. (2020). Deep learning technology for improving cancer care in society: New directions in cancer imaging driven by artificial intelligence. Technology in Society, 60, 101198.

Xiao, Y., Wu, J., Lin, Z., & Zhao, X. (2018). A deep learning-based multi-model ensemble method for cancer prediction. Computer methods and programs in biomedicine, 153, 1-9.

Cruz, J. A., & Wishart, D. S. (2006). Applications of machine learning in cancer prediction and prognosis. Cancer informatics, 2, 117693510600200030.

Bhinder, B., Gilvary, C., Madhukar, N. S., & Elemento, O. (2021). Artificial intelligence in cancer research and precision medicine. Cancer discovery, 11(4), 900-915.

Sridevi, T., & Murugan, A. (2014). A novel feature selection method for effective breast cancer diagnosis and prognosis. International Journal of Computer Applications, 88(11).

Aličković, E., & Subasi, A. (2017). Breast cancer diagnosis using GA feature selection and Rotation Forest. Neural Computing and applications, 28, 753-763.

Hamsagayathri, P., & Sampath, P. (2017). Performance analysis of breast cancer classification using decision tree classifiers. Int J Curr Pharm Res, 9(2), 19-25.

Sewak, M., Vaidya, P., Chan, C. C., & Duan, Z. H. (2007, August). SVM approach to breast cancer classification. In Second international multi-symposiums on computer and computational sciences (IMSCCS 2007) (pp. 32-37). IEEE.

Obaid, O. I., Mohammed, M. A., Ghani, M. K. A., Mostafa, A., & Taha, F. (2018). Evaluating the performance of machine learning techniques in the classification of Wisconsin Breast Cancer. International Journal of Engineering & Technology, 7(4.36), 160-166.

Kumari, S., & Arumugam, M. (2015). Application of bio-inspired krill herd algorithm for breast cancer classification and diagnosis. Indian J. Sci. Technol, 8, 30.

Chaudhuri, A. K., Banerjee, D. K., & Das, A. (2021). A Dataset Centric Feature Selection and Stacked Model to Detect Breast Cancer. International Journal of Intelligent Systems and Applications (IJISA), 13(4), 24-37.

Stiawan, D., Heryanto, A., Bardadi, A., Rini, D. P., Subroto, I. M. I., Idris, M. Y. B., ... & Budiarto, R. (2020). An approach for optimizing ensemble intrusion detection systems. Ieee Access, 9, 6930-6947.

Robnik-Sikonja M and Kononenko I. Theoretical and empirical analysis of ReliefF and RReliefF. Machine Learning 2003; 53: 23–69.

Onan, A., & Korukoğlu, S. (2017). A feature selection model based on genetic rank aggregation for text sentiment classification. Journal of Information Science, 43(1), 25-38.

Hall, M. A., & Smith, L. A. (1999, May). Feature selection for machine learning: comparing a correlation-based filter approach to the wrapper. In FLAIRS conference (Vol. 1999, pp. 235-239).

Win, T. Z., & Kham, N. S. M. (2019). Information gain measured feature selection to reduce high dimensional data (Doctoral dissertation, MERAL Portal).

Karimi, Z., Kashani, M. M. R., & Harounabadi, A. (2013). Feature ranking in intrusion detection dataset using combination of filtering methods. International Journal of Computer Applications, 78(4).

Bhattacharya, S., & Selvakumar, S. (2016). Multi-measure multi-weight ranking approach for the identification of the network features for the detection of DoS and Probe attacks. The Computer Journal, 59(6), 923-943.

T. Garg and Y. Kumar, ``Combinational feature selection approach for network intrusion detection system,'' in Proc. 3rd Int. Conf. Parallel, Distrib Grid Comput., 2014, pp. 82_87, doi: 10.1109/PDGC.2014.7030720.

R. A. Ghazy, E. S. M. El-Rabaie, M. I. Dessouky, N. A. El-Fishawy, and F. E. A. El-Samie, ``Feature selection ranking and subset-based techniques with different classi_ers for intrusion detection,'' Wireless Pers. Commun.vol. 111, no. 1, pp. 375_393, 2020, doi: 10.1007/s11277-019-06864-3.

K. Shah and D. K. Singh, ``A survey on data mining approaches for dynamic analysis of malwares,'' in Proc. Int. Conf. Green Comput. Internet Things, 2015, pp. 495_499, doi: 10.1109/ICGCIoT.2015.7380515.

Chaudhuri, A. K., Ray, A., Banerjee, D. K., & Das, A. (2021). A multi-stage approach combining feature selection with machine learning techniques for higher prediction reliability and accuracy in cervical cancer diagnosis. Int J Intell Syst Appl, 13(5), 46-63.

E. Ahishakiye, R. Wario, W. Mwangi, and D. Taremwa, “Prediction of Cervical Cancer Basing on Risk Factors using Ensemble Learning,” in 2020 IST-Africa Conference (IST-Africa), IEEE, May 2020, pp. 1-12

Y. M. S. Al-Wesabi, A. Choudhury, and D. Won, “Classification of cervical cancer dataset,” in Proceedings of the 2018 IISE Annual Conference, IISE, December 2018, pp.1456-1461

J. Lu, E. Song, A. Ghoneim, and M. Alrashoud, “Machine learning for assisting cervical cancer diagnosis: An ensemble approach,” Future Generat. Comput. Syst., vol. 106, pp. 199-205, 2020.

M. Z. F. Nasution, O. S. Sitompul, and M. Ramli, “PCA based feature reduction to improve the accuracy of decision tree c4.5 classification,” J. Phys. Conf., vol. 978, pp. 012058, 2018.

B. Nithya and V. Ilango, “Evaluation of machine learning based optimized feature selection approaches and classification methods for cervical cancer prediction,” SN Applied Sciences, vol. 1, pp. 1-6, 2019.

S. Priya and N. K. Karthikeyan, “A Heuristic and ANN based Classification Model for Early Screening of Cervical Cancer,” Int. J. Comput. Intell. Syst., vol. 13, pp. 1092-1100, 2020.

R. Sawhney, P. Mathur, and R. Shankar, “A firefly algorithm based wrapper-penalty feature selection method for cancer diagnosis,” in International Conference on Computational Science and Its Applications, Cham: Springer, July 2018, pp. 438-449

H. D. Singh, Diagnosis of Cervical Cancer using Hybrid Machine Learning Models. Doctoral dissertation, Dublin, National College of Ireland, 2018.

A. K. Tripathi, P. Garg, A. Tripathy, N. Vats, D. Gupta, and A. Khanna, “Prediction of Cervical Cancer Using Chicken Swarm Optimization,” in International Conference on Innovative Computing and Communications, Singapore: Springer, 2020, pp.591-604.

Singh, G.A.P.; Gupta, P. Performance analysis of various machine learning-based approaches for detection and classification of lung cancer in humans. Neural Comput. Appl. 2019, 31, 6863–6877

Faisal, M.I.; Bashir, S.; Khan, Z.S.; Khan, F.H. An evaluation of machine learning classifiers and ensembles for early stage prediction of lung cancer. In Proceedings of the 2018 3rd International Conference on Emerging Trends in Engineering, Sciences and Technology (ICEEST), Thrissur, Kerala, India, 18–20 January 2018; pp. 1–4.

Vieira, E.; Ferreira, D.; Neto, C.; Abelha, A.; Machado, J. Data Mining Approach to Classify Cases of Lung Cancer. In World Conference on Information Systems and Technologies; Springer: Berlin/Heidelberg, Germany, 2021; pp. 511–521.

Xie, Y.; Meng, W.Y.; Li, R.Z.; Wang, Y.W.; Qian, X.; Chan, C.; Yu, Z.F.; Fan, X.X.; Pan, H.D.; Xie, C.; et al. Early lung cancer diagnostic biomarker discovery by machine learning methods. Transl. Oncol. 2021, 14, 100907.

Pal, S. S., Raymahapatra, P., Paul, S., Dolui, S., Chaudhuri, A. K., & Das, S. A Novel Brain Tumor Classification Model Using Machine Learning Techniques.

Chaudhuri, A. K., Sinha, D., Banerjee, D. K., & Das, A. (2021). A novel enhanced decision tree model for detecting chronic kidney disease. Network Modeling Analysis in Health Informatics and Bioinformatics, 10, 1-22.

Chaudhuri, A. K., & Das, A. (2020, November). Variable Selection in Genetic Algorithm Model with Logistic Regression for Prediction of Progression to Diseases. In 2020 IEEE International Conference for Innovation in Technology (INOCON) (pp. 1-6). IEEE.

Dey, R., Bose, S., Ghosh, N., Chakraborty, S., Chaudhuri, A. K., & Das, S. (2023). An Extensive Review on Cancer Detection using Machine Learning Algorithms. International Journal of Engineering Technology and Management Sciences, 7(2), 254-270.

Ray, A., & Chaudhuri, A. K. (2021). Smart healthcare disease diagnosis and patient management: Innovation, improvement and skill development. Machine Learning with Applications, 3, 100011.


Refbacks

  • There are currently no refbacks.


------------------------------------------------------------------------------------------------------------------------

The ADBU Journal of Engineering Technology (AJET)" ISSN:2348-7305

This journal is published under the terms of the Creative Commons Attribution (CC-BY) (http://creativecommons.org/licenses/)

Number of Visitors to this Journal: