Implementation of K-Nearest Neighbour (KNN) Algorithm and Random Forest Algorithm in Identifying Diabetes

Virly Diranisha; Agung Triayudi; Ratih Titi Komalasari

doi:10.58905/saga.v2i2.253

Authors

Virly Diranisha Informatics Study Program, Faculty of Communication and Information Technology, Universitas Nasional, Jakarta, Indonesia
Agung Triayudi Informatics Study Program, Faculty of Communication and Information Technology, Universitas Nasional, Jakarta, Indonesia
Ratih Titi Komalasari Informatics Study Program, Faculty of Communication and Information Technology, Universitas Nasional, Jakarta, Indonesia

DOI:

https://doi.org/10.58905/saga.v2i2.253

Keywords:

classification, comparison, Diabetes, K-Nearest Neighbours, Random Forest

Abstract

Diabetes, one of the noncommunicable diseases (NCDs), is currently a major health threat worldwide. So far, diabetes symptoms have only been diagnosed by people according to known physical characteristics without the support of factual evidence or other medical considerations. With the advancement of technology, it is possible to use algorithms to solve various kinds of problems. One of artificial intelligence (AI), machine learning, concentrates on creating systems that can learn from data. This research uses the K-Nearest Neighbor (KNN) and Random Forest algorithms that can be utilised as testing algorithms to identify diabetes. Classification is done based on training data that has been provided in the dataset. The purpose of this research is to determine the best classification in identifying diabetes with the K-Nearest Neighbor (KNN) algorithm and the Random Forest algorithm and is expected to provide more understanding of the implementation of machine learning models. comparing the two algorithms between the KNN algorithm and the Random Forest algorithm. By dividing the testing data and training data using a ratio of 20%: 80% randomised data 300 times. The results of the accuracy evaluation obtained from the Confusion Matrix show that the Random Forest Algorithm has the best accuracy value of 77%, Precision 89%, Recall 78% and F1-Score 83% with an estimator of 100 trees. While the KNN algorithm obtained accuracy of 73%, Precision 87%, Recall 73% and F1-Score 79% of the value of K = 7. Based on the comparison results of the two algorithms, it shows that the accuracy value obtained is greater than the Random Forest algorithm even though the value obtained is not much different.

Author Biographies

Agung Triayudi, Informatics Study Program, Faculty of Communication and Information Technology, Universitas Nasional, Jakarta, Indonesia

Informatics Study Program, Faculty of Communication and Information Technology, Universitas Nasional, Jakarta, Indonesia.

Ratih Titi Komalasari, Informatics Study Program, Faculty of Communication and Information Technology, Universitas Nasional, Jakarta, Indonesia

Informatics Study Program, Faculty of Communication and Information Technology, Universitas Nasional, Jakarta, Indonesia

References

J. Biologi et al., “Diabetes Melitus: Review Etiologi.” [Online]. Available: http://journal.uin-alauddin.ac.id/index.php/psb

Y. Nora Marlim, L. Suryati, and N. Agustina, “Deteksi Dini Penyakit Diabetes Menggunakan Machine Learning dengan Algoritma Logistic Regression,” 2022.

P. R. Sihombing and I. F. Yuliati, “Penerapan Metode Machine Learning dalam Klasifikasi Risiko Kejadian Berat Badan Lahir Rendah di Indonesia,” MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer, vol. 20, no. 2, pp. 417–426, May 2021, doi: 10.30812/matrik.v20i2.1174.

L. U. Khasanah, Y. N. Nasution, F. Deny, and T. Amijaya, “Klasifikasi Penyakit Diabetes Melitus Menggunakan Algoritma Naïve Bayes Classifier,” vol. 1, no. 1, pp. 41–50, 2022, [Online]. Available: http://jurnal.fmipa.unmul.ac.id/index.php/basis

A. A. A. S. Z. Gustiana. Muttaqin, Implementasi Artificial Intelligence Dalam Kehidupan. Aceh: Yayasan Kita Menulis, 2023.

A. Fauzi, A. Heri, and Y. #2, “JEPIN (Jurnal Edukasi dan Penelitian Informatika) Optimasi Algoritma Klasifikasi Naive Bayes, Decision Tree, K-Nearest Neighbor, dan Random Forest menggunakan Algoritma Particle Swarm Optimization pada Diabetes Dataset”.

A. M. Ridwan and G. D. Setyawan, “PERBANDINGAN BERBAGAI MODEL MACHINE LEARNING UNTUK MENDETEKSI DIABETES,” TEKNOKOM, vol. 6, no. 2, pp. 127–132, Aug. 2023, doi: 10.31943/teknokom.v6i2.152.

I. L. Faisal, “Perbandingan Metode Naïve Bayes dan KNN (K-Nearest Neighbor) dalam Klasifikasi Penyakit Diabetes,” 2023.

Audrey Athallah, “Prediksi Diabetes Menggunakan Metode KNN,” Youtube. 2020.

D. A. Hadi and D. A. N. Sirodj, “Metode Random Forest untuk Klasifikasi Penyakit Diabetes,” Bandung Conference Series: Statistics, vol. 3, no. 2, pp. 428–435, Aug. 2023, doi: 10.29313/bcss.v3i2.8354.