BioScience Trends. 2026;20(1):80-90. (DOI: 10.5582/bst.2025.01323)
Predicting non-alcoholic fatty liver disease (NAFLD) using machine learning algorithms: Evidence from a large-scale community cohort in Taiwan
Lin TC, Wei YJ, Liang PC, Tsai PC, Lin YH, Hsieh MH, Jang TY, Wang CW, Hsieh MY, Lin ZY, Yeh ML, Huang JF, Huang CF, Chuang WL, Yu ML, Dai CY, Shi HY
Closely associated with metabolic disorders, non-alcoholic fatty liver disease (NAFLD) substantially increases the risk of hepatocellular carcinoma. This study aimed to apply machine learning (ML) algorithms to a community-based cohort in southern Taiwan to identify key risk factors for NAFLD and to develop predictive models with clinical applicability. Data were derived from community health examinations, and eighteen clinical and demographic features were analyzed. Five ML algorithms were evaluated: logistic regression (LR), random forest (RF), K-nearest neighbors (KNN), adaptive boosting (AdaBoost), and extreme gradient boosting (XGBoost). Model performance was assessed using accuracy, precision, recall, F1 score, and area under the receiver operating characteristic curve (AUROC). A total of 7,510 participants were included (38.8% male; mean age 50.9 ± 15.0 years). The dataset was randomly divided into training (80%) and testing (20%) subsets, with no significant differences observed between groups in most independent variables. The Synthetic Minority Over-sampling Technique (SMOTE) was employed to balance NAFLD and non-NAFLD groups in the training dataset. Among all models, XGBoost achieved the highest performance, with an accuracy of 83.48%, precision of 84.31%, recall of 81.21%, F1 score of 82.72%, and AUROC of 92.85%. Feature importance analysis identified low-density lipoprotein cholesterol (LDL-C), body mass index (BMI), waist circumference, fasting plasma glucose (FPG), and triglycerides (TG) as the most influential predictors of NAFLD. ML algorithms, particularly XGBoost, demonstrated high accuracy in predicting NAFLD and effectively identified key clinical predictors. These findings may enhance early diagnosis and facilitate the development of targeted intervention strategies in the management of NAFLD.






