Advanced Machine Learning Strategies for Effective Rare Event Classification: A Comparative Study

[ X ]

Tarih

2025

Dergi Başlığı

Dergi ISSN

Cilt Başlığı

Yayıncı

Erişim Hakkı

info:eu-repo/semantics/openAccess

Özet

As data science and machine learning continue to evolve, binary event classification has become increasingly important. Logistic Regression (LR) is a standard baseline, yet it can underestimate probabilities in rare-event settings. This study combines a 16-scenario simulation (rarity 5–10%, n∈ {1000,5000}, p∈ {3,5,7,10}, 100 repeats) with a real-world application to assess Support Vector Machines (SVM), Random Forest (RF), and Gradient Boosting (GB) as alternatives to LR. Training data were balanced using the SMOTETomek hybrid method. In simulation, LR attained the highest balanced performance (G_mean) across cases, with GB the closest competitor; SVM lagged, and RF yielded the lowest G_mean despite often leading test accuracy and precision. On a wine dataset adjusted to 5% and 10% rarity, RF/GB achieved top test accuracy/recall (e.g., ACC=0.998/0.997 with REC=0.958/0.989 at 5% and ACC=0.998/0.994 with REC=0.989/0.977 at 10%), mirroring their strong aggregate accuracy. Overall, the “best” model depends on the target metric: LR/GB when balanced minority–majority performance is critical, and RF when overall accuracy/precision is prioritized.

Açıklama

Anahtar Kelimeler

Bilgisayar Bilimleri, Yapay Zeka

Kaynak

Cumhuriyet Science Journal

WoS Q Değeri

Scopus Q Değeri

Cilt

46

Sayı

4

Künye