Advanced Machine Learning Strategies for Effective Rare Event Classification: A Comparative Study

dc.contributor.authorAlpay, Olcay
dc.date.accessioned2026-04-25T14:11:33Z
dc.date.available2026-04-25T14:11:33Z
dc.date.issued2025
dc.departmentSinop Üniversitesi
dc.description.abstractAs data science and machine learning continue to evolve, binary event classification has become increasingly important. Logistic Regression (LR) is a standard baseline, yet it can underestimate probabilities in rare-event settings. This study combines a 16-scenario simulation (rarity 5–10%, n∈ {1000,5000}, p∈ {3,5,7,10}, 100 repeats) with a real-world application to assess Support Vector Machines (SVM), Random Forest (RF), and Gradient Boosting (GB) as alternatives to LR. Training data were balanced using the SMOTETomek hybrid method. In simulation, LR attained the highest balanced performance (G_mean) across cases, with GB the closest competitor; SVM lagged, and RF yielded the lowest G_mean despite often leading test accuracy and precision. On a wine dataset adjusted to 5% and 10% rarity, RF/GB achieved top test accuracy/recall (e.g., ACC=0.998/0.997 with REC=0.958/0.989 at 5% and ACC=0.998/0.994 with REC=0.989/0.977 at 10%), mirroring their strong aggregate accuracy. Overall, the “best” model depends on the target metric: LR/GB when balanced minority–majority performance is critical, and RF when overall accuracy/precision is prioritized.
dc.identifier.doi10.17776/csj.1605507
dc.identifier.endpage1002
dc.identifier.issn2587-2680
dc.identifier.issn2587-246X
dc.identifier.issue4
dc.identifier.startpage990
dc.identifier.trdizinid1378574
dc.identifier.urihttps://doi.org/10.17776/csj.1605507
dc.identifier.urihttps://search.trdizin.gov.tr/tr/yayin/detay/1378574
dc.identifier.urihttps://hdl.handle.net/11486/7967
dc.identifier.volume46
dc.indekslendigikaynakTR-Dizin
dc.institutionauthorAlpay, Olcay
dc.language.isoen
dc.relation.ispartofCumhuriyet Science Journal
dc.relation.publicationcategoryMakale - Ulusal Hakemli Dergi - Kurum Öğretim Elemanı
dc.rightsinfo:eu-repo/semantics/openAccess
dc.snmzKA_TR_20260420
dc.subjectBilgisayar Bilimleri
dc.subjectYapay Zeka
dc.titleAdvanced Machine Learning Strategies for Effective Rare Event Classification: A Comparative Study
dc.typeArticle

Dosyalar