Guney, EzgiDemir, Memnun2026-04-252026-04-2520262193-567X2191-4281https://doi.org/10.1007/s13369-026-11143-yhttps://hdl.handle.net/11486/8235Accurate short-term estimation of electrical power output in biomass power plants remains challenging due to the nonlinear and dynamically coupled nature of thermochemical conversion processes, fuel heterogeneity, and pronounced thermal inertia. Conventional physics-based models, while effective for steady-state analysis, often fail to capture the high-frequency dynamics required for real-time monitoring and decision-support applications. This study proposes a data-driven framework for short-term power output estimation using high-resolution Supervisory Control and Data Acquisition (SCADA) data collected from an operational industrial biomass power plant. A large-scale SCADA dataset comprising several hundred thousand time-stamped records is used to model the relationship between seven key thermodynamic and operational variables and net electrical power output. Multi-layer perceptron (MLP), random forest (RF), gradient boosting regressor (GBR), and support vector regression (SVR) are evaluated under two distinct validation strategies: (i) a conventional random train-test split and (ii) a temporally blocked cross-validation scheme preserving causal order. Under random sampling, RF attains the highest apparent accuracy (R2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$<^>{2}$$\end{document} = 0.9687), whereas MLP exhibits lower performance (R2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$<^>{2}$$\end{document} = 0.8492), highlighting sensitivity to instantaneous regression assumptions. When temporal continuity is enforced, predictive performance improves consistently across all models. In the blocked validation stage, GBR and RF achieve R2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$<^>{2}$$\end{document} values of 0.9983 and 0.9973, respectively, while MLP demonstrates a substantial performance increase (R2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$<^>{2}$$\end{document} = 0.9865). Time-domain analysis further reveals that ensemble-based models provide smoother tracking of short-term fluctuations, whereas temporally aligned evaluation significantly improves the physical consistency of neural network predictions. These results demonstrate that temporally consistent validation is essential for reliable SCADA-based modeling of biomass power generation and provide a practical foundation for real-time monitoring and decision-support applications in industrial biomass power plants.eninfo:eu-repo/semantics/openAccessBiomassSCADAMachine learningTemporal information leakageShort-term power forecastingTime-Aware Machine Learning for Biomass Power Output Estimation Using SCADA DataArticle10.1007/s13369-026-11143-y2-s2.0-105030700174Q1WOS:001694945300001Q20000-0003-4868-06260000-0002-4228-9637