Predicting Corporate ESG Scores from Financial Performance and Environmental Indicators: A Machine Learning Framework
Main Article Content
Abstract
As investors, regulators, and the public increasingly emphasize sustainable investment amid growing climate concerns, the accurate prediction of Environmental, Social, and Governance (ESG) metrics has become a crucial complement to traditional assessment methods. This study analyzes 1,000 companies across nine industries and seven regions between 2015 and 2025 to predict overall ESG scores using key financial and environmental indicators. To ensure robust predictive performance, a diverse set of machine learning algorithms—including Linear Regression, Random Forests, and four boosting models (AdaBoost, LightGBM, XGBoost, and CatBoost)—was employed. To address potential bias in panel data, a panel-aware machine learning framework incorporating GroupKFold cross-validation was implemented. The results show that boosting algorithms consistently outperform traditional linear approaches in predicting ESG scores. Among them, CatBoost achieved the best overall performance, with the lowest RMSE (4.608), MAE (2.222), and MSE (21.234), and the highest R² (0.913), indicating strong predictive accuracy. Overall, this study presents an innovative and transferable framework for predicting ESG scores, thus contributing to both empirical research and quantitative modeling practices. Furthermore, it advances the sustainability field by providing a machine learning–based application that enables companies to predict their ESG scores in real time.