Leveraging Machine Learning for Corporate Fraud Detection: A Random Forest Study
Main Article Content
Abstract
The occurrence of corporate fraud often results in significant losses to stakeholders and society. Therefore, this study aims to construct a model to predict corporate fraud, with the goal of providing early warnings of potential fraudulent activities. The research focuses on fraudulent listed companies in Taiwan and selects matching non-fraudulent companies at a ratio of 1:2 as the research sample. To comprehensively capture the factors contributing to fraud, 53 indicators are selected from four dimensions: financial statements, corporate governance, market transactions, and the overall economy. This study further categorizes fraud methods into financial statement fraud and non-financial statement fraud (i.e., hollowing out/misappropriating assets/manipulating stock prices), and applies machine learning techniques, specifically decision tree and random forest algorithms, for prediction and analysis. The empirical results indicate that: (1) the random forest method, based on ensemble learning, achieves higher prediction accuracy than the decision tree model, and the prediction accuracy improves when fraud methods are distinguished; (2) the type I error of the random forest model is zero, meaning that if the model predicts a company as fraudulent, fraud will occur in the following year; and (3) the detailed techniques of fraud evolve structurally over time, leading to a relatively high type II error.