Default Risk Identification of Chinese Corporate Bonds Using Interpretable Machine Learning
Published in Computational Economics, 2026
Abstract: Current research on corporate bond default identification in China faces several challenges, including imbalanced default samples, complex hyperparameter configurations, and limited interpretability of the identification model. To tackle these issues, this study employs an interpretable machine learning framework, leveraging its efficient data-processing capabilities. Using samples of Chinese defaulted bonds from 2014 to 2024, the framework first applies the Synthetic Minority Over-sampling Technique (SMOTE) to alleviate classification bias resulting from sample imbalance. Subsequently, the Light Gradient Boosting Machine (LightGBM) is employed for feature selection and default bond identification, while the multi-objective optimization algorithm Non-dominated Sorting Genetic Algorithm II (NSGA-II) is used to optimize the hyperparameters of the LightGBM, thereby improving the model’s generalization capability. Finally, the SHapley Additive exPlanations (SHAP) method is adopted to interpret the marginal contributions of default factors to the identification outcomes. Experimental results show that the proposed model achieves an average identification accuracy of over 82.88% across four different prediction windows, with an average efficiency metric of 93.54%. Moreover, SHAP analysis reveals that risk factors such as the cash asset ratio play a critical role in default identification within China’s bond market. These findings confirm that the proposed approach not only makes the decision-making process of key risk factors interpretable but also offers regulatory authorities a scientific basis for policymaking, thereby supporting the development of targeted regulatory frameworks and enabling proactive intervention in high-risk bonds.
