Bayesian Optimization Cost-Sensitive XGBoost Learning Algorithm for Imbalanced Data in Semiconductor Industry
Haziqah Shamsudin, Umi Kalsom Yusof, Fizza Kashif, Iza Sazanita Isa | Pages: 552-565|

Abstract— This paper proposes an improved ensemble learning model based on extreme gradient boosting (XGBoost) with Bayesian optimization cost-sensitive learning algorithm for dealing with highly imbalanced data in the semiconductor process to achieve the highest possible pass and fail accuracy or recall for the classification performances. Most of the existing models are biased toward the majority class neglecting the minority class. The proposed Bayesian optimization cost-sensitive XGboost model is configured to be applied to the semiconductor dataset. The obtained experimental results – based on benchmarking semiconductor industry dataset – show 91.46% and 23.08% for the pass and fail accuracies, respectively. This confirms that the proposed model is significant for imbalanced cases in semiconductor applications. Moreover, this investigation reveals that the proposed model is able not only to maintain the performance of the majority class, but also to classify well the minority class.