Machine Learning of Predictive Models on Unbalanced Data on Hazardous Asteroids
Abstract
Machine Learning of Predictive Models on Unbalanced Data on Hazardous Asteroids
Incoming article date: 21.03.2023A set of data on potentially dangerous asteroids for the Earth is analyzed. According to descriptive statistics, a preliminary analysis and data processing is performed. The correlation between the parameters allows you to identify those that will be used to train the models. With the help of machine learning models, asteroids from the database are classified into hazardous and non-hazardous. Methods of logistic regression, k-nearest neighbors; decision tree and others are used. Using cross-validation, the best method is found, then its optimal hyperparameters are determined. The quality of the classifier model is evaluated by the metrics of completeness (Recall) and its standard deviation, as well as using the error matrix (confusion matrix) and the average absolute error in percent (MAPE). The results of analysis and modeling in Python are presented, demonstrating the high accuracy of predicting the resulting model.
Keywords: machine learning, predictive model, data analysis, imbalanced data, logistic regression, k-nearest neighbors, decision tree, random forest, support vector machine, cross-validation