A machine learning (ML) model’s accuracy is the percentage of predictions the model got right. Let’s define when a classifier is considered to be precise by looking at an example of binary classification accuracy (assuming we have a simple binary classification problem).
We wish to predict a target variable Y (Y is binary, so either 0 or 1) for a given X. Let's say we are using two different classifiers, classifier1 and classifier2. While predicting a new data point x_i, classifier1 predicts with a probability of 0.8 that y_i = 1 (Pr(y_i=1|x_i) = 0.8). On the other hand, classifier2 predicts it with a probability of 0.7 ( Pr(y_i =1|x_i) = 0.7).
If in reality y_i =1, classifier1 is more accurate than classifier2 since it was closer to an accurate prediction. However, if the real value of y_i is 0, classifier1 is less accurate than classifier2 since it was more sure about a false prediction.
What about the mathematical expression of binary classification accuracy? It can be defined in terms of True and False as shown below:
(TP = True Positives, TN = True Negatives, FP = False Positives, and FN = False Negatives)
Machine learning models are often used to make important business decisions. For example, they can be used to decide whether a client is worthy of a bank loan, for stock prediction, fraud detection, and cancer detection. The cost of possible errors can be huge. However, it can be reduced significantly by optimizing model accuracy.
Accuracy seems to be a good technique to optimize the final machine learning model, but wait a minute! What happens when we are dealing with f, a class-imbalanced data set, where there is a significant disparity between the number of positive and negative labels? In this case, accuracy doesn't tell the entire story. We will need to factor in other important parameters such as the F1-score.