Loss Functions: The Key to Improving AI Predictions
Loss functions measure how wrong an AI's predictions are. Different loss functions are used for different types of problems (regression or classification).
Join the DZone community and get the full member experience.
Join For FreeHow Good Is an AI Model at Forecasting?
We can put an actual number on it. In machine learning, a loss function tracks the degree of error in the output from an AI model by quantifying the difference or the loss between a predicted value and the actual value. If the model’s predictions are accurate, the difference between these two numbers — the loss — is small. If the predictions are inaccurate, the loss is larger.
For example, a colleague built an AI model to forecast how many views his videos would receive on YouTube. The model was fed YouTube titles and forecasted the number of views the video would receive in its first week. When comparing the model’s forecasts to the actual number of views, the predictions were not very accurate. The model predicted that the cold brew video would bomb and that the pour-over guide video would be a hit, but this wasn’t the case. This is a hard problem to solve, and loss functions can help improve the model.
Loss functions define how well a model is performing mathematically. By calculating loss, we can adjust model parameters to see if the loss increases (worsens) or decreases (improves). A machine learning model is considered sufficiently trained when the loss is minimized below a predefined threshold. At a high level, loss functions fall into two categories: regression loss functions and classification loss functions.
Loss Functions in Regression Models
Regression loss functions measure errors in continuous value predictions, such as house prices, temperature, or YouTube video views. These functions must be sensitive to both whether the forecast is correct and the degree to which it diverges from the ground truth.
1. Mean Squared Error (MSE)
The most common regression loss function is Mean Squared Error (MSE), calculated as the average squared difference between the predicted and true values across all training examples.
Squaring the error gives large mistakes a disproportionately heavy impact on overall loss, strongly penalizing outliers.
2. Mean Absolute Error (MAE)
MAE, on the other hand, measures the average absolute difference between the predicted and actual values. Unlike MSE, MAE does not square the errors, making it less sensitive to outliers.
Choosing between MSE and MAE depends on the nature of the data. If there are a few extreme outliers, such as temperature ranges in July in the southern U.S., MSE is a good choice since it heavily penalizes large deviations. However, if the data contains outliers that should not overly influence the model, such as occasional surges in product sales, MAE is a better option.
3. Huber Loss
Hubber Loss provides a compromise between MSE and MAE, acting like MSE for small errors and MAE for large errors. This makes it useful when penalizing large errors is necessary, but not too harshly.
For the YouTube example, the MAE value summed up to an average prediction error of 16,000 views per video. The MSE loss function skyrocketed to over 400 million due to the squaring of large errors. The Huber loss also indicated poor predictions but provided a more balanced perspective, penalizing large errors less severely than MSE. However, these loss values are only meaningful when used to adjust model parameters and observe improvements.
Loss Functions in Classification Models
Classification loss functions, in contrast to regression loss functions, measure accuracy in categorical predictions. These functions assess how well predicted probabilities or labels match actual categories, such as determining whether an email is spam or not.
1. Cross-Entropy Loss
Cross-entropy is the most widely used classification loss function, measuring how uncertain a model’s predictions are compared to actual outcomes. Entropy, in this context, represents uncertainty — a coin flip has low entropy, while rolling a six-sided die has higher entropy. Cross-entropy loss compares the certainty of the model’s predictions to the certainty of the ground truth labels.
2. Hinge Loss (Used in SVMs)
Another classification loss function is hinge loss, which is commonly used in support vector machines (SVMs). Hinge loss encourages correct predictions with confidence, aiming to maximize the margin between classes. This makes it particularly useful in binary classification tasks where distinctions between classes must be clear.
Calculating the loss function serves as a guide for improving the model. Loss values indicate how far off predictions are from actual results, enabling adjustments through optimization. The loss function acts as a feedback mechanism, directing the learning process. Lower loss indicates better alignment between predictions and true outcomes. After adjusting the YouTube prediction model, new forecasts resulted in lower loss values across all three functions, with the greatest improvement in MSE, as the model reduced the large prediction error for the pour-over video.
Loss functions not only evaluate model performance but also influence model training through optimization techniques like gradient descent. Gradient descent calculates the slope of the loss function with respect to each model parameter, determining the optimal direction to minimize loss. The model updates weight and bias terms iteratively until the loss is sufficiently minimized.
Conclusion
In summary, a loss function serves as both a scorekeeper that measures model performance and a guide that directs learning. Thanks to loss functions, my colleague can continue tweaking his YouTube AI model to minimize loss and improve prediction accuracy.
Opinions expressed by DZone contributors are their own.
Comments