Handling Overfitting in Machine Learning
Overfitting is one of the key challenges in Machine Learning. This occurs when model performance on training data is significantly better than unseen data which implies that instead of learning general patterns about the data, model learnt about the noise or anomalies specific to training data. In this article, we will learn about the ways to detect & prevent overfitting.
What is overfitting?
As shown in the graph above, if a model learns about the patterns specific to training data — such as wavy green line in this case — there is a good possibility that it will not perform as good on test data since pattern is not general enough. Now take the example of black line — chances of black line being a general trend is very high. We can also say that even though variance of error in green line would be low — it will have a significant bias.