Outlier Detection in Machine Learning. Techniques, Methods, and How to Handle Overfitting and Underfitting

Ismat Samadov
5 min readOct 25, 2024

Outlier detection plays a crucial role in machine learning, especially in data preprocessing, anomaly detection, and improving model performance.

Photo by Will Myers on Unsplash

Outliers can disrupt the predictive accuracy of models, leading to overfitting or underfitting, and affecting overall reliability.

This article explores the fundamentals of outlier detection, reviews popular techniques, and addresses methods to manage overfitting and underfitting in machine learning models.

What Are Outliers?

Outliers are data points that significantly differ from the majority of the dataset. These anomalies can occur due to various factors, such as measurement errors, data entry issues, or genuine but rare occurrences within the data. Outliers can skew the training process and result in models that either fail to generalize well (overfitting) or are too simplistic to capture underlying patterns (underfitting).

Types of Outliers

  1. Global Outliers: Data points that are far removed from the overall distribution of the dataset.
  2. Contextual Outliers: Points that are only considered outliers in a specific context, such as time-series…

--

--