Skip to main content
Course/Module 2/Topic 4 of 4Beginner

Data Preprocessing

Prepare and clean data for machine learning models

55 minBy Priygop TeamLast updated: Feb 2026

What is Data Preprocessing?

Data preprocessing is a crucial step in machine learning that involves cleaning, transforming, and preparing raw data for analysis and modeling.

Handling Missing Data

Example
import pandas as pd
import numpy as np
from sklearn.impute import SimpleImputer

# Create sample data with missing values
data = {
    'A': [1, 2, np.nan, 4, 5],
    'B': [1.1, np.nan, 3.3, 4.4, 5.5],
    'C': ['a', 'b', 'c', np.nan, 'e']
}
df = pd.DataFrame(data)

# Check for missing values
print(df.isnull().sum())

# Fill missing values
df_filled = df.fillna(df.mean())  # For numerical columns
df_filled = df.fillna(df.mode().iloc[0])  # For categorical columns

# Using SimpleImputer
imputer = SimpleImputer(strategy='mean')
df_imputed = pd.DataFrame(imputer.fit_transform(df), columns=df.columns)

Feature Scaling

Example
from sklearn.preprocessing import StandardScaler, MinMaxScaler
from sklearn.model_selection import train_test_split

# Sample data
X = np.random.randn(100, 3)
y = np.random.randint(0, 2, 100)

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardization (Z-score normalization)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Min-Max scaling
minmax_scaler = MinMaxScaler()
X_train_minmax = minmax_scaler.fit_transform(X_train)
X_test_minmax = minmax_scaler.transform(X_test)

Try It Yourself — Data Preprocessing

Try It Yourself — Data PreprocessingPython
Python Editor
✓ ValidTab = 2 spaces
Python|34 lines|770 chars|✓ Valid syntax
UTF-8

Quick Quiz — Data Preprocessing

Additional Resources

Recommended Reading

  • Python for Data Analysis by Wes McKinney
  • Python Data Science Handbook by Jake VanderPlas
  • NumPy and Pandas Official Documentation

Online Resources

  • NumPy Tutorial
  • Pandas Getting Started Guide
  • Matplotlib Tutorial
Chat on WhatsApp
Priygop - Leading Professional Development Platform | Expert Courses & Interview Prep