Pair Plots & Scatter Analysis

Pair plots (scatter matrix) visualize relationships between every pair of features simultaneously. They reveal: which feature pairs create clear boundaries between classes (good for classification), linear vs non-linear relationships, clusters and natural groupings, and outliers. Essential for feature selection intuition before training any model.

15 min•By Priygop Team•Updated 2026

Pair Plots and Scatter Analysis

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import load_iris

# PAIR PLOT -- classic starting point
iris = load_iris()
df = pd.DataFrame(iris.data, columns=iris.feature_names)
df["species"] = pd.Categorical.from_codes(iris.target, iris.target_names)

# sns.pairplot -- full scatter matrix
pair_plot = sns.pairplot(
    df,
    hue="species",
    diag_kind="kde",           # diagonal: kernel density estimate
    plot_kws={"alpha": 0.6},
    palette="tab10",
)
pair_plot.fig.suptitle("Iris Pair Plot — All Feature Combinations", y=1.02)
plt.savefig("pairplot.png", dpi=80, bbox_inches="tight")
plt.show()

# READING THE PAIR PLOT:
interpretation = {
    "Diagonal (KDE)":    "Distribution of each feature -- look for multi-modal (class separation) or skewness",
    "Off-diagonal":      "Scatter plot of two features -- look for linear patterns and cluster separation",
    "Well-separated clusters": "Features form clear groups by color -> high discriminative power",
    "Overlapping clouds":      "Features do NOT separate classes well -> low predictive value",
}
print("Pair plot interpretation guide:")
for element, meaning in interpretation.items():
    print(f"  {element:30s}: {meaning}")

# FOCUSED SCATTER: best features only
fig, axes = plt.subplots(1, 3, figsize=(15, 5))
best_pairs = [
    ("petal length (cm)", "petal width (cm)"),
    ("sepal length (cm)", "petal length (cm)"),
    ("sepal width (cm)",  "petal length (cm)"),
]

for ax, (x_col, y_col) in zip(axes, best_pairs):
    for species, color in zip(iris.target_names, ["#2196F3", "#FF5722", "#4CAF50"]):
        subset = df[df["species"] == species]
        ax.scatter(subset[x_col], subset[y_col], label=species, color=color, alpha=0.7, s=40)
    ax.set_xlabel(x_col.split(" (")[0])
    ax.set_ylabel(y_col.split(" (")[0])
    ax.legend(fontsize=8)
    ax.set_title(f"{x_col.split(' (')[0]}\nvs\n{y_col.split(' (')[0]}")

plt.tight_layout()
plt.savefig("scatter_pairs.png", dpi=100, bbox_inches="tight")
plt.show()

# QUANTIFY CLASS SEPARABILITY
print("\nBetween-class distance (Fisher score -- higher = more separable):")
numeric_cols = [c for c in df.columns if c != "species"]
for col in numeric_cols:
    overall_mean = df[col].mean()
    groups = [df[df["species"] == s][col] for s in iris.target_names]
    between = sum(len(g) * (g.mean() - overall_mean)**2 for g in groups) / len(df)
    within  = sum(g.var() for g in groups) / 3
    fisher_score = between / (within + 1e-10)
    print(f"  {col:25s}: {fisher_score:.2f}")

Tip

Practice Pair Plots Scatter Analysis in small, isolated examples before integrating into larger projects. Breaking concepts into small experiments builds genuine understanding faster than reading alone.

Diagram

Loading diagram…

Neural networks learn by adjusting connection weights via backpropagation

Practice Task

Note

Practice Task — (1) Write a working example of Pair Plots Scatter Analysis from scratch without looking at notes. (2) Modify it to handle an edge case (empty input, null value, or error state). (3) Share your solution in the Priygop community for feedback.

Quick Quiz

Common Mistake

Warning

A common mistake with Pair Plots Scatter Analysis is skipping edge case testing — empty inputs, null values, and unexpected data types. Always validate boundary conditions to write robust, production-ready ml code.

Topics in This Module