Pair Plots & Scatter Analysis
Pair plots (scatter matrix) visualize relationships between every pair of features simultaneously. They reveal: which feature pairs create clear boundaries between classes (good for classification), linear vs non-linear relationships, clusters and natural groupings, and outliers. Essential for feature selection intuition before training any model.
Pair Plots and Scatter Analysis
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import load_iris
# PAIR PLOT -- classic starting point
iris = load_iris()
df = pd.DataFrame(iris.data, columns=iris.feature_names)
df["species"] = pd.Categorical.from_codes(iris.target, iris.target_names)
# sns.pairplot -- full scatter matrix
pair_plot = sns.pairplot(
df,
hue="species",
diag_kind="kde", # diagonal: kernel density estimate
plot_kws={"alpha": 0.6},
palette="tab10",
)
pair_plot.fig.suptitle("Iris Pair Plot — All Feature Combinations", y=1.02)
plt.savefig("pairplot.png", dpi=80, bbox_inches="tight")
plt.show()
# READING THE PAIR PLOT:
interpretation = {
"Diagonal (KDE)": "Distribution of each feature -- look for multi-modal (class separation) or skewness",
"Off-diagonal": "Scatter plot of two features -- look for linear patterns and cluster separation",
"Well-separated clusters": "Features form clear groups by color -> high discriminative power",
"Overlapping clouds": "Features do NOT separate classes well -> low predictive value",
}
print("Pair plot interpretation guide:")
for element, meaning in interpretation.items():
print(f" {element:30s}: {meaning}")
# FOCUSED SCATTER: best features only
fig, axes = plt.subplots(1, 3, figsize=(15, 5))
best_pairs = [
("petal length (cm)", "petal width (cm)"),
("sepal length (cm)", "petal length (cm)"),
("sepal width (cm)", "petal length (cm)"),
]
for ax, (x_col, y_col) in zip(axes, best_pairs):
for species, color in zip(iris.target_names, ["#2196F3", "#FF5722", "#4CAF50"]):
subset = df[df["species"] == species]
ax.scatter(subset[x_col], subset[y_col], label=species, color=color, alpha=0.7, s=40)
ax.set_xlabel(x_col.split(" (")[0])
ax.set_ylabel(y_col.split(" (")[0])
ax.legend(fontsize=8)
ax.set_title(f"{x_col.split(' (')[0]}\nvs\n{y_col.split(' (')[0]}")
plt.tight_layout()
plt.savefig("scatter_pairs.png", dpi=100, bbox_inches="tight")
plt.show()
# QUANTIFY CLASS SEPARABILITY
print("\nBetween-class distance (Fisher score -- higher = more separable):")
numeric_cols = [c for c in df.columns if c != "species"]
for col in numeric_cols:
overall_mean = df[col].mean()
groups = [df[df["species"] == s][col] for s in iris.target_names]
between = sum(len(g) * (g.mean() - overall_mean)**2 for g in groups) / len(df)
within = sum(g.var() for g in groups) / 3
fisher_score = between / (within + 1e-10)
print(f" {col:25s}: {fisher_score:.2f}")Tip
Tip
Practice Pair Plots Scatter Analysis in small, isolated examples before integrating into larger projects. Breaking concepts into small experiments builds genuine understanding faster than reading alone.
Neural networks learn by adjusting connection weights via backpropagation
Practice Task
Note
Practice Task — (1) Write a working example of Pair Plots Scatter Analysis from scratch without looking at notes. (2) Modify it to handle an edge case (empty input, null value, or error state). (3) Share your solution in the Priygop community for feedback.
Quick Quiz
Common Mistake
Warning
A common mistake with Pair Plots Scatter Analysis is skipping edge case testing — empty inputs, null values, and unexpected data types. Always validate boundary conditions to write robust, production-ready ml code.