Interaction Features, Binning & Polynomial Terms
Interaction features encode the combined effect of two features that the model can't discover independently. Binning (discretization) converts continuous variables into ordinal groups — capturing non-linear threshold effects (salary band, age group). Polynomial features systematically expand the feature space with all products and powers, enabling linear models to fit non-linear patterns.
Creating Interaction, Binning, and Polynomial Features
import numpy as np
import pandas as pd
from sklearn.preprocessing import PolynomialFeatures, KBinsDiscretizer
from sklearn.model_selection import cross_val_score
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
np.random.seed(42)
N = 1500
df = pd.DataFrame({
"age": np.random.normal(40, 13, N).clip(18, 75),
"income": np.random.exponential(55000, N).clip(15000, 200000),
"credit_score": np.random.normal(680, 80, N).clip(300, 850),
"loan_amount": np.random.exponential(18000, N).clip(1000, 80000),
"employment_yrs":np.random.exponential(7, N).clip(0, 40),
})
df["default"] = (
(df["loan_amount"] / df["income"] > 0.4) |
(df["credit_score"] < 600) |
(np.random.uniform(0, 1, N) < 0.08)
).astype(int)
X = df.drop("default", axis=1)
y = df["default"]
# ━━━━━━━━━━━━━━━━━━━━━━━━━━
# 1. MANUAL INTERACTION FEATURES
# ━━━━━━━━━━━━━━━━━━━━━━━━━━
X_fe = X.copy()
X_fe["debt_to_income"] = X["loan_amount"] / X["income"]
X_fe["age_x_credit"] = X["age"] * X["credit_score"] / 1000
X_fe["income_x_emp_yrs"] = X["income"] * X["employment_yrs"] / 1e6
X_fe["credit_income_ratio"] = X["credit_score"] / (X["income"] / 10000)
print("Feature impact on AUC-ROC (GBM, 5-fold CV):")
model = GradientBoostingClassifier(n_estimators=100, random_state=42)
base_auc = cross_val_score(model, X, y, cv=5, scoring="roc_auc").mean()
eng_auc = cross_val_score(model, X_fe, y, cv=5, scoring="roc_auc").mean()
print(f" Raw features: {base_auc:.4f}")
print(f" + Interaction feats: {eng_auc:.4f} (+{eng_auc-base_auc:.4f})")
# ━━━━━━━━━━━━━━━━━━━━━━━━━━
# 2. BINNING (DISCRETIZATION)
# ━━━━━━━━━━━━━━━━━━━━━━━━━━
# Age groups (domain knowledge bins)
df["age_group"] = pd.cut(df["age"],
bins=[0, 25, 35, 50, 65, 100],
labels=["Gen Z", "Millennial", "Gen X", "Boomer", "Senior"]
)
print("\nDefault rate by age group:")
print(df.groupby("age_group", observed=True)["default"].agg(["mean", "count"]).round(3))
# Income quartiles
df["income_quartile"] = pd.qcut(df["income"], q=4, labels=["Q1", "Q2", "Q3", "Q4"])
print("\nDefault rate by income quartile:")
print(df.groupby("income_quartile", observed=True)["default"].agg(["mean", "count"]).round(3))
# KBinsDiscretizer -- sklearn compatible (can go in pipelines)
discretizer = KBinsDiscretizer(n_bins=5, strategy="quantile", encode="ordinal")
X_fe["age_binned"] = discretizer.fit_transform(X[["age"]])
X_fe["income_binned"] = discretizer.fit_transform(X[["income"]])
# ━━━━━━━━━━━━━━━━━━━━━━━━━━
# 3. POLYNOMIAL FEATURES
# ━━━━━━━━━━━━━━━━━━━━━━━━━━
print("\nPolynomial features for Logistic Regression:")
for degree in [1, 2, 3]:
pipe = Pipeline([
("poly", PolynomialFeatures(degree=degree, include_bias=False)),
("scaler", StandardScaler()),
("model", LogisticRegression(C=0.1, max_iter=2000, random_state=42)),
])
cv_auc = cross_val_score(pipe, X, y, cv=5, scoring="roc_auc").mean()
n_feats = PolynomialFeatures(degree=degree, include_bias=False).fit_transform(X).shape[1]
print(f" Degree {degree}: {n_feats:4d} features | AUC = {cv_auc:.4f}")
print(" NOTE: GBM doesn't need polynomial features (learns interactions natively)")Tip
Tip
Practice Interaction Features Binning Polynomial Terms in small, isolated examples before integrating into larger projects. Breaking concepts into small experiments builds genuine understanding faster than reading alone.
Neural networks learn by adjusting connection weights via backpropagation
Practice Task
Note
Practice Task — (1) Write a working example of Interaction Features Binning Polynomial Terms from scratch without looking at notes. (2) Modify it to handle an edge case (empty input, null value, or error state). (3) Share your solution in the Priygop community for feedback.
Quick Quiz
Common Mistake
Warning
A common mistake with Interaction Features Binning Polynomial Terms is skipping edge case testing — empty inputs, null values, and unexpected data types. Always validate boundary conditions to write robust, production-ready ml code.