Interaction Features, Binning & Polynomial Terms

Interaction features encode the combined effect of two features that the model can't discover independently. Binning (discretization) converts continuous variables into ordinal groups — capturing non-linear threshold effects (salary band, age group). Polynomial features systematically expand the feature space with all products and powers, enabling linear models to fit non-linear patterns.

20 min•By Priygop Team•Updated 2026

Creating Interaction, Binning, and Polynomial Features

import numpy as np
import pandas as pd
from sklearn.preprocessing import PolynomialFeatures, KBinsDiscretizer
from sklearn.model_selection import cross_val_score
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler

np.random.seed(42)
N = 1500

df = pd.DataFrame({
    "age":           np.random.normal(40, 13, N).clip(18, 75),
    "income":        np.random.exponential(55000, N).clip(15000, 200000),
    "credit_score":  np.random.normal(680, 80, N).clip(300, 850),
    "loan_amount":   np.random.exponential(18000, N).clip(1000, 80000),
    "employment_yrs":np.random.exponential(7, N).clip(0, 40),
})
df["default"] = (
    (df["loan_amount"] / df["income"] > 0.4) |
    (df["credit_score"] < 600) |
    (np.random.uniform(0, 1, N) < 0.08)
).astype(int)

X = df.drop("default", axis=1)
y = df["default"]

# ━━━━━━━━━━━━━━━━━━━━━━━━━━
# 1. MANUAL INTERACTION FEATURES
# ━━━━━━━━━━━━━━━━━━━━━━━━━━
X_fe = X.copy()
X_fe["debt_to_income"]      = X["loan_amount"] / X["income"]
X_fe["age_x_credit"]        = X["age"] * X["credit_score"] / 1000
X_fe["income_x_emp_yrs"]    = X["income"] * X["employment_yrs"] / 1e6
X_fe["credit_income_ratio"]  = X["credit_score"] / (X["income"] / 10000)

print("Feature impact on AUC-ROC (GBM, 5-fold CV):")
model = GradientBoostingClassifier(n_estimators=100, random_state=42)
base_auc = cross_val_score(model, X, y, cv=5, scoring="roc_auc").mean()
eng_auc  = cross_val_score(model, X_fe, y, cv=5, scoring="roc_auc").mean()
print(f"  Raw features:         {base_auc:.4f}")
print(f"  + Interaction feats:  {eng_auc:.4f}  (+{eng_auc-base_auc:.4f})")

# ━━━━━━━━━━━━━━━━━━━━━━━━━━
# 2. BINNING (DISCRETIZATION)
# ━━━━━━━━━━━━━━━━━━━━━━━━━━
# Age groups (domain knowledge bins)
df["age_group"] = pd.cut(df["age"],
    bins=[0, 25, 35, 50, 65, 100],
    labels=["Gen Z", "Millennial", "Gen X", "Boomer", "Senior"]
)
print("\nDefault rate by age group:")
print(df.groupby("age_group", observed=True)["default"].agg(["mean", "count"]).round(3))

# Income quartiles
df["income_quartile"] = pd.qcut(df["income"], q=4, labels=["Q1", "Q2", "Q3", "Q4"])
print("\nDefault rate by income quartile:")
print(df.groupby("income_quartile", observed=True)["default"].agg(["mean", "count"]).round(3))

# KBinsDiscretizer -- sklearn compatible (can go in pipelines)
discretizer = KBinsDiscretizer(n_bins=5, strategy="quantile", encode="ordinal")
X_fe["age_binned"]    = discretizer.fit_transform(X[["age"]])
X_fe["income_binned"] = discretizer.fit_transform(X[["income"]])

# ━━━━━━━━━━━━━━━━━━━━━━━━━━
# 3. POLYNOMIAL FEATURES
# ━━━━━━━━━━━━━━━━━━━━━━━━━━
print("\nPolynomial features for Logistic Regression:")
for degree in [1, 2, 3]:
    pipe = Pipeline([
        ("poly",   PolynomialFeatures(degree=degree, include_bias=False)),
        ("scaler", StandardScaler()),
        ("model",  LogisticRegression(C=0.1, max_iter=2000, random_state=42)),
    ])
    cv_auc = cross_val_score(pipe, X, y, cv=5, scoring="roc_auc").mean()
    n_feats = PolynomialFeatures(degree=degree, include_bias=False).fit_transform(X).shape[1]
    print(f"  Degree {degree}: {n_feats:4d} features | AUC = {cv_auc:.4f}")
print("  NOTE: GBM doesn't need polynomial features (learns interactions natively)")

Tip

Practice Interaction Features Binning Polynomial Terms in small, isolated examples before integrating into larger projects. Breaking concepts into small experiments builds genuine understanding faster than reading alone.

Diagram

Loading diagram…

Neural networks learn by adjusting connection weights via backpropagation

Practice Task

Note

Practice Task — (1) Write a working example of Interaction Features Binning Polynomial Terms from scratch without looking at notes. (2) Modify it to handle an edge case (empty input, null value, or error state). (3) Share your solution in the Priygop community for feedback.

Quick Quiz

Common Mistake

Warning

A common mistake with Interaction Features Binning Polynomial Terms is skipping edge case testing — empty inputs, null values, and unexpected data types. Always validate boundary conditions to write robust, production-ready ml code.

Topics in This Module