Polynomial Regression & Non-linearity
Linear regression draws a straight line. Real relationships are often curved. Polynomial regression extends linear regression with polynomial feature terms (x², x³, x1*x2) — allowing curves while still using the same linear algebra underneath. The key risk: high-degree polynomials overfit dramatically. Combine with Ridge regularization to control complexity.
PolynomialFeatures with Regularization
import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import PolynomialFeatures, StandardScaler
from sklearn.linear_model import LinearRegression, Ridge
from sklearn.pipeline import Pipeline
from sklearn.model_selection import cross_val_score
from sklearn.metrics import mean_squared_error
np.random.seed(42)
# TRUE RELATIONSHIP: quadratic
X = np.sort(np.random.uniform(-3, 3, 100)).reshape(-1, 1)
y = 2 * X.ravel()**2 - 3 * X.ravel() + 1 + np.random.normal(0, 1.5, 100)
X_train, y_train = X[:80], y[:80]
X_test, y_test = X[80:], y[80:]
# COMPARE POLYNOMIAL DEGREES
fig, axes = plt.subplots(1, 4, figsize=(18, 5))
degrees = [1, 2, 5, 15]
for ax, degree in zip(axes, degrees):
pipeline = Pipeline([
("poly", PolynomialFeatures(degree=degree, include_bias=False)),
("scale", StandardScaler()),
("reg", LinearRegression()),
])
pipeline.fit(X_train, y_train)
X_plot = np.linspace(-3, 3, 300).reshape(-1, 1)
y_plot = pipeline.predict(X_plot)
train_rmse = np.sqrt(mean_squared_error(y_train, pipeline.predict(X_train)))
test_rmse = np.sqrt(mean_squared_error(y_test, pipeline.predict(X_test)))
ax.scatter(X_train, y_train, s=15, color="steelblue", alpha=0.7, label="train")
ax.scatter(X_test, y_test, s=15, color="tomato", marker="^", alpha=0.7, label="test")
ax.plot(X_plot, y_plot, color="black", linewidth=2)
ax.set_ylim(-10, 25)
ax.set_title(f"Degree {degree}\ntrain RMSE={train_rmse:.2f} | test RMSE={test_rmse:.2f}")
ax.legend(fontsize=8)
plt.suptitle("Polynomial Regression: Underfitting -> Good Fit -> Overfitting", y=1.02)
plt.tight_layout()
plt.savefig("polynomial_regression.png", dpi=100, bbox_inches="tight")
plt.show()
# FIND OPTIMAL DEGREE WITH CROSS-VALIDATION
print("Cross-validation RMSE by polynomial degree:")
for degree in [1, 2, 3, 4, 5, 8, 12]:
pipe = Pipeline([
("poly", PolynomialFeatures(degree=degree, include_bias=False)),
("scale", StandardScaler()),
("reg", Ridge(alpha=1.0)), # Ridge prevents overfitting at high degrees
])
cv_rmse = np.sqrt(-cross_val_score(pipe, X_train, y_train, cv=5, scoring="neg_mean_squared_error"))
print(f" Degree {degree:2d}: CV RMSE = {cv_rmse.mean():.3f} +/- {cv_rmse.std():.3f}")
# INTERACTION TERMS -- non-additive feature relationships
print("\nPolynomialFeatures also creates interaction terms:")
poly = PolynomialFeatures(degree=2, include_bias=False)
X_example = np.array([[2, 3, 5]]) # 3 features
X_poly = poly.fit_transform(X_example)
print(f" Input features: {X_example.tolist()}")
print(f" Output features: {X_poly[0].tolist()}")
print(f" Created: x1, x2, x3, x1^2, x1*x2, x1*x3, x2^2, x2*x3, x3^2")Tip
Tip
Practice Polynomial Regression Nonlinearity in small, isolated examples before integrating into larger projects. Breaking concepts into small experiments builds genuine understanding faster than reading alone.
Simplest ML. y = mx + b. Minimize MSE.
Practice Task
Note
Practice Task — (1) Write a working example of Polynomial Regression Nonlinearity from scratch without looking at notes. (2) Modify it to handle an edge case (empty input, null value, or error state). (3) Share your solution in the Priygop community for feedback.
Quick Quiz
Common Mistake
Warning
A common mistake with Polynomial Regression Nonlinearity is skipping edge case testing — empty inputs, null values, and unexpected data types. Always validate boundary conditions to write robust, production-ready ml code.