Multiple Regression & Feature Importance
In multiple regression, coefficients measure the isolated effect of each feature holding all else constant — ceteris paribus. But raw coefficients are not comparable across features with different scales. Standardize features first, then compare absolute coefficient magnitudes to understand relative feature importance. Multicollinearity (correlated features) inflates standard errors and makes coefficients unstable.
Feature Importance from Linear Regression Coefficients
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import Ridge
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
# Load & scale features
housing = fetch_california_housing()
X = pd.DataFrame(housing.data, columns=housing.feature_names)
y = housing.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
scaler = StandardScaler()
X_train_sc = scaler.fit_transform(X_train)
X_test_sc = scaler.transform(X_test)
model = Ridge(alpha=1.0)
model.fit(X_train_sc, y_train)
# INTERPRET STANDARDIZED COEFFICIENTS
coeff_df = pd.DataFrame({
"Feature": housing.feature_names,
"Coefficient": model.coef_,
"Abs_Coefficient": np.abs(model.coef_),
}).sort_values("Abs_Coefficient", ascending=False)
print("Feature importance (standardized coefficients):")
print(coeff_df.round(4).to_string(index=False))
# VISUALIZE
fig, ax = plt.subplots(figsize=(10, 5))
colors = ["tomato" if c < 0 else "steelblue" for c in coeff_df["Coefficient"]]
coeff_df.sort_values("Coefficient").plot(
kind="barh", x="Feature", y="Coefficient",
ax=ax, color=colors, legend=False
)
ax.axvline(0, color="black", linewidth=0.8)
ax.set_title("Feature Coefficients (standardized)\n(Blue=positive effect, Red=negative effect)")
ax.set_xlabel("Coefficient (1 SD increase in feature -> coefficient units change in house value)")
plt.tight_layout()
plt.savefig("feature_importance_lr.png", dpi=100, bbox_inches="tight")
plt.show()
# DETECT MULTICOLLINEARITY WITH VIF
from statsmodels.stats.outliers_influence import variance_inflation_factor
vif_data = pd.DataFrame()
vif_data["Feature"] = X.columns
vif_data["VIF"] = [variance_inflation_factor(X_train_sc, i) for i in range(X_train_sc.shape[1])]
vif_data["Concern"] = vif_data["VIF"].apply(lambda v: "SEVERE" if v > 10 else ("MODERATE" if v > 5 else "OK"))
print("\nVariance Inflation Factor (VIF):")
print(vif_data.sort_values("VIF", ascending=False).to_string(index=False))
print(" VIF > 10: high multicollinearity -> consider removing or combining features")Tip
Tip
Practice Multiple Regression Feature Importance in small, isolated examples before integrating into larger projects. Breaking concepts into small experiments builds genuine understanding faster than reading alone.
Tree = interpretable. Forest = robust. XGBoost = wins.
Practice Task
Note
Practice Task — (1) Write a working example of Multiple Regression Feature Importance from scratch without looking at notes. (2) Modify it to handle an edge case (empty input, null value, or error state). (3) Share your solution in the Priygop community for feedback.
Quick Quiz
Common Mistake
Warning
A common mistake with Multiple Regression Feature Importance is skipping edge case testing — empty inputs, null values, and unexpected data types. Always validate boundary conditions to write robust, production-ready ml code.