Linear Regression — How It Works

Linear regression fits a straight line (or hyperplane) through data by minimizing the sum of squared residuals (OLS — Ordinary Least Squares). The model learns coefficients (weights) for each feature: price = w1*sqft + w2*bedrooms + w3*location + bias. Each coefficient tells you: if this feature increases by 1 unit (with all else constant), the target changes by w units. Linear regression is fast, interpretable, and a baseline every ML project should start with.

25 min•By Priygop Team•Updated 2026

Linear Regression — OLS, Coefficients, and Assumptions

import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import fetch_california_housing
import matplotlib.pyplot as plt

# SIMPLE EXAMPLE: understand coefficients
X_simple = np.array([[1000], [1500], [2000], [2500], [3000], [3500]])  # sqft
y_simple  = np.array([200000, 280000, 350000, 420000, 490000, 570000])  # price

lr_simple = LinearRegression()
lr_simple.fit(X_simple, y_simple)
print(f"Coefficient (price per sqft): ${lr_simple.coef_[0]:.2f}")
print(f"Intercept (base price):       ${lr_simple.intercept_:,.0f}")
print(f"Formula: price = ${lr_simple.coef_[0]:.0f} x sqft + ${lr_simple.intercept_:,.0f}")
print(f"Prediction for 2200 sqft: ${lr_simple.predict([[2200]])[0]:,.0f}")

# MULTIPLE LINEAR REGRESSION on real data
housing = fetch_california_housing()
X = pd.DataFrame(housing.data, columns=housing.feature_names)
y = housing.target  # median house value in $100k

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

scaler = StandardScaler()
X_train_sc = scaler.fit_transform(X_train)
X_test_sc  = scaler.transform(X_test)

model = LinearRegression()
model.fit(X_train_sc, y_train)
y_pred = model.predict(X_test_sc)

# EVALUATION METRICS
mae  = mean_absolute_error(y_test, y_pred)
rmse = np.sqrt(mean_squared_error(y_test, y_pred))
r2   = r2_score(y_test, y_pred)

print(f"\nCalifornia Housing Linear Regression:")
print(f"  MAE  (Mean Absolute Error):  {mae:.3f}  ($100k) -- average prediction error")
print(f"  RMSE (Root Mean Sq Error):   {rmse:.3f}  ($100k) -- penalizes large errors more")
print(f"  R2   (coefficient of det.):  {r2:.3f}    -- {r2:.1%} of variance explained")

# INTERPRET COEFFICIENTS (after scaling -- comparable)
coeff_df = pd.DataFrame({
    "Feature": housing.feature_names,
    "Coefficient": model.coef_.round(4),
}).sort_values("Coefficient", key=abs, ascending=False)
print("\nFeature Importance (standardized coefficients, abs = impact):")
print(coeff_df)

# LINEAR REGRESSION ASSUMPTIONS
assumptions = {
    "Linearity":         "Relationship between X and y is linear -- check with residual plots",
    "Independence":      "Observations are independent -- violated in time-series",
    "Homoscedasticity":  "Constant residual variance -- check residual vs fitted plot",
    "Normality":         "Residuals are normally distributed -- check Q-Q plot",
    "No multicollinearity": "Features not highly correlated -- check VIF > 10 signals problem",
}
print("\nLinear Regression Assumptions:")
for assumption, test in assumptions.items():
    print(f"  {assumption:22s}: {test}")

Tip

Practice Linear Regression How It Works in small, isolated examples before integrating into larger projects. Breaking concepts into small experiments builds genuine understanding faster than reading alone.

Diagram

Loading diagram…

Simplest ML. y = mx + b. Minimize MSE.

Practice Task

Note

Practice Task — (1) Write a working example of Linear Regression How It Works from scratch without looking at notes. (2) Modify it to handle an edge case (empty input, null value, or error state). (3) Share your solution in the Priygop community for feedback.

Quick Quiz

Common Mistake

Warning

A common mistake with Linear Regression How It Works is skipping edge case testing — empty inputs, null values, and unexpected data types. Always validate boundary conditions to write robust, production-ready ml code.

Topics in This Module