Logistic Regression — Classification with Probabilities
Despite its name, logistic regression is a classification algorithm. It applies the sigmoid function to a linear combination of features, squashing the output to a probability between 0 and 1. The decision boundary is learned by maximizing log-likelihood. Logistic regression is the gold standard baseline for binary classification — fast, interpretable, and often surprisingly competitive with more complex models.
Logistic Regression — Sigmoid, Coefficients, Multiclass
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import classification_report, roc_auc_score
# THE SIGMOID FUNCTION -- core of logistic regression
z = np.linspace(-8, 8, 200)
sigmoid = 1 / (1 + np.exp(-z))
plt.figure(figsize=(8, 4))
plt.plot(z, sigmoid, "steelblue", linewidth=2.5)
plt.axhline(0.5, color="red", linestyle="--", alpha=0.7, label="Decision boundary (p=0.5)")
plt.fill_between(z, sigmoid, 0.5, where=(sigmoid > 0.5), alpha=0.1, color="green", label="Predict Class 1")
plt.fill_between(z, sigmoid, 0.5, where=(sigmoid < 0.5), alpha=0.1, color="red", label="Predict Class 0")
plt.xlabel("Linear combination z = w0 + w1*x1 + w2*x2 + ...")
plt.ylabel("Probability P(y=1)")
plt.title("Sigmoid Function -- converts linear score to probability")
plt.legend()
plt.tight_layout()
plt.savefig("sigmoid.png", dpi=100, bbox_inches="tight")
plt.show()
# BINARY CLASSIFICATION -- breast cancer
cancer = load_breast_cancer()
X, y = cancer.data, cancer.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)
scaler = StandardScaler()
X_train_sc = scaler.fit_transform(X_train)
X_test_sc = scaler.transform(X_test)
model = LogisticRegression(C=1.0, max_iter=1000, random_state=42)
model.fit(X_train_sc, y_train)
# PROBABILITIES (not just class labels!)
y_prob = model.predict_proba(X_test_sc)[:, 1] # probability of class 1 (benign)
y_pred = model.predict(X_test_sc)
print("Binary Classification -- Breast Cancer:")
print(classification_report(y_test, y_pred, target_names=cancer.target_names))
print(f"AUC-ROC: {roc_auc_score(y_test, y_prob):.4f}")
# CUSTOM THRESHOLD -- adjust when recall vs precision matters
threshold = 0.35 # lower threshold = more positive predictions = higher recall
y_pred_thresh = (y_prob > threshold).astype(int)
print(f"\nWith threshold={threshold} (more sensitive to malignant):")
print(classification_report(y_test, y_pred_thresh, target_names=cancer.target_names))
# COEFFICIENT INTERPRETATION
coeff_df = pd.DataFrame({
"Feature": cancer.feature_names[:10],
"Coefficient": model.coef_[0][:10],
"Odds_Ratio": np.exp(model.coef_[0][:10]),
}).sort_values("Coefficient", key=abs, ascending=False)
print("\nCoefficients and Odds Ratios (first 10 features):")
print(coeff_df.round(4).to_string(index=False))
print(" Odds ratio > 1: feature increases P(benign)")
print(" Odds ratio < 1: feature decreases P(benign)")
# MULTICLASS: one-vs-rest or multinomial
from sklearn.datasets import load_iris
iris = load_iris()
lr_multi = LogisticRegression(multi_class="multinomial", max_iter=1000, C=1.0, random_state=42)
lr_multi.fit(StandardScaler().fit_transform(iris.data), iris.target)
print(f"\nMulticlass accuracy on Iris: {lr_multi.score(StandardScaler().fit_transform(iris.data), iris.target):.4f}")Tip
Tip
Practice Logistic Regression Classification with Probabilities in small, isolated examples before integrating into larger projects. Breaking concepts into small experiments builds genuine understanding faster than reading alone.
Classification = predict categories. Regression = predict numbers. Some algorithms do both.
Practice Task
Note
Practice Task — (1) Write a working example of Logistic Regression Classification with Probabilities from scratch without looking at notes. (2) Modify it to handle an edge case (empty input, null value, or error state). (3) Share your solution in the Priygop community for feedback.
Quick Quiz
Common Mistake
Warning
A common mistake with Logistic Regression Classification with Probabilities is skipping edge case testing — empty inputs, null values, and unexpected data types. Always validate boundary conditions to write robust, production-ready ml code.