Feature Transformations — Log, Power, Quantile

When features are highly skewed (income, population, transaction amounts), standard scaling alone is insufficient. Log transform compresses the right tail of exponential distributions. Power transform (Box-Cox, Yeo-Johnson) finds the optimal transformation to make data more Gaussian. Quantile transform maps to a uniform or normal distribution regardless of original shape. These matter most for linear models and SVM.

20 min•By Priygop Team•Updated 2026

Log, Power, and Quantile Transformations

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.preprocessing import PowerTransformer, QuantileTransformer, FunctionTransformer

np.random.seed(42)
# Highly right-skewed data (like income, house prices, website visits)
income = np.random.exponential(scale=50000, size=1000).clip(5000, 1000000)

# COMPUTE TRANSFORMATIONS
log_income      = np.log1p(income)               # log(x+1) -- handles zeros
pt_boxcox       = PowerTransformer(method="box-cox")  # requires all positive values
pt_yeo          = PowerTransformer(method="yeo-johnson")  # handles zeros and negatives
qt_uniform      = QuantileTransformer(output_distribution="uniform", n_quantiles=500)
qt_normal       = QuantileTransformer(output_distribution="normal", n_quantiles=500)

transformations = {
    "Original (exponential)": income,
    "Log (log1p)":            log_income,
    "Box-Cox":                pt_boxcox.fit_transform(income.reshape(-1, 1)).ravel(),
    "Yeo-Johnson":            pt_yeo.fit_transform(income.reshape(-1, 1)).ravel(),
    "Quantile (normal)":      qt_normal.fit_transform(income.reshape(-1, 1)).ravel(),
}

fig, axes = plt.subplots(1, 5, figsize=(20, 4))
for ax, (name, data) in zip(axes, transformations.items()):
    ax.hist(data, bins=40, color="steelblue", edgecolor="white", alpha=0.8)
    ax.set_title(f"{name}\nskew={pd.Series(data).skew():.2f}")
    ax.set_ylabel("Count")
plt.tight_layout()
plt.savefig("transformations.png", dpi=100, bbox_inches="tight")
plt.show()

# IMPACT ON SKEWNESS (closer to 0 is better for linear models)
print("Skewness after transformation (target: |skew| < 0.5):")
for name, data in transformations.items():
    skew = pd.Series(data).skew()
    status = "GOOD" if abs(skew) < 0.5 else ("OK" if abs(skew) < 1.0 else "STILL SKEWED")
    print(f"  {name:28s}: skew = {skew:+.2f} -> {status}")

# USING IN SKLEARN PIPELINE
from sklearn.pipeline import Pipeline
from sklearn.linear_model import Ridge

income_df = pd.DataFrame({"income": income, "target": income * 0.001 + np.random.normal(0, 5, 1000)})

without_transform = Pipeline([("reg", Ridge())])
with_log_transform = Pipeline([
    ("log_transform", FunctionTransformer(np.log1p)),
    ("reg", Ridge()),
])
with_yeo_transform = Pipeline([
    ("yeo", PowerTransformer(method="yeo-johnson")),
    ("reg", Ridge()),
])

X_inc = income_df[["income"]]
y_inc = income_df["target"]

from sklearn.model_selection import cross_val_score
for name, pipe in [("No transform", without_transform), ("Log transform", with_log_transform), ("Yeo-Johnson", with_yeo_transform)]:
    r2 = cross_val_score(pipe, X_inc, y_inc, cv=5, scoring="r2").mean()
    print(f"  {name:20s}: CV R2 = {r2:.4f}")

Tip

Practice Feature Transformations Log Power Quantile in small, isolated examples before integrating into larger projects. Breaking concepts into small experiments builds genuine understanding faster than reading alone.

Diagram

Loading diagram…

Feature engineering = 80% of ML success.

Practice Task

Note

Practice Task — (1) Write a working example of Feature Transformations Log Power Quantile from scratch without looking at notes. (2) Modify it to handle an edge case (empty input, null value, or error state). (3) Share your solution in the Priygop community for feedback.

Quick Quiz

Common Mistake

Warning

A common mistake with Feature Transformations Log Power Quantile is skipping edge case testing — empty inputs, null values, and unexpected data types. Always validate boundary conditions to write robust, production-ready ml code.

Topics in This Module