Time-Series Feature Engineering for ML
Tabular ML on time-series (forecasting electricity demand, predicting churn next month) requires engineering lag features (value 1 day ago, 7 days ago), rolling statistics (7-day mean, 30-day max), change rates, and cyclical encodings. These features convert temporal dependencies into a format that any ML model can consume — even models that have no built-in time awareness.
Lag, Rolling, and Cyclical Features for Time-Series
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.model_selection import TimeSeriesSplit
from sklearn.metrics import r2_score, mean_absolute_error
np.random.seed(42)
# SIMULATE DAILY RETAIL SALES WITH TREND + SEASONALITY
dates = pd.date_range("2021-01-01", "2023-12-31", freq="D")
n = len(dates)
trend = np.linspace(100, 150, n)
weekly = 20 * np.sin(2 * np.pi * np.arange(n) / 7) # 7-day cycle
annual = 30 * np.sin(2 * np.pi * np.arange(n) / 365)
noise = np.random.normal(0, 5, n)
sales = (trend + weekly + annual + noise).clip(0)
df = pd.DataFrame({"date": dates, "sales": sales})
df = df.set_index("date")
# ━━━━━━━━━━━━━━━━━━━━━━━━━━
# FEATURE ENGINEERING
# ━━━━━━━━━━━━━━━━━━━━━━━━━━
# 1. DATETIME FEATURES
df["day_of_week"] = df.index.dayofweek # 0=Mon, 6=Sun
df["month"] = df.index.month
df["day_of_year"] = df.index.dayofyear
df["quarter"] = df.index.quarter
df["is_weekend"] = df.index.dayofweek.isin([5, 6]).astype(int)
# 2. CYCLICAL ENCODING (sin/cos so season wraps correctly)
df["dow_sin"] = np.sin(2 * np.pi * df["day_of_week"] / 7)
df["dow_cos"] = np.cos(2 * np.pi * df["day_of_week"] / 7)
df["doy_sin"] = np.sin(2 * np.pi * df["day_of_year"] / 365)
df["doy_cos"] = np.cos(2 * np.pi * df["day_of_year"] / 365)
# 3. LAG FEATURES (use .shift() -- never use future values!)
for lag in [1, 2, 7, 14, 28, 365]:
df[f"sales_lag_{lag}"] = df["sales"].shift(lag)
# 4. ROLLING STATISTICS
for window in [7, 14, 30]:
df[f"sales_roll_mean_{window}"] = df["sales"].shift(1).rolling(window).mean()
df[f"sales_roll_std_{window}"] = df["sales"].shift(1).rolling(window).std()
df[f"sales_roll_max_{window}"] = df["sales"].shift(1).rolling(window).max()
# 5. DIFFERENCE / CHANGE FEATURES
df["sales_diff_1"] = df["sales"].diff(1) # day-over-day change
df["sales_diff_7"] = df["sales"].diff(7) # week-over-week change
df["sales_pct_7"] = df["sales"].pct_change(7) * 100 # week-over-week % change
# DROP ROWS WITH NaN (from lagging and rolling)
df = df.dropna()
print(f"Dataset after feature engineering: {df.shape}")
print(f"Features created: {df.shape[1] - 1}")
# ━━━━━━━━━━━━━━━━━━━━━━━━━━
# TRAIN-TEST SPLIT (TEMPORAL -- never shuffle!)
# ━━━━━━━━━━━━━━━━━━━━━━━━━━
feature_cols = [c for c in df.columns if c != "sales"]
X = df[feature_cols]
y = df["sales"]
split_date = "2023-01-01"
X_train, y_train = X[X.index < split_date], y[y.index < split_date]
X_test, y_test = X[X.index >= split_date], y[y.index >= split_date]
model = GradientBoostingRegressor(n_estimators=200, learning_rate=0.05, max_depth=4, random_state=42)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
print(f"\nTest R2: {r2_score(y_test, y_pred):.4f}")
print(f"Test MAE: {mean_absolute_error(y_test, y_pred):.2f} units/day")
# FEATURE IMPORTANCE
feat_imp = pd.DataFrame({"Feature": feature_cols, "Importance": model.feature_importances_})
print("\nTop 10 features:")
print(feat_imp.nlargest(10, "Importance").to_string(index=False))Tip
Tip
Practice TimeSeries Feature Engineering for ML in small, isolated examples before integrating into larger projects. Breaking concepts into small experiments builds genuine understanding faster than reading alone.
80% of ML work is data preparation — garbage in = garbage out
Practice Task
Note
Practice Task — (1) Write a working example of TimeSeries Feature Engineering for ML from scratch without looking at notes. (2) Modify it to handle an edge case (empty input, null value, or error state). (3) Share your solution in the Priygop community for feedback.
Quick Quiz
Common Mistake
Warning
A common mistake with TimeSeries Feature Engineering for ML is skipping edge case testing — empty inputs, null values, and unexpected data types. Always validate boundary conditions to write robust, production-ready ml code.