Time-Series Feature Engineering for ML

Tabular ML on time-series (forecasting electricity demand, predicting churn next month) requires engineering lag features (value 1 day ago, 7 days ago), rolling statistics (7-day mean, 30-day max), change rates, and cyclical encodings. These features convert temporal dependencies into a format that any ML model can consume — even models that have no built-in time awareness.

20 min•By Priygop Team•Updated 2026

Lag, Rolling, and Cyclical Features for Time-Series

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.model_selection import TimeSeriesSplit
from sklearn.metrics import r2_score, mean_absolute_error

np.random.seed(42)

# SIMULATE DAILY RETAIL SALES WITH TREND + SEASONALITY
dates = pd.date_range("2021-01-01", "2023-12-31", freq="D")
n = len(dates)
trend      = np.linspace(100, 150, n)
weekly     = 20 * np.sin(2 * np.pi * np.arange(n) / 7)  # 7-day cycle
annual     = 30 * np.sin(2 * np.pi * np.arange(n) / 365)
noise      = np.random.normal(0, 5, n)
sales      = (trend + weekly + annual + noise).clip(0)

df = pd.DataFrame({"date": dates, "sales": sales})
df = df.set_index("date")

# ━━━━━━━━━━━━━━━━━━━━━━━━━━
# FEATURE ENGINEERING
# ━━━━━━━━━━━━━━━━━━━━━━━━━━

# 1. DATETIME FEATURES
df["day_of_week"]  = df.index.dayofweek         # 0=Mon, 6=Sun
df["month"]        = df.index.month
df["day_of_year"]  = df.index.dayofyear
df["quarter"]      = df.index.quarter
df["is_weekend"]   = df.index.dayofweek.isin([5, 6]).astype(int)

# 2. CYCLICAL ENCODING (sin/cos so season wraps correctly)
df["dow_sin"]      = np.sin(2 * np.pi * df["day_of_week"] / 7)
df["dow_cos"]      = np.cos(2 * np.pi * df["day_of_week"] / 7)
df["doy_sin"]      = np.sin(2 * np.pi * df["day_of_year"] / 365)
df["doy_cos"]      = np.cos(2 * np.pi * df["day_of_year"] / 365)

# 3. LAG FEATURES (use .shift() -- never use future values!)
for lag in [1, 2, 7, 14, 28, 365]:
    df[f"sales_lag_{lag}"] = df["sales"].shift(lag)

# 4. ROLLING STATISTICS
for window in [7, 14, 30]:
    df[f"sales_roll_mean_{window}"] = df["sales"].shift(1).rolling(window).mean()
    df[f"sales_roll_std_{window}"]  = df["sales"].shift(1).rolling(window).std()
    df[f"sales_roll_max_{window}"]  = df["sales"].shift(1).rolling(window).max()

# 5. DIFFERENCE / CHANGE FEATURES
df["sales_diff_1"]   = df["sales"].diff(1)     # day-over-day change
df["sales_diff_7"]   = df["sales"].diff(7)     # week-over-week change
df["sales_pct_7"]    = df["sales"].pct_change(7) * 100  # week-over-week % change

# DROP ROWS WITH NaN (from lagging and rolling)
df = df.dropna()
print(f"Dataset after feature engineering: {df.shape}")
print(f"Features created: {df.shape[1] - 1}")

# ━━━━━━━━━━━━━━━━━━━━━━━━━━
# TRAIN-TEST SPLIT (TEMPORAL -- never shuffle!)
# ━━━━━━━━━━━━━━━━━━━━━━━━━━
feature_cols = [c for c in df.columns if c != "sales"]
X = df[feature_cols]
y = df["sales"]

split_date = "2023-01-01"
X_train, y_train = X[X.index < split_date],  y[y.index < split_date]
X_test,  y_test  = X[X.index >= split_date], y[y.index >= split_date]

model = GradientBoostingRegressor(n_estimators=200, learning_rate=0.05, max_depth=4, random_state=42)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

print(f"\nTest R2:  {r2_score(y_test, y_pred):.4f}")
print(f"Test MAE: {mean_absolute_error(y_test, y_pred):.2f} units/day")

# FEATURE IMPORTANCE
feat_imp = pd.DataFrame({"Feature": feature_cols, "Importance": model.feature_importances_})
print("\nTop 10 features:")
print(feat_imp.nlargest(10, "Importance").to_string(index=False))

Tip

Practice TimeSeries Feature Engineering for ML in small, isolated examples before integrating into larger projects. Breaking concepts into small experiments builds genuine understanding faster than reading alone.

Diagram

Loading diagram…

80% of ML work is data preparation — garbage in = garbage out

Practice Task

Note

Practice Task — (1) Write a working example of TimeSeries Feature Engineering for ML from scratch without looking at notes. (2) Modify it to handle an edge case (empty input, null value, or error state). (3) Share your solution in the Priygop community for feedback.

Quick Quiz

Common Mistake

Warning

A common mistake with TimeSeries Feature Engineering for ML is skipping edge case testing — empty inputs, null values, and unexpected data types. Always validate boundary conditions to write robust, production-ready ml code.

Topics in This Module