Time-Series & Datetime EDA
Datetime features are incredibly rich. A single timestamp can yield: hour of day, day of week, month, quarter, year, days since a reference event, is_weekend, is_holiday, and cyclical encodings. Time-series EDA looks for trends (steadily increasing sales), seasonality (higher on weekends), and anomalies (sudden drops indicating data issues).
Extracting Value from Datetime Features
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
np.random.seed(42)
# Create a time series dataset (e-commerce transactions)
dates = pd.date_range("2023-01-01", "2023-12-31", freq="h") # hourly 2023
n = len(dates)
df = pd.DataFrame({
"timestamp": dates,
"sales": (200 + 50 * np.sin(np.arange(n) * 2 * np.pi / 24) # daily cycle
+ 100 * np.sin(np.arange(n) * 2 * np.pi / (24*7)) # weekly cycle
+ np.random.normal(0, 20, n)).clip(0), # noise
"is_fraud": np.random.choice([0, 1], n, p=[0.99, 0.01]),
})
# STEP 1: EXTRACT DATETIME FEATURES
df["hour"] = df["timestamp"].dt.hour
df["day_of_week"] = df["timestamp"].dt.dayofweek # 0=Mon, 6=Sun
df["day_name"] = df["timestamp"].dt.day_name()
df["month"] = df["timestamp"].dt.month
df["month_name"] = df["timestamp"].dt.month_name()
df["quarter"] = df["timestamp"].dt.quarter
df["is_weekend"] = df["timestamp"].dt.dayofweek.isin([5, 6]).astype(int)
df["week_of_year"] = df["timestamp"].dt.isocalendar().week.astype(int)
print("Extracted datetime features:")
print(df[["timestamp", "hour", "day_of_week", "month", "is_weekend"]].head(5))
# STEP 2: VISUALIZE TEMPORAL PATTERNS
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
# Daily pattern
hourly_avg = df.groupby("hour")["sales"].mean()
axes[0, 0].plot(hourly_avg.index, hourly_avg.values, color="steelblue", linewidth=2)
axes[0, 0].fill_between(hourly_avg.index, hourly_avg.values, alpha=0.3, color="steelblue")
axes[0, 0].set_title("Average Sales by Hour of Day")
axes[0, 0].set_xlabel("Hour")
# Weekly pattern
day_order = ["Monday","Tuesday","Wednesday","Thursday","Friday","Saturday","Sunday"]
weekly_avg = df.groupby("day_name")["sales"].mean().reindex(day_order)
colors_week = ["steelblue"] * 5 + ["tomato"] * 2 # weekend in red
weekly_avg.plot(kind="bar", ax=axes[0, 1], color=colors_week, rot=30)
axes[0, 1].set_title("Average Sales by Day of Week")
# Monthly pattern
monthly_avg = df.groupby("month")["sales"].mean()
axes[1, 0].plot(monthly_avg.index, monthly_avg.values, marker="o", color="coral", linewidth=2)
axes[1, 0].set_title("Average Sales by Month")
axes[1, 0].set_xticks(range(1, 13))
# Fraud by hour
fraud_by_hour = df.groupby("hour")["is_fraud"].mean()
axes[1, 1].bar(fraud_by_hour.index, fraud_by_hour.values, color="tomato", alpha=0.7)
axes[1, 1].set_title("Fraud Rate by Hour of Day")
axes[1, 1].set_xlabel("Hour")
plt.tight_layout()
plt.savefig("temporal_eda.png", dpi=100, bbox_inches="tight")
plt.show()
# STEP 3: CYCLICAL ENCODING (for ML models that don't know 23 is close to 0)
df["hour_sin"] = np.sin(2 * np.pi * df["hour"] / 24)
df["hour_cos"] = np.cos(2 * np.pi * df["hour"] / 24)
df["day_sin"] = np.sin(2 * np.pi * df["day_of_week"] / 7)
df["day_cos"] = np.cos(2 * np.pi * df["day_of_week"] / 7)
print("\nCyclical encoding: hour=23 and hour=0 are now close in encoded space")
print(f" Hour 23: sin={df['hour_sin'].iloc[23]:.3f}, cos={df['hour_cos'].iloc[23]:.3f}")
print(f" Hour 0: sin={df['hour_sin'].iloc[0]:.3f}, cos={df['hour_cos'].iloc[0]:.3f}")Tip
Tip
Practice TimeSeries Datetime EDA in small, isolated examples before integrating into larger projects. Breaking concepts into small experiments builds genuine understanding faster than reading alone.
Technical diagram.
Practice Task
Note
Practice Task — (1) Write a working example of TimeSeries Datetime EDA from scratch without looking at notes. (2) Modify it to handle an edge case (empty input, null value, or error state). (3) Share your solution in the Priygop community for feedback.
Quick Quiz
Common Mistake
Warning
A common mistake with TimeSeries Datetime EDA is skipping edge case testing — empty inputs, null values, and unexpected data types. Always validate boundary conditions to write robust, production-ready ml code.