Model Monitoring — Detecting Drift in Production

Models degrade in production: the world changes but the model doesn't. Data drift (input distribution shifts), concept drift (label relationship changes), and performance degradation all require monitoring. Production ML without monitoring is like flying blind.

20 min•By Priygop Team•Updated 2026

Production Model Monitoring with Evidently

import evidently
from evidently.report import Report
from evidently.metric_preset import DataDriftPreset, ModelPerformancePreset
from evidently.metrics import *
from evidently.test_suite import TestSuite
from evidently.tests import *
import pandas as pd
import numpy as np

# ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
# EVIDENTLY -- open-source ML monitoring
# ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

# Reference data: distribution at training/deployment time
reference_data = pd.DataFrame({
    "text_length": np.random.normal(150, 50, 1000).clip(10, 500),
    "word_count":  np.random.normal(25, 8, 1000).clip(2, 80),
    "prediction":  np.random.choice(["positive", "negative"], 1000, p=[0.6, 0.4]),
    "confidence":  np.random.beta(8, 2, 1000),
})

# Current data: what's coming in production (after 1 month)
current_data = pd.DataFrame({
    "text_length": np.random.normal(220, 80, 500).clip(10, 500),  # texts getting longer!
    "word_count":  np.random.normal(35, 15, 500).clip(2, 80),
    "prediction":  np.random.choice(["positive", "negative"], 500, p=[0.4, 0.6]),
    "confidence":  np.random.beta(4, 3, 500),                     # confidence dropping!
})

# ── DATA DRIFT REPORT ─────────────────────────────────
drift_report = Report(metrics=[
    DataDriftPreset(),                        # checks all features for distribution shift
    DatasetMissingValuesMetric(),
    ColumnDriftMetric(column_name="text_length"),  # specific column with options
])
drift_report.run(reference_data=reference_data, current_data=current_data)
drift_report.save_html("drift_report.html")

# ── AUTOMATED TEST SUITE ──────────────────────────────
test_suite = TestSuite(tests=[
    TestShareOfDriftedColumns(lt=0.3),     # FAIL if > 30% of features drift
    TestNumberOfMissingValues(lt=0.05),    # FAIL if > 5% missing values
    TestColumnDrift(column_name="confidence", stattest="ks"),  # Kolmogorov-Smirnov test
])
test_suite.run(reference_data=reference_data, current_data=current_data)
results = test_suite.as_dict()

for test in results["tests"]:
    status = "PASS" if test["status"] == "SUCCESS" else "FAIL"
    print(f"  [{status}] {test['name']}: {test.get('description', '')}")

# ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
# WHAT TO MONITOR IN PRODUCTION
# ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
monitoring_checklist = {
    "Input distribution": "Track feature distributions (text length, category frequencies) vs baseline",
    "Prediction distribution": "Monitor class balance -- if suddenly 90% positive, something is wrong",
    "Confidence scores": "Watch for bimodal distribution (overconfident) or uniform (model uncertain)",
    "Latency percentiles": "Track p50, p95, p99 latency -- spikes indicate GPU pressure or memory issues",
    "Error rates": "HTTP 500s, timeout rates, invalid input rates",
    "Throughput":  "Requests per second -- sudden drops may indicate upstream issues",
    "Model accuracy": "For tasks with delayed labels (fraud detection, click prediction), track when labels arrive",
}

# ALERTING THRESHOLDS (common for ML services)
alert_thresholds = {
    "p95_latency_ms":         500,   # alert if 95th percentile > 500ms
    "error_rate_pct":           1,   # alert if > 1% errors
    "prediction_drift_pct":    20,   # alert if prediction distribution shifts > 20%
    "feature_drift_features":   3,   # alert if > 3 features drift simultaneously
}

print("\nProduction monitoring thresholds:")
for metric, threshold in alert_thresholds.items():
    print(f"  Alert when {metric} exceeds {threshold}")

Tip

Practice Model Monitoring Detecting Drift in Production in small, isolated examples before integrating into larger projects. Breaking concepts into small experiments builds genuine understanding faster than reading alone.

Diagram

Loading diagram…

Technical diagram.

Practice Task

Note

Practice Task — (1) Write a working example of Model Monitoring Detecting Drift in Production from scratch without looking at notes. (2) Modify it to handle an edge case (empty input, null value, or error state). (3) Share your solution in the Priygop community for feedback.

Quick Quiz

Common Mistake

Warning

A common mistake with Model Monitoring Detecting Drift in Production is skipping edge case testing — empty inputs, null values, and unexpected data types. Always validate boundary conditions to write robust, production-ready ai code.

Topics in This Module