Monitoring & Observability

Observability is the three pillars: metrics (Prometheus/Grafana), logs (ELK Stack), and traces (Jaeger/Zipkin). Together they answer: what is happening, why it's happening, and where the problem is.

40 min•By Priygop Team•Last updated: Feb 2026

Three Pillars of Observability

Metrics — Numerical time-series data. Prometheus scrapes /metrics endpoints. Grafana visualizes. Alert on: high error rate, latency > SLO, memory > 90%
Logs — Structured JSON logs. ELK Stack: Elasticsearch (storage), Logstash (pipeline), Kibana (visualization). Fluentd/Fluentbit for K8s log collection
Traces — Distributed request tracking. OpenTelemetry instruments code. Jaeger shows request path across microservices. Find which service is slow
SLI (Service Level Indicator) — Measurable metric: request error rate, p99 latency, uptime percentage
SLO (Service Level Objective) — Target for SLI: 99.9% uptime, p99 latency < 500ms. Internal commitment
SLA (Service Level Agreement) — Contract with customers. SLO is stricter than SLA to have error budget
Error budget — (1 - SLO) × time period. 99.9% uptime = 8.7 hours/year of allowed downtime
Alerting — Alert on SLO burn rate, not raw metrics. Page on: prediction to exhaust budget in 1 hour

Quick Quiz

Next Module →