Monitoring & Observability
Observability is the three pillars: metrics (Prometheus/Grafana), logs (ELK Stack), and traces (Jaeger/Zipkin). Together they answer: what is happening, why it's happening, and where the problem is.
40 min•By Priygop Team•Last updated: Feb 2026
Three Pillars of Observability
- Metrics — Numerical time-series data. Prometheus scrapes /metrics endpoints. Grafana visualizes. Alert on: high error rate, latency > SLO, memory > 90%
- Logs — Structured JSON logs. ELK Stack: Elasticsearch (storage), Logstash (pipeline), Kibana (visualization). Fluentd/Fluentbit for K8s log collection
- Traces — Distributed request tracking. OpenTelemetry instruments code. Jaeger shows request path across microservices. Find which service is slow
- SLI (Service Level Indicator) — Measurable metric: request error rate, p99 latency, uptime percentage
- SLO (Service Level Objective) — Target for SLI: 99.9% uptime, p99 latency < 500ms. Internal commitment
- SLA (Service Level Agreement) — Contract with customers. SLO is stricter than SLA to have error budget
- Error budget — (1 - SLO) × time period. 99.9% uptime = 8.7 hours/year of allowed downtime
- Alerting — Alert on SLO burn rate, not raw metrics. Page on: prediction to exhaust budget in 1 hour