The Three Pillars of Observability
Observability is the ability to understand the internal state of a system from its external outputs. The three pillars — Metrics, Logs, and Traces — give you complete visibility into your production systems.
Metrics, Logs, and Traces
Metrics: Numerical measurements over time — request rate, error rate, response time, CPU usage. Metrics tell you WHAT is happening. Fast to query, low storage cost. Tools: Prometheus, Datadog. Logs: Timestamped records of discrete events — 'User 12345 logged in', 'Payment failed: card declined'. Logs tell you WHAT HAPPENED in detail. Context-rich but expensive at scale. Tools: ELK Stack, Loki. Traces: Records of a request's journey through distributed services — which microservices were called, how long each took. Traces tell you WHERE the bottleneck is. Tools: Jaeger, Zipkin, OpenTelemetry.
DevOps unifies development and operations in a continuous cycle
Why Observability Matters
- MTTD (Mean Time to Detect): How fast do you know something is broken? Observability reduces MTTD from hours to seconds
- MTTR (Mean Time to Recover): How fast can you diagnose and fix? Traces + logs reduce MTTR from hours to minutes
- Proactive alerting: Know about problems before users call — catch 500 errors before they affect 100% of traffic
- Performance optimization: Identify slow database queries, memory leaks, and bottlenecks in production
- Capacity planning: Understand resource usage trends to predict when you need to scale
The Observability Stack
// Modern observability stack (2026)
const observabilityStack = {
metrics: {
collection: "Prometheus — pull-based metrics from /metrics endpoints",
storage: "Prometheus TSDB (local) or Thanos/Cortex (long-term, distributed)",
visualization: "Grafana — dashboards, alerts, explore",
cloudAlternative: "AWS CloudWatch, Google Cloud Monitoring, Datadog",
},
logs: {
collection: "Promtail or Fluentd — collect logs from pods/nodes",
storage: "Loki (low cost, label-based) or Elasticsearch (full-text search)",
visualization: "Grafana Explore (Loki) or Kibana (Elasticsearch)",
cloudAlternative: "AWS CloudWatch Logs, Google Cloud Logging",
},
traces: {
instrumentation: "OpenTelemetry SDK — language-agnostic tracing instrumentation",
collection: "OpenTelemetry Collector — receives, processes, exports traces",
backend: "Jaeger or Tempo — store and query distributed traces",
visualization: "Grafana Tempo or Jaeger UI",
cloudAlternative: "AWS X-Ray, Google Cloud Trace, Datadog APM",
},
alerts: {
rules: "Prometheus Alerting Rules — define conditions",
manager: "Alertmanager — routing, deduplication, silencing",
channels: "Slack, PagerDuty, OpsGenie, email",
},
};Quick Quiz
Tip
Tip
Practice The Three Pillars of Observability in small, isolated examples before integrating into larger projects. Breaking concepts into small experiments builds genuine understanding faster than reading alone.
Practice Task
Note
Practice Task — (1) Write a working example of The Three Pillars of Observability from scratch without looking at notes. (2) Modify it to handle an edge case (empty input, null value, or error state). (3) Share your solution in the Priygop community for feedback.
Common Mistake
Warning
A common mistake with The Three Pillars of Observability is skipping edge case testing — empty inputs, null values, and unexpected data types. Always validate boundary conditions to write robust, production-ready devops code.
Key Takeaways
- Observability is the ability to understand the internal state of a system from its external outputs.
- MTTD (Mean Time to Detect): How fast do you know something is broken? Observability reduces MTTD from hours to seconds
- MTTR (Mean Time to Recover): How fast can you diagnose and fix? Traces + logs reduce MTTR from hours to minutes
- Proactive alerting: Know about problems before users call — catch 500 errors before they affect 100% of traffic