The Three Pillars of Observability
Observability is the ability to understand the internal state of a system by examining its external outputs. Modern distributed systems are too complex to monitor with simple up/down checks — you need metrics, logs, and traces working together to understand not just that something is wrong, but why it's happening and where.
Monitoring vs Observability
Monitoring tells you when something is broken. Observability tells you why it's broken and helps you figure out what you didn't even know to look for.
In the early days of web applications, monitoring was simple: ping the server, check if it responds, alert if it doesn't. That works when you have one server. It fails catastrophically when you have 500 microservices, each making calls to a dozen other services, all running across three cloud regions.
Observability emerged from this complexity. It borrows a concept from control theory: a system is observable if you can infer its internal state from its external outputs. For software, those outputs are metrics, logs, and distributed traces — the three pillars of observability.
The practical difference shows up during incidents. Monitoring tells you 'payment service is slow.' Observability lets you trace a single user's failed request through 8 microservices, pinpoint the database query taking 3.2 seconds, identify it's hitting a missing index on the orders table, and fix the problem — all without guessing.
Each model shifts more responsibility from you to the cloud provider
The Three Pillars
- Metrics — Numerical measurements aggregated over time. Examples: HTTP error rate, request latency p99, memory usage %. Tools: Prometheus (collection), Grafana (visualization). Cheap to store, fast to query, but lose individual request detail
- Logs — Timestamped records of discrete events. Examples: 'User 42 placed order #1234', 'DB query took 3,200ms'. Tools: ELK Stack, Loki. Expensive at scale but contain rich debug detail
- Traces — A request's full journey across services. Shows which services were called, in what order, and how long each took. Essential for microservices. Tools: Jaeger, Zipkin, Tempo. OpenTelemetry is the universal instrumentation standard
When to Use Each Pillar
The three pillars answer different questions. Metrics answer 'Is my system healthy right now?' — your dashboard view. When a metric exceeds a threshold, an alert fires. Metrics tell you something is wrong.
Logs answer 'What exactly happened?' — when an alert fires, you search logs to find the specific error messages, stack traces, and request context.
Traces answer 'Where in the system did it go wrong?' — in microservices a single user request may touch 10 services. A trace shows: API Gateway (2ms) → Auth Service (15ms) → Product Service (8ms) → Database (3,200ms ← the bottleneck). Without traces, finding the slow service in a chain of 10 requires inspecting logs in each service individually.
Quick Quiz
Tip
Tip
Practice The Three Pillars of Observability in small, isolated examples before integrating into larger projects. Breaking concepts into small experiments builds genuine understanding faster than reading alone.
Practice Task
Note
Practice Task — (1) Write a working example of The Three Pillars of Observability from scratch without looking at notes. (2) Modify it to handle an edge case (empty input, null value, or error state). (3) Share your solution in the Priygop community for feedback.
Common Mistake
Warning
A common mistake with The Three Pillars of Observability is skipping edge case testing — empty inputs, null values, and unexpected data types. Always validate boundary conditions to write robust, production-ready cloud code.
Key Takeaways
- Observability is the ability to understand the internal state of a system by examining its external outputs.
- Metrics — Numerical measurements aggregated over time. Examples: HTTP error rate, request latency p99, memory usage %. Tools: Prometheus (collection), Grafana (visualization). Cheap to store, fast to query, but lose individual request detail
- Logs — Timestamped records of discrete events. Examples: 'User 42 placed order #1234', 'DB query took 3,200ms'. Tools: ELK Stack, Loki. Expensive at scale but contain rich debug detail
- Traces — A request's full journey across services. Shows which services were called, in what order, and how long each took. Essential for microservices. Tools: Jaeger, Zipkin, Tempo. OpenTelemetry is the universal instrumentation standard