The Three Pillars of Observability

Observability is the ability to understand the internal state of a system from its external outputs. The three pillars — Metrics, Logs, and Traces — give you complete visibility into your production systems.

20 min•By Priygop Team•Updated 2026

Metrics, Logs, and Traces

Metrics: Numerical measurements over time — request rate, error rate, response time, CPU usage. Metrics tell you WHAT is happening. Fast to query, low storage cost. Tools: Prometheus, Datadog. Logs: Timestamped records of discrete events — 'User 12345 logged in', 'Payment failed: card declined'. Logs tell you WHAT HAPPENED in detail. Context-rich but expensive at scale. Tools: ELK Stack, Loki. Traces: Records of a request's journey through distributed services — which microservices were called, how long each took. Traces tell you WHERE the bottleneck is. Tools: Jaeger, Zipkin, OpenTelemetry.

Diagram

Loading diagram…

DevOps unifies development and operations in a continuous cycle

Why Observability Matters

MTTD (Mean Time to Detect): How fast do you know something is broken? Observability reduces MTTD from hours to seconds
MTTR (Mean Time to Recover): How fast can you diagnose and fix? Traces + logs reduce MTTR from hours to minutes
Proactive alerting: Know about problems before users call — catch 500 errors before they affect 100% of traffic
Performance optimization: Identify slow database queries, memory leaks, and bottlenecks in production
Capacity planning: Understand resource usage trends to predict when you need to scale

The Observability Stack

// Modern observability stack (2026)

const observabilityStack = {
  metrics: {
    collection: "Prometheus — pull-based metrics from /metrics endpoints",
    storage: "Prometheus TSDB (local) or Thanos/Cortex (long-term, distributed)",
    visualization: "Grafana — dashboards, alerts, explore",
    cloudAlternative: "AWS CloudWatch, Google Cloud Monitoring, Datadog",
  },
  logs: {
    collection: "Promtail or Fluentd — collect logs from pods/nodes",
    storage: "Loki (low cost, label-based) or Elasticsearch (full-text search)",
    visualization: "Grafana Explore (Loki) or Kibana (Elasticsearch)",
    cloudAlternative: "AWS CloudWatch Logs, Google Cloud Logging",
  },
  traces: {
    instrumentation: "OpenTelemetry SDK — language-agnostic tracing instrumentation",
    collection: "OpenTelemetry Collector — receives, processes, exports traces",
    backend: "Jaeger or Tempo — store and query distributed traces",
    visualization: "Grafana Tempo or Jaeger UI",
    cloudAlternative: "AWS X-Ray, Google Cloud Trace, Datadog APM",
  },
  alerts: {
    rules: "Prometheus Alerting Rules — define conditions",
    manager: "Alertmanager — routing, deduplication, silencing",
    channels: "Slack, PagerDuty, OpsGenie, email",
  },
};

Quick Quiz

Tip

Practice The Three Pillars of Observability in small, isolated examples before integrating into larger projects. Breaking concepts into small experiments builds genuine understanding faster than reading alone.

Practice Task

Note

Practice Task — (1) Write a working example of The Three Pillars of Observability from scratch without looking at notes. (2) Modify it to handle an edge case (empty input, null value, or error state). (3) Share your solution in the Priygop community for feedback.

Common Mistake

Warning

A common mistake with The Three Pillars of Observability is skipping edge case testing — empty inputs, null values, and unexpected data types. Always validate boundary conditions to write robust, production-ready devops code.

Key Takeaways

Observability is the ability to understand the internal state of a system from its external outputs.
MTTD (Mean Time to Detect): How fast do you know something is broken? Observability reduces MTTD from hours to seconds
MTTR (Mean Time to Recover): How fast can you diagnose and fix? Traces + logs reduce MTTR from hours to minutes
Proactive alerting: Know about problems before users call — catch 500 errors before they affect 100% of traffic

Topics in This Module

Metrics, Logs, and Traces

Diagram

Loading diagram…

DevOps unifies development and operations in a continuous cycle

Why Observability Matters

MTTD (Mean Time to Detect): How fast do you know something is broken? Observability reduces MTTD from hours to seconds

MTTR (Mean Time to Recover): How fast can you diagnose and fix? Traces + logs reduce MTTR from hours to minutes

Proactive alerting: Know about problems before users call — catch 500 errors before they affect 100% of traffic

Performance optimization: Identify slow database queries, memory leaks, and bottlenecks in production

Capacity planning: Understand resource usage trends to predict when you need to scale

The Observability Stack

// Modern observability stack (2026)

const observabilityStack = {
  metrics: {
    collection: "Prometheus — pull-based metrics from /metrics endpoints",
    storage: "Prometheus TSDB (local) or Thanos/Cortex (long-term, distributed)",
    visualization: "Grafana — dashboards, alerts, explore",
    cloudAlternative: "AWS CloudWatch, Google Cloud Monitoring, Datadog",
  },
  logs: {
    collection: "Promtail or Fluentd — collect logs from pods/nodes",
    storage: "Loki (low cost, label-based) or Elasticsearch (full-text search)",
    visualization: "Grafana Explore (Loki) or Kibana (Elasticsearch)",
    cloudAlternative: "AWS CloudWatch Logs, Google Cloud Logging",
  },
  traces: {
    instrumentation: "OpenTelemetry SDK — language-agnostic tracing instrumentation",
    collection: "OpenTelemetry Collector — receives, processes, exports traces",
    backend: "Jaeger or Tempo — store and query distributed traces",
    visualization: "Grafana Tempo or Jaeger UI",
    cloudAlternative: "AWS X-Ray, Google Cloud Trace, Datadog APM",
  },
  alerts: {
    rules: "Prometheus Alerting Rules — define conditions",
    manager: "Alertmanager — routing, deduplication, silencing",
    channels: "Slack, PagerDuty, OpsGenie, email",
  },
};

Topics in This Module