Bottleneck Identification & Analysis
Finding performance bottlenecks requires systematic analysis across multiple layers — application code, database queries, network, and infrastructure. A performance test that shows 'p95 = 3 seconds' tells you there's a problem; bottleneck analysis tells you WHERE the problem is and what to fix. This is the skill that makes performance testers enormously valuable.
Bottleneck Analysis Framework
- Step 1 — Isolate the slow endpoint: Compare per-endpoint p95 times. The endpoint with dramatically higher times is the starting point.
- Step 2 — Check server resource utilization: CPU, Memory, Network I/O, Disk I/O during peak load. >70% CPU or >85% memory indicates resource exhaustion.
- Step 3 — Database query analysis: Most web application bottlenecks are in the database. Run EXPLAIN ANALYZE on slow queries. Check for missing indexes, N+1 queries.
- Step 4 — Application profiling: Add APM (Application Performance Monitoring) — Datadog, New Relic, Jaeger. Identify which code path consumes the most time.
- Step 5 — Network analysis: Check if time-to-first-byte (TTFB) is high (server processing) vs download time (large payload, slow network).
- Step 6 — Connection pool analysis: Database/HTTP connection pool exhaustion causes queueing — response times spike while system waits for a connection.
Bottleneck Investigation with k6 Data
// ══════════════════════════════════════════════════════════════
// REAL PERFORMANCE INVESTIGATION SCENARIO
// Observation: p95 response time for GET /api/orders = 8 seconds under 100 users
// Expected SLA: p95 < 500ms
// ══════════════════════════════════════════════════════════════
// STEP 1: k6 per-URL breakdown shows GET /api/orders is the outlier
// k6 --out json=results.json results:
// http_req_duration{url:"/api/orders"}: p(95)=8132ms ← 16x over SLA!
// http_req_duration{url:"/api/products"}: p(95)=245ms ← Fine
// STEP 2: Server monitoring shows CPU normal, DB connections at 100%
// MySQL max_connections hit → requests wait in queue
// Fix attempt 1: Increase connection pool from 10 → 50
// STEP 3: Database slow query log during test:
// Query: SELECT * FROM orders o
// JOIN order_items i ON i.order_id = o.id
// JOIN products p ON p.id = i.product_id
// WHERE o.user_id = ?
// ORDER BY o.created_at DESC
// Execution time: 2.4 seconds (on 500k orders table)
// EXPLAIN: Full table scan on order_items (NO INDEX on order_id!)
// STEP 4: Add missing index:
// CREATE INDEX idx_order_items_order_id ON order_items(order_id);
// CREATE INDEX idx_orders_user_id ON orders(user_id);
// Query execution time: 2400ms → 4ms (600x improvement!)
// STEP 5: Rerun load test after fixes:
// http_req_duration{url:"/api/orders"}: p(95)=87ms ← Below SLA!
// ── BOTTLENECK TYPES AND THEIR SYMPTOMS ──────────────────────
const bottleneckPatterns = [
{
bottleneck: "Missing database index",
symptoms: ["Slow as data grows", "CPU spikes on DB server", "EXPLAIN shows full table scan"],
fix: "Add index on frequently queried/joined column"
},
{
bottleneck: "N+1 query problem",
symptoms: ["Response time scales linearly with result size", "1 page load → 51 SQL queries"],
fix: "Use JOIN instead of separate queries per item"
},
{
bottleneck: "Connection pool exhaustion",
symptoms: ["Sudden response time spike at specific concurrency", "Requests queue, then all fail at once"],
fix: "Increase pool size or add connection timeout with retry"
},
{
bottleneck: "Memory leak",
symptoms: ["Performance degrades over time (soak test)", "Memory grows continuously, never released"],
fix: "Profile heap dumps; close connections/streams properly"
},
{
bottleneck: "Inefficient algorithm",
symptoms: ["CPU at 100% under moderate load", "One endpoint much slower than others with similar data volumes"],
fix: "Profile with CPU profiler; optimize O(n²) to O(n log n) where possible"
}
];Common Mistakes
- Guessing at the bottleneck without data — don't optimize random code; use profilers, APM, and slow query logs to find the actual bottleneck
- Fixing symptoms instead of root cause — adding caching to a slow endpoint hides an N+1 query problem; find and fix the actual inefficiency
- Retesting after fixes without baseline comparison — always compare post-fix results to the same load profile as pre-fix results
- Not involving developers in analysis — performance testers find WHERE it's slow; developers know WHY it's slow; collaborate in the analysis, not just the reporting
Tip
Tip
Practice Bottleneck Identification Analysis in small, isolated examples before integrating into larger projects. Breaking concepts into small experiments builds genuine understanding faster than reading alone.
Playwright rising fast — modern API, auto-waits, all browsers
Practice Task
Note
Practice Task — (1) Write a working example of Bottleneck Identification Analysis from scratch without looking at notes. (2) Modify it to handle an edge case (empty input, null value, or error state). (3) Share your solution in the Priygop community for feedback.
Quick Quiz
Common Mistake
Warning
A common mistake with Bottleneck Identification Analysis is skipping edge case testing — empty inputs, null values, and unexpected data types. Always validate boundary conditions to write robust, production-ready software testing code.
Key Takeaways
- Finding performance bottlenecks requires systematic analysis across multiple layers — application code, database queries, network, and infrastructure.
- Step 1 — Isolate the slow endpoint: Compare per-endpoint p95 times. The endpoint with dramatically higher times is the starting point.
- Step 2 — Check server resource utilization: CPU, Memory, Network I/O, Disk I/O during peak load. >70% CPU or >85% memory indicates resource exhaustion.
- Step 3 — Database query analysis: Most web application bottlenecks are in the database. Run EXPLAIN ANALYZE on slow queries. Check for missing indexes, N+1 queries.