Performance Metrics & Thresholds
Performance metrics are only valuable when you understand what they mean, how to read them, and what thresholds to set. This topic covers the complete performance metrics vocabulary, industry-standard SLA thresholds, and how to interpret results to find the real bottlenecks rather than chasing metrics that look alarming but aren't actionable.
Key Performance Metrics Dictionary
- Throughput (TPS/RPS): Transactions Per Second — how many requests the system handles per second. Higher is better. Formula: Total requests ÷ Test duration.
- Response Time: How long ONE request takes from send to receive. Key percentiles: p50 (median), p90, p95, p99.
- p95: 95% of users experience this response time or better. The standard SLA metric. If p95 = 800ms, 5% of users wait >800ms.
- p99: 99% of users experience this response time or better. Captures the worst-case experience excluding extreme outliers.
- Error Rate: % of requests that returned an error (4xx, 5xx, timeouts). Target: <1% under normal load.
- Latency vs Response Time: Latency = network travel time. Response time includes server processing + latency. Distinguish them when debugging.
- Concurrent Users: Number of users actively making requests simultaneously. Different from requests per second.
- Think Time: Time a user waits between actions. Affects realistic concurrent user count calculation.
- Apdex Score: Application Performance Index. Scores between 0–1 based on satisfied/tolerating/frustrated users. >0.8 = Good.
Industry Standard Thresholds and k6 Implementation
// ══════════════════════════════════════════════════════════════
// INDUSTRY STANDARD SLA THRESHOLDS
// (Google's research: 53% of mobile users leave after 3 seconds)
// ══════════════════════════════════════════════════════════════
const industryThresholds = {
// Consumer web applications (Google, Amazon standards):
webApp: {
pageLoad: { excellent: 1000, good: 2000, acceptable: 3000 }, // ms
apiResponse: { excellent: 200, good: 500, acceptable: 1000 },
checkout: { excellent: 1000, good: 2000, acceptable: 4000 },
search: { excellent: 200, good: 400, acceptable: 800 },
},
// API SLAs (p95 values in milliseconds):
api: {
readEndpoints: { p95: 300, p99: 500 },
writeEndpoints: { p95: 500, p99: 1000 },
reportEndpoints:{ p95: 3000, p99: 5000 },
},
errorRates: {
excellent: 0.001, // 0.1% - excellent
good: 0.01, // 1% - acceptable
poor: 0.05, // 5% - investigate
critical: 0.1 // 10% - stop the test
}
};
// k6 threshold implementation:
export const options = {
thresholds: {
// Standard API thresholds (adjust for your app's SLA)
'http_req_duration{type:read}': ['p(95)<300', 'p(99)<500'],
'http_req_duration{type:write}': ['p(95)<500', 'p(99)<1000'],
'http_req_duration{type:checkout}': ['p(95)<2000'],
'http_req_failed': ['rate<0.01'], // < 1% error rate
// Group-specific thresholds:
'group_duration{group:::Authentication}': ['p(95)<500'],
'group_duration{group:::Checkout}': ['p(95)<3000'],
}
};
// ── READING JMETER AGGREGATE REPORT ──────────────────────────
// Column: Samples = total requests executed
// Column: Average = mean response time (misleading with outliers!)
// Column: Min / Max = extreme values (useful for identifying outliers)
// Column: Std. Dev. = consistency; high = inconsistent performance
// Column: 90% Line = p90 response time (use this for SLA)
// Column: 95% Line = p95 response time (primary SLA metric)
// Column: 99% Line = p99 (worst-case user experience)
// Column: Error% = error rate — must be < 1% for healthy system
// Column: Throughput = requests per second (cross-check with expected TPS)
// RED FLAGS IN RESULTS:
// 🔴 p99 > 10× p50 (extreme outliers suggest occasional failures)
// 🔴 Error rate increasing as load increases (capacity limit reached)
// 🔴 Response time increasing linearly as users increase (no auto-scaling)
// 🔴 Throughput plateauing before target load reached (bottleneck)Common Mistakes
- Using average response time as the SLA metric — averages mask outliers; p95 and p99 represent what real users experience
- Setting the same threshold for all endpoints — search should be <300ms, report generation can be <5000ms; one-size thresholds are meaningless
- Ignoring throughput plateau — if TPS stops increasing as users increase, you've found the capacity ceiling; this is critical data, not a test failure
- Not measuring during the sustained phase — wait for the system to stabilize (after ramp-up) before measuring; initial cache misses inflate early response times
Tip
Tip
Practice Performance Metrics Thresholds in small, isolated examples before integrating into larger projects. Breaking concepts into small experiments builds genuine understanding faster than reading alone.
Technical diagram.
Practice Task
Note
Practice Task — (1) Write a working example of Performance Metrics Thresholds from scratch without looking at notes. (2) Modify it to handle an edge case (empty input, null value, or error state). (3) Share your solution in the Priygop community for feedback.
Quick Quiz
Common Mistake
Warning
A common mistake with Performance Metrics Thresholds is skipping edge case testing — empty inputs, null values, and unexpected data types. Always validate boundary conditions to write robust, production-ready software testing code.
Key Takeaways
- Performance metrics are only valuable when you understand what they mean, how to read them, and what thresholds to set.
- Throughput (TPS/RPS): Transactions Per Second — how many requests the system handles per second. Higher is better. Formula: Total requests ÷ Test duration.
- Response Time: How long ONE request takes from send to receive. Key percentiles: p50 (median), p90, p95, p99.
- p95: 95% of users experience this response time or better. The standard SLA metric. If p95 = 800ms, 5% of users wait >800ms.