Performance Metrics & Thresholds

Performance metrics are only valuable when you understand what they mean, how to read them, and what thresholds to set. This topic covers the complete performance metrics vocabulary, industry-standard SLA thresholds, and how to interpret results to find the real bottlenecks rather than chasing metrics that look alarming but aren't actionable.

30 min•By Priygop Team•Updated 2026

Key Performance Metrics Dictionary

Throughput (TPS/RPS): Transactions Per Second — how many requests the system handles per second. Higher is better. Formula: Total requests ÷ Test duration.
Response Time: How long ONE request takes from send to receive. Key percentiles: p50 (median), p90, p95, p99.
p95: 95% of users experience this response time or better. The standard SLA metric. If p95 = 800ms, 5% of users wait >800ms.
p99: 99% of users experience this response time or better. Captures the worst-case experience excluding extreme outliers.
Error Rate: % of requests that returned an error (4xx, 5xx, timeouts). Target: <1% under normal load.
Latency vs Response Time: Latency = network travel time. Response time includes server processing + latency. Distinguish them when debugging.
Concurrent Users: Number of users actively making requests simultaneously. Different from requests per second.
Think Time: Time a user waits between actions. Affects realistic concurrent user count calculation.
Apdex Score: Application Performance Index. Scores between 0–1 based on satisfied/tolerating/frustrated users. >0.8 = Good.

Industry Standard Thresholds and k6 Implementation

// ══════════════════════════════════════════════════════════════
// INDUSTRY STANDARD SLA THRESHOLDS
// (Google's research: 53% of mobile users leave after 3 seconds)
// ══════════════════════════════════════════════════════════════

const industryThresholds = {
    // Consumer web applications (Google, Amazon standards):
    webApp: {
        pageLoad:     { excellent: 1000, good: 2000, acceptable: 3000 }, // ms
        apiResponse:  { excellent: 200,  good: 500,  acceptable: 1000 },
        checkout:     { excellent: 1000, good: 2000, acceptable: 4000 },
        search:       { excellent: 200,  good: 400,  acceptable: 800 },
    },
    
    // API SLAs (p95 values in milliseconds):
    api: {
        readEndpoints:  { p95: 300,  p99: 500 },
        writeEndpoints: { p95: 500,  p99: 1000 },
        reportEndpoints:{ p95: 3000, p99: 5000 },
    },
    
    errorRates: {
        excellent: 0.001,  // 0.1% - excellent
        good:      0.01,   // 1% - acceptable
        poor:      0.05,   // 5% - investigate
        critical:  0.1     // 10% - stop the test
    }
};

// k6 threshold implementation:
export const options = {
  thresholds: {
    // Standard API thresholds (adjust for your app's SLA)
    'http_req_duration{type:read}': ['p(95)<300', 'p(99)<500'],
    'http_req_duration{type:write}': ['p(95)<500', 'p(99)<1000'],
    'http_req_duration{type:checkout}': ['p(95)<2000'],
    'http_req_failed': ['rate<0.01'],  // < 1% error rate
    
    // Group-specific thresholds:
    'group_duration{group:::Authentication}': ['p(95)<500'],
    'group_duration{group:::Checkout}': ['p(95)<3000'],
  }
};

// ── READING JMETER AGGREGATE REPORT ──────────────────────────
// Column: Samples = total requests executed
// Column: Average = mean response time (misleading with outliers!)
// Column: Min / Max = extreme values (useful for identifying outliers)
// Column: Std. Dev. = consistency; high = inconsistent performance
// Column: 90% Line = p90 response time (use this for SLA)
// Column: 95% Line = p95 response time (primary SLA metric)
// Column: 99% Line = p99 (worst-case user experience)
// Column: Error% = error rate — must be < 1% for healthy system
// Column: Throughput = requests per second (cross-check with expected TPS)

// RED FLAGS IN RESULTS:
// 🔴 p99 > 10× p50 (extreme outliers suggest occasional failures)
// 🔴 Error rate increasing as load increases (capacity limit reached)
// 🔴 Response time increasing linearly as users increase (no auto-scaling)
// 🔴 Throughput plateauing before target load reached (bottleneck)

Common Mistakes

Using average response time as the SLA metric — averages mask outliers; p95 and p99 represent what real users experience
Setting the same threshold for all endpoints — search should be <300ms, report generation can be <5000ms; one-size thresholds are meaningless
Ignoring throughput plateau — if TPS stops increasing as users increase, you've found the capacity ceiling; this is critical data, not a test failure
Not measuring during the sustained phase — wait for the system to stabilize (after ramp-up) before measuring; initial cache misses inflate early response times

Tip

Practice Performance Metrics Thresholds in small, isolated examples before integrating into larger projects. Breaking concepts into small experiments builds genuine understanding faster than reading alone.

Diagram

Loading diagram…

Technical diagram.

Practice Task

Note

Practice Task — (1) Write a working example of Performance Metrics Thresholds from scratch without looking at notes. (2) Modify it to handle an edge case (empty input, null value, or error state). (3) Share your solution in the Priygop community for feedback.

Quick Quiz

Common Mistake

Warning

A common mistake with Performance Metrics Thresholds is skipping edge case testing — empty inputs, null values, and unexpected data types. Always validate boundary conditions to write robust, production-ready software testing code.

Key Takeaways

Performance metrics are only valuable when you understand what they mean, how to read them, and what thresholds to set.
Throughput (TPS/RPS): Transactions Per Second — how many requests the system handles per second. Higher is better. Formula: Total requests ÷ Test duration.
Response Time: How long ONE request takes from send to receive. Key percentiles: p50 (median), p90, p95, p99.
p95: 95% of users experience this response time or better. The standard SLA metric. If p95 = 800ms, 5% of users wait >800ms.

Topics in This Module

Performance Metrics & Thresholds

30 min•By Priygop Team•Updated 2026

Key Performance Metrics Dictionary

Throughput (TPS/RPS): Transactions Per Second — how many requests the system handles per second. Higher is better. Formula: Total requests ÷ Test duration.

Response Time: How long ONE request takes from send to receive. Key percentiles: p50 (median), p90, p95, p99.

p95: 95% of users experience this response time or better. The standard SLA metric. If p95 = 800ms, 5% of users wait >800ms.

p99: 99% of users experience this response time or better. Captures the worst-case experience excluding extreme outliers.

Error Rate: % of requests that returned an error (4xx, 5xx, timeouts). Target: <1% under normal load.

Latency vs Response Time: Latency = network travel time. Response time includes server processing + latency. Distinguish them when debugging.

Concurrent Users: Number of users actively making requests simultaneously. Different from requests per second.

Think Time: Time a user waits between actions. Affects realistic concurrent user count calculation.

Apdex Score: Application Performance Index. Scores between 0–1 based on satisfied/tolerating/frustrated users. >0.8 = Good.

Industry Standard Thresholds and k6 Implementation

// ══════════════════════════════════════════════════════════════
// INDUSTRY STANDARD SLA THRESHOLDS
// (Google's research: 53% of mobile users leave after 3 seconds)
// ══════════════════════════════════════════════════════════════

const industryThresholds = {
    // Consumer web applications (Google, Amazon standards):
    webApp: {
        pageLoad:     { excellent: 1000, good: 2000, acceptable: 3000 }, // ms
        apiResponse:  { excellent: 200,  good: 500,  acceptable: 1000 },
        checkout:     { excellent: 1000, good: 2000, acceptable: 4000 },
        search:       { excellent: 200,  good: 400,  acceptable: 800 },
    },
    
    // API SLAs (p95 values in milliseconds):
    api: {
        readEndpoints:  { p95: 300,  p99: 500 },
        writeEndpoints: { p95: 500,  p99: 1000 },
        reportEndpoints:{ p95: 3000, p99: 5000 },
    },
    
    errorRates: {
        excellent: 0.001,  // 0.1% - excellent
        good:      0.01,   // 1% - acceptable
        poor:      0.05,   // 5% - investigate
        critical:  0.1     // 10% - stop the test
    }
};

// k6 threshold implementation:
export const options = {
  thresholds: {
    // Standard API thresholds (adjust for your app's SLA)
    'http_req_duration{type:read}': ['p(95)<300', 'p(99)<500'],
    'http_req_duration{type:write}': ['p(95)<500', 'p(99)<1000'],
    'http_req_duration{type:checkout}': ['p(95)<2000'],
    'http_req_failed': ['rate<0.01'],  // < 1% error rate
    
    // Group-specific thresholds:
    'group_duration{group:::Authentication}': ['p(95)<500'],
    'group_duration{group:::Checkout}': ['p(95)<3000'],
  }
};

// ── READING JMETER AGGREGATE REPORT ──────────────────────────
// Column: Samples = total requests executed
// Column: Average = mean response time (misleading with outliers!)
// Column: Min / Max = extreme values (useful for identifying outliers)
// Column: Std. Dev. = consistency; high = inconsistent performance
// Column: 90% Line = p90 response time (use this for SLA)
// Column: 95% Line = p95 response time (primary SLA metric)
// Column: 99% Line = p99 (worst-case user experience)
// Column: Error% = error rate — must be < 1% for healthy system
// Column: Throughput = requests per second (cross-check with expected TPS)

// RED FLAGS IN RESULTS:
// 🔴 p99 > 10× p50 (extreme outliers suggest occasional failures)
// 🔴 Error rate increasing as load increases (capacity limit reached)
// 🔴 Response time increasing linearly as users increase (no auto-scaling)
// 🔴 Throughput plateauing before target load reached (bottleneck)

Common Mistakes

Using average response time as the SLA metric — averages mask outliers; p95 and p99 represent what real users experience

Setting the same threshold for all endpoints — search should be <300ms, report generation can be <5000ms; one-size thresholds are meaningless

Ignoring throughput plateau — if TPS stops increasing as users increase, you've found the capacity ceiling; this is critical data, not a test failure

Not measuring during the sustained phase — wait for the system to stabilize (after ramp-up) before measuring; initial cache misses inflate early response times

Tip

Diagram

Loading diagram…

Technical diagram.

Topics in This Module