Data-Driven Testing (CSV, JSON, Excel)

Data-Driven Testing (DDT) separates test logic from test data — the test script runs the same logic repeatedly with different sets of input data. This enables testing with hundreds of data combinations using a single test function, makes non-technical stakeholders able to add test scenarios by editing a CSV file, and dramatically improves test coverage for input-heavy features like forms, pricing engines, and eligibility rules.

35 min•By Priygop Team•Updated 2026

Data-Driven Testing Approaches

# ══════════════════════════════════════════════════════════════
# APPROACH 1: pytest.mark.parametrize (simplest — inline data)
# ══════════════════════════════════════════════════════════════
import pytest

@pytest.mark.parametrize("email,password,expected_url,expected_msg", [
    ("alice@test.com",  "Test@1234",  "/dashboard",   None),              # Valid
    ("wrong@test.com",  "Test@1234",  "/login",        "Invalid credentials"),
    ("alice@test.com",  "wrongpass",  "/login",        "Invalid credentials"),
    ("",                "Test@1234",  "/login",        "Email is required"),
    ("alice@test.com",  "",           "/login",        "Password is required"),
    ("admin@test.com",  "Admin@1234", "/admin",        None),              # Admin user
])
def test_login_data_driven(driver, base_url, email, password, expected_url, expected_msg):
    driver.get(f"{base_url}/login")
    driver.find_element("id", "email").send_keys(email)
    driver.find_element("id", "password").send_keys(password)
    driver.find_element("id", "submit").click()
    
    assert expected_url in driver.current_url
    if expected_msg:
        error = driver.find_element("css selector", ".error-message")
        assert expected_msg in error.text

# ══════════════════════════════════════════════════════════════
# APPROACH 2: CSV file data source (for large datasets)
# ══════════════════════════════════════════════════════════════
# test_data/login_data.csv:
# email,password,expected_url,expected_msg
# alice@test.com,Test@1234,/dashboard,
# wrong@test.com,Test@1234,/login,Invalid credentials

import csv
import pytest

def read_csv_data(filepath):
    """Read test data from CSV and return list of tuples"""
    data = []
    with open(filepath, 'r') as f:
        reader = csv.DictReader(f)
        for row in reader:
            data.append((
                row['email'],
                row['password'],
                row['expected_url'],
                row['expected_msg'] or None
            ))
    return data

@pytest.mark.parametrize("email,password,expected_url,expected_msg",
    read_csv_data("test_data/login_data.csv"))
def test_login_csv(driver, base_url, email, password, expected_url, expected_msg):
    # Same test logic — data from CSV
    pass  # implementation as above

# ══════════════════════════════════════════════════════════════
# APPROACH 3: JSON file data source
# ══════════════════════════════════════════════════════════════
# test_data/products.json:
# [
#   {"sku": "P001", "name": "iPhone", "price": 999, "inStock": true},
#   {"sku": "P002", "name": "MacBook", "price": 1299, "inStock": true},
#   {"sku": "P003", "name": "AirPods", "price": 179, "inStock": false}
# ]

import json

def load_json_data(filepath):
    with open(filepath) as f:
        return json.load(f)

products = load_json_data("test_data/products.json")

@pytest.mark.parametrize("product", products)
def test_product_card_displays_correctly(driver, base_url, product):
    driver.get(f"{base_url}/products/{product['sku']}")
    name_el = driver.find_element("css selector", ".product-name")
    price_el = driver.find_element("css selector", ".product-price")
    
    assert name_el.text == product['name']
    assert f"${product['price']}" in price_el.text
    
    stock_badge = driver.find_element("css selector", ".stock-badge")
    expected_text = "In Stock" if product['inStock'] else "Out of Stock"
    assert stock_badge.text == expected_text

Common Mistakes

Hardcoding test data in test functions — data belongs in external files or parametrize decorators, not embedded in test logic
Not testing invalid data — DDT is often used only for valid inputs; always include invalid/edge case rows in your data files
CSV files without headers — always include headers in CSV files and use DictReader for readable column access
Mixing production test data with test data files — test data files should use fake/anonymized data; never include real user credentials in committed files

Tip

Practice DataDriven Testing CSV JSON Excel in small, isolated examples before integrating into larger projects. Breaking concepts into small experiments builds genuine understanding faster than reading alone.

Diagram

Loading diagram…

Parameterize with external data.

Practice Task

Note

Practice Task — (1) Write a working example of DataDriven Testing CSV JSON Excel from scratch without looking at notes. (2) Modify it to handle an edge case (empty input, null value, or error state). (3) Share your solution in the Priygop community for feedback.

Quick Quiz

Common Mistake

Warning

A common mistake with DataDriven Testing CSV JSON Excel is skipping edge case testing — empty inputs, null values, and unexpected data types. Always validate boundary conditions to write robust, production-ready software testing code.

Key Takeaways

Data-Driven Testing (DDT) separates test logic from test data — the test script runs the same logic repeatedly with different sets of input data.
Hardcoding test data in test functions — data belongs in external files or parametrize decorators, not embedded in test logic
Not testing invalid data — DDT is often used only for valid inputs; always include invalid/edge case rows in your data files
CSV files without headers — always include headers in CSV files and use DictReader for readable column access

Topics in This Module

Data-Driven Testing (CSV, JSON, Excel)

35 min•By Priygop Team•Updated 2026

Data-Driven Testing Approaches

# ══════════════════════════════════════════════════════════════
# APPROACH 1: pytest.mark.parametrize (simplest — inline data)
# ══════════════════════════════════════════════════════════════
import pytest

@pytest.mark.parametrize("email,password,expected_url,expected_msg", [
    ("alice@test.com",  "Test@1234",  "/dashboard",   None),              # Valid
    ("wrong@test.com",  "Test@1234",  "/login",        "Invalid credentials"),
    ("alice@test.com",  "wrongpass",  "/login",        "Invalid credentials"),
    ("",                "Test@1234",  "/login",        "Email is required"),
    ("alice@test.com",  "",           "/login",        "Password is required"),
    ("admin@test.com",  "Admin@1234", "/admin",        None),              # Admin user
])
def test_login_data_driven(driver, base_url, email, password, expected_url, expected_msg):
    driver.get(f"{base_url}/login")
    driver.find_element("id", "email").send_keys(email)
    driver.find_element("id", "password").send_keys(password)
    driver.find_element("id", "submit").click()
    
    assert expected_url in driver.current_url
    if expected_msg:
        error = driver.find_element("css selector", ".error-message")
        assert expected_msg in error.text

# ══════════════════════════════════════════════════════════════
# APPROACH 2: CSV file data source (for large datasets)
# ══════════════════════════════════════════════════════════════
# test_data/login_data.csv:
# email,password,expected_url,expected_msg
# alice@test.com,Test@1234,/dashboard,
# wrong@test.com,Test@1234,/login,Invalid credentials

import csv
import pytest

def read_csv_data(filepath):
    """Read test data from CSV and return list of tuples"""
    data = []
    with open(filepath, 'r') as f:
        reader = csv.DictReader(f)
        for row in reader:
            data.append((
                row['email'],
                row['password'],
                row['expected_url'],
                row['expected_msg'] or None
            ))
    return data

@pytest.mark.parametrize("email,password,expected_url,expected_msg",
    read_csv_data("test_data/login_data.csv"))
def test_login_csv(driver, base_url, email, password, expected_url, expected_msg):
    # Same test logic — data from CSV
    pass  # implementation as above

# ══════════════════════════════════════════════════════════════
# APPROACH 3: JSON file data source
# ══════════════════════════════════════════════════════════════
# test_data/products.json:
# [
#   {"sku": "P001", "name": "iPhone", "price": 999, "inStock": true},
#   {"sku": "P002", "name": "MacBook", "price": 1299, "inStock": true},
#   {"sku": "P003", "name": "AirPods", "price": 179, "inStock": false}
# ]

import json

def load_json_data(filepath):
    with open(filepath) as f:
        return json.load(f)

products = load_json_data("test_data/products.json")

@pytest.mark.parametrize("product", products)
def test_product_card_displays_correctly(driver, base_url, product):
    driver.get(f"{base_url}/products/{product['sku']}")
    name_el = driver.find_element("css selector", ".product-name")
    price_el = driver.find_element("css selector", ".product-price")
    
    assert name_el.text == product['name']
    assert f"${product['price']}" in price_el.text
    
    stock_badge = driver.find_element("css selector", ".stock-badge")
    expected_text = "In Stock" if product['inStock'] else "Out of Stock"
    assert stock_badge.text == expected_text

Common Mistakes

Hardcoding test data in test functions — data belongs in external files or parametrize decorators, not embedded in test logic

Not testing invalid data — DDT is often used only for valid inputs; always include invalid/edge case rows in your data files

CSV files without headers — always include headers in CSV files and use DictReader for readable column access

Mixing production test data with test data files — test data files should use fake/anonymized data; never include real user credentials in committed files

Tip

Diagram

Loading diagram…

Parameterize with external data.

Topics in This Module