Skip to main content

SQL Interview Questions

Master these 31 carefully curated interview questions to ace your next SQL Interview Questions interview.

Quick Answer

SQL manages relational databases. DDL (define schema), DML (manipulate data), DCL (permissions), TCL (transactions).

Detailed Explanation

DDL: CREATE, ALTER, DROP, TRUNCATE. DML: SELECT, INSERT, UPDATE, DELETE. DCL: GRANT, REVOKE. TCL: COMMIT, ROLLBACK, SAVEPOINT. SQL is declarative. Standard ANSI SQL with vendor extensions (PostgreSQL, MySQL, SQL Server).

Quick Answer

INNER (matching rows), LEFT (all left + matching), RIGHT (all right + matching), FULL (all from both), CROSS (cartesian).

Detailed Explanation

INNER JOIN: only matching rows. LEFT JOIN: all from left, NULL for non-matching right. FULL OUTER: all from both. CROSS JOIN: every combination. SELF JOIN: table joined with itself. Performance: INNER fastest, CROSS most expensive.

Quick Answer

Indexes are data structures speeding up retrieval by creating sorted references to rows, like a book's index.

Detailed Explanation

Types: B-tree (default), Hash (exact match), GIN (full-text), GiST (geometric). Benefits: faster SELECT, WHERE, JOIN. Costs: slower INSERT/UPDATE, extra storage. Index columns in WHERE/JOIN/ORDER BY. EXPLAIN ANALYZE to verify usage.

Quick Answer

Window functions calculate across related rows without collapsing them: ROW_NUMBER, RANK, LAG, LEAD over PARTITION BY.

Detailed Explanation

Syntax: function() OVER (PARTITION BY col ORDER BY col). Functions: ROW_NUMBER, RANK, DENSE_RANK, LAG, LEAD, SUM, AVG. Unlike GROUP BY, preserves individual rows. Use for rankings, running totals, comparing to previous rows.

Quick Answer

CTEs (WITH clause) are temporary named result sets improving readability and enabling recursive queries for hierarchies.

Detailed Explanation

WITH cte AS (SELECT ...) SELECT FROM cte. Recursive: WITH RECURSIVE for hierarchical data (org charts, categories). CTEs are not always materialized — optimizer may inline them.

Quick Answer

Normalization eliminates data redundancy by organizing into related tables (1NF-3NF). Denormalization adds redundancy for read speed.

Detailed Explanation

1NF: atomic values. 2NF: no partial dependencies. 3NF: no transitive dependencies. Benefits: integrity, reduced storage. Denormalization: intentional redundancy for joins avoidance. Normalize for OLTP, denormalize for OLAP.

Quick Answer

Use EXPLAIN ANALYZE, add indexes, rewrite queries, optimize JOINs, use pagination, and partition large tables.

Detailed Explanation

Steps: (1) EXPLAIN ANALYZE for execution plan. (2) Add indexes for sequential scans. (3) Check JOIN order. (4) Covering indexes. (5) Avoid SELECT *. (6) Replace subqueries with JOINs. (7) Pagination. (8) Partition large tables. (9) Connection pooling. (10) Materialized views for complex aggregations.

Quick Answer

Atomicity (all or nothing), Consistency (valid state), Isolation (concurrent safety), Durability (survives crashes).

Detailed Explanation

Isolation levels: READ UNCOMMITTED, READ COMMITTED, REPEATABLE READ, SERIALIZABLE. Trade safety for performance. BEGIN/COMMIT/ROLLBACK. PostgreSQL default: READ COMMITTED. SERIALIZABLE prevents all anomalies but may need retry logic.

Quick Answer

Profile with EXPLAIN, add composite indexes, use materialized views, consider partitioning and pre-aggregation.

Detailed Explanation

Steps: (1) EXPLAIN ANALYZE. (2) Add composite indexes. (3) Materialized views for aggregations. (4) Partition by date. (5) Pre-aggregate in summary tables. (6) Consider columnar storage for analytics. (7) Parallel query. (8) Application-level caching.

Quick Answer

Core: users, products, categories, orders, order_items, addresses, payments, reviews with proper foreign keys and indexes.

Detailed Explanation

Tables: users(id, email), products(id, title, price, category_id, stock), categories(id, name, parent_id), orders(id, user_id, status, total), order_items(order_id, product_id, quantity, price), addresses, payments, reviews. Indexes on foreign keys and frequently queried columns.

Quick Answer

WHERE filters rows before grouping; HAVING filters groups after GROUP BY aggregation.

Detailed Explanation

WHERE: applied to individual rows before GROUP BY, cannot use aggregate functions (SUM, COUNT, AVG). HAVING: applied to groups after GROUP BY, can use aggregate functions. Example: SELECT dept, COUNT(*) FROM employees WHERE salary > 50000 GROUP BY dept HAVING COUNT(*) > 5 — first filters rows by salary, then groups and filters groups by count. HAVING without GROUP BY applies to entire result as one group. Performance: WHERE reduces data before aggregation (more efficient). Use WHERE for row-level filters, HAVING for aggregate conditions.

Quick Answer

Joins combine rows from two or more tables based on related columns. Types: INNER, LEFT, RIGHT, FULL, CROSS, SELF.

Detailed Explanation

INNER JOIN: returns matching rows from both tables. LEFT JOIN: all rows from left + matching from right (NULL for no match). RIGHT JOIN: all rows from right + matching from left. FULL OUTER JOIN: all rows from both (NULLs where no match). CROSS JOIN: cartesian product (every row × every row). SELF JOIN: table joined with itself (hierarchical data, comparisons). Natural join: auto-matches same-named columns (avoid — fragile). Performance: join order matters, indexes on join columns critical. Use EXPLAIN to analyze join performance.

Quick Answer

UNION combines results and removes duplicates (with sort overhead); UNION ALL keeps all rows including duplicates (faster).

Detailed Explanation

UNION: combines result sets, removes duplicate rows (implicit DISTINCT + sort). UNION ALL: combines without removing duplicates — faster because no sorting/comparison. Rules: same number of columns, compatible data types, column names from first SELECT. Use UNION ALL when you know there are no duplicates or duplicates are acceptable — much better performance. INTERSECT: rows in both. EXCEPT/MINUS: rows in first but not second. UNION ALL is almost always preferred over UNION for performance.

Quick Answer

Indexes are data structures (usually B-tree) that speed up data retrieval at the cost of slower writes and additional storage.

Detailed Explanation

B-tree index: default, ordered structure for range and equality queries. Hash index: equality lookups only (faster for exact match). Composite index: multiple columns — column order matters (leftmost prefix rule). Covering index: includes all query columns, avoiding table lookup. Unique index: enforces uniqueness. Partial index: indexes subset of rows (WHERE condition). Full-text index: text search. GIN/GiST: for arrays, JSON, geometric data (PostgreSQL). Trade-offs: faster reads, slower writes (index maintenance), disk space. Use EXPLAIN ANALYZE to verify index usage.

Quick Answer

Window functions perform calculations across a set of rows related to the current row, without collapsing rows like GROUP BY.

Detailed Explanation

Syntax: function() OVER (PARTITION BY col ORDER BY col ROWS BETWEEN ...). Functions: ROW_NUMBER() (unique sequential), RANK() (gaps on ties), DENSE_RANK() (no gaps), NTILE(n) (divide into buckets), LAG/LEAD (previous/next row), FIRST_VALUE/LAST_VALUE, SUM/AVG/COUNT as running totals. Frame: ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW (running total). PARTITION BY creates independent windows. Use cases: running totals, rankings, moving averages, comparing current to previous row, top-N per group. More powerful than GROUP BY for analytical queries.

Quick Answer

Normalization organizes data to reduce redundancy and improve integrity through progressive normal forms (1NF through 5NF).

Detailed Explanation

1NF: atomic values, no repeating groups. 2NF: 1NF + no partial dependencies (all non-key columns depend on entire primary key). 3NF: 2NF + no transitive dependencies (non-key columns don't depend on other non-key columns). BCNF: every determinant is a candidate key. 4NF: no multi-valued dependencies. 5NF: no join dependencies. Denormalization: intentionally adding redundancy for read performance (reporting, analytics). Most applications normalize to 3NF. Data warehouses often denormalize (star/snowflake schema). Balance normalization with query performance.

Quick Answer

Stored procedures are precompiled SQL programs stored in the database, offering performance, security, and code reuse benefits.

Detailed Explanation

Benefits: precompiled execution plans (faster), reduced network traffic (batch operations), centralized business logic, security (grant EXECUTE without table access), transaction management. Drawbacks: harder to version control, database-vendor lock-in, debugging complexity, can hide business logic. Use for: complex data operations, batch processing, data validation rules, audit logging. Avoid for: simple queries, logic that changes frequently, portability requirements. Functions return values; procedures perform actions.

Quick Answer

Isolation levels control how transactions interact: Read Uncommitted, Read Committed, Repeatable Read, and Serializable.

Detailed Explanation

Read Uncommitted: sees uncommitted changes (dirty reads). Read Committed: only committed data (PostgreSQL default). Repeatable Read: consistent snapshot throughout transaction (MySQL InnoDB default). Serializable: full isolation, transactions appear sequential. Problems: Dirty reads (reading uncommitted data), Non-repeatable reads (row changes between reads), Phantom reads (new rows appear matching query). Higher isolation = more consistency but lower concurrency. MVCC (Multi-Version Concurrency Control) in PostgreSQL/MySQL avoids locking by keeping row versions.

Quick Answer

Use EXPLAIN ANALYZE, add indexes, rewrite joins, reduce result set early, avoid SELECT *, and use query caching.

Detailed Explanation

Steps: (1) EXPLAIN ANALYZE — identify full table scans, nested loops. (2) Add indexes on WHERE/JOIN/ORDER BY columns. (3) Avoid SELECT * — fetch only needed columns. (4) Replace subqueries with JOINs when possible. (5) Use EXISTS instead of IN for correlated subqueries. (6) Limit results early (WHERE before GROUP BY). (7) Avoid functions in WHERE (col = UPPER('value') prevents index use). (8) Partition large tables. (9) Materialized views for complex aggregations. (10) Query caching (Redis). (11) Check for N+1 queries in application code. (12) Database-specific: PostgreSQL pg_stat_statements.

Quick Answer

CTEs define named temporary result sets within a query using WITH clause, improving readability and enabling recursive queries.

Detailed Explanation

Syntax: WITH cte_name AS (SELECT ...) SELECT * FROM cte_name. Multiple CTEs: WITH cte1 AS (...), cte2 AS (...) SELECT .... Recursive CTE: WITH RECURSIVE tree AS (base UNION ALL recursive) — for hierarchical data (org charts, category trees, graphs). Benefits: readability, reuse within query, recursive traversal. Performance: some databases materialize CTEs (PostgreSQL), others inline (MySQL). CTE vs subquery: same performance usually, CTE is more readable. CTE vs temp table: CTE exists only for one query; temp table persists.

Quick Answer

Use migration tools, apply incrementally, test on staging, ensure backward compatibility, and plan rollback strategies.

Detailed Explanation

Best practices: (1) Migration tools: Flyway, Liquibase, Django migrations, Knex.js. (2) Version control migrations. (3) Forward-only: migration + rollback script. (4) Backward compatible: add columns with defaults before code deploy, remove later. (5) Schema changes: add column → deploy code → backfill data → add constraints. (6) Large table changes: pt-online-schema-change (MySQL), pg_repack (PostgreSQL). (7) Test on staging with production-size data. (8) Blue-green deploy: run old and new code simultaneously during migration.

Quick Answer

Core tables: users, products, categories, orders, order_items, payments, addresses with proper normalization and indexes.

Detailed Explanation

Schema: users (id, email, name, password_hash). products (id, name, price, category_id, stock, sku, description). categories (id, name, parent_id for hierarchy). orders (id, user_id, total, status, created_at). order_items (order_id, product_id, quantity, price_at_purchase — snapshot price). addresses (user_id, type, street, city, zip). payments (order_id, method, amount, status, transaction_id). Indexes: user email, product sku, order user_id+status. Considerations: soft deletes, audit trail, inventory management (optimistic locking), search (full-text or Elasticsearch).

Quick Answer

Sharding horizontally partitions data across multiple databases based on a shard key, enabling horizontal scaling.

Detailed Explanation

Strategy: choose shard key (user_id, region) that distributes data evenly. Types: range-based (id 1-1000 → shard 1), hash-based (hash(id) % num_shards), directory-based (lookup table). Challenges: cross-shard queries (expensive), rebalancing when adding shards, no cross-shard foreign keys, distributed transactions. Alternatives before sharding: read replicas, caching, vertical partitioning, connection pooling. Tools: Vitess (MySQL), Citus (PostgreSQL). Start with single database, optimize, then shard when truly necessary. Most applications never need sharding.

Quick Answer

CAP theorem states distributed systems can guarantee only two of three: Consistency, Availability, and Partition tolerance.

Detailed Explanation

Consistency: all nodes see same data simultaneously. Availability: every request gets a response. Partition tolerance: system works despite network splits. CP systems: prioritize consistency (MongoDB with strict read). AP systems: prioritize availability (Cassandra, DynamoDB). CA: impossible in distributed systems (partitions are inevitable). In practice: choose between consistency and availability during partition. PACELC extends CAP: during Partition choose A or C; Else (no partition) choose Latency or Consistency. Most systems default to eventual consistency for better performance.

Quick Answer

SQL databases are relational with structured schemas and ACID transactions; NoSQL offers flexible schemas with horizontal scaling.

Detailed Explanation

SQL (PostgreSQL, MySQL): structured schemas, ACID transactions, powerful joins, strong consistency, vertical scaling primarily. NoSQL types: Document (MongoDB — flexible JSON), Key-Value (Redis — fast lookups), Column-family (Cassandra — wide rows), Graph (Neo4j — relationships). NoSQL benefits: horizontal scaling, flexible schema, high write throughput. SQL benefits: data integrity, complex queries, transaction support. Choose SQL for: financial data, complex relationships, ACID requirements. Choose NoSQL for: high scale, flexible data, simple queries, eventual consistency acceptable.

Quick Answer

DELETE removes specific rows with rollback; TRUNCATE removes all rows fast without logging each; DROP removes the entire table.

Detailed Explanation

DELETE: DML, removes rows matching WHERE (all if no WHERE), logged per row, can rollback, triggers fire, slower for bulk. TRUNCATE: DDL, removes ALL rows, minimal logging (deallocates pages), faster, resets auto-increment, cannot rollback in some databases. DROP: DDL, removes table structure and data entirely, cannot rollback. Performance: TRUNCATE >> DELETE for clearing tables. DELETE with WHERE for selective removal. CASCADE option in DROP/TRUNCATE affects dependent objects. Foreign key constraints may prevent TRUNCATE.

Quick Answer

Triggers are stored programs that automatically execute in response to INSERT, UPDATE, or DELETE events on a table.

Detailed Explanation

Types: BEFORE (validate/modify data pre-operation), AFTER (log/audit post-operation), INSTEAD OF (override operation on views). Row-level: fires for each affected row. Statement-level: fires once per statement. Access: OLD (pre-change values), NEW (post-change values). Use cases: audit trails, data validation, auto-updating timestamps, maintaining denormalized data, cascade operations. Caution: hidden logic (hard to debug), performance impact, can cause infinite loops (trigger fires trigger). Alternative: application-level logic for complex business rules.

Quick Answer

Deadlock occurs when transactions wait for each other's locks indefinitely; databases detect and kill one transaction to resolve.

Detailed Explanation

Scenario: Transaction A locks row 1, waits for row 2. Transaction B locks row 2, waits for row 1 — circular wait. Database detects deadlock and rolls back one transaction (victim). Prevention: (1) Access tables/rows in consistent order. (2) Keep transactions short. (3) Use appropriate isolation level. (4) Add indexes (reduces locking). (5) Use row-level locking (not table). (6) SELECT FOR UPDATE NOWAIT to fail instead of wait. (7) Retry logic for deadlock victims. Monitoring: MySQL: SHOW ENGINE INNODB STATUS. PostgreSQL: pg_stat_activity, deadlock_timeout setting.

Quick Answer

Use keyset/cursor pagination instead of OFFSET, limit result set, add proper indexes, and consider caching.

Detailed Explanation

OFFSET pagination: SELECT * FROM items ORDER BY id LIMIT 20 OFFSET 10000 — scans and discards 10000 rows. Slow for large offsets. Keyset pagination: WHERE id > last_seen_id ORDER BY id LIMIT 20 — uses index, constant performance regardless of page. Cursor-based: encode last item's sort values as cursor token. Requirements: stable sort column, index on sort column. Additional: estimated total count (SELECT reltuples FROM pg_class vs exact COUNT), infinite scroll vs page numbers, cache frequently accessed pages, materialized views for complex queries.

Quick Answer

Use built-in full-text indexes (PostgreSQL tsvector, MySQL FULLTEXT), or external search engines like Elasticsearch for complex needs.

Detailed Explanation

PostgreSQL: CREATE INDEX idx ON articles USING GIN (to_tsvector('english', title || body)). Query: WHERE to_tsvector('english', title) @@ to_tsquery('search & terms'). ts_rank() for relevance scoring. MySQL: FULLTEXT index, MATCH...AGAINST syntax. Limitations: SQL full-text works for basic search. For advanced needs: Elasticsearch (distributed, fuzzy matching, facets, synonyms, autocomplete), MeiliSearch (lightweight), or Typesense. Architecture: sync data from DB to search engine, search engine for queries, DB for source of truth. Consider: indexing latency, data consistency, autocomplete requirements.

Ready to master SQL Interview Questions?

Start learning with our comprehensive course and practice these questions.