SQL Mastery Guide - Part 4: Subqueries and CTEs

Introduction

Welcome to Part 4 of our SQL Mastery series. In Parts 1-3, we covered SELECT fundamentals, JOINs, and aggregations. Now we explore subqueries and Common Table Expressions (CTEs) - powerful techniques for building complex queries from simpler pieces.
Subqueries let you nest queries inside other queries. CTEs provide named, reusable query blocks. Together, they're essential for solving real-world data problems.
This part covers Questions 76-100, completing our 100-question journey.

Schema Reference

We continue with our e-commerce database:
sql
CREATE TABLE users ( user_id SERIAL PRIMARY KEY, email VARCHAR(255) UNIQUE NOT NULL, full_name VARCHAR(100), created_at TIMESTAMP DEFAULT NOW(), country VARCHAR(50), is_active BOOLEAN DEFAULT true ); CREATE TABLE products ( product_id SERIAL PRIMARY KEY, name VARCHAR(255) NOT NULL, price DECIMAL(10,2), category_id INT, stock_quantity INT DEFAULT 0, created_at TIMESTAMP DEFAULT NOW() ); CREATE TABLE orders ( order_id SERIAL PRIMARY KEY, user_id INT REFERENCES users(user_id), order_date TIMESTAMP DEFAULT NOW(), status VARCHAR(20) DEFAULT 'pending', total_amount DECIMAL(10,2) ); CREATE TABLE order_items ( item_id SERIAL PRIMARY KEY, order_id INT REFERENCES orders(order_id), product_id INT REFERENCES products(product_id), quantity INT NOT NULL, unit_price DECIMAL(10,2) NOT NULL ); CREATE TABLE reviews ( review_id SERIAL PRIMARY KEY, product_id INT REFERENCES products(product_id), user_id INT REFERENCES users(user_id), rating INT CHECK (rating BETWEEN 1 AND 5), created_at TIMESTAMP DEFAULT NOW() ); -- Hierarchical table for CTE examples CREATE TABLE employees ( employee_id SERIAL PRIMARY KEY, name VARCHAR(100), manager_id INT REFERENCES employees(employee_id), department VARCHAR(50), salary DECIMAL(10,2), hire_date DATE ); CREATE TABLE categories ( category_id SERIAL PRIMARY KEY, name VARCHAR(100), parent_category_id INT REFERENCES categories(category_id) );

Question 76: Scalar Subquery in SELECT

Problem

For each product, show its price and how it compares to the average price.

Solution

sql
SELECT product_id, name, price, (SELECT AVG(price) FROM products) AS avg_price, price - (SELECT AVG(price) FROM products) AS diff_from_avg FROM products WHERE price IS NOT NULL ORDER BY diff_from_avg DESC;

Explanation

A scalar subquery returns exactly one value (one row, one column). It can be used wherever a single value is expected.
How it works:
  1. Inner query
    SELECT AVG(price) FROM products
    executes once
  2. Result (e.g., 75.50) substitutes into outer query
  3. Outer query runs with that constant value

Common Mistakes

Mistake 1: Subquery returns multiple rows
sql
-- ERROR! Subquery returns multiple values SELECT name, (SELECT price FROM products) FROM products; -- "Subquery returns more than one row" -- Fix: Ensure single value with aggregate or LIMIT SELECT name, (SELECT MAX(price) FROM products) FROM products;
Mistake 2: Subquery returns multiple columns
sql
-- ERROR! Subquery has multiple columns SELECT name, (SELECT MIN(price), MAX(price) FROM products) FROM products; -- Fix: One subquery per column SELECT name, (SELECT MIN(price) FROM products) AS min_price, (SELECT MAX(price) FROM products) AS max_price FROM products;

Performance Considerations

sql
-- Scalar subquery in SELECT executes for EVERY row -- This is inefficient for row-dependent subqueries: SELECT name, (SELECT COUNT(*) FROM order_items WHERE product_id = p.product_id) FROM products p; -- Subquery runs once per product! -- Better: Use JOIN or window function SELECT p.name, COUNT(oi.item_id) AS order_count FROM products p LEFT JOIN order_items oi ON p.product_id = oi.product_id GROUP BY p.product_id, p.name;

Question 77: Subquery in WHERE - Single Value

Problem

Find products priced above average.

Solution

sql
SELECT product_id, name, price FROM products WHERE price > (SELECT AVG(price) FROM products) ORDER BY price DESC;

Explanation

Subquery in WHERE filters rows based on dynamically computed value.
Execution:
  1. Subquery computes AVG(price) = 75.50
  2. Outer query becomes WHERE price > 75.50
  3. Filter and return results

Common Mistakes

Mistake 1: Comparing to multi-row subquery with operators
sql
-- ERROR if subquery returns multiple rows SELECT * FROM products WHERE price > (SELECT price FROM products WHERE category_id = 1); -- Fix: Use ANY/ALL for multi-row SELECT * FROM products WHERE price > ALL (SELECT price FROM products WHERE category_id = 1);

Question 78: IN Subquery

Problem

Find users who have placed at least one order.

Solution

sql
SELECT user_id, email, full_name FROM users WHERE user_id IN (SELECT DISTINCT user_id FROM orders) ORDER BY full_name;

Explanation

IN checks if value exists in the subquery result set.
sql
-- Equivalent to: WHERE user_id = 1 OR user_id = 5 OR user_id = 12 ...

Common Mistakes

Mistake 1: IN with NULL values
sql
-- If subquery contains NULL, unexpected results SELECT * FROM products WHERE category_id IN (SELECT category_id FROM special_categories); -- If subquery returns (1, 2, NULL), products with category_id = 3 are correctly excluded -- But NOT IN with NULL is problematic (see next question)

Performance Considerations

sql
-- IN subquery often converted to semi-join by optimizer EXPLAIN ANALYZE SELECT * FROM users WHERE user_id IN (SELECT user_id FROM orders); -- Shows: Hash Semi Join (efficient) -- Not: Subquery executed for each row -- For large IN lists from subqueries, consider JOIN: SELECT DISTINCT u.* FROM users u JOIN orders o ON u.user_id = o.user_id;

Question 79: NOT IN Subquery - The NULL Trap

Problem

Find users who have never placed an order.

Solution

sql
-- Safe version (handles NULLs) SELECT user_id, email, full_name FROM users WHERE user_id NOT IN ( SELECT user_id FROM orders WHERE user_id IS NOT NULL -- Critical! ); -- Or better, use NOT EXISTS (see Question 82)

Explanation

NOT IN has a dangerous NULL trap:
sql
-- If orders.user_id contains any NULL: SELECT * FROM users WHERE user_id NOT IN (SELECT user_id FROM orders); -- Returns ZERO rows! Even for users who never ordered. -- Why? -- NOT IN (1, 2, NULL) means: -- user_id != 1 AND user_id != 2 AND user_id != NULL -- user_id != NULL is UNKNOWN, not TRUE -- UNKNOWN AND anything = UNKNOWN -- So WHERE clause is never TRUE

Common Mistakes

Mistake 1: Forgetting NULL in subquery
sql
-- Dangerous: Will return nothing if any NULL in subquery SELECT * FROM users WHERE user_id NOT IN (SELECT user_id FROM orders); -- Safe alternatives: -- 1. Filter NULLs WHERE user_id NOT IN (SELECT user_id FROM orders WHERE user_id IS NOT NULL) -- 2. Use NOT EXISTS (recommended) WHERE NOT EXISTS (SELECT 1 FROM orders WHERE orders.user_id = users.user_id) -- 3. Use LEFT JOIN + NULL check SELECT u.* FROM users u LEFT JOIN orders o ON u.user_id = o.user_id WHERE o.order_id IS NULL;

Performance Considerations

sql
-- NOT IN can be slow - often becomes anti-join or filter -- NOT EXISTS is usually better optimized -- LEFT JOIN with NULL check often fastest for large tables EXPLAIN ANALYZE SELECT * FROM users WHERE NOT EXISTS (SELECT 1 FROM orders WHERE orders.user_id = users.user_id); -- Shows: Hash Anti Join (efficient)

Question 80: ANY and ALL

Problem

Find products more expensive than any product in category 1.

Solution

sql
-- ANY: More expensive than at least one category-1 product SELECT product_id, name, price FROM products WHERE price > ANY ( SELECT price FROM products WHERE category_id = 1 ); -- ALL: More expensive than every category-1 product SELECT product_id, name, price FROM products WHERE price > ALL ( SELECT price FROM products WHERE category_id = 1 );

Explanation

  • > ANY (subquery): Greater than at least one value (same as > MIN)
  • > ALL (subquery): Greater than every value (same as > MAX)
  • = ANY (subquery): Same as IN
sql
-- Equivalents: > ANY (1, 5, 10) -- Same as > 1 (the minimum) > ALL (1, 5, 10) -- Same as > 10 (the maximum) < ANY (1, 5, 10) -- Same as < 10 (the maximum) < ALL (1, 5, 10) -- Same as < 1 (the minimum)

Common Mistakes

Mistake 1: Confusing ANY and ALL
sql
-- "More expensive than any product" means > MIN (lowest bar) -- "More expensive than all products" means > MAX (highest bar) -- Usually you want ALL for "beat everyone" scenarios
Mistake 2: Empty subquery behavior
sql
-- If subquery returns no rows: > ALL (empty) -- TRUE (vacuously true, no values to beat) > ANY (empty) -- FALSE (no values to compare to)

Question 81: EXISTS Subquery

Problem

Find products that have at least one review.

Solution

sql
SELECT p.product_id, p.name FROM products p WHERE EXISTS ( SELECT 1 FROM reviews r WHERE r.product_id = p.product_id );

Explanation

EXISTS returns TRUE if subquery returns at least one row, FALSE otherwise.
Key characteristics:
  • Stops at first match (short-circuit)
  • Doesn't care about values, only existence
  • Usually correlates with outer query

Common Mistakes

Mistake 1: Using SELECT * in EXISTS
sql
-- Works but misleading - columns don't matter WHERE EXISTS (SELECT * FROM reviews WHERE ...) -- Better: Use SELECT 1 to be clear WHERE EXISTS (SELECT 1 FROM reviews WHERE ...)
Mistake 2: Non-correlated EXISTS
sql
-- This EXISTS returns same result for every row WHERE EXISTS (SELECT 1 FROM reviews WHERE rating > 3) -- Either TRUE for all rows or FALSE for all rows -- Usually you want correlated: WHERE EXISTS (SELECT 1 FROM reviews WHERE reviews.product_id = products.product_id)

Performance Considerations

sql
-- EXISTS is optimized to stop at first match -- For checking existence, EXISTS beats COUNT -- Bad: WHERE (SELECT COUNT(*) FROM reviews WHERE product_id = p.product_id) > 0 -- Counts all reviews unnecessarily -- Good: WHERE EXISTS (SELECT 1 FROM reviews WHERE product_id = p.product_id) -- Stops at first review found

Question 82: NOT EXISTS - Finding Missing Relations

Problem

Find users who have never written a review.

Solution

sql
SELECT u.user_id, u.full_name, u.email FROM users u WHERE NOT EXISTS ( SELECT 1 FROM reviews r WHERE r.user_id = u.user_id );

Explanation

NOT EXISTS returns TRUE if subquery returns zero rows.
This is the safest way to find "missing" relationships - no NULL trap like NOT IN.

Common Mistakes

Mistake 1: Using NOT IN instead
sql
-- Dangerous if reviews.user_id can be NULL SELECT * FROM users WHERE user_id NOT IN (SELECT user_id FROM reviews); -- Safe SELECT * FROM users u WHERE NOT EXISTS (SELECT 1 FROM reviews r WHERE r.user_id = u.user_id);

Performance Considerations

sql
-- NOT EXISTS typically becomes Anti Semi Join EXPLAIN ANALYZE SELECT * FROM users u WHERE NOT EXISTS (SELECT 1 FROM orders o WHERE o.user_id = u.user_id); -- Ensure index on the correlated column CREATE INDEX idx_orders_user ON orders(user_id);

Question 83: Correlated Subquery

Problem

Find each customer's most recent order.

Solution

sql
SELECT o.order_id, o.user_id, o.order_date, o.total_amount FROM orders o WHERE o.order_date = ( SELECT MAX(o2.order_date) FROM orders o2 WHERE o2.user_id = o.user_id );

Explanation

A correlated subquery references columns from the outer query. It executes once per outer row.
Execution:
  1. For user_id = 1: Find MAX(order_date) where user_id = 1
  2. Check if current row's date matches
  3. Repeat for each order row

Common Mistakes

Mistake 1: Thinking it's efficient
sql
-- Correlated subqueries execute per row = potentially slow -- For 10,000 orders, subquery runs 10,000 times -- Often better to rewrite as JOIN: SELECT o.* FROM orders o JOIN ( SELECT user_id, MAX(order_date) AS max_date FROM orders GROUP BY user_id ) latest ON o.user_id = latest.user_id AND o.order_date = latest.max_date; -- Or use window function: SELECT * FROM ( SELECT *, ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY order_date DESC) AS rn FROM orders ) sub WHERE rn = 1;

Performance Considerations

sql
-- Correlated subqueries need careful indexing -- Ensure index on correlation columns CREATE INDEX idx_orders_user_date ON orders(user_id, order_date); -- Check execution plan EXPLAIN ANALYZE SELECT ...; -- Look for: "SubPlan" or "InitPlan" indicators

Question 84: Subquery in FROM (Derived Table)

Problem

Calculate the percentage of total revenue each category represents.

Solution

sql
SELECT category_revenue.category_id, c.name AS category_name, category_revenue.revenue, total.total_revenue, ROUND(category_revenue.revenue * 100.0 / total.total_revenue, 2) AS pct_of_total FROM ( SELECT p.category_id, SUM(oi.quantity * oi.unit_price) AS revenue FROM products p JOIN order_items oi ON p.product_id = oi.product_id GROUP BY p.category_id ) AS category_revenue CROSS JOIN ( SELECT SUM(quantity * unit_price) AS total_revenue FROM order_items ) AS total JOIN categories c ON category_revenue.category_id = c.category_id ORDER BY revenue DESC;

Explanation

A subquery in FROM creates a derived table (or inline view). It's treated like a regular table.
Use cases:
  • Pre-aggregating data before joining
  • Creating intermediate results
  • Simplifying complex queries

Common Mistakes

Mistake 1: Forgetting alias
sql
-- ERROR: Derived table needs alias SELECT * FROM (SELECT product_id FROM products); -- Correct SELECT * FROM (SELECT product_id FROM products) AS derived;
Mistake 2: Selecting columns not in derived table
sql
-- ERROR: 'name' not available SELECT name FROM (SELECT product_id FROM products) AS p; -- Derived table only has product_id

Performance Considerations

sql
-- Derived tables can be materialized or merged into outer query -- Check EXPLAIN to see what optimizer does -- Tip: For reused derived tables, consider CTE (more readable)

Question 85: Common Table Expression (CTE) Basics

Problem

Find users whose total spending exceeds the average total spending per user.

Solution

sql
WITH user_spending AS ( SELECT user_id, SUM(total_amount) AS total_spent FROM orders WHERE status = 'completed' GROUP BY user_id ), avg_spending AS ( SELECT AVG(total_spent) AS avg_total FROM user_spending ) SELECT us.user_id, u.full_name, us.total_spent, avs.avg_total FROM user_spending us CROSS JOIN avg_spending avs JOIN users u ON us.user_id = u.user_id WHERE us.total_spent > avs.avg_total ORDER BY us.total_spent DESC;

Explanation

CTE (Common Table Expression) is a named temporary result set defined with WITH.
Benefits:
  • Improves readability
  • Can be referenced multiple times
  • Breaks complex queries into logical steps
Syntax:
sql
WITH cte_name AS ( SELECT ... ), another_cte AS ( SELECT ... FROM cte_name -- Can reference previous CTE ) SELECT ... FROM cte_name JOIN another_cte ...;

Common Mistakes

Mistake 1: Trying to use CTE after semicolon
sql
-- ERROR: CTE scope ends at semicolon WITH my_cte AS (...); SELECT * FROM my_cte; -- CTE not found! -- Correct: No semicolon before final SELECT WITH my_cte AS (...) SELECT * FROM my_cte;
Mistake 2: Referencing CTE before it's defined
sql
-- ERROR: cte2 references cte1 before it exists WITH cte2 AS (SELECT * FROM cte1), cte1 AS (SELECT 1) SELECT * FROM cte2; -- Correct order: WITH cte1 AS (SELECT 1), cte2 AS (SELECT * FROM cte1) SELECT * FROM cte2;

Performance Considerations

sql
-- CTE behavior varies by database: -- PostgreSQL < 12: CTEs are optimization barriers (always materialized) -- PostgreSQL >= 12: Optimizer can inline CTEs -- MySQL 8.0+: CTEs supported, optimizer can inline -- SQL Server: Inline by default -- Force materialization when needed: WITH cte AS MATERIALIZED (...) -- PostgreSQL 12+

Question 86: Multiple CTEs

Problem

Create a sales report showing daily, weekly, and monthly summaries.

Solution

sql
WITH daily_sales AS ( SELECT DATE(order_date) AS sale_date, SUM(total_amount) AS daily_total FROM orders WHERE status = 'completed' GROUP BY DATE(order_date) ), weekly_sales AS ( SELECT DATE_TRUNC('week', sale_date) AS week_start, SUM(daily_total) AS weekly_total FROM daily_sales GROUP BY DATE_TRUNC('week', sale_date) ), monthly_sales AS ( SELECT DATE_TRUNC('month', sale_date) AS month_start, SUM(daily_total) AS monthly_total FROM daily_sales GROUP BY DATE_TRUNC('month', sale_date) ) SELECT d.sale_date, d.daily_total, w.weekly_total, m.monthly_total, ROUND(d.daily_total * 100.0 / m.monthly_total, 2) AS pct_of_month FROM daily_sales d JOIN weekly_sales w ON DATE_TRUNC('week', d.sale_date) = w.week_start JOIN monthly_sales m ON DATE_TRUNC('month', d.sale_date) = m.month_start ORDER BY d.sale_date;

Explanation

Multiple CTEs create a pipeline:
  1. daily_sales: Base aggregation
  2. weekly_sales: Built from daily_sales
  3. monthly_sales: Also from daily_sales
  4. Final SELECT: Combines all levels
This is much cleaner than nested subqueries.

Question 87: CTE vs Subquery - When to Use Which

Problem

Demonstrate equivalent query with CTE and subquery.

Solution

sql
-- Subquery version (nested, harder to read) SELECT p.product_id, p.name, p.price, stats.avg_price, stats.max_price FROM products p CROSS JOIN ( SELECT AVG(price) AS avg_price, MAX(price) AS max_price FROM products ) AS stats WHERE p.price > stats.avg_price; -- CTE version (flat, easier to read) WITH product_stats AS ( SELECT AVG(price) AS avg_price, MAX(price) AS max_price FROM products ) SELECT p.product_id, p.name, p.price, ps.avg_price, ps.max_price FROM products p CROSS JOIN product_stats ps WHERE p.price > ps.avg_price;

Explanation

Use CTE when:
  • Query logic has multiple steps
  • Same derived table needed multiple times
  • Recursion required
  • Readability is priority
Use Subquery when:
  • Simple, one-time use
  • Performance critical (subqueries may optimize better)
  • CTE not supported (older databases)

Performance Considerations

sql
-- CTE referenced twice vs subquery twice: -- CTE (potentially computed once, stored) WITH expensive_calc AS ( SELECT ... -- complex calculation ) SELECT * FROM expensive_calc UNION ALL SELECT * FROM expensive_calc; -- Subquery (may compute twice) SELECT * FROM (SELECT ... ) AS t1 UNION ALL SELECT * FROM (SELECT ... ) AS t2; -- Modern optimizers often handle both efficiently -- Test with EXPLAIN ANALYZE for your specific case

Question 88: Recursive CTE - Employee Hierarchy

Problem

Build complete management hierarchy for each employee.

Solution

sql
WITH RECURSIVE employee_hierarchy AS ( -- Base case: Top-level managers (no manager) SELECT employee_id, name, manager_id, 1 AS level, name AS hierarchy_path FROM employees WHERE manager_id IS NULL UNION ALL -- Recursive case: Employees with managers SELECT e.employee_id, e.name, e.manager_id, eh.level + 1, eh.hierarchy_path || ' > ' || e.name FROM employees e JOIN employee_hierarchy eh ON e.manager_id = eh.employee_id ) SELECT employee_id, name, level, hierarchy_path FROM employee_hierarchy ORDER BY hierarchy_path;

Explanation

Recursive CTE processes hierarchical or graph data:
  1. Anchor member: Starting point (employees with no manager)
  2. Recursive member: References the CTE itself, linked via JOIN
  3. UNION ALL: Combines anchor and recursive results
  4. Termination: Stops when recursive member returns no rows
Iteration 1 (anchor): CEO Iteration 2: VP1, VP2 (report to CEO) Iteration 3: Managers (report to VPs) Iteration 4: Staff (report to managers) ... until no more employees found

Common Mistakes

Mistake 1: Infinite recursion
sql
-- Missing termination condition WITH RECURSIVE bad AS ( SELECT 1 AS n UNION ALL SELECT n + 1 FROM bad -- Never stops! ) SELECT * FROM bad; -- Add termination: WITH RECURSIVE safe AS ( SELECT 1 AS n UNION ALL SELECT n + 1 FROM safe WHERE n < 100 -- Stops at 100 ) SELECT * FROM safe;
Mistake 2: Using UNION instead of UNION ALL
sql
-- UNION removes duplicates, may cause issues WITH RECURSIVE hierarchy AS ( ... UNION -- Might eliminate legitimate duplicate values ... ) -- Usually use UNION ALL for recursive CTEs

Performance Considerations

sql
-- Index on the recursive join column CREATE INDEX idx_employees_manager ON employees(manager_id); -- Set recursion limit (PostgreSQL) SET max_recursive_iterations = 100; -- Watch for exponential growth in badly structured data

Question 89: Recursive CTE - Category Tree

Problem

Show all categories with their full path (e.g., "Electronics > Computers > Laptops").

Solution

sql
WITH RECURSIVE category_tree AS ( -- Root categories (no parent) SELECT category_id, name, parent_category_id, name AS full_path, 1 AS depth FROM categories WHERE parent_category_id IS NULL UNION ALL -- Child categories SELECT c.category_id, c.name, c.parent_category_id, ct.full_path || ' > ' || c.name, ct.depth + 1 FROM categories c JOIN category_tree ct ON c.parent_category_id = ct.category_id ) SELECT category_id, name, full_path, depth FROM category_tree ORDER BY full_path;

Explanation

This builds hierarchical paths:
category_id | name | full_path | depth 1 | Electronics| Electronics | 1 2 | Computers | Electronics > Computers | 2 3 | Laptops | Electronics > Computers > Laptops | 3 4 | Desktops | Electronics > Computers > Desktops | 3 5 | Clothing | Clothing | 1
Useful for:
  • Breadcrumb navigation
  • Sitemap generation
  • Hierarchical dropdowns

Question 90: Recursive CTE - Finding All Descendants

Problem

Find all employees who report to a specific manager (at any level).

Solution

sql
WITH RECURSIVE all_reports AS ( -- Direct reports to manager 5 SELECT employee_id, name, manager_id, 1 AS report_level FROM employees WHERE manager_id = 5 UNION ALL -- Reports of reports (recursive) SELECT e.employee_id, e.name, e.manager_id, ar.report_level + 1 FROM employees e JOIN all_reports ar ON e.manager_id = ar.employee_id ) SELECT * FROM all_reports ORDER BY report_level, name;

Explanation

Starting from manager_id = 5:
  1. Find all direct reports
  2. Find all their reports
  3. Continue until no more reports
Useful for:
  • Cascade delete checks
  • Permission hierarchies
  • Organizational reporting

Question 91: Recursive CTE - Generating Series

Problem

Generate a date series for the last 30 days (without generate_series function).

Solution

sql
WITH RECURSIVE date_series AS ( SELECT CURRENT_DATE - INTERVAL '29 days' AS date_val UNION ALL SELECT date_val + INTERVAL '1 day' FROM date_series WHERE date_val < CURRENT_DATE ) SELECT date_val::DATE AS date FROM date_series;

Explanation

This generates dates without relying on built-in functions:
date_val 2024-01-01 2024-01-02 ... 2024-01-30
Use cases:
  • Fill gaps in time series data
  • Create calendar tables
  • Generate test data

Alternative with generate_series (PostgreSQL)

sql
-- If available, use built-in function SELECT generate_series( CURRENT_DATE - INTERVAL '29 days', CURRENT_DATE, INTERVAL '1 day' )::DATE AS date;

Question 92: Subquery for Row-by-Row Calculations

Problem

For each product, show units sold and revenue, compared to category average.

Solution

sql
SELECT p.product_id, p.name, p.category_id, COALESCE(SUM(oi.quantity), 0) AS units_sold, COALESCE(SUM(oi.quantity * oi.unit_price), 0) AS revenue, ( SELECT ROUND(AVG(sub_revenue), 2) FROM ( SELECT SUM(oi2.quantity * oi2.unit_price) AS sub_revenue FROM products p2 LEFT JOIN order_items oi2 ON p2.product_id = oi2.product_id WHERE p2.category_id = p.category_id GROUP BY p2.product_id ) category_products ) AS category_avg_revenue FROM products p LEFT JOIN order_items oi ON p.product_id = oi.product_id GROUP BY p.product_id, p.name, p.category_id ORDER BY revenue DESC;

Explanation

The correlated subquery calculates average revenue for products in the same category.
Better approach using CTE:
sql
WITH product_revenue AS ( SELECT p.product_id, p.name, p.category_id, COALESCE(SUM(oi.quantity * oi.unit_price), 0) AS revenue FROM products p LEFT JOIN order_items oi ON p.product_id = oi.product_id GROUP BY p.product_id, p.name, p.category_id ), category_avg AS ( SELECT category_id, ROUND(AVG(revenue), 2) AS avg_revenue FROM product_revenue GROUP BY category_id ) SELECT pr.*, ca.avg_revenue AS category_avg_revenue, pr.revenue - ca.avg_revenue AS diff_from_category_avg FROM product_revenue pr JOIN category_avg ca ON pr.category_id = ca.category_id ORDER BY pr.revenue DESC;

Question 93: Lateral Join (Correlated Subquery in FROM)

Problem

For each user, get their 3 most recent orders.

Solution

sql
-- PostgreSQL / MySQL 8+ syntax SELECT u.user_id, u.full_name, recent_orders.order_id, recent_orders.order_date, recent_orders.total_amount FROM users u CROSS JOIN LATERAL ( SELECT order_id, order_date, total_amount FROM orders o WHERE o.user_id = u.user_id ORDER BY order_date DESC LIMIT 3 ) AS recent_orders ORDER BY u.user_id, recent_orders.order_date DESC;

Explanation

LATERAL allows subquery in FROM to reference columns from preceding tables.
Without LATERAL:
sql
-- ERROR: Cannot reference u.user_id in subquery SELECT * FROM users u JOIN ( SELECT * FROM orders WHERE user_id = u.user_id -- Error! ) AS o ON true;
With LATERAL:
sql
-- Works: Subquery can see u.user_id SELECT * FROM users u CROSS JOIN LATERAL ( SELECT * FROM orders WHERE user_id = u.user_id ) AS o;

Common Mistakes

Mistake 1: Using CROSS JOIN LATERAL with empty results
sql
-- CROSS JOIN LATERAL excludes users with no matching orders -- Use LEFT JOIN LATERAL to include them: SELECT u.*, recent.* FROM users u LEFT JOIN LATERAL ( SELECT * FROM orders WHERE user_id = u.user_id LIMIT 3 ) AS recent ON true;

Performance Considerations

sql
-- LATERAL is like correlated subquery in FROM clause -- Ensure proper indexes CREATE INDEX idx_orders_user_date ON orders(user_id, order_date DESC); -- Alternative: Window function approach WITH ranked_orders AS ( SELECT o.*, ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY order_date DESC) AS rn FROM orders o ) SELECT u.*, ro.* FROM users u JOIN ranked_orders ro ON u.user_id = ro.user_id AND ro.rn <= 3;

Question 94: Subquery in UPDATE

Problem

Update product prices to match the average price in their category.

Solution

sql
-- Set each product's price to its category average UPDATE products p SET price = ( SELECT AVG(p2.price) FROM products p2 WHERE p2.category_id = p.category_id ) WHERE category_id IS NOT NULL; -- Better: Using CTE (PostgreSQL) WITH category_averages AS ( SELECT category_id, AVG(price) AS avg_price FROM products GROUP BY category_id ) UPDATE products p SET price = ca.avg_price FROM category_averages ca WHERE p.category_id = ca.category_id;

Explanation

Subqueries in UPDATE can:
  • Set values based on calculations
  • Reference the row being updated
  • Join with other tables for complex logic

Common Mistakes

Mistake 1: Subquery returns multiple rows
sql
-- ERROR if subquery returns multiple values UPDATE products SET price = (SELECT price FROM other_prices); -- Ensure single value per row UPDATE products p SET price = (SELECT op.price FROM other_prices op WHERE op.product_id = p.product_id);
Mistake 2: Not considering NULL
sql
-- If subquery returns NULL, price becomes NULL UPDATE products SET price = (SELECT ... ); -- Use COALESCE for safety UPDATE products SET price = COALESCE((SELECT ...), price); -- Keep original if NULL

Question 95: Subquery in DELETE

Problem

Delete orders that have no order items.

Solution

sql
-- Using NOT EXISTS (safest) DELETE FROM orders o WHERE NOT EXISTS ( SELECT 1 FROM order_items oi WHERE oi.order_id = o.order_id ); -- Using NOT IN (watch for NULLs) DELETE FROM orders WHERE order_id NOT IN ( SELECT DISTINCT order_id FROM order_items WHERE order_id IS NOT NULL ); -- Using LEFT JOIN (PostgreSQL) DELETE FROM orders o USING ( SELECT o2.order_id FROM orders o2 LEFT JOIN order_items oi ON o2.order_id = oi.order_id WHERE oi.item_id IS NULL ) AS empty_orders WHERE o.order_id = empty_orders.order_id;

Explanation

Similar to SELECT, DELETE can use subqueries to identify rows to remove.

Common Mistakes

Mistake 1: Deleting too much with poorly scoped subquery
sql
-- Dangerous: Delete all orders if ANY order has no items DELETE FROM orders WHERE (SELECT COUNT(*) FROM order_items) = 0; -- Correct: Correlate the subquery DELETE FROM orders o WHERE (SELECT COUNT(*) FROM order_items oi WHERE oi.order_id = o.order_id) = 0;

Question 96: Subquery in INSERT

Problem

Insert top 10 customers into a VIP table.

Solution

sql
-- Create VIP table CREATE TABLE vip_customers ( user_id INT PRIMARY KEY, full_name VARCHAR(100), total_spent DECIMAL(12,2), added_date TIMESTAMP DEFAULT NOW() ); -- Insert top spenders INSERT INTO vip_customers (user_id, full_name, total_spent) SELECT u.user_id, u.full_name, SUM(o.total_amount) AS total_spent FROM users u JOIN orders o ON u.user_id = o.user_id WHERE o.status = 'completed' GROUP BY u.user_id, u.full_name ORDER BY total_spent DESC LIMIT 10;

Explanation

INSERT ... SELECT inserts results of a query into a table.
Use cases:
  • Archiving data
  • Creating summary tables
  • ETL operations
  • Populating denormalized tables

Common Mistakes

Mistake 1: Column mismatch
sql
-- ERROR: Column count/types don't match INSERT INTO vip_customers (user_id, full_name, total_spent) SELECT user_id, full_name -- Missing third column! FROM users;
Mistake 2: Duplicate key violations
sql
-- If VIP already exists, INSERT fails INSERT INTO vip_customers (user_id, ...) SELECT ... -- Might include existing user_id -- Use ON CONFLICT (PostgreSQL) or INSERT IGNORE (MySQL) INSERT INTO vip_customers (user_id, full_name, total_spent) SELECT ... ON CONFLICT (user_id) DO UPDATE SET total_spent = EXCLUDED.total_spent;

Question 97: CTE with INSERT/UPDATE/DELETE

Problem

Archive old orders and delete them in one operation.

Solution

sql
-- PostgreSQL: CTE with data modification WITH archived AS ( INSERT INTO orders_archive (order_id, user_id, order_date, total_amount, archived_at) SELECT order_id, user_id, order_date, total_amount, NOW() FROM orders WHERE order_date < CURRENT_DATE - INTERVAL '2 years' RETURNING order_id ) DELETE FROM orders WHERE order_id IN (SELECT order_id FROM archived);

Explanation

PostgreSQL allows data-modifying CTEs:
  1. CTE inserts old orders into archive, returns IDs
  2. Main query deletes those orders from original table
  3. Both happen in same transaction

Common Mistakes

Mistake 1: Assuming all databases support this
sql
-- This is PostgreSQL-specific -- MySQL/SQL Server need separate statements: INSERT INTO orders_archive SELECT ... FROM orders WHERE ...; DELETE FROM orders WHERE ...;

Question 98: Complex Business Query - Customer Cohort Analysis

Problem

Analyze customer retention by signup month (cohort analysis).

Solution

sql
WITH customer_cohorts AS ( SELECT user_id, DATE_TRUNC('month', created_at) AS cohort_month FROM users ), customer_activities AS ( SELECT o.user_id, DATE_TRUNC('month', o.order_date) AS activity_month FROM orders o WHERE o.status = 'completed' GROUP BY o.user_id, DATE_TRUNC('month', o.order_date) ), cohort_activity AS ( SELECT cc.cohort_month, ca.activity_month, COUNT(DISTINCT cc.user_id) AS active_users FROM customer_cohorts cc JOIN customer_activities ca ON cc.user_id = ca.user_id WHERE ca.activity_month >= cc.cohort_month GROUP BY cc.cohort_month, ca.activity_month ), cohort_sizes AS ( SELECT cohort_month, COUNT(DISTINCT user_id) AS cohort_size FROM customer_cohorts GROUP BY cohort_month ) SELECT ca.cohort_month, cs.cohort_size, ca.activity_month, EXTRACT(MONTH FROM AGE(ca.activity_month, ca.cohort_month)) AS months_since_signup, ca.active_users, ROUND(ca.active_users * 100.0 / cs.cohort_size, 2) AS retention_pct FROM cohort_activity ca JOIN cohort_sizes cs ON ca.cohort_month = cs.cohort_month ORDER BY ca.cohort_month, ca.activity_month;

Explanation

This creates a retention matrix:
  • Cohort: Group of customers by signup month
  • Activity: When they placed orders
  • Retention: Percentage still active N months later
cohort | size | month_0 | month_1 | month_2 | ... 2024-01 | 100 | 100% | 45% | 32% | 2024-02 | 150 | 100% | 50% | 38% |

Use Cases

  • Measuring product stickiness
  • Identifying when customers churn
  • Comparing cohort quality over time

Question 99: Complex Business Query - Funnel Analysis

Problem

Analyze conversion funnel: page view > add to cart > checkout > purchase.

Solution

sql
WITH page_views AS ( SELECT DISTINCT user_id, DATE(viewed_at) AS event_date FROM page_views WHERE page_url LIKE '/product/%' ), cart_adds AS ( SELECT DISTINCT user_id, DATE(created_at) AS event_date FROM cart_items ), checkouts AS ( SELECT DISTINCT user_id, DATE(created_at) AS event_date FROM checkout_sessions ), purchases AS ( SELECT DISTINCT user_id, DATE(order_date) AS event_date FROM orders WHERE status IN ('completed', 'processing') ), daily_funnel AS ( SELECT pv.event_date, COUNT(DISTINCT pv.user_id) AS viewed, COUNT(DISTINCT ca.user_id) AS added_to_cart, COUNT(DISTINCT co.user_id) AS checked_out, COUNT(DISTINCT pu.user_id) AS purchased FROM page_views pv LEFT JOIN cart_adds ca ON pv.user_id = ca.user_id AND pv.event_date = ca.event_date LEFT JOIN checkouts co ON ca.user_id = co.user_id AND ca.event_date = co.event_date LEFT JOIN purchases pu ON co.user_id = pu.user_id AND co.event_date = pu.event_date GROUP BY pv.event_date ) SELECT event_date, viewed, added_to_cart, ROUND(added_to_cart * 100.0 / NULLIF(viewed, 0), 2) AS view_to_cart_pct, checked_out, ROUND(checked_out * 100.0 / NULLIF(added_to_cart, 0), 2) AS cart_to_checkout_pct, purchased, ROUND(purchased * 100.0 / NULLIF(checked_out, 0), 2) AS checkout_to_purchase_pct, ROUND(purchased * 100.0 / NULLIF(viewed, 0), 2) AS overall_conversion_pct FROM daily_funnel ORDER BY event_date;

Explanation

Funnel analysis tracks user progression through stages:
event_date | viewed | cart | checkout | purchased | conversion 2024-01-15 | 1000 | 300 | 150 | 120 | 12%
Each stage shows:
  • Absolute numbers
  • Conversion rate from previous stage
  • Overall conversion (end-to-end)

Question 100: Complex Business Query - Revenue Attribution

Problem

Attribute revenue to first-touch marketing channel.

Solution

sql
WITH first_touch AS ( -- Find first touchpoint for each user SELECT DISTINCT ON (user_id) user_id, channel, campaign, touch_date FROM marketing_touches ORDER BY user_id, touch_date ASC ), user_revenue AS ( -- Total revenue per user SELECT user_id, SUM(total_amount) AS lifetime_revenue, COUNT(*) AS order_count, MIN(order_date) AS first_order, MAX(order_date) AS last_order FROM orders WHERE status = 'completed' GROUP BY user_id ) SELECT ft.channel, ft.campaign, COUNT(DISTINCT ft.user_id) AS acquired_users, COUNT(DISTINCT ur.user_id) AS converted_users, ROUND(COUNT(DISTINCT ur.user_id) * 100.0 / COUNT(DISTINCT ft.user_id), 2) AS conversion_rate, COALESCE(SUM(ur.lifetime_revenue), 0) AS attributed_revenue, ROUND(COALESCE(AVG(ur.lifetime_revenue), 0), 2) AS avg_customer_value FROM first_touch ft LEFT JOIN user_revenue ur ON ft.user_id = ur.user_id GROUP BY ft.channel, ft.campaign ORDER BY attributed_revenue DESC;

Explanation

First-touch attribution credits revenue to the channel that first introduced the customer:
channel | campaign | users | converted | revenue | avg_value google | brand_2024 | 5000 | 800 | $120,000 | $150 facebook | retargeting | 3000 | 450 | $67,500 | $150 organic | NULL | 10000 | 1200 | $180,000 | $150

Use Cases

  • Marketing ROI analysis
  • Budget allocation
  • Campaign optimization

Summary: Subqueries and CTEs Cheat Sheet

Subquery Types

TypeLocationReturnsExample
ScalarSELECT, WHERESingle value
(SELECT MAX(price) FROM products)
RowWHERESingle rowWHERE (a, b) = (SELECT ...)
TableFROMMultiple rows/colsFROM (SELECT ...) AS t
CorrelatedAnyDepends on outer
WHERE price > (SELECT AVG(price) WHERE category_id = p.category_id)

Subquery Operators

OperatorMeaning
INValue in set
NOT INValue not in set (watch NULLs!)
EXISTSSubquery returns rows
NOT EXISTSSubquery returns no rows
ANY/SOMEComparison to any value
ALLComparison to all values

CTE Syntax

sql
WITH cte_name AS ( SELECT ... ), cte2 AS ( SELECT ... FROM cte_name ) SELECT ... FROM cte_name JOIN cte2 ...;

Recursive CTE Pattern

sql
WITH RECURSIVE cte AS ( -- Anchor: starting rows SELECT ... WHERE condition UNION ALL -- Recursive: reference CTE SELECT ... FROM table JOIN cte ON ... ) SELECT * FROM cte;

Performance Tips

  1. EXISTS vs IN: EXISTS often faster, no NULL issues
  2. NOT EXISTS vs NOT IN: Always prefer NOT EXISTS
  3. Correlated vs JOIN: JOIN usually faster for large datasets
  4. CTE materialization: Check if your database materializes or inlines
  5. LATERAL: Useful but like correlated subquery in FROM

Congratulations!

You've completed all 100 questions in the SQL Mastery Guide!
What you've learned:
  • Part 1 (Q1-25): SELECT fundamentals, filtering, sorting
  • Part 2 (Q26-50): All types of JOINs
  • Part 3 (Q51-75): Aggregations and window functions
  • Part 4 (Q76-100): Subqueries and CTEs
Next steps:
  1. Practice these patterns on real data
  2. Use EXPLAIN ANALYZE to understand query performance
  3. Learn your specific database's optimizations
  4. Build increasingly complex queries combining all techniques
Happy querying!
All Blogs
Tags:sqlsubqueriesCTEsrecursive queriesperformanceinterview