SQL Mastery Guide - Part 4: Subqueries and CTEs

Introduction

Welcome to Part 4 of our SQL Mastery series. In Parts 1-3, we covered SELECT fundamentals, JOINs, and aggregations. Now we explore subqueries and Common Table Expressions (CTEs) - powerful techniques for building complex queries from simpler pieces.

Subqueries let you nest queries inside other queries. CTEs provide named, reusable query blocks. Together, they're essential for solving real-world data problems.

This part covers Questions 76-100, completing our 100-question journey.

Schema Reference

We continue with our e-commerce database:

sql
CREATE TABLE users (
    user_id SERIAL PRIMARY KEY,
    email VARCHAR(255) UNIQUE NOT NULL,
    full_name VARCHAR(100),
    created_at TIMESTAMP DEFAULT NOW(),
    country VARCHAR(50),
    is_active BOOLEAN DEFAULT true
);

CREATE TABLE products (
    product_id SERIAL PRIMARY KEY,
    name VARCHAR(255) NOT NULL,
    price DECIMAL(10,2),
    category_id INT,
    stock_quantity INT DEFAULT 0,
    created_at TIMESTAMP DEFAULT NOW()
);

CREATE TABLE orders (
    order_id SERIAL PRIMARY KEY,
    user_id INT REFERENCES users(user_id),
    order_date TIMESTAMP DEFAULT NOW(),
    status VARCHAR(20) DEFAULT 'pending',
    total_amount DECIMAL(10,2)
);

CREATE TABLE order_items (
    item_id SERIAL PRIMARY KEY,
    order_id INT REFERENCES orders(order_id),
    product_id INT REFERENCES products(product_id),
    quantity INT NOT NULL,
    unit_price DECIMAL(10,2) NOT NULL
);

CREATE TABLE reviews (
    review_id SERIAL PRIMARY KEY,
    product_id INT REFERENCES products(product_id),
    user_id INT REFERENCES users(user_id),
    rating INT CHECK (rating BETWEEN 1 AND 5),
    created_at TIMESTAMP DEFAULT NOW()
);

-- Hierarchical table for CTE examples
CREATE TABLE employees (
    employee_id SERIAL PRIMARY KEY,
    name VARCHAR(100),
    manager_id INT REFERENCES employees(employee_id),
    department VARCHAR(50),
    salary DECIMAL(10,2),
    hire_date DATE
);

CREATE TABLE categories (
    category_id SERIAL PRIMARY KEY,
    name VARCHAR(100),
    parent_category_id INT REFERENCES categories(category_id)
);

Question 76: Scalar Subquery in SELECT

Problem

For each product, show its price and how it compares to the average price.

Solution

sql
SELECT
    product_id,
    name,
    price,
    (SELECT AVG(price) FROM products) AS avg_price,
    price - (SELECT AVG(price) FROM products) AS diff_from_avg
FROM products
WHERE price IS NOT NULL
ORDER BY diff_from_avg DESC;

Explanation

A scalar subquery returns exactly one value (one row, one column). It can be used wherever a single value is expected.

How it works:

Inner query
SELECT AVG(price) FROM products
executes once
Result (e.g., 75.50) substitutes into outer query
Outer query runs with that constant value

Common Mistakes

Mistake 1: Subquery returns multiple rows

sql
-- ERROR! Subquery returns multiple values
SELECT name, (SELECT price FROM products) FROM products;
-- "Subquery returns more than one row"

-- Fix: Ensure single value with aggregate or LIMIT
SELECT name, (SELECT MAX(price) FROM products) FROM products;

Mistake 2: Subquery returns multiple columns

sql
-- ERROR! Subquery has multiple columns
SELECT name, (SELECT MIN(price), MAX(price) FROM products)
FROM products;

-- Fix: One subquery per column
SELECT
    name,
    (SELECT MIN(price) FROM products) AS min_price,
    (SELECT MAX(price) FROM products) AS max_price
FROM products;

Performance Considerations

sql
-- Scalar subquery in SELECT executes for EVERY row
-- This is inefficient for row-dependent subqueries:
SELECT
    name,
    (SELECT COUNT(*) FROM order_items WHERE product_id = p.product_id)
FROM products p;
-- Subquery runs once per product!

-- Better: Use JOIN or window function
SELECT
    p.name,
    COUNT(oi.item_id) AS order_count
FROM products p
LEFT JOIN order_items oi ON p.product_id = oi.product_id
GROUP BY p.product_id, p.name;

Question 77: Subquery in WHERE - Single Value

Problem

Find products priced above average.

Solution

sql
SELECT product_id, name, price
FROM products
WHERE price > (SELECT AVG(price) FROM products)
ORDER BY price DESC;

Explanation

Subquery in WHERE filters rows based on dynamically computed value.

Execution:

Subquery computes AVG(price) = 75.50
Outer query becomes WHERE price > 75.50
Filter and return results

Common Mistakes

Mistake 1: Comparing to multi-row subquery with operators

sql
-- ERROR if subquery returns multiple rows
SELECT * FROM products
WHERE price > (SELECT price FROM products WHERE category_id = 1);

-- Fix: Use ANY/ALL for multi-row
SELECT * FROM products
WHERE price > ALL (SELECT price FROM products WHERE category_id = 1);

Question 78: IN Subquery

Problem

Find users who have placed at least one order.

Solution

sql
SELECT user_id, email, full_name
FROM users
WHERE user_id IN (SELECT DISTINCT user_id FROM orders)
ORDER BY full_name;

Explanation

IN checks if value exists in the subquery result set.

sql
-- Equivalent to:
WHERE user_id = 1 OR user_id = 5 OR user_id = 12 ...

Common Mistakes

Mistake 1: IN with NULL values

sql
-- If subquery contains NULL, unexpected results
SELECT * FROM products
WHERE category_id IN (SELECT category_id FROM special_categories);
-- If subquery returns (1, 2, NULL), products with category_id = 3 are correctly excluded
-- But NOT IN with NULL is problematic (see next question)

Performance Considerations

sql
-- IN subquery often converted to semi-join by optimizer
EXPLAIN ANALYZE
SELECT * FROM users
WHERE user_id IN (SELECT user_id FROM orders);

-- Shows: Hash Semi Join (efficient)
-- Not: Subquery executed for each row

-- For large IN lists from subqueries, consider JOIN:
SELECT DISTINCT u.*
FROM users u
JOIN orders o ON u.user_id = o.user_id;

Question 79: NOT IN Subquery - The NULL Trap

Problem

Find users who have never placed an order.

Solution

sql
-- Safe version (handles NULLs)
SELECT user_id, email, full_name
FROM users
WHERE user_id NOT IN (
    SELECT user_id
    FROM orders
    WHERE user_id IS NOT NULL  -- Critical!
);

-- Or better, use NOT EXISTS (see Question 82)

Explanation

NOT IN has a dangerous NULL trap:

sql
-- If orders.user_id contains any NULL:
SELECT * FROM users WHERE user_id NOT IN (SELECT user_id FROM orders);
-- Returns ZERO rows! Even for users who never ordered.

-- Why?
-- NOT IN (1, 2, NULL) means:
-- user_id != 1 AND user_id != 2 AND user_id != NULL
-- user_id != NULL is UNKNOWN, not TRUE
-- UNKNOWN AND anything = UNKNOWN
-- So WHERE clause is never TRUE

Common Mistakes

Mistake 1: Forgetting NULL in subquery

sql
-- Dangerous: Will return nothing if any NULL in subquery
SELECT * FROM users WHERE user_id NOT IN (SELECT user_id FROM orders);

-- Safe alternatives:
-- 1. Filter NULLs
WHERE user_id NOT IN (SELECT user_id FROM orders WHERE user_id IS NOT NULL)

-- 2. Use NOT EXISTS (recommended)
WHERE NOT EXISTS (SELECT 1 FROM orders WHERE orders.user_id = users.user_id)

-- 3. Use LEFT JOIN + NULL check
SELECT u.* FROM users u
LEFT JOIN orders o ON u.user_id = o.user_id
WHERE o.order_id IS NULL;

Performance Considerations

sql
-- NOT IN can be slow - often becomes anti-join or filter
-- NOT EXISTS is usually better optimized
-- LEFT JOIN with NULL check often fastest for large tables

EXPLAIN ANALYZE
SELECT * FROM users
WHERE NOT EXISTS (SELECT 1 FROM orders WHERE orders.user_id = users.user_id);
-- Shows: Hash Anti Join (efficient)

Question 80: ANY and ALL

Problem

Find products more expensive than any product in category 1.

Solution

sql
-- ANY: More expensive than at least one category-1 product
SELECT product_id, name, price
FROM products
WHERE price > ANY (
    SELECT price
    FROM products
    WHERE category_id = 1
);

-- ALL: More expensive than every category-1 product
SELECT product_id, name, price
FROM products
WHERE price > ALL (
    SELECT price
    FROM products
    WHERE category_id = 1
);

Explanation

> ANY (subquery): Greater than at least one value (same as > MIN)
> ALL (subquery): Greater than every value (same as > MAX)
= ANY (subquery): Same as IN

sql
-- Equivalents:
> ANY (1, 5, 10) -- Same as > 1 (the minimum)
> ALL (1, 5, 10) -- Same as > 10 (the maximum)
< ANY (1, 5, 10) -- Same as < 10 (the maximum)
< ALL (1, 5, 10) -- Same as < 1 (the minimum)

Common Mistakes

Mistake 1: Confusing ANY and ALL

sql
-- "More expensive than any product" means > MIN (lowest bar)
-- "More expensive than all products" means > MAX (highest bar)

-- Usually you want ALL for "beat everyone" scenarios

Mistake 2: Empty subquery behavior

sql
-- If subquery returns no rows:
> ALL (empty) -- TRUE (vacuously true, no values to beat)
> ANY (empty) -- FALSE (no values to compare to)

Question 81: EXISTS Subquery

Problem

Find products that have at least one review.

Solution

sql
SELECT p.product_id, p.name
FROM products p
WHERE EXISTS (
    SELECT 1
    FROM reviews r
    WHERE r.product_id = p.product_id
);

Explanation

EXISTS returns TRUE if subquery returns at least one row, FALSE otherwise.

Key characteristics:

Stops at first match (short-circuit)
Doesn't care about values, only existence
Usually correlates with outer query

Common Mistakes

Mistake 1: Using SELECT * in EXISTS

sql
-- Works but misleading - columns don't matter
WHERE EXISTS (SELECT * FROM reviews WHERE ...)

-- Better: Use SELECT 1 to be clear
WHERE EXISTS (SELECT 1 FROM reviews WHERE ...)

Mistake 2: Non-correlated EXISTS

sql
-- This EXISTS returns same result for every row
WHERE EXISTS (SELECT 1 FROM reviews WHERE rating > 3)
-- Either TRUE for all rows or FALSE for all rows

-- Usually you want correlated:
WHERE EXISTS (SELECT 1 FROM reviews WHERE reviews.product_id = products.product_id)

Performance Considerations

sql
-- EXISTS is optimized to stop at first match
-- For checking existence, EXISTS beats COUNT
-- Bad:
WHERE (SELECT COUNT(*) FROM reviews WHERE product_id = p.product_id) > 0
-- Counts all reviews unnecessarily

-- Good:
WHERE EXISTS (SELECT 1 FROM reviews WHERE product_id = p.product_id)
-- Stops at first review found

Question 82: NOT EXISTS - Finding Missing Relations

Problem

Find users who have never written a review.

Solution

sql
SELECT u.user_id, u.full_name, u.email
FROM users u
WHERE NOT EXISTS (
    SELECT 1
    FROM reviews r
    WHERE r.user_id = u.user_id
);

Explanation

NOT EXISTS returns TRUE if subquery returns zero rows.

This is the safest way to find "missing" relationships - no NULL trap like NOT IN.

Common Mistakes

Mistake 1: Using NOT IN instead

sql
-- Dangerous if reviews.user_id can be NULL
SELECT * FROM users WHERE user_id NOT IN (SELECT user_id FROM reviews);

-- Safe
SELECT * FROM users u
WHERE NOT EXISTS (SELECT 1 FROM reviews r WHERE r.user_id = u.user_id);

Performance Considerations

sql
-- NOT EXISTS typically becomes Anti Semi Join
EXPLAIN ANALYZE
SELECT * FROM users u
WHERE NOT EXISTS (SELECT 1 FROM orders o WHERE o.user_id = u.user_id);

-- Ensure index on the correlated column
CREATE INDEX idx_orders_user ON orders(user_id);

Question 83: Correlated Subquery

Problem

Find each customer's most recent order.

Solution

sql
SELECT o.order_id, o.user_id, o.order_date, o.total_amount
FROM orders o
WHERE o.order_date = (
    SELECT MAX(o2.order_date)
    FROM orders o2
    WHERE o2.user_id = o.user_id
);

Explanation

A correlated subquery references columns from the outer query. It executes once per outer row.

Execution:

For user_id = 1: Find MAX(order_date) where user_id = 1
Check if current row's date matches
Repeat for each order row

Common Mistakes

Mistake 1: Thinking it's efficient

sql
-- Correlated subqueries execute per row = potentially slow
-- For 10,000 orders, subquery runs 10,000 times

-- Often better to rewrite as JOIN:
SELECT o.*
FROM orders o
JOIN (
    SELECT user_id, MAX(order_date) AS max_date
    FROM orders
    GROUP BY user_id
) latest ON o.user_id = latest.user_id AND o.order_date = latest.max_date;

-- Or use window function:
SELECT * FROM (
    SELECT *, ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY order_date DESC) AS rn
    FROM orders
) sub WHERE rn = 1;

Performance Considerations

sql
-- Correlated subqueries need careful indexing
-- Ensure index on correlation columns
CREATE INDEX idx_orders_user_date ON orders(user_id, order_date);

-- Check execution plan
EXPLAIN ANALYZE SELECT ...;
-- Look for: "SubPlan" or "InitPlan" indicators

Question 84: Subquery in FROM (Derived Table)

Problem

Calculate the percentage of total revenue each category represents.

Solution

sql
SELECT
    category_revenue.category_id,
    c.name AS category_name,
    category_revenue.revenue,
    total.total_revenue,
    ROUND(category_revenue.revenue * 100.0 / total.total_revenue, 2) AS pct_of_total
FROM (
    SELECT
        p.category_id,
        SUM(oi.quantity * oi.unit_price) AS revenue
    FROM products p
    JOIN order_items oi ON p.product_id = oi.product_id
    GROUP BY p.category_id
) AS category_revenue
CROSS JOIN (
    SELECT SUM(quantity * unit_price) AS total_revenue
    FROM order_items
) AS total
JOIN categories c ON category_revenue.category_id = c.category_id
ORDER BY revenue DESC;

Explanation

A subquery in FROM creates a derived table (or inline view). It's treated like a regular table.

Use cases:

Pre-aggregating data before joining
Creating intermediate results
Simplifying complex queries

Common Mistakes

Mistake 1: Forgetting alias

sql
-- ERROR: Derived table needs alias
SELECT * FROM (SELECT product_id FROM products);

-- Correct
SELECT * FROM (SELECT product_id FROM products) AS derived;

Mistake 2: Selecting columns not in derived table

sql
-- ERROR: 'name' not available
SELECT name FROM (SELECT product_id FROM products) AS p;
-- Derived table only has product_id

Performance Considerations

sql
-- Derived tables can be materialized or merged into outer query
-- Check EXPLAIN to see what optimizer does

-- Tip: For reused derived tables, consider CTE (more readable)

Question 85: Common Table Expression (CTE) Basics

Problem

Find users whose total spending exceeds the average total spending per user.

Solution

sql
WITH user_spending AS (
    SELECT
        user_id,
        SUM(total_amount) AS total_spent
    FROM orders
    WHERE status = 'completed'
    GROUP BY user_id
),
avg_spending AS (
    SELECT AVG(total_spent) AS avg_total FROM user_spending
)
SELECT
    us.user_id,
    u.full_name,
    us.total_spent,
    avs.avg_total
FROM user_spending us
CROSS JOIN avg_spending avs
JOIN users u ON us.user_id = u.user_id
WHERE us.total_spent > avs.avg_total
ORDER BY us.total_spent DESC;

Explanation

CTE (Common Table Expression) is a named temporary result set defined with WITH.

Benefits:

Improves readability
Can be referenced multiple times
Breaks complex queries into logical steps

Syntax:

sql
WITH cte_name AS (
    SELECT ...
),
another_cte AS (
    SELECT ... FROM cte_name  -- Can reference previous CTE
)
SELECT ... FROM cte_name JOIN another_cte ...;

Common Mistakes

Mistake 1: Trying to use CTE after semicolon

sql
-- ERROR: CTE scope ends at semicolon
WITH my_cte AS (...);
SELECT * FROM my_cte;  -- CTE not found!

-- Correct: No semicolon before final SELECT
WITH my_cte AS (...)
SELECT * FROM my_cte;

Mistake 2: Referencing CTE before it's defined

sql
-- ERROR: cte2 references cte1 before it exists
WITH cte2 AS (SELECT * FROM cte1),
     cte1 AS (SELECT 1)
SELECT * FROM cte2;

-- Correct order:
WITH cte1 AS (SELECT 1),
     cte2 AS (SELECT * FROM cte1)
SELECT * FROM cte2;

Performance Considerations

sql
-- CTE behavior varies by database:
-- PostgreSQL < 12: CTEs are optimization barriers (always materialized)
-- PostgreSQL >= 12: Optimizer can inline CTEs
-- MySQL 8.0+: CTEs supported, optimizer can inline
-- SQL Server: Inline by default

-- Force materialization when needed:
WITH cte AS MATERIALIZED (...)  -- PostgreSQL 12+

Question 86: Multiple CTEs

Problem

Create a sales report showing daily, weekly, and monthly summaries.

Solution

sql
WITH daily_sales AS (
    SELECT
        DATE(order_date) AS sale_date,
        SUM(total_amount) AS daily_total
    FROM orders
    WHERE status = 'completed'
    GROUP BY DATE(order_date)
),
weekly_sales AS (
    SELECT
        DATE_TRUNC('week', sale_date) AS week_start,
        SUM(daily_total) AS weekly_total
    FROM daily_sales
    GROUP BY DATE_TRUNC('week', sale_date)
),
monthly_sales AS (
    SELECT
        DATE_TRUNC('month', sale_date) AS month_start,
        SUM(daily_total) AS monthly_total
    FROM daily_sales
    GROUP BY DATE_TRUNC('month', sale_date)
)
SELECT
    d.sale_date,
    d.daily_total,
    w.weekly_total,
    m.monthly_total,
    ROUND(d.daily_total * 100.0 / m.monthly_total, 2) AS pct_of_month
FROM daily_sales d
JOIN weekly_sales w ON DATE_TRUNC('week', d.sale_date) = w.week_start
JOIN monthly_sales m ON DATE_TRUNC('month', d.sale_date) = m.month_start
ORDER BY d.sale_date;

Explanation

Multiple CTEs create a pipeline:

daily_sales: Base aggregation
weekly_sales: Built from daily_sales
monthly_sales: Also from daily_sales
Final SELECT: Combines all levels

This is much cleaner than nested subqueries.

Question 87: CTE vs Subquery - When to Use Which

Problem

Demonstrate equivalent query with CTE and subquery.

Solution

sql
-- Subquery version (nested, harder to read)
SELECT
    p.product_id,
    p.name,
    p.price,
    stats.avg_price,
    stats.max_price
FROM products p
CROSS JOIN (
    SELECT AVG(price) AS avg_price, MAX(price) AS max_price
    FROM products
) AS stats
WHERE p.price > stats.avg_price;

-- CTE version (flat, easier to read)
WITH product_stats AS (
    SELECT AVG(price) AS avg_price, MAX(price) AS max_price
    FROM products
)
SELECT
    p.product_id,
    p.name,
    p.price,
    ps.avg_price,
    ps.max_price
FROM products p
CROSS JOIN product_stats ps
WHERE p.price > ps.avg_price;

Explanation

Use CTE when:

Query logic has multiple steps
Same derived table needed multiple times
Recursion required
Readability is priority

Use Subquery when:

Simple, one-time use
Performance critical (subqueries may optimize better)
CTE not supported (older databases)

Performance Considerations

sql
-- CTE referenced twice vs subquery twice:

-- CTE (potentially computed once, stored)
WITH expensive_calc AS (
    SELECT ... -- complex calculation
)
SELECT * FROM expensive_calc
UNION ALL
SELECT * FROM expensive_calc;

-- Subquery (may compute twice)
SELECT * FROM (SELECT ... ) AS t1
UNION ALL
SELECT * FROM (SELECT ... ) AS t2;

-- Modern optimizers often handle both efficiently
-- Test with EXPLAIN ANALYZE for your specific case

Question 88: Recursive CTE - Employee Hierarchy

Problem

Build complete management hierarchy for each employee.

Solution

sql
WITH RECURSIVE employee_hierarchy AS (
    -- Base case: Top-level managers (no manager)
    SELECT
        employee_id,
        name,
        manager_id,
        1 AS level,
        name AS hierarchy_path
    FROM employees
    WHERE manager_id IS NULL

    UNION ALL

    -- Recursive case: Employees with managers
    SELECT
        e.employee_id,
        e.name,
        e.manager_id,
        eh.level + 1,
        eh.hierarchy_path || ' > ' || e.name
    FROM employees e
    JOIN employee_hierarchy eh ON e.manager_id = eh.employee_id
)
SELECT
    employee_id,
    name,
    level,
    hierarchy_path
FROM employee_hierarchy
ORDER BY hierarchy_path;

Explanation

Recursive CTE processes hierarchical or graph data:

Anchor member: Starting point (employees with no manager)
Recursive member: References the CTE itself, linked via JOIN
UNION ALL: Combines anchor and recursive results
Termination: Stops when recursive member returns no rows

Iteration 1 (anchor): CEO
Iteration 2: VP1, VP2 (report to CEO)
Iteration 3: Managers (report to VPs)
Iteration 4: Staff (report to managers)
... until no more employees found

Common Mistakes

Mistake 1: Infinite recursion

sql
-- Missing termination condition
WITH RECURSIVE bad AS (
    SELECT 1 AS n
    UNION ALL
    SELECT n + 1 FROM bad  -- Never stops!
)
SELECT * FROM bad;

-- Add termination:
WITH RECURSIVE safe AS (
    SELECT 1 AS n
    UNION ALL
    SELECT n + 1 FROM safe WHERE n < 100  -- Stops at 100
)
SELECT * FROM safe;

Mistake 2: Using UNION instead of UNION ALL

sql
-- UNION removes duplicates, may cause issues
WITH RECURSIVE hierarchy AS (
    ...
    UNION  -- Might eliminate legitimate duplicate values
    ...
)
-- Usually use UNION ALL for recursive CTEs

Performance Considerations

sql
-- Index on the recursive join column
CREATE INDEX idx_employees_manager ON employees(manager_id);

-- Set recursion limit (PostgreSQL)
SET max_recursive_iterations = 100;

-- Watch for exponential growth in badly structured data

Question 89: Recursive CTE - Category Tree

Problem

Show all categories with their full path (e.g., "Electronics > Computers > Laptops").

Solution

sql
WITH RECURSIVE category_tree AS (
    -- Root categories (no parent)
    SELECT
        category_id,
        name,
        parent_category_id,
        name AS full_path,
        1 AS depth
    FROM categories
    WHERE parent_category_id IS NULL

    UNION ALL

    -- Child categories
    SELECT
        c.category_id,
        c.name,
        c.parent_category_id,
        ct.full_path || ' > ' || c.name,
        ct.depth + 1
    FROM categories c
    JOIN category_tree ct ON c.parent_category_id = ct.category_id
)
SELECT
    category_id,
    name,
    full_path,
    depth
FROM category_tree
ORDER BY full_path;

Explanation

This builds hierarchical paths:

category_id | name       | full_path                          | depth
1           | Electronics| Electronics                        | 1
2           | Computers  | Electronics > Computers            | 2
3           | Laptops    | Electronics > Computers > Laptops  | 3
4           | Desktops   | Electronics > Computers > Desktops | 3
5           | Clothing   | Clothing                           | 1

Useful for:

Breadcrumb navigation
Sitemap generation
Hierarchical dropdowns

Question 90: Recursive CTE - Finding All Descendants

Problem

Find all employees who report to a specific manager (at any level).

Solution

sql
WITH RECURSIVE all_reports AS (
    -- Direct reports to manager 5
    SELECT
        employee_id,
        name,
        manager_id,
        1 AS report_level
    FROM employees
    WHERE manager_id = 5

    UNION ALL

    -- Reports of reports (recursive)
    SELECT
        e.employee_id,
        e.name,
        e.manager_id,
        ar.report_level + 1
    FROM employees e
    JOIN all_reports ar ON e.manager_id = ar.employee_id
)
SELECT * FROM all_reports
ORDER BY report_level, name;

Explanation

Starting from manager_id = 5:

Find all direct reports
Find all their reports
Continue until no more reports

Useful for:

Cascade delete checks
Permission hierarchies
Organizational reporting

Question 91: Recursive CTE - Generating Series

Problem

Generate a date series for the last 30 days (without generate_series function).

Solution

sql
WITH RECURSIVE date_series AS (
    SELECT CURRENT_DATE - INTERVAL '29 days' AS date_val

    UNION ALL

    SELECT date_val + INTERVAL '1 day'
    FROM date_series
    WHERE date_val < CURRENT_DATE
)
SELECT date_val::DATE AS date
FROM date_series;

Explanation

This generates dates without relying on built-in functions:

date_val
2024-01-01
2024-01-02
...
2024-01-30

Use cases:

Fill gaps in time series data
Create calendar tables
Generate test data

Alternative with generate_series (PostgreSQL)

sql
-- If available, use built-in function
SELECT generate_series(
    CURRENT_DATE - INTERVAL '29 days',
    CURRENT_DATE,
    INTERVAL '1 day'
)::DATE AS date;

Question 92: Subquery for Row-by-Row Calculations

Problem

For each product, show units sold and revenue, compared to category average.

Solution

sql
SELECT
    p.product_id,
    p.name,
    p.category_id,
    COALESCE(SUM(oi.quantity), 0) AS units_sold,
    COALESCE(SUM(oi.quantity * oi.unit_price), 0) AS revenue,
    (
        SELECT ROUND(AVG(sub_revenue), 2)
        FROM (
            SELECT SUM(oi2.quantity * oi2.unit_price) AS sub_revenue
            FROM products p2
            LEFT JOIN order_items oi2 ON p2.product_id = oi2.product_id
            WHERE p2.category_id = p.category_id
            GROUP BY p2.product_id
        ) category_products
    ) AS category_avg_revenue
FROM products p
LEFT JOIN order_items oi ON p.product_id = oi.product_id
GROUP BY p.product_id, p.name, p.category_id
ORDER BY revenue DESC;

Explanation

The correlated subquery calculates average revenue for products in the same category.

Better approach using CTE:

sql
WITH product_revenue AS (
    SELECT
        p.product_id,
        p.name,
        p.category_id,
        COALESCE(SUM(oi.quantity * oi.unit_price), 0) AS revenue
    FROM products p
    LEFT JOIN order_items oi ON p.product_id = oi.product_id
    GROUP BY p.product_id, p.name, p.category_id
),
category_avg AS (
    SELECT
        category_id,
        ROUND(AVG(revenue), 2) AS avg_revenue
    FROM product_revenue
    GROUP BY category_id
)
SELECT
    pr.*,
    ca.avg_revenue AS category_avg_revenue,
    pr.revenue - ca.avg_revenue AS diff_from_category_avg
FROM product_revenue pr
JOIN category_avg ca ON pr.category_id = ca.category_id
ORDER BY pr.revenue DESC;

Question 93: Lateral Join (Correlated Subquery in FROM)

Problem

For each user, get their 3 most recent orders.

Solution

sql
-- PostgreSQL / MySQL 8+ syntax
SELECT
    u.user_id,
    u.full_name,
    recent_orders.order_id,
    recent_orders.order_date,
    recent_orders.total_amount
FROM users u
CROSS JOIN LATERAL (
    SELECT order_id, order_date, total_amount
    FROM orders o
    WHERE o.user_id = u.user_id
    ORDER BY order_date DESC
    LIMIT 3
) AS recent_orders
ORDER BY u.user_id, recent_orders.order_date DESC;

Explanation

LATERAL allows subquery in FROM to reference columns from preceding tables.

Without LATERAL:

sql
-- ERROR: Cannot reference u.user_id in subquery
SELECT * FROM users u
JOIN (
    SELECT * FROM orders WHERE user_id = u.user_id  -- Error!
) AS o ON true;

With LATERAL:

sql
-- Works: Subquery can see u.user_id
SELECT * FROM users u
CROSS JOIN LATERAL (
    SELECT * FROM orders WHERE user_id = u.user_id
) AS o;

Common Mistakes

Mistake 1: Using CROSS JOIN LATERAL with empty results

sql
-- CROSS JOIN LATERAL excludes users with no matching orders
-- Use LEFT JOIN LATERAL to include them:
SELECT u.*, recent.*
FROM users u
LEFT JOIN LATERAL (
    SELECT * FROM orders WHERE user_id = u.user_id LIMIT 3
) AS recent ON true;

Performance Considerations

sql
-- LATERAL is like correlated subquery in FROM clause
-- Ensure proper indexes
CREATE INDEX idx_orders_user_date ON orders(user_id, order_date DESC);

-- Alternative: Window function approach
WITH ranked_orders AS (
    SELECT
        o.*,
        ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY order_date DESC) AS rn
    FROM orders o
)
SELECT u.*, ro.*
FROM users u
JOIN ranked_orders ro ON u.user_id = ro.user_id AND ro.rn <= 3;

Question 94: Subquery in UPDATE

Problem

Update product prices to match the average price in their category.

Solution

sql
-- Set each product's price to its category average
UPDATE products p
SET price = (
    SELECT AVG(p2.price)
    FROM products p2
    WHERE p2.category_id = p.category_id
)
WHERE category_id IS NOT NULL;

-- Better: Using CTE (PostgreSQL)
WITH category_averages AS (
    SELECT category_id, AVG(price) AS avg_price
    FROM products
    GROUP BY category_id
)
UPDATE products p
SET price = ca.avg_price
FROM category_averages ca
WHERE p.category_id = ca.category_id;

Explanation

Subqueries in UPDATE can:

Set values based on calculations
Reference the row being updated
Join with other tables for complex logic

Common Mistakes

Mistake 1: Subquery returns multiple rows

sql
-- ERROR if subquery returns multiple values
UPDATE products SET price = (SELECT price FROM other_prices);

-- Ensure single value per row
UPDATE products p
SET price = (SELECT op.price FROM other_prices op WHERE op.product_id = p.product_id);

Mistake 2: Not considering NULL

sql
-- If subquery returns NULL, price becomes NULL
UPDATE products SET price = (SELECT ... );

-- Use COALESCE for safety
UPDATE products SET price = COALESCE((SELECT ...), price);  -- Keep original if NULL

Question 95: Subquery in DELETE

Problem

Delete orders that have no order items.

Solution

sql
-- Using NOT EXISTS (safest)
DELETE FROM orders o
WHERE NOT EXISTS (
    SELECT 1 FROM order_items oi WHERE oi.order_id = o.order_id
);

-- Using NOT IN (watch for NULLs)
DELETE FROM orders
WHERE order_id NOT IN (
    SELECT DISTINCT order_id FROM order_items WHERE order_id IS NOT NULL
);

-- Using LEFT JOIN (PostgreSQL)
DELETE FROM orders o
USING (
    SELECT o2.order_id
    FROM orders o2
    LEFT JOIN order_items oi ON o2.order_id = oi.order_id
    WHERE oi.item_id IS NULL
) AS empty_orders
WHERE o.order_id = empty_orders.order_id;

Explanation

Similar to SELECT, DELETE can use subqueries to identify rows to remove.

Common Mistakes

Mistake 1: Deleting too much with poorly scoped subquery

sql
-- Dangerous: Delete all orders if ANY order has no items
DELETE FROM orders WHERE (SELECT COUNT(*) FROM order_items) = 0;

-- Correct: Correlate the subquery
DELETE FROM orders o
WHERE (SELECT COUNT(*) FROM order_items oi WHERE oi.order_id = o.order_id) = 0;

Question 96: Subquery in INSERT

Problem

Insert top 10 customers into a VIP table.

Solution

sql
-- Create VIP table
CREATE TABLE vip_customers (
    user_id INT PRIMARY KEY,
    full_name VARCHAR(100),
    total_spent DECIMAL(12,2),
    added_date TIMESTAMP DEFAULT NOW()
);

-- Insert top spenders
INSERT INTO vip_customers (user_id, full_name, total_spent)
SELECT
    u.user_id,
    u.full_name,
    SUM(o.total_amount) AS total_spent
FROM users u
JOIN orders o ON u.user_id = o.user_id
WHERE o.status = 'completed'
GROUP BY u.user_id, u.full_name
ORDER BY total_spent DESC
LIMIT 10;

Explanation

INSERT ... SELECT inserts results of a query into a table.

Use cases:

Archiving data
Creating summary tables
ETL operations
Populating denormalized tables

Common Mistakes

Mistake 1: Column mismatch

sql
-- ERROR: Column count/types don't match
INSERT INTO vip_customers (user_id, full_name, total_spent)
SELECT user_id, full_name  -- Missing third column!
FROM users;

Mistake 2: Duplicate key violations

sql
-- If VIP already exists, INSERT fails
INSERT INTO vip_customers (user_id, ...)
SELECT ... -- Might include existing user_id

-- Use ON CONFLICT (PostgreSQL) or INSERT IGNORE (MySQL)
INSERT INTO vip_customers (user_id, full_name, total_spent)
SELECT ...
ON CONFLICT (user_id) DO UPDATE
SET total_spent = EXCLUDED.total_spent;

Question 97: CTE with INSERT/UPDATE/DELETE

Problem

Archive old orders and delete them in one operation.

Solution

sql
-- PostgreSQL: CTE with data modification
WITH archived AS (
    INSERT INTO orders_archive (order_id, user_id, order_date, total_amount, archived_at)
    SELECT order_id, user_id, order_date, total_amount, NOW()
    FROM orders
    WHERE order_date < CURRENT_DATE - INTERVAL '2 years'
    RETURNING order_id
)
DELETE FROM orders
WHERE order_id IN (SELECT order_id FROM archived);

Explanation

PostgreSQL allows data-modifying CTEs:

CTE inserts old orders into archive, returns IDs
Main query deletes those orders from original table
Both happen in same transaction

Common Mistakes

Mistake 1: Assuming all databases support this

sql
-- This is PostgreSQL-specific
-- MySQL/SQL Server need separate statements:
INSERT INTO orders_archive SELECT ... FROM orders WHERE ...;
DELETE FROM orders WHERE ...;

Question 98: Complex Business Query - Customer Cohort Analysis

Problem

Analyze customer retention by signup month (cohort analysis).

Solution

sql
WITH customer_cohorts AS (
    SELECT
        user_id,
        DATE_TRUNC('month', created_at) AS cohort_month
    FROM users
),
customer_activities AS (
    SELECT
        o.user_id,
        DATE_TRUNC('month', o.order_date) AS activity_month
    FROM orders o
    WHERE o.status = 'completed'
    GROUP BY o.user_id, DATE_TRUNC('month', o.order_date)
),
cohort_activity AS (
    SELECT
        cc.cohort_month,
        ca.activity_month,
        COUNT(DISTINCT cc.user_id) AS active_users
    FROM customer_cohorts cc
    JOIN customer_activities ca ON cc.user_id = ca.user_id
    WHERE ca.activity_month >= cc.cohort_month
    GROUP BY cc.cohort_month, ca.activity_month
),
cohort_sizes AS (
    SELECT
        cohort_month,
        COUNT(DISTINCT user_id) AS cohort_size
    FROM customer_cohorts
    GROUP BY cohort_month
)
SELECT
    ca.cohort_month,
    cs.cohort_size,
    ca.activity_month,
    EXTRACT(MONTH FROM AGE(ca.activity_month, ca.cohort_month)) AS months_since_signup,
    ca.active_users,
    ROUND(ca.active_users * 100.0 / cs.cohort_size, 2) AS retention_pct
FROM cohort_activity ca
JOIN cohort_sizes cs ON ca.cohort_month = cs.cohort_month
ORDER BY ca.cohort_month, ca.activity_month;

Explanation

This creates a retention matrix:

Cohort: Group of customers by signup month
Activity: When they placed orders
Retention: Percentage still active N months later

cohort    | size | month_0 | month_1 | month_2 | ...
2024-01   | 100  | 100%    | 45%     | 32%     |
2024-02   | 150  | 100%    | 50%     | 38%     |

Use Cases

Measuring product stickiness
Identifying when customers churn
Comparing cohort quality over time

Question 99: Complex Business Query - Funnel Analysis

Problem

Analyze conversion funnel: page view > add to cart > checkout > purchase.

Solution

sql
WITH page_views AS (
    SELECT DISTINCT user_id, DATE(viewed_at) AS event_date
    FROM page_views
    WHERE page_url LIKE '/product/%'
),
cart_adds AS (
    SELECT DISTINCT user_id, DATE(created_at) AS event_date
    FROM cart_items
),
checkouts AS (
    SELECT DISTINCT user_id, DATE(created_at) AS event_date
    FROM checkout_sessions
),
purchases AS (
    SELECT DISTINCT user_id, DATE(order_date) AS event_date
    FROM orders
    WHERE status IN ('completed', 'processing')
),
daily_funnel AS (
    SELECT
        pv.event_date,
        COUNT(DISTINCT pv.user_id) AS viewed,
        COUNT(DISTINCT ca.user_id) AS added_to_cart,
        COUNT(DISTINCT co.user_id) AS checked_out,
        COUNT(DISTINCT pu.user_id) AS purchased
    FROM page_views pv
    LEFT JOIN cart_adds ca ON pv.user_id = ca.user_id AND pv.event_date = ca.event_date
    LEFT JOIN checkouts co ON ca.user_id = co.user_id AND ca.event_date = co.event_date
    LEFT JOIN purchases pu ON co.user_id = pu.user_id AND co.event_date = pu.event_date
    GROUP BY pv.event_date
)
SELECT
    event_date,
    viewed,
    added_to_cart,
    ROUND(added_to_cart * 100.0 / NULLIF(viewed, 0), 2) AS view_to_cart_pct,
    checked_out,
    ROUND(checked_out * 100.0 / NULLIF(added_to_cart, 0), 2) AS cart_to_checkout_pct,
    purchased,
    ROUND(purchased * 100.0 / NULLIF(checked_out, 0), 2) AS checkout_to_purchase_pct,
    ROUND(purchased * 100.0 / NULLIF(viewed, 0), 2) AS overall_conversion_pct
FROM daily_funnel
ORDER BY event_date;

Explanation

Funnel analysis tracks user progression through stages:

event_date | viewed | cart | checkout | purchased | conversion
2024-01-15 | 1000   | 300  | 150      | 120       | 12%

Each stage shows:

Absolute numbers
Conversion rate from previous stage
Overall conversion (end-to-end)

Question 100: Complex Business Query - Revenue Attribution

Problem

Attribute revenue to first-touch marketing channel.

Solution

sql
WITH first_touch AS (
    -- Find first touchpoint for each user
    SELECT DISTINCT ON (user_id)
        user_id,
        channel,
        campaign,
        touch_date
    FROM marketing_touches
    ORDER BY user_id, touch_date ASC
),
user_revenue AS (
    -- Total revenue per user
    SELECT
        user_id,
        SUM(total_amount) AS lifetime_revenue,
        COUNT(*) AS order_count,
        MIN(order_date) AS first_order,
        MAX(order_date) AS last_order
    FROM orders
    WHERE status = 'completed'
    GROUP BY user_id
)
SELECT
    ft.channel,
    ft.campaign,
    COUNT(DISTINCT ft.user_id) AS acquired_users,
    COUNT(DISTINCT ur.user_id) AS converted_users,
    ROUND(COUNT(DISTINCT ur.user_id) * 100.0 / COUNT(DISTINCT ft.user_id), 2) AS conversion_rate,
    COALESCE(SUM(ur.lifetime_revenue), 0) AS attributed_revenue,
    ROUND(COALESCE(AVG(ur.lifetime_revenue), 0), 2) AS avg_customer_value
FROM first_touch ft
LEFT JOIN user_revenue ur ON ft.user_id = ur.user_id
GROUP BY ft.channel, ft.campaign
ORDER BY attributed_revenue DESC;

Explanation

First-touch attribution credits revenue to the channel that first introduced the customer:

channel   | campaign    | users | converted | revenue   | avg_value
google    | brand_2024  | 5000  | 800       | $120,000  | $150
facebook  | retargeting | 3000  | 450       | $67,500   | $150
organic   | NULL        | 10000 | 1200      | $180,000  | $150

Use Cases

Marketing ROI analysis
Budget allocation
Campaign optimization

Summary: Subqueries and CTEs Cheat Sheet

Subquery Types

Type	Location	Returns	Example
Scalar	SELECT, WHERE	Single value	`(SELECT MAX(price) FROM products)`
Row	WHERE	Single row	`WHERE (a, b) = (SELECT ...)`
Table	FROM	Multiple rows/cols	`FROM (SELECT ...) AS t`
Correlated	Any	Depends on outer	`WHERE price > (SELECT AVG(price) WHERE category_id = p.category_id)`

Subquery Operators

Operator	Meaning
IN	Value in set
NOT IN	Value not in set (watch NULLs!)
EXISTS	Subquery returns rows
NOT EXISTS	Subquery returns no rows
ANY/SOME	Comparison to any value
ALL	Comparison to all values

CTE Syntax

sql
WITH cte_name AS (
    SELECT ...
),
cte2 AS (
    SELECT ... FROM cte_name
)
SELECT ... FROM cte_name JOIN cte2 ...;

Recursive CTE Pattern

sql
WITH RECURSIVE cte AS (
    -- Anchor: starting rows
    SELECT ... WHERE condition

    UNION ALL

    -- Recursive: reference CTE
    SELECT ... FROM table JOIN cte ON ...
)
SELECT * FROM cte;

Performance Tips

EXISTS vs IN: EXISTS often faster, no NULL issues
NOT EXISTS vs NOT IN: Always prefer NOT EXISTS
Correlated vs JOIN: JOIN usually faster for large datasets
CTE materialization: Check if your database materializes or inlines
LATERAL: Useful but like correlated subquery in FROM

Congratulations!

You've completed all 100 questions in the SQL Mastery Guide!

What you've learned:

Part 1 (Q1-25): SELECT fundamentals, filtering, sorting
Part 2 (Q26-50): All types of JOINs
Part 3 (Q51-75): Aggregations and window functions
Part 4 (Q76-100): Subqueries and CTEs

Next steps:

Practice these patterns on real data
Use EXPLAIN ANALYZE to understand query performance
Learn your specific database's optimizations
Build increasingly complex queries combining all techniques

Happy querying!