SQL Mastery Guide Part 2: Joins Mastery

Why Joins Matter

Joins are the heart of relational databases. They're what make "relational" meaningful connecting data across tables based on relationships.

After 20+ years of database work, I can tell you: understanding joins deeply separates junior developers from senior engineers. This part covers every join type with real performance implications.

We continue with our e-commerce schema from Part 1.

Section 1: INNER JOIN

Question 26: Basic INNER JOIN

Difficulty: Beginner | Topic: INNER JOIN Basics

Problem: Get all orders with their customer information.

Solution:

sql
SELECT
    o.order_id,
    o.order_number,
    o.total_amount,
    o.created_at,
    u.user_id,
    u.email,
    u.first_name,
    u.last_name
FROM orders o
INNER JOIN users u ON o.user_id = u.user_id;

Explanation: INNER JOIN returns only rows where the join condition matches in BOTH tables.

orders table:         users table:           Result:
order_id | user_id    user_id | name         order_id | user_id | name
---------|--------    --------|-----         ---------|---------|-----
1        | 100        100     | Alice    →   1        | 100     | Alice
2        | 101        101     | Bob      →   2        | 101     | Bob
3        | 102        103     | Charlie      3        | (no match, excluded)

Order 3's user_id (102) has no match in users, so it's excluded.

Common Mistakes:

sql
-- Missing table alias causes ambiguous column error
SELECT order_id, user_id, email  -- Which user_id?
FROM orders
INNER JOIN users ON orders.user_id = users.user_id;

-- CORRECT: Use aliases
SELECT o.order_id, o.user_id, u.email
FROM orders o
INNER JOIN users u ON o.user_id = u.user_id;

Performance Concerns: With 50M orders and 10M users:

Without index: Full table scans, comparing every row combination
With index on join column: Index lookup, dramatically faster

Indexing Suggestion:

sql
-- Index on the column being joined TO
CREATE INDEX idx_orders_user_id ON orders(user_id);

-- The primary key on users.user_id is already indexed

Question 27: Multi-Table INNER JOIN

Difficulty: Intermediate | Topic: Chained Joins

Problem: Get order details including customer name, product name, and quantity.

Solution:

sql
SELECT
    o.order_number,
    u.first_name || ' ' || u.last_name AS customer_name,
    p.name AS product_name,
    oi.quantity,
    oi.unit_price,
    oi.total_price
FROM orders o
INNER JOIN users u ON o.user_id = u.user_id
INNER JOIN order_items oi ON o.order_id = oi.order_id
INNER JOIN products p ON oi.product_id = p.product_id;

Explanation: You can chain multiple joins. Each join connects to the result of the previous joins.

Join Order:

orders → users (via user_id)
       → order_items (via order_id)
              → products (via product_id)

Performance Concerns: The optimizer decides join order. Sometimes it gets it wrong.

Checking Join Order:

sql
EXPLAIN ANALYZE
SELECT ...  -- your query

Look for the join order in the plan. If a large table is scanned first, consider query hints or restructuring.

When Join Order Matters:

Start with the most selective filter
Join smaller result sets first
Avoid Cartesian products in intermediate steps

Question 28: INNER JOIN with Filter

Difficulty: Intermediate | Topic: Filtered Joins

Problem: Get active products with their category names.

Solution:

sql
SELECT
    p.product_id,
    p.name AS product_name,
    p.price,
    c.name AS category_name
FROM products p
INNER JOIN categories c ON p.category_id = c.category_id
WHERE p.status = 'active'
  AND c.is_active = TRUE;

WHERE vs JOIN Condition:

sql
-- Filter in WHERE (standard approach)
FROM products p
INNER JOIN categories c ON p.category_id = c.category_id
WHERE p.status = 'active';

-- Filter in JOIN condition (same result for INNER JOIN)
FROM products p
INNER JOIN categories c
    ON p.category_id = c.category_id
    AND p.status = 'active';

For INNER JOIN, both produce the same result. For LEFT JOIN, they're different! (See Question 34)

Question 29: Self-Referencing Data with JOIN

Difficulty: Intermediate | Topic: Join Patterns

Problem: Get products with their category hierarchy (category and parent category).

Solution:

sql
SELECT
    p.product_id,
    p.name AS product_name,
    c.name AS category,
    parent.name AS parent_category
FROM products p
INNER JOIN categories c ON p.category_id = c.category_id
LEFT JOIN categories parent ON c.parent_id = parent.category_id;

Why LEFT JOIN for parent? Top-level categories have no parent (parent_id IS NULL). We still want those products.

Section 2: LEFT JOIN (LEFT OUTER JOIN)

Question 30: Basic LEFT JOIN

Difficulty: Beginner | Topic: LEFT JOIN Basics

Problem: Get all users and their order counts (including users with no orders).

Solution:

sql
SELECT
    u.user_id,
    u.email,
    u.first_name,
    COUNT(o.order_id) AS order_count
FROM users u
LEFT JOIN orders o ON u.user_id = o.user_id
GROUP BY u.user_id, u.email, u.first_name;

Explanation: LEFT JOIN returns ALL rows from the left table, even if there's no match in the right table.

users table:           orders table:          Result:
user_id | name         order_id | user_id    user_id | name    | order_id
--------|-----         ---------|--------    --------|---------|---------
100     | Alice        1        | 100        100     | Alice   | 1
101     | Bob          2        | 100        100     | Alice   | 2
102     | Charlie                             101     | Bob     | NULL
                                             102     | Charlie | NULL

Alice has 2 orders. Bob and Charlie have NULL (no orders).

COUNT Behavior:

COUNT(*) counts all rows (including NULLs) → would show 1 for Bob
COUNT(o.order_id) counts non-NULL values → shows 0 for Bob

Question 31: LEFT JOIN Anti-Pattern (Finding Missing Data)

Difficulty: Intermediate | Topic: Anti-Join

Problem: Find users who have never placed an order.

Solution:

sql
SELECT u.user_id, u.email, u.first_name
FROM users u
LEFT JOIN orders o ON u.user_id = o.user_id
WHERE o.order_id IS NULL;

Explanation:

LEFT JOIN keeps all users
Users without orders have NULL in all orders columns
WHERE o.order_id IS NULL filters to only those users

Why Check order_id, Not user_id?

sql
-- WRONG: This might match NULLs that exist in orders
WHERE o.user_id IS NULL

-- CORRECT: Primary key is never NULL in a matched row
WHERE o.order_id IS NULL

Alternative Approaches:

sql
-- NOT EXISTS (often faster)
SELECT u.user_id, u.email, u.first_name
FROM users u
WHERE NOT EXISTS (
    SELECT 1 FROM orders o WHERE o.user_id = u.user_id
);

-- NOT IN (risky with NULLs)
SELECT user_id, email, first_name
FROM users
WHERE user_id NOT IN (SELECT user_id FROM orders WHERE user_id IS NOT NULL);

Question 32: LEFT JOIN with Aggregation

Difficulty: Intermediate | Topic: Aggregate with Joins

Problem: Get each customer's total spending (including customers who never bought).

Solution:

sql
SELECT
    u.user_id,
    u.email,
    u.first_name,
    COALESCE(SUM(o.total_amount), 0) AS total_spent,
    COUNT(o.order_id) AS order_count,
    COALESCE(AVG(o.total_amount), 0) AS avg_order_value
FROM users u
LEFT JOIN orders o ON u.user_id = o.user_id
    AND o.status NOT IN ('cancelled', 'refunded')  -- Filter in JOIN!
GROUP BY u.user_id, u.email, u.first_name
ORDER BY total_spent DESC;

Why Filter in JOIN, Not WHERE?

sql
-- WRONG: This excludes users with no valid orders
FROM users u
LEFT JOIN orders o ON u.user_id = o.user_id
WHERE o.status NOT IN ('cancelled', 'refunded');  -- Removes NULL rows!

-- CORRECT: Filter in JOIN keeps users with no orders
FROM users u
LEFT JOIN orders o ON u.user_id = o.user_id
    AND o.status NOT IN ('cancelled', 'refunded');

Question 33: Multiple LEFT JOINs

Difficulty: Intermediate | Topic: Chained LEFT JOINs

Problem: Get product information with optional brand and category (some products may have NULL brand_id).

Solution:

sql
SELECT
    p.product_id,
    p.name AS product_name,
    p.price,
    COALESCE(c.name, 'Uncategorized') AS category,
    COALESCE(b.name, 'No Brand') AS brand
FROM products p
LEFT JOIN categories c ON p.category_id = c.category_id
LEFT JOIN brands b ON p.brand_id = b.brand_id
WHERE p.status = 'active';

Performance Concerns: Multiple LEFT JOINs can create large intermediate result sets. Each join multiplies potential rows.

Question 34: LEFT JOIN Filter Location Trap

Difficulty: Advanced | Topic: Join Logic

Problem: Get all users with their orders from 2024 (users without 2024 orders should still appear).

Solution:

sql
-- WRONG: Filters out users with no 2024 orders
SELECT u.user_id, u.email, o.order_id, o.created_at
FROM users u
LEFT JOIN orders o ON u.user_id = o.user_id
WHERE o.created_at >= '2024-01-01';  -- This removes NULL rows!

-- CORRECT: Filter in JOIN condition
SELECT u.user_id, u.email, o.order_id, o.created_at
FROM users u
LEFT JOIN orders o ON u.user_id = o.user_id
    AND o.created_at >= '2024-01-01';

Visual Explanation:

WRONG approach (WHERE):
users:           orders:                     After LEFT JOIN:        After WHERE:
user_id          user_id | date              user_id | date          user_id | date
-------          --------|-----              --------|-----          --------|-----
1                1       | 2023-05           1       | 2023-05       (excluded - 2023)
2                1       | 2024-03           1       | 2024-03   →   1       | 2024-03
                 2       | 2023-08           2       | 2023-08       (excluded - 2023)
                                                                     (user 2 disappears!)

CORRECT approach (AND in JOIN):
After LEFT JOIN with date filter:
user_id | date
--------|-----
1       | 2024-03    (2023 order filtered in JOIN)
2       | NULL       (no 2024 orders, but user kept!)

Section 3: RIGHT JOIN and FULL JOIN

Question 35: RIGHT JOIN

Difficulty: Intermediate | Topic: RIGHT JOIN

Problem: Get all products and their order counts (including products never ordered).

Solution:

sql
-- Using RIGHT JOIN (less common)
SELECT
    p.product_id,
    p.name,
    COUNT(oi.item_id) AS times_ordered
FROM order_items oi
RIGHT JOIN products p ON oi.product_id = p.product_id
GROUP BY p.product_id, p.name;

-- Equivalent LEFT JOIN (more common, recommended)
SELECT
    p.product_id,
    p.name,
    COUNT(oi.item_id) AS times_ordered
FROM products p
LEFT JOIN order_items oi ON p.product_id = oi.product_id
GROUP BY p.product_id, p.name;

When to Use RIGHT JOIN: Almost never. You can always rewrite as LEFT JOIN by swapping table order. LEFT JOIN is more readable because we read left-to-right.

Question 36: FULL OUTER JOIN

Difficulty: Intermediate | Topic: FULL JOIN

Problem: Find mismatches between two inventory systems.

Schema for this question:

sql
CREATE TABLE warehouse_inventory (
    sku VARCHAR(50) PRIMARY KEY,
    quantity INT
);

CREATE TABLE system_inventory (
    sku VARCHAR(50) PRIMARY KEY,
    quantity INT
);

Solution:

sql
SELECT
    COALESCE(w.sku, s.sku) AS sku,
    w.quantity AS warehouse_qty,
    s.quantity AS system_qty,
    CASE
        WHEN w.sku IS NULL THEN 'In system only'
        WHEN s.sku IS NULL THEN 'In warehouse only'
        WHEN w.quantity != s.quantity THEN 'Quantity mismatch'
        ELSE 'Match'
    END AS status
FROM warehouse_inventory w
FULL OUTER JOIN system_inventory s ON w.sku = s.sku
WHERE w.sku IS NULL
   OR s.sku IS NULL
   OR w.quantity != s.quantity;

Explanation: FULL OUTER JOIN returns ALL rows from BOTH tables, matching where possible.

warehouse:          system:              Result:
sku  | qty          sku  | qty           sku  | w_qty | s_qty | status
-----|---           -----|---            -----|-------|-------|------
A    | 10           A    | 10        →   A    | 10    | 10    | Match
B    | 20           B    | 15        →   B    | 20    | 15    | Mismatch
C    | 5                              →   C    | 5     | NULL  | Warehouse only
                    D    | 8         →   D    | NULL  | 8     | System only

When to Use FULL JOIN:

Data reconciliation between systems
Finding orphaned records
Merge operations

MySQL Note: MySQL doesn't support FULL OUTER JOIN. Workaround:

sql
SELECT * FROM warehouse_inventory w
LEFT JOIN system_inventory s ON w.sku = s.sku
UNION
SELECT * FROM warehouse_inventory w
RIGHT JOIN system_inventory s ON w.sku = s.sku;

Section 4: SELF JOIN

Question 37: Basic SELF JOIN

Difficulty: Intermediate | Topic: Self-Referencing Tables

Problem: Get categories with their parent category names.

Solution:

sql
SELECT
    c.category_id,
    c.name AS category_name,
    c.level,
    p.name AS parent_category_name
FROM categories c
LEFT JOIN categories p ON c.parent_id = p.category_id
ORDER BY c.path;

Explanation: SELF JOIN joins a table to itself. You MUST use different aliases.

categories table:
category_id | name        | parent_id
------------|-------------|----------
1           | Electronics | NULL
2           | Phones      | 1
3           | Laptops     | 1
4           | Smartphones | 2

Result:
category_id | category_name | parent_category_name
------------|---------------|---------------------
1           | Electronics   | NULL
2           | Phones        | Electronics
3           | Laptops       | Electronics
4           | Smartphones   | Phones

Question 38: Employee Hierarchy

Difficulty: Intermediate | Topic: Self JOIN for Hierarchy

Schema:

sql
CREATE TABLE employees (
    employee_id SERIAL PRIMARY KEY,
    name VARCHAR(100),
    manager_id INT REFERENCES employees(employee_id),
    department VARCHAR(50),
    salary DECIMAL(10,2)
);

Problem: Get employees with their manager names.

Solution:

sql
SELECT
    e.employee_id,
    e.name AS employee_name,
    e.department,
    e.salary,
    m.name AS manager_name
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.employee_id;

Finding Employees Who Earn More Than Their Manager:

sql
SELECT
    e.name AS employee_name,
    e.salary AS employee_salary,
    m.name AS manager_name,
    m.salary AS manager_salary
FROM employees e
INNER JOIN employees m ON e.manager_id = m.employee_id
WHERE e.salary > m.salary;

Question 39: Finding Duplicates with SELF JOIN

Difficulty: Intermediate | Topic: Duplicate Detection

Problem: Find users with duplicate email addresses (case-insensitive).

Solution:

sql
SELECT
    u1.user_id,
    u1.email,
    u2.user_id AS duplicate_user_id,
    u2.email AS duplicate_email
FROM users u1
INNER JOIN users u2
    ON LOWER(u1.email) = LOWER(u2.email)
    AND u1.user_id < u2.user_id;  -- Avoid self-match and duplicate pairs

Why u1.user_id < u2.user_id? Without it, you'd get:

Each row matched with itself
Both (A, B) and (B, A) pairs

With <, you only get each pair once.

Section 5: CROSS JOIN

Question 40: Basic CROSS JOIN

Difficulty: Intermediate | Topic: Cartesian Product

Problem: Generate all possible product-color combinations.

Schema:

sql
CREATE TABLE colors (
    color_id SERIAL PRIMARY KEY,
    name VARCHAR(50)
);
-- Contains: Red, Blue, Green

CREATE TABLE sizes (
    size_id SERIAL PRIMARY KEY,
    name VARCHAR(20)
);
-- Contains: S, M, L, XL

Solution:

sql
SELECT
    c.name AS color,
    s.name AS size,
    c.name || '-' || s.name AS variant_code
FROM colors c
CROSS JOIN sizes s
ORDER BY c.name, s.size_id;

Result:

color | size | variant_code
------|------|-------------
Blue  | S    | Blue-S
Blue  | M    | Blue-M
Blue  | L    | Blue-L
Blue  | XL   | Blue-XL
Green | S    | Green-S
... (12 total rows: 3 colors × 4 sizes)

Performance Warning: CROSS JOIN produces rows = table1_rows × table2_rows.

1000 × 1000 = 1,000,000 rows!
Only use with small tables or intentionally

Use Cases:

Generating combinations (colors × sizes)
Creating date ranges with all dates × all products
Simulation data

Question 41: CROSS JOIN for Reporting

Difficulty: Intermediate | Topic: Report Generation

Problem: Generate a sales report with all months, even months with zero sales.

Solution:

sql
-- Generate all months for 2024
WITH months AS (
    SELECT generate_series(
        '2024-01-01'::date,
        '2024-12-01'::date,
        '1 month'::interval
    ) AS month_start
)
SELECT
    TO_CHAR(m.month_start, 'YYYY-MM') AS month,
    COALESCE(SUM(o.total_amount), 0) AS total_sales,
    COALESCE(COUNT(o.order_id), 0) AS order_count
FROM months m
LEFT JOIN orders o
    ON o.created_at >= m.month_start
    AND o.created_at < m.month_start + INTERVAL '1 month'
    AND o.status NOT IN ('cancelled', 'refunded')
GROUP BY m.month_start
ORDER BY m.month_start;

Section 6: Semi-Join and Anti-Join

Question 42: Semi-Join with EXISTS

Difficulty: Intermediate | Topic: Semi-Join

Problem: Find products that have been ordered at least once.

Solution:

sql
-- Semi-join using EXISTS
SELECT p.product_id, p.name, p.price
FROM products p
WHERE EXISTS (
    SELECT 1
    FROM order_items oi
    WHERE oi.product_id = p.product_id
);

Semi-Join vs INNER JOIN:

sql
-- INNER JOIN may return duplicates!
SELECT p.product_id, p.name
FROM products p
INNER JOIN order_items oi ON p.product_id = oi.product_id;
-- Product "iPhone" appears 1000 times if ordered 1000 times

-- Semi-join returns each product once
SELECT p.product_id, p.name
FROM products p
WHERE EXISTS (SELECT 1 FROM order_items oi WHERE oi.product_id = p.product_id);
-- Product "iPhone" appears once

Performance: EXISTS stops at the first match. It doesn't need to find all matching rows just one.

Question 43: Semi-Join with IN

Difficulty: Intermediate | Topic: IN vs EXISTS

Problem: Find customers who ordered in January 2024.

Solution:

sql
-- Using IN
SELECT user_id, email, first_name
FROM users
WHERE user_id IN (
    SELECT user_id
    FROM orders
    WHERE created_at >= '2024-01-01'
      AND created_at < '2024-02-01'
);

-- Using EXISTS (often faster for large subqueries)
SELECT user_id, email, first_name
FROM users u
WHERE EXISTS (
    SELECT 1
    FROM orders o
    WHERE o.user_id = u.user_id
      AND o.created_at >= '2024-01-01'
      AND o.created_at < '2024-02-01'
);

When to Use Which:

Scenario	Recommended
Small subquery result (< 1000 rows)	IN
Large subquery result	EXISTS
Subquery is a literal list	IN
Correlated condition	EXISTS

Question 44: Anti-Join with NOT EXISTS

Difficulty: Intermediate | Topic: Anti-Join

Problem: Find products never ordered.

Solution:

sql
SELECT p.product_id, p.name, p.price, p.created_at
FROM products p
WHERE NOT EXISTS (
    SELECT 1
    FROM order_items oi
    WHERE oi.product_id = p.product_id
);

Why NOT EXISTS is Safer Than NOT IN:

sql
-- DANGEROUS: If order_items.product_id has ANY NULL value,
-- NOT IN returns zero rows!
SELECT * FROM products
WHERE product_id NOT IN (SELECT product_id FROM order_items);

-- SAFE: NOT EXISTS handles NULLs correctly
SELECT * FROM products p
WHERE NOT EXISTS (SELECT 1 FROM order_items oi WHERE oi.product_id = p.product_id);

Section 7: Advanced Join Patterns

Question 45: Latest Record Per Group (Top 1 Per Group)

Difficulty: Advanced | Topic: Join Pattern

Problem: Get each user's most recent order.

Solution:

sql
-- Method 1: Correlated subquery
SELECT o.*
FROM orders o
WHERE o.created_at = (
    SELECT MAX(o2.created_at)
    FROM orders o2
    WHERE o2.user_id = o.user_id
);

-- Method 2: Window function with DISTINCT ON (PostgreSQL)
SELECT DISTINCT ON (user_id)
    order_id,
    user_id,
    order_number,
    total_amount,
    created_at
FROM orders
ORDER BY user_id, created_at DESC;

-- Method 3: Window function with ROW_NUMBER
WITH ranked_orders AS (
    SELECT
        *,
        ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY created_at DESC) AS rn
    FROM orders
)
SELECT * FROM ranked_orders WHERE rn = 1;

-- Method 4: LATERAL join (PostgreSQL, very efficient)
SELECT u.user_id, u.email, o.*
FROM users u
CROSS JOIN LATERAL (
    SELECT *
    FROM orders
    WHERE user_id = u.user_id
    ORDER BY created_at DESC
    LIMIT 1
) o;

Performance Comparison:

Method	Index Required	Performance
Correlated subquery	(user_id, created_at)	Slow for large tables
DISTINCT ON	(user_id, created_at DESC)	Fast, PostgreSQL only
ROW_NUMBER	(user_id, created_at)	Medium, scans all rows
LATERAL	(user_id, created_at DESC)	Fast, PostgreSQL only

Question 46: Joining with Date Ranges

Difficulty: Advanced | Topic: Range Joins

Problem: Find the applicable discount for each order based on order date.

Schema:

sql
CREATE TABLE discounts (
    discount_id SERIAL PRIMARY KEY,
    code VARCHAR(50),
    percentage DECIMAL(5,2),
    start_date DATE,
    end_date DATE
);

Solution:

sql
SELECT
    o.order_id,
    o.order_number,
    o.total_amount,
    o.created_at,
    d.code AS discount_code,
    d.percentage AS discount_percentage
FROM orders o
LEFT JOIN discounts d
    ON o.created_at::date BETWEEN d.start_date AND d.end_date
    AND d.code = 'SUMMER2024';  -- Or whatever matching criteria

Performance Warning: BETWEEN for date ranges can be slow. Consider using a date dimension table or range types.

Using PostgreSQL Range Types:

sql
ALTER TABLE discounts ADD COLUMN valid_range daterange;
UPDATE discounts SET valid_range = daterange(start_date, end_date, '[]');
CREATE INDEX idx_discounts_range ON discounts USING GIST (valid_range);

-- Efficient range query
SELECT o.*, d.code
FROM orders o
LEFT JOIN discounts d ON o.created_at::date <@ d.valid_range;

Question 47: Joining on Multiple Conditions

Difficulty: Intermediate | Topic: Composite Joins

Problem: Get order items with product prices at the time of order.

Schema:

sql
CREATE TABLE product_price_history (
    history_id SERIAL PRIMARY KEY,
    product_id BIGINT,
    price DECIMAL(10,2),
    effective_from DATE,
    effective_to DATE
);

Solution:

sql
SELECT
    oi.item_id,
    oi.order_id,
    p.name AS product_name,
    oi.quantity,
    pph.price AS historical_price,
    oi.unit_price AS charged_price
FROM order_items oi
INNER JOIN orders o ON oi.order_id = o.order_id
INNER JOIN products p ON oi.product_id = p.product_id
LEFT JOIN product_price_history pph
    ON oi.product_id = pph.product_id
    AND o.created_at::date >= pph.effective_from
    AND (pph.effective_to IS NULL OR o.created_at::date <= pph.effective_to);

Question 48: Non-Equi Joins

Difficulty: Advanced | Topic: Non-Equality Joins

Problem: Find all orders that are larger than the customer's average order.

Solution:

sql
WITH customer_avg AS (
    SELECT
        user_id,
        AVG(total_amount) AS avg_order_value
    FROM orders
    WHERE status NOT IN ('cancelled', 'refunded')
    GROUP BY user_id
)
SELECT
    o.order_id,
    o.order_number,
    o.user_id,
    o.total_amount,
    ca.avg_order_value,
    o.total_amount - ca.avg_order_value AS above_average_by
FROM orders o
INNER JOIN customer_avg ca
    ON o.user_id = ca.user_id
    AND o.total_amount > ca.avg_order_value  -- Non-equi condition
ORDER BY above_average_by DESC;

Question 49: Many-to-Many Join

Difficulty: Intermediate | Topic: Junction Tables

Schema:

sql
CREATE TABLE product_tags (
    product_id BIGINT REFERENCES products(product_id),
    tag_id INT REFERENCES tags(tag_id),
    PRIMARY KEY (product_id, tag_id)
);

CREATE TABLE tags (
    tag_id SERIAL PRIMARY KEY,
    name VARCHAR(50)
);

Problem: Get products with all their tags as a comma-separated list.

Solution:

sql
SELECT
    p.product_id,
    p.name AS product_name,
    STRING_AGG(t.name, ', ' ORDER BY t.name) AS tags
FROM products p
LEFT JOIN product_tags pt ON p.product_id = pt.product_id
LEFT JOIN tags t ON pt.tag_id = t.tag_id
GROUP BY p.product_id, p.name;

Finding Products with Specific Tags:

sql
-- Products with ALL specified tags
SELECT p.product_id, p.name
FROM products p
INNER JOIN product_tags pt ON p.product_id = pt.product_id
INNER JOIN tags t ON pt.tag_id = t.tag_id
WHERE t.name IN ('wireless', 'bluetooth', 'portable')
GROUP BY p.product_id, p.name
HAVING COUNT(DISTINCT t.tag_id) = 3;  -- Must have all 3 tags

-- Products with ANY of specified tags
SELECT DISTINCT p.product_id, p.name
FROM products p
INNER JOIN product_tags pt ON p.product_id = pt.product_id
INNER JOIN tags t ON pt.tag_id = t.tag_id
WHERE t.name IN ('wireless', 'bluetooth', 'portable');

Question 50: Avoiding N+1 with JOIN

Difficulty: Advanced | Topic: Performance Pattern

Problem: Get user details with their last 5 orders in a single query.

Bad Pattern (N+1):

python
# Application code - N+1 problem
users = db.query("SELECT * FROM users LIMIT 100")
for user in users:
    orders = db.query(f"SELECT * FROM orders WHERE user_id = {user.id} ORDER BY created_at DESC LIMIT 5")
# 101 queries total!

Good Pattern (Single Query):

sql
WITH ranked_orders AS (
    SELECT
        o.*,
        ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY created_at DESC) AS rn
    FROM orders o
)
SELECT
    u.user_id,
    u.email,
    u.first_name,
    ro.order_id,
    ro.order_number,
    ro.total_amount,
    ro.created_at
FROM users u
LEFT JOIN ranked_orders ro
    ON u.user_id = ro.user_id
    AND ro.rn <= 5
WHERE u.status = 'active'
ORDER BY u.user_id, ro.rn;
-- 1 query total!

Alternative with LATERAL:

sql
SELECT u.user_id, u.email, o.*
FROM users u
CROSS JOIN LATERAL (
    SELECT order_id, order_number, total_amount, created_at
    FROM orders
    WHERE user_id = u.user_id
    ORDER BY created_at DESC
    LIMIT 5
) o
WHERE u.status = 'active';

Section 8: Join Performance Deep Dive

Join Algorithm Types

Understanding how databases execute joins helps you optimize them.

NESTED LOOP JOIN:
For each row in outer table:
    For each row in inner table:
        If join condition matches:
            Output row

Best for: Small tables, indexed inner table
Worst for: Large tables without indexes

HASH JOIN:
1. Build hash table from smaller table
2. Probe hash table for each row from larger table

Best for: Large tables, equality joins
Worst for: Non-equality joins, memory-constrained systems

MERGE JOIN:
1. Sort both tables on join key
2. Walk through both sorted lists, matching rows

Best for: Large sorted data, range conditions
Worst for: Unsorted data (sort overhead)

Question: Analyzing Join Performance

How to Read EXPLAIN for Joins:

sql
EXPLAIN ANALYZE
SELECT o.order_id, u.email
FROM orders o
INNER JOIN users u ON o.user_id = u.user_id
WHERE o.created_at >= '2024-01-01';

Look for:

Hash Join  (cost=... rows=... width=...)
  Hash Cond: (o.user_id = u.user_id)
  ->  Seq Scan on orders o  (cost=...)
        Filter: (created_at >= '2024-01-01')
  ->  Hash  (cost=...)
        ->  Seq Scan on users u (cost=...)

Key Indicators:

Seq Scan → No index used, consider adding one
Nested Loop with large outer table → May be slow
Hash → Building hash table, check memory
Sort → Sorting for merge join, check if index can help

Summary: Part 2 Key Takeaways

Join Types Quick Reference

Join Type	Returns	Use When
INNER JOIN	Matching rows only	Need data from both tables
LEFT JOIN	All left + matching right	Include non-matching rows from left
RIGHT JOIN	All right + matching left	Rare, use LEFT JOIN instead
FULL OUTER JOIN	All rows from both	Data reconciliation
CROSS JOIN	Cartesian product	Generate combinations
SELF JOIN	Table joined to itself	Hierarchies, duplicates
Semi-Join (EXISTS)	Rows with matches	Check existence
Anti-Join (NOT EXISTS)	Rows without matches	Find missing data

Performance Rules

Always index join columns - Foreign keys, join conditions
Filter early - Apply WHERE before joining when possible
**Avoid SELECT *** - Only select needed columns
LEFT JOIN filter placement - Use ON for outer table filters
Avoid NOT IN with NULLs - Use NOT EXISTS instead
Use EXISTS over IN - For large subqueries
Check EXPLAIN plans - Verify join algorithms

Indexing for Joins

sql
-- Index the foreign key (child table)
CREATE INDEX idx_orders_user_id ON orders(user_id);
CREATE INDEX idx_order_items_order_id ON order_items(order_id);
CREATE INDEX idx_order_items_product_id ON order_items(product_id);

-- Composite index for filtered joins
CREATE INDEX idx_orders_user_created ON orders(user_id, created_at DESC);

Coming in Part 3: Aggregations and GROUP BY

COUNT, SUM, AVG, MIN, MAX
GROUP BY mechanics
HAVING vs WHERE
Window functions introduction
Partitioning and ranking
Rolling calculations

This is Part 2 of the SQL Mastery Guide series. Practice these join patterns they appear in almost every production query.