SQL Mastery Guide Part 2: Joins Mastery

Why Joins Matter

Joins are the heart of relational databases. They're what make "relational" meaningful—connecting data across tables based on relationships.
After 20+ years of database work, I can tell you: understanding joins deeply separates junior developers from senior engineers. This part covers every join type with real performance implications.
We continue with our e-commerce schema from Part 1.

Section 1: INNER JOIN

Question 26: Basic INNER JOIN

Difficulty: Beginner | Topic: INNER JOIN Basics
Problem: Get all orders with their customer information.
Solution:
sql
SELECT o.order_id, o.order_number, o.total_amount, o.created_at, u.user_id, u.email, u.first_name, u.last_name FROM orders o INNER JOIN users u ON o.user_id = u.user_id;
Explanation: INNER JOIN returns only rows where the join condition matches in BOTH tables.
orders table: users table: Result: order_id | user_id user_id | name order_id | user_id | name ---------|-------- --------|----- ---------|---------|----- 1 | 100 100 | Alice → 1 | 100 | Alice 2 | 101 101 | Bob → 2 | 101 | Bob 3 | 102 103 | Charlie 3 | (no match, excluded)
Order 3's user_id (102) has no match in users, so it's excluded.
Common Mistakes:
sql
-- Missing table alias causes ambiguous column error SELECT order_id, user_id, email -- Which user_id? FROM orders INNER JOIN users ON orders.user_id = users.user_id; -- CORRECT: Use aliases SELECT o.order_id, o.user_id, u.email FROM orders o INNER JOIN users u ON o.user_id = u.user_id;
Performance Concerns: With 50M orders and 10M users:
  • Without index: Full table scans, comparing every row combination
  • With index on join column: Index lookup, dramatically faster
Indexing Suggestion:
sql
-- Index on the column being joined TO CREATE INDEX idx_orders_user_id ON orders(user_id); -- The primary key on users.user_id is already indexed

Question 27: Multi-Table INNER JOIN

Difficulty: Intermediate | Topic: Chained Joins
Problem: Get order details including customer name, product name, and quantity.
Solution:
sql
SELECT o.order_number, u.first_name || ' ' || u.last_name AS customer_name, p.name AS product_name, oi.quantity, oi.unit_price, oi.total_price FROM orders o INNER JOIN users u ON o.user_id = u.user_id INNER JOIN order_items oi ON o.order_id = oi.order_id INNER JOIN products p ON oi.product_id = p.product_id;
Explanation: You can chain multiple joins. Each join connects to the result of the previous joins.
Join Order:
orders → users (via user_id) → order_items (via order_id) → products (via product_id)
Performance Concerns: The optimizer decides join order. Sometimes it gets it wrong.
Checking Join Order:
sql
EXPLAIN ANALYZE SELECT ... -- your query
Look for the join order in the plan. If a large table is scanned first, consider query hints or restructuring.
When Join Order Matters:
  • Start with the most selective filter
  • Join smaller result sets first
  • Avoid Cartesian products in intermediate steps

Question 28: INNER JOIN with Filter

Difficulty: Intermediate | Topic: Filtered Joins
Problem: Get active products with their category names.
Solution:
sql
SELECT p.product_id, p.name AS product_name, p.price, c.name AS category_name FROM products p INNER JOIN categories c ON p.category_id = c.category_id WHERE p.status = 'active' AND c.is_active = TRUE;
WHERE vs JOIN Condition:
sql
-- Filter in WHERE (standard approach) FROM products p INNER JOIN categories c ON p.category_id = c.category_id WHERE p.status = 'active'; -- Filter in JOIN condition (same result for INNER JOIN) FROM products p INNER JOIN categories c ON p.category_id = c.category_id AND p.status = 'active';
For INNER JOIN, both produce the same result. For LEFT JOIN, they're different! (See Question 34)

Question 29: Self-Referencing Data with JOIN

Difficulty: Intermediate | Topic: Join Patterns
Problem: Get products with their category hierarchy (category and parent category).
Solution:
sql
SELECT p.product_id, p.name AS product_name, c.name AS category, parent.name AS parent_category FROM products p INNER JOIN categories c ON p.category_id = c.category_id LEFT JOIN categories parent ON c.parent_id = parent.category_id;
Why LEFT JOIN for parent? Top-level categories have no parent (parent_id IS NULL). We still want those products.

Section 2: LEFT JOIN (LEFT OUTER JOIN)

Question 30: Basic LEFT JOIN

Difficulty: Beginner | Topic: LEFT JOIN Basics
Problem: Get all users and their order counts (including users with no orders).
Solution:
sql
SELECT u.user_id, u.email, u.first_name, COUNT(o.order_id) AS order_count FROM users u LEFT JOIN orders o ON u.user_id = o.user_id GROUP BY u.user_id, u.email, u.first_name;
Explanation: LEFT JOIN returns ALL rows from the left table, even if there's no match in the right table.
users table: orders table: Result: user_id | name order_id | user_id user_id | name | order_id --------|----- ---------|-------- --------|---------|--------- 100 | Alice 1 | 100 100 | Alice | 1 101 | Bob 2 | 100 100 | Alice | 2 102 | Charlie 101 | Bob | NULL 102 | Charlie | NULL
Alice has 2 orders. Bob and Charlie have NULL (no orders).
COUNT Behavior:
  • COUNT(*) counts all rows (including NULLs) → would show 1 for Bob
  • COUNT(o.order_id) counts non-NULL values → shows 0 for Bob

Question 31: LEFT JOIN Anti-Pattern (Finding Missing Data)

Difficulty: Intermediate | Topic: Anti-Join
Problem: Find users who have never placed an order.
Solution:
sql
SELECT u.user_id, u.email, u.first_name FROM users u LEFT JOIN orders o ON u.user_id = o.user_id WHERE o.order_id IS NULL;
Explanation:
  1. LEFT JOIN keeps all users
  2. Users without orders have NULL in all orders columns
  3. WHERE o.order_id IS NULL filters to only those users
Why Check order_id, Not user_id?
sql
-- WRONG: This might match NULLs that exist in orders WHERE o.user_id IS NULL -- CORRECT: Primary key is never NULL in a matched row WHERE o.order_id IS NULL
Alternative Approaches:
sql
-- NOT EXISTS (often faster) SELECT u.user_id, u.email, u.first_name FROM users u WHERE NOT EXISTS ( SELECT 1 FROM orders o WHERE o.user_id = u.user_id ); -- NOT IN (risky with NULLs) SELECT user_id, email, first_name FROM users WHERE user_id NOT IN (SELECT user_id FROM orders WHERE user_id IS NOT NULL);

Question 32: LEFT JOIN with Aggregation

Difficulty: Intermediate | Topic: Aggregate with Joins
Problem: Get each customer's total spending (including customers who never bought).
Solution:
sql
SELECT u.user_id, u.email, u.first_name, COALESCE(SUM(o.total_amount), 0) AS total_spent, COUNT(o.order_id) AS order_count, COALESCE(AVG(o.total_amount), 0) AS avg_order_value FROM users u LEFT JOIN orders o ON u.user_id = o.user_id AND o.status NOT IN ('cancelled', 'refunded') -- Filter in JOIN! GROUP BY u.user_id, u.email, u.first_name ORDER BY total_spent DESC;
Why Filter in JOIN, Not WHERE?
sql
-- WRONG: This excludes users with no valid orders FROM users u LEFT JOIN orders o ON u.user_id = o.user_id WHERE o.status NOT IN ('cancelled', 'refunded'); -- Removes NULL rows! -- CORRECT: Filter in JOIN keeps users with no orders FROM users u LEFT JOIN orders o ON u.user_id = o.user_id AND o.status NOT IN ('cancelled', 'refunded');

Question 33: Multiple LEFT JOINs

Difficulty: Intermediate | Topic: Chained LEFT JOINs
Problem: Get product information with optional brand and category (some products may have NULL brand_id).
Solution:
sql
SELECT p.product_id, p.name AS product_name, p.price, COALESCE(c.name, 'Uncategorized') AS category, COALESCE(b.name, 'No Brand') AS brand FROM products p LEFT JOIN categories c ON p.category_id = c.category_id LEFT JOIN brands b ON p.brand_id = b.brand_id WHERE p.status = 'active';
Performance Concerns: Multiple LEFT JOINs can create large intermediate result sets. Each join multiplies potential rows.

Question 34: LEFT JOIN Filter Location Trap

Difficulty: Advanced | Topic: Join Logic
Problem: Get all users with their orders from 2024 (users without 2024 orders should still appear).
Solution:
sql
-- WRONG: Filters out users with no 2024 orders SELECT u.user_id, u.email, o.order_id, o.created_at FROM users u LEFT JOIN orders o ON u.user_id = o.user_id WHERE o.created_at >= '2024-01-01'; -- This removes NULL rows! -- CORRECT: Filter in JOIN condition SELECT u.user_id, u.email, o.order_id, o.created_at FROM users u LEFT JOIN orders o ON u.user_id = o.user_id AND o.created_at >= '2024-01-01';
Visual Explanation:
WRONG approach (WHERE): users: orders: After LEFT JOIN: After WHERE: user_id user_id | date user_id | date user_id | date ------- --------|----- --------|----- --------|----- 1 1 | 2023-05 1 | 2023-05 (excluded - 2023) 2 1 | 2024-03 1 | 2024-03 → 1 | 2024-03 2 | 2023-08 2 | 2023-08 (excluded - 2023) (user 2 disappears!) CORRECT approach (AND in JOIN): After LEFT JOIN with date filter: user_id | date --------|----- 1 | 2024-03 (2023 order filtered in JOIN) 2 | NULL (no 2024 orders, but user kept!)

Section 3: RIGHT JOIN and FULL JOIN

Question 35: RIGHT JOIN

Difficulty: Intermediate | Topic: RIGHT JOIN
Problem: Get all products and their order counts (including products never ordered).
Solution:
sql
-- Using RIGHT JOIN (less common) SELECT p.product_id, p.name, COUNT(oi.item_id) AS times_ordered FROM order_items oi RIGHT JOIN products p ON oi.product_id = p.product_id GROUP BY p.product_id, p.name; -- Equivalent LEFT JOIN (more common, recommended) SELECT p.product_id, p.name, COUNT(oi.item_id) AS times_ordered FROM products p LEFT JOIN order_items oi ON p.product_id = oi.product_id GROUP BY p.product_id, p.name;
When to Use RIGHT JOIN: Almost never. You can always rewrite as LEFT JOIN by swapping table order. LEFT JOIN is more readable because we read left-to-right.

Question 36: FULL OUTER JOIN

Difficulty: Intermediate | Topic: FULL JOIN
Problem: Find mismatches between two inventory systems.
Schema for this question:
sql
CREATE TABLE warehouse_inventory ( sku VARCHAR(50) PRIMARY KEY, quantity INT ); CREATE TABLE system_inventory ( sku VARCHAR(50) PRIMARY KEY, quantity INT );
Solution:
sql
SELECT COALESCE(w.sku, s.sku) AS sku, w.quantity AS warehouse_qty, s.quantity AS system_qty, CASE WHEN w.sku IS NULL THEN 'In system only' WHEN s.sku IS NULL THEN 'In warehouse only' WHEN w.quantity != s.quantity THEN 'Quantity mismatch' ELSE 'Match' END AS status FROM warehouse_inventory w FULL OUTER JOIN system_inventory s ON w.sku = s.sku WHERE w.sku IS NULL OR s.sku IS NULL OR w.quantity != s.quantity;
Explanation: FULL OUTER JOIN returns ALL rows from BOTH tables, matching where possible.
warehouse: system: Result: sku | qty sku | qty sku | w_qty | s_qty | status -----|--- -----|--- -----|-------|-------|------ A | 10 A | 10 → A | 10 | 10 | Match B | 20 B | 15 → B | 20 | 15 | Mismatch C | 5 → C | 5 | NULL | Warehouse only D | 8 → D | NULL | 8 | System only
When to Use FULL JOIN:
  • Data reconciliation between systems
  • Finding orphaned records
  • Merge operations
MySQL Note: MySQL doesn't support FULL OUTER JOIN. Workaround:
sql
SELECT * FROM warehouse_inventory w LEFT JOIN system_inventory s ON w.sku = s.sku UNION SELECT * FROM warehouse_inventory w RIGHT JOIN system_inventory s ON w.sku = s.sku;

Section 4: SELF JOIN

Question 37: Basic SELF JOIN

Difficulty: Intermediate | Topic: Self-Referencing Tables
Problem: Get categories with their parent category names.
Solution:
sql
SELECT c.category_id, c.name AS category_name, c.level, p.name AS parent_category_name FROM categories c LEFT JOIN categories p ON c.parent_id = p.category_id ORDER BY c.path;
Explanation: SELF JOIN joins a table to itself. You MUST use different aliases.
categories table: category_id | name | parent_id ------------|-------------|---------- 1 | Electronics | NULL 2 | Phones | 1 3 | Laptops | 1 4 | Smartphones | 2 Result: category_id | category_name | parent_category_name ------------|---------------|--------------------- 1 | Electronics | NULL 2 | Phones | Electronics 3 | Laptops | Electronics 4 | Smartphones | Phones

Question 38: Employee Hierarchy

Difficulty: Intermediate | Topic: Self JOIN for Hierarchy
Schema:
sql
CREATE TABLE employees ( employee_id SERIAL PRIMARY KEY, name VARCHAR(100), manager_id INT REFERENCES employees(employee_id), department VARCHAR(50), salary DECIMAL(10,2) );
Problem: Get employees with their manager names.
Solution:
sql
SELECT e.employee_id, e.name AS employee_name, e.department, e.salary, m.name AS manager_name FROM employees e LEFT JOIN employees m ON e.manager_id = m.employee_id;
Finding Employees Who Earn More Than Their Manager:
sql
SELECT e.name AS employee_name, e.salary AS employee_salary, m.name AS manager_name, m.salary AS manager_salary FROM employees e INNER JOIN employees m ON e.manager_id = m.employee_id WHERE e.salary > m.salary;

Question 39: Finding Duplicates with SELF JOIN

Difficulty: Intermediate | Topic: Duplicate Detection
Problem: Find users with duplicate email addresses (case-insensitive).
Solution:
sql
SELECT u1.user_id, u1.email, u2.user_id AS duplicate_user_id, u2.email AS duplicate_email FROM users u1 INNER JOIN users u2 ON LOWER(u1.email) = LOWER(u2.email) AND u1.user_id < u2.user_id; -- Avoid self-match and duplicate pairs
Why u1.user_id < u2.user_id? Without it, you'd get:
  • Each row matched with itself
  • Both (A, B) and (B, A) pairs
With <, you only get each pair once.

Section 5: CROSS JOIN

Question 40: Basic CROSS JOIN

Difficulty: Intermediate | Topic: Cartesian Product
Problem: Generate all possible product-color combinations.
Schema:
sql
CREATE TABLE colors ( color_id SERIAL PRIMARY KEY, name VARCHAR(50) ); -- Contains: Red, Blue, Green CREATE TABLE sizes ( size_id SERIAL PRIMARY KEY, name VARCHAR(20) ); -- Contains: S, M, L, XL
Solution:
sql
SELECT c.name AS color, s.name AS size, c.name || '-' || s.name AS variant_code FROM colors c CROSS JOIN sizes s ORDER BY c.name, s.size_id;
Result:
color | size | variant_code ------|------|------------- Blue | S | Blue-S Blue | M | Blue-M Blue | L | Blue-L Blue | XL | Blue-XL Green | S | Green-S ... (12 total rows: 3 colors × 4 sizes)
Performance Warning: CROSS JOIN produces rows = table1_rows × table2_rows.
  • 1000 × 1000 = 1,000,000 rows!
  • Only use with small tables or intentionally
Use Cases:
  • Generating combinations (colors × sizes)
  • Creating date ranges with all dates × all products
  • Simulation data

Question 41: CROSS JOIN for Reporting

Difficulty: Intermediate | Topic: Report Generation
Problem: Generate a sales report with all months, even months with zero sales.
Solution:
sql
-- Generate all months for 2024 WITH months AS ( SELECT generate_series( '2024-01-01'::date, '2024-12-01'::date, '1 month'::interval ) AS month_start ) SELECT TO_CHAR(m.month_start, 'YYYY-MM') AS month, COALESCE(SUM(o.total_amount), 0) AS total_sales, COALESCE(COUNT(o.order_id), 0) AS order_count FROM months m LEFT JOIN orders o ON o.created_at >= m.month_start AND o.created_at < m.month_start + INTERVAL '1 month' AND o.status NOT IN ('cancelled', 'refunded') GROUP BY m.month_start ORDER BY m.month_start;

Section 6: Semi-Join and Anti-Join

Question 42: Semi-Join with EXISTS

Difficulty: Intermediate | Topic: Semi-Join
Problem: Find products that have been ordered at least once.
Solution:
sql
-- Semi-join using EXISTS SELECT p.product_id, p.name, p.price FROM products p WHERE EXISTS ( SELECT 1 FROM order_items oi WHERE oi.product_id = p.product_id );
Semi-Join vs INNER JOIN:
sql
-- INNER JOIN may return duplicates! SELECT p.product_id, p.name FROM products p INNER JOIN order_items oi ON p.product_id = oi.product_id; -- Product "iPhone" appears 1000 times if ordered 1000 times -- Semi-join returns each product once SELECT p.product_id, p.name FROM products p WHERE EXISTS (SELECT 1 FROM order_items oi WHERE oi.product_id = p.product_id); -- Product "iPhone" appears once
Performance: EXISTS stops at the first match. It doesn't need to find all matching rows—just one.

Question 43: Semi-Join with IN

Difficulty: Intermediate | Topic: IN vs EXISTS
Problem: Find customers who ordered in January 2024.
Solution:
sql
-- Using IN SELECT user_id, email, first_name FROM users WHERE user_id IN ( SELECT user_id FROM orders WHERE created_at >= '2024-01-01' AND created_at < '2024-02-01' ); -- Using EXISTS (often faster for large subqueries) SELECT user_id, email, first_name FROM users u WHERE EXISTS ( SELECT 1 FROM orders o WHERE o.user_id = u.user_id AND o.created_at >= '2024-01-01' AND o.created_at < '2024-02-01' );
When to Use Which:
ScenarioRecommended
Small subquery result (< 1000 rows)IN
Large subquery resultEXISTS
Subquery is a literal listIN
Correlated conditionEXISTS

Question 44: Anti-Join with NOT EXISTS

Difficulty: Intermediate | Topic: Anti-Join
Problem: Find products never ordered.
Solution:
sql
SELECT p.product_id, p.name, p.price, p.created_at FROM products p WHERE NOT EXISTS ( SELECT 1 FROM order_items oi WHERE oi.product_id = p.product_id );
Why NOT EXISTS is Safer Than NOT IN:
sql
-- DANGEROUS: If order_items.product_id has ANY NULL value, -- NOT IN returns zero rows! SELECT * FROM products WHERE product_id NOT IN (SELECT product_id FROM order_items); -- SAFE: NOT EXISTS handles NULLs correctly SELECT * FROM products p WHERE NOT EXISTS (SELECT 1 FROM order_items oi WHERE oi.product_id = p.product_id);

Section 7: Advanced Join Patterns

Question 45: Latest Record Per Group (Top 1 Per Group)

Difficulty: Advanced | Topic: Join Pattern
Problem: Get each user's most recent order.
Solution:
sql
-- Method 1: Correlated subquery SELECT o.* FROM orders o WHERE o.created_at = ( SELECT MAX(o2.created_at) FROM orders o2 WHERE o2.user_id = o.user_id ); -- Method 2: Window function with DISTINCT ON (PostgreSQL) SELECT DISTINCT ON (user_id) order_id, user_id, order_number, total_amount, created_at FROM orders ORDER BY user_id, created_at DESC; -- Method 3: Window function with ROW_NUMBER WITH ranked_orders AS ( SELECT *, ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY created_at DESC) AS rn FROM orders ) SELECT * FROM ranked_orders WHERE rn = 1; -- Method 4: LATERAL join (PostgreSQL, very efficient) SELECT u.user_id, u.email, o.* FROM users u CROSS JOIN LATERAL ( SELECT * FROM orders WHERE user_id = u.user_id ORDER BY created_at DESC LIMIT 1 ) o;
Performance Comparison:
MethodIndex RequiredPerformance
Correlated subquery(user_id, created_at)Slow for large tables
DISTINCT ON(user_id, created_at DESC)Fast, PostgreSQL only
ROW_NUMBER(user_id, created_at)Medium, scans all rows
LATERAL(user_id, created_at DESC)Fast, PostgreSQL only

Question 46: Joining with Date Ranges

Difficulty: Advanced | Topic: Range Joins
Problem: Find the applicable discount for each order based on order date.
Schema:
sql
CREATE TABLE discounts ( discount_id SERIAL PRIMARY KEY, code VARCHAR(50), percentage DECIMAL(5,2), start_date DATE, end_date DATE );
Solution:
sql
SELECT o.order_id, o.order_number, o.total_amount, o.created_at, d.code AS discount_code, d.percentage AS discount_percentage FROM orders o LEFT JOIN discounts d ON o.created_at::date BETWEEN d.start_date AND d.end_date AND d.code = 'SUMMER2024'; -- Or whatever matching criteria
Performance Warning: BETWEEN for date ranges can be slow. Consider using a date dimension table or range types.
Using PostgreSQL Range Types:
sql
ALTER TABLE discounts ADD COLUMN valid_range daterange; UPDATE discounts SET valid_range = daterange(start_date, end_date, '[]'); CREATE INDEX idx_discounts_range ON discounts USING GIST (valid_range); -- Efficient range query SELECT o.*, d.code FROM orders o LEFT JOIN discounts d ON o.created_at::date <@ d.valid_range;

Question 47: Joining on Multiple Conditions

Difficulty: Intermediate | Topic: Composite Joins
Problem: Get order items with product prices at the time of order.
Schema:
sql
CREATE TABLE product_price_history ( history_id SERIAL PRIMARY KEY, product_id BIGINT, price DECIMAL(10,2), effective_from DATE, effective_to DATE );
Solution:
sql
SELECT oi.item_id, oi.order_id, p.name AS product_name, oi.quantity, pph.price AS historical_price, oi.unit_price AS charged_price FROM order_items oi INNER JOIN orders o ON oi.order_id = o.order_id INNER JOIN products p ON oi.product_id = p.product_id LEFT JOIN product_price_history pph ON oi.product_id = pph.product_id AND o.created_at::date >= pph.effective_from AND (pph.effective_to IS NULL OR o.created_at::date <= pph.effective_to);

Question 48: Non-Equi Joins

Difficulty: Advanced | Topic: Non-Equality Joins
Problem: Find all orders that are larger than the customer's average order.
Solution:
sql
WITH customer_avg AS ( SELECT user_id, AVG(total_amount) AS avg_order_value FROM orders WHERE status NOT IN ('cancelled', 'refunded') GROUP BY user_id ) SELECT o.order_id, o.order_number, o.user_id, o.total_amount, ca.avg_order_value, o.total_amount - ca.avg_order_value AS above_average_by FROM orders o INNER JOIN customer_avg ca ON o.user_id = ca.user_id AND o.total_amount > ca.avg_order_value -- Non-equi condition ORDER BY above_average_by DESC;

Question 49: Many-to-Many Join

Difficulty: Intermediate | Topic: Junction Tables
Schema:
sql
CREATE TABLE product_tags ( product_id BIGINT REFERENCES products(product_id), tag_id INT REFERENCES tags(tag_id), PRIMARY KEY (product_id, tag_id) ); CREATE TABLE tags ( tag_id SERIAL PRIMARY KEY, name VARCHAR(50) );
Problem: Get products with all their tags as a comma-separated list.
Solution:
sql
SELECT p.product_id, p.name AS product_name, STRING_AGG(t.name, ', ' ORDER BY t.name) AS tags FROM products p LEFT JOIN product_tags pt ON p.product_id = pt.product_id LEFT JOIN tags t ON pt.tag_id = t.tag_id GROUP BY p.product_id, p.name;
Finding Products with Specific Tags:
sql
-- Products with ALL specified tags SELECT p.product_id, p.name FROM products p INNER JOIN product_tags pt ON p.product_id = pt.product_id INNER JOIN tags t ON pt.tag_id = t.tag_id WHERE t.name IN ('wireless', 'bluetooth', 'portable') GROUP BY p.product_id, p.name HAVING COUNT(DISTINCT t.tag_id) = 3; -- Must have all 3 tags -- Products with ANY of specified tags SELECT DISTINCT p.product_id, p.name FROM products p INNER JOIN product_tags pt ON p.product_id = pt.product_id INNER JOIN tags t ON pt.tag_id = t.tag_id WHERE t.name IN ('wireless', 'bluetooth', 'portable');

Question 50: Avoiding N+1 with JOIN

Difficulty: Advanced | Topic: Performance Pattern
Problem: Get user details with their last 5 orders in a single query.
Bad Pattern (N+1):
python
# Application code - N+1 problem users = db.query("SELECT * FROM users LIMIT 100") for user in users: orders = db.query(f"SELECT * FROM orders WHERE user_id = {user.id} ORDER BY created_at DESC LIMIT 5") # 101 queries total!
Good Pattern (Single Query):
sql
WITH ranked_orders AS ( SELECT o.*, ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY created_at DESC) AS rn FROM orders o ) SELECT u.user_id, u.email, u.first_name, ro.order_id, ro.order_number, ro.total_amount, ro.created_at FROM users u LEFT JOIN ranked_orders ro ON u.user_id = ro.user_id AND ro.rn <= 5 WHERE u.status = 'active' ORDER BY u.user_id, ro.rn; -- 1 query total!
Alternative with LATERAL:
sql
SELECT u.user_id, u.email, o.* FROM users u CROSS JOIN LATERAL ( SELECT order_id, order_number, total_amount, created_at FROM orders WHERE user_id = u.user_id ORDER BY created_at DESC LIMIT 5 ) o WHERE u.status = 'active';

Section 8: Join Performance Deep Dive

Join Algorithm Types

Understanding how databases execute joins helps you optimize them.
NESTED LOOP JOIN: For each row in outer table: For each row in inner table: If join condition matches: Output row Best for: Small tables, indexed inner table Worst for: Large tables without indexes HASH JOIN: 1. Build hash table from smaller table 2. Probe hash table for each row from larger table Best for: Large tables, equality joins Worst for: Non-equality joins, memory-constrained systems MERGE JOIN: 1. Sort both tables on join key 2. Walk through both sorted lists, matching rows Best for: Large sorted data, range conditions Worst for: Unsorted data (sort overhead)

Question: Analyzing Join Performance

How to Read EXPLAIN for Joins:
sql
EXPLAIN ANALYZE SELECT o.order_id, u.email FROM orders o INNER JOIN users u ON o.user_id = u.user_id WHERE o.created_at >= '2024-01-01';
Look for:
Hash Join (cost=... rows=... width=...) Hash Cond: (o.user_id = u.user_id) -> Seq Scan on orders o (cost=...) Filter: (created_at >= '2024-01-01') -> Hash (cost=...) -> Seq Scan on users u (cost=...)
Key Indicators:
  • Seq Scan → No index used, consider adding one
  • Nested Loop with large outer table → May be slow
  • Hash → Building hash table, check memory
  • Sort → Sorting for merge join, check if index can help

Summary: Part 2 Key Takeaways

Join Types Quick Reference

Join TypeReturnsUse When
INNER JOINMatching rows onlyNeed data from both tables
LEFT JOINAll left + matching rightInclude non-matching rows from left
RIGHT JOINAll right + matching leftRare, use LEFT JOIN instead
FULL OUTER JOINAll rows from bothData reconciliation
CROSS JOINCartesian productGenerate combinations
SELF JOINTable joined to itselfHierarchies, duplicates
Semi-Join (EXISTS)Rows with matchesCheck existence
Anti-Join (NOT EXISTS)Rows without matchesFind missing data

Performance Rules

  1. Always index join columns - Foreign keys, join conditions
  2. Filter early - Apply WHERE before joining when possible
  3. **Avoid SELECT *** - Only select needed columns
  4. LEFT JOIN filter placement - Use ON for outer table filters
  5. Avoid NOT IN with NULLs - Use NOT EXISTS instead
  6. Use EXISTS over IN - For large subqueries
  7. Check EXPLAIN plans - Verify join algorithms

Indexing for Joins

sql
-- Index the foreign key (child table) CREATE INDEX idx_orders_user_id ON orders(user_id); CREATE INDEX idx_order_items_order_id ON order_items(order_id); CREATE INDEX idx_order_items_product_id ON order_items(product_id); -- Composite index for filtered joins CREATE INDEX idx_orders_user_created ON orders(user_id, created_at DESC);

Coming in Part 3: Aggregations and GROUP BY

  • COUNT, SUM, AVG, MIN, MAX
  • GROUP BY mechanics
  • HAVING vs WHERE
  • Window functions introduction
  • Partitioning and ranking
  • Rolling calculations

This is Part 2 of the SQL Mastery Guide series. Practice these join patterns—they appear in almost every production query.
All Blogs
Tags:sqljoinsdatabaseperformanceinterview