SQL Mastery Guide Part 2: Joins Mastery
Why Joins Matter
Joins are the heart of relational databases. They're what make "relational" meaningful—connecting data across tables based on relationships.
After 20+ years of database work, I can tell you: understanding joins deeply separates junior developers from senior engineers. This part covers every join type with real performance implications.
We continue with our e-commerce schema from Part 1.
Section 1: INNER JOIN
Question 26: Basic INNER JOIN
Difficulty: Beginner | Topic: INNER JOIN Basics
Problem: Get all orders with their customer information.
Solution:
sqlSELECT o.order_id, o.order_number, o.total_amount, o.created_at, u.user_id, u.email, u.first_name, u.last_name FROM orders o INNER JOIN users u ON o.user_id = u.user_id;
Explanation:
INNER JOIN returns only rows where the join condition matches in BOTH tables.orders table: users table: Result: order_id | user_id user_id | name order_id | user_id | name ---------|-------- --------|----- ---------|---------|----- 1 | 100 100 | Alice → 1 | 100 | Alice 2 | 101 101 | Bob → 2 | 101 | Bob 3 | 102 103 | Charlie 3 | (no match, excluded)
Order 3's user_id (102) has no match in users, so it's excluded.
Common Mistakes:
sql-- Missing table alias causes ambiguous column error SELECT order_id, user_id, email -- Which user_id? FROM orders INNER JOIN users ON orders.user_id = users.user_id; -- CORRECT: Use aliases SELECT o.order_id, o.user_id, u.email FROM orders o INNER JOIN users u ON o.user_id = u.user_id;
Performance Concerns:
With 50M orders and 10M users:
- Without index: Full table scans, comparing every row combination
- With index on join column: Index lookup, dramatically faster
Indexing Suggestion:
sql-- Index on the column being joined TO CREATE INDEX idx_orders_user_id ON orders(user_id); -- The primary key on users.user_id is already indexed
Question 27: Multi-Table INNER JOIN
Difficulty: Intermediate | Topic: Chained Joins
Problem: Get order details including customer name, product name, and quantity.
Solution:
sqlSELECT o.order_number, u.first_name || ' ' || u.last_name AS customer_name, p.name AS product_name, oi.quantity, oi.unit_price, oi.total_price FROM orders o INNER JOIN users u ON o.user_id = u.user_id INNER JOIN order_items oi ON o.order_id = oi.order_id INNER JOIN products p ON oi.product_id = p.product_id;
Explanation:
You can chain multiple joins. Each join connects to the result of the previous joins.
Join Order:
orders → users (via user_id) → order_items (via order_id) → products (via product_id)
Performance Concerns:
The optimizer decides join order. Sometimes it gets it wrong.
Checking Join Order:
sqlEXPLAIN ANALYZE SELECT ... -- your query
Look for the join order in the plan. If a large table is scanned first, consider query hints or restructuring.
When Join Order Matters:
- Start with the most selective filter
- Join smaller result sets first
- Avoid Cartesian products in intermediate steps
Question 28: INNER JOIN with Filter
Difficulty: Intermediate | Topic: Filtered Joins
Problem: Get active products with their category names.
Solution:
sqlSELECT p.product_id, p.name AS product_name, p.price, c.name AS category_name FROM products p INNER JOIN categories c ON p.category_id = c.category_id WHERE p.status = 'active' AND c.is_active = TRUE;
WHERE vs JOIN Condition:
sql-- Filter in WHERE (standard approach) FROM products p INNER JOIN categories c ON p.category_id = c.category_id WHERE p.status = 'active'; -- Filter in JOIN condition (same result for INNER JOIN) FROM products p INNER JOIN categories c ON p.category_id = c.category_id AND p.status = 'active';
For
INNER JOIN, both produce the same result. For LEFT JOIN, they're different! (See Question 34)Question 29: Self-Referencing Data with JOIN
Difficulty: Intermediate | Topic: Join Patterns
Problem: Get products with their category hierarchy (category and parent category).
Solution:
sqlSELECT p.product_id, p.name AS product_name, c.name AS category, parent.name AS parent_category FROM products p INNER JOIN categories c ON p.category_id = c.category_id LEFT JOIN categories parent ON c.parent_id = parent.category_id;
Why LEFT JOIN for parent?
Top-level categories have no parent (
parent_id IS NULL). We still want those products.Section 2: LEFT JOIN (LEFT OUTER JOIN)
Question 30: Basic LEFT JOIN
Difficulty: Beginner | Topic: LEFT JOIN Basics
Problem: Get all users and their order counts (including users with no orders).
Solution:
sqlSELECT u.user_id, u.email, u.first_name, COUNT(o.order_id) AS order_count FROM users u LEFT JOIN orders o ON u.user_id = o.user_id GROUP BY u.user_id, u.email, u.first_name;
Explanation:
LEFT JOIN returns ALL rows from the left table, even if there's no match in the right table.users table: orders table: Result: user_id | name order_id | user_id user_id | name | order_id --------|----- ---------|-------- --------|---------|--------- 100 | Alice 1 | 100 100 | Alice | 1 101 | Bob 2 | 100 100 | Alice | 2 102 | Charlie 101 | Bob | NULL 102 | Charlie | NULL
Alice has 2 orders. Bob and Charlie have NULL (no orders).
COUNT Behavior:
COUNT(*)counts all rows (including NULLs) → would show 1 for BobCOUNT(o.order_id)counts non-NULL values → shows 0 for Bob
Question 31: LEFT JOIN Anti-Pattern (Finding Missing Data)
Difficulty: Intermediate | Topic: Anti-Join
Problem: Find users who have never placed an order.
Solution:
sqlSELECT u.user_id, u.email, u.first_name FROM users u LEFT JOIN orders o ON u.user_id = o.user_id WHERE o.order_id IS NULL;
Explanation:
LEFT JOINkeeps all users- Users without orders have NULL in all orders columns
WHERE o.order_id IS NULLfilters to only those users
Why Check order_id, Not user_id?
sql-- WRONG: This might match NULLs that exist in orders WHERE o.user_id IS NULL -- CORRECT: Primary key is never NULL in a matched row WHERE o.order_id IS NULL
Alternative Approaches:
sql-- NOT EXISTS (often faster) SELECT u.user_id, u.email, u.first_name FROM users u WHERE NOT EXISTS ( SELECT 1 FROM orders o WHERE o.user_id = u.user_id ); -- NOT IN (risky with NULLs) SELECT user_id, email, first_name FROM users WHERE user_id NOT IN (SELECT user_id FROM orders WHERE user_id IS NOT NULL);
Question 32: LEFT JOIN with Aggregation
Difficulty: Intermediate | Topic: Aggregate with Joins
Problem: Get each customer's total spending (including customers who never bought).
Solution:
sqlSELECT u.user_id, u.email, u.first_name, COALESCE(SUM(o.total_amount), 0) AS total_spent, COUNT(o.order_id) AS order_count, COALESCE(AVG(o.total_amount), 0) AS avg_order_value FROM users u LEFT JOIN orders o ON u.user_id = o.user_id AND o.status NOT IN ('cancelled', 'refunded') -- Filter in JOIN! GROUP BY u.user_id, u.email, u.first_name ORDER BY total_spent DESC;
Why Filter in JOIN, Not WHERE?
sql-- WRONG: This excludes users with no valid orders FROM users u LEFT JOIN orders o ON u.user_id = o.user_id WHERE o.status NOT IN ('cancelled', 'refunded'); -- Removes NULL rows! -- CORRECT: Filter in JOIN keeps users with no orders FROM users u LEFT JOIN orders o ON u.user_id = o.user_id AND o.status NOT IN ('cancelled', 'refunded');
Question 33: Multiple LEFT JOINs
Difficulty: Intermediate | Topic: Chained LEFT JOINs
Problem: Get product information with optional brand and category (some products may have NULL brand_id).
Solution:
sqlSELECT p.product_id, p.name AS product_name, p.price, COALESCE(c.name, 'Uncategorized') AS category, COALESCE(b.name, 'No Brand') AS brand FROM products p LEFT JOIN categories c ON p.category_id = c.category_id LEFT JOIN brands b ON p.brand_id = b.brand_id WHERE p.status = 'active';
Performance Concerns:
Multiple LEFT JOINs can create large intermediate result sets. Each join multiplies potential rows.
Question 34: LEFT JOIN Filter Location Trap
Difficulty: Advanced | Topic: Join Logic
Problem: Get all users with their orders from 2024 (users without 2024 orders should still appear).
Solution:
sql-- WRONG: Filters out users with no 2024 orders SELECT u.user_id, u.email, o.order_id, o.created_at FROM users u LEFT JOIN orders o ON u.user_id = o.user_id WHERE o.created_at >= '2024-01-01'; -- This removes NULL rows! -- CORRECT: Filter in JOIN condition SELECT u.user_id, u.email, o.order_id, o.created_at FROM users u LEFT JOIN orders o ON u.user_id = o.user_id AND o.created_at >= '2024-01-01';
Visual Explanation:
WRONG approach (WHERE): users: orders: After LEFT JOIN: After WHERE: user_id user_id | date user_id | date user_id | date ------- --------|----- --------|----- --------|----- 1 1 | 2023-05 1 | 2023-05 (excluded - 2023) 2 1 | 2024-03 1 | 2024-03 → 1 | 2024-03 2 | 2023-08 2 | 2023-08 (excluded - 2023) (user 2 disappears!) CORRECT approach (AND in JOIN): After LEFT JOIN with date filter: user_id | date --------|----- 1 | 2024-03 (2023 order filtered in JOIN) 2 | NULL (no 2024 orders, but user kept!)
Section 3: RIGHT JOIN and FULL JOIN
Question 35: RIGHT JOIN
Difficulty: Intermediate | Topic: RIGHT JOIN
Problem: Get all products and their order counts (including products never ordered).
Solution:
sql-- Using RIGHT JOIN (less common) SELECT p.product_id, p.name, COUNT(oi.item_id) AS times_ordered FROM order_items oi RIGHT JOIN products p ON oi.product_id = p.product_id GROUP BY p.product_id, p.name; -- Equivalent LEFT JOIN (more common, recommended) SELECT p.product_id, p.name, COUNT(oi.item_id) AS times_ordered FROM products p LEFT JOIN order_items oi ON p.product_id = oi.product_id GROUP BY p.product_id, p.name;
When to Use RIGHT JOIN:
Almost never. You can always rewrite as LEFT JOIN by swapping table order. LEFT JOIN is more readable because we read left-to-right.
Question 36: FULL OUTER JOIN
Difficulty: Intermediate | Topic: FULL JOIN
Problem: Find mismatches between two inventory systems.
Schema for this question:
sqlCREATE TABLE warehouse_inventory ( sku VARCHAR(50) PRIMARY KEY, quantity INT ); CREATE TABLE system_inventory ( sku VARCHAR(50) PRIMARY KEY, quantity INT );
Solution:
sqlSELECT COALESCE(w.sku, s.sku) AS sku, w.quantity AS warehouse_qty, s.quantity AS system_qty, CASE WHEN w.sku IS NULL THEN 'In system only' WHEN s.sku IS NULL THEN 'In warehouse only' WHEN w.quantity != s.quantity THEN 'Quantity mismatch' ELSE 'Match' END AS status FROM warehouse_inventory w FULL OUTER JOIN system_inventory s ON w.sku = s.sku WHERE w.sku IS NULL OR s.sku IS NULL OR w.quantity != s.quantity;
Explanation:
FULL OUTER JOIN returns ALL rows from BOTH tables, matching where possible.warehouse: system: Result: sku | qty sku | qty sku | w_qty | s_qty | status -----|--- -----|--- -----|-------|-------|------ A | 10 A | 10 → A | 10 | 10 | Match B | 20 B | 15 → B | 20 | 15 | Mismatch C | 5 → C | 5 | NULL | Warehouse only D | 8 → D | NULL | 8 | System only
When to Use FULL JOIN:
- Data reconciliation between systems
- Finding orphaned records
- Merge operations
MySQL Note:
MySQL doesn't support
FULL OUTER JOIN. Workaround:sqlSELECT * FROM warehouse_inventory w LEFT JOIN system_inventory s ON w.sku = s.sku UNION SELECT * FROM warehouse_inventory w RIGHT JOIN system_inventory s ON w.sku = s.sku;
Section 4: SELF JOIN
Question 37: Basic SELF JOIN
Difficulty: Intermediate | Topic: Self-Referencing Tables
Problem: Get categories with their parent category names.
Solution:
sqlSELECT c.category_id, c.name AS category_name, c.level, p.name AS parent_category_name FROM categories c LEFT JOIN categories p ON c.parent_id = p.category_id ORDER BY c.path;
Explanation:
SELF JOIN joins a table to itself. You MUST use different aliases.
categories table: category_id | name | parent_id ------------|-------------|---------- 1 | Electronics | NULL 2 | Phones | 1 3 | Laptops | 1 4 | Smartphones | 2 Result: category_id | category_name | parent_category_name ------------|---------------|--------------------- 1 | Electronics | NULL 2 | Phones | Electronics 3 | Laptops | Electronics 4 | Smartphones | Phones
Question 38: Employee Hierarchy
Difficulty: Intermediate | Topic: Self JOIN for Hierarchy
Schema:
sqlCREATE TABLE employees ( employee_id SERIAL PRIMARY KEY, name VARCHAR(100), manager_id INT REFERENCES employees(employee_id), department VARCHAR(50), salary DECIMAL(10,2) );
Problem: Get employees with their manager names.
Solution:
sqlSELECT e.employee_id, e.name AS employee_name, e.department, e.salary, m.name AS manager_name FROM employees e LEFT JOIN employees m ON e.manager_id = m.employee_id;
Finding Employees Who Earn More Than Their Manager:
sqlSELECT e.name AS employee_name, e.salary AS employee_salary, m.name AS manager_name, m.salary AS manager_salary FROM employees e INNER JOIN employees m ON e.manager_id = m.employee_id WHERE e.salary > m.salary;
Question 39: Finding Duplicates with SELF JOIN
Difficulty: Intermediate | Topic: Duplicate Detection
Problem: Find users with duplicate email addresses (case-insensitive).
Solution:
sqlSELECT u1.user_id, u1.email, u2.user_id AS duplicate_user_id, u2.email AS duplicate_email FROM users u1 INNER JOIN users u2 ON LOWER(u1.email) = LOWER(u2.email) AND u1.user_id < u2.user_id; -- Avoid self-match and duplicate pairs
Why
u1.user_id < u2.user_id?
Without it, you'd get:- Each row matched with itself
- Both (A, B) and (B, A) pairs
With
<, you only get each pair once.Section 5: CROSS JOIN
Question 40: Basic CROSS JOIN
Difficulty: Intermediate | Topic: Cartesian Product
Problem: Generate all possible product-color combinations.
Schema:
sqlCREATE TABLE colors ( color_id SERIAL PRIMARY KEY, name VARCHAR(50) ); -- Contains: Red, Blue, Green CREATE TABLE sizes ( size_id SERIAL PRIMARY KEY, name VARCHAR(20) ); -- Contains: S, M, L, XL
Solution:
sqlSELECT c.name AS color, s.name AS size, c.name || '-' || s.name AS variant_code FROM colors c CROSS JOIN sizes s ORDER BY c.name, s.size_id;
Result:
color | size | variant_code ------|------|------------- Blue | S | Blue-S Blue | M | Blue-M Blue | L | Blue-L Blue | XL | Blue-XL Green | S | Green-S ... (12 total rows: 3 colors × 4 sizes)
Performance Warning:
CROSS JOIN produces rows = table1_rows × table2_rows.
- 1000 × 1000 = 1,000,000 rows!
- Only use with small tables or intentionally
Use Cases:
- Generating combinations (colors × sizes)
- Creating date ranges with all dates × all products
- Simulation data
Question 41: CROSS JOIN for Reporting
Difficulty: Intermediate | Topic: Report Generation
Problem: Generate a sales report with all months, even months with zero sales.
Solution:
sql-- Generate all months for 2024 WITH months AS ( SELECT generate_series( '2024-01-01'::date, '2024-12-01'::date, '1 month'::interval ) AS month_start ) SELECT TO_CHAR(m.month_start, 'YYYY-MM') AS month, COALESCE(SUM(o.total_amount), 0) AS total_sales, COALESCE(COUNT(o.order_id), 0) AS order_count FROM months m LEFT JOIN orders o ON o.created_at >= m.month_start AND o.created_at < m.month_start + INTERVAL '1 month' AND o.status NOT IN ('cancelled', 'refunded') GROUP BY m.month_start ORDER BY m.month_start;
Section 6: Semi-Join and Anti-Join
Question 42: Semi-Join with EXISTS
Difficulty: Intermediate | Topic: Semi-Join
Problem: Find products that have been ordered at least once.
Solution:
sql-- Semi-join using EXISTS SELECT p.product_id, p.name, p.price FROM products p WHERE EXISTS ( SELECT 1 FROM order_items oi WHERE oi.product_id = p.product_id );
Semi-Join vs INNER JOIN:
sql-- INNER JOIN may return duplicates! SELECT p.product_id, p.name FROM products p INNER JOIN order_items oi ON p.product_id = oi.product_id; -- Product "iPhone" appears 1000 times if ordered 1000 times -- Semi-join returns each product once SELECT p.product_id, p.name FROM products p WHERE EXISTS (SELECT 1 FROM order_items oi WHERE oi.product_id = p.product_id); -- Product "iPhone" appears once
Performance:
EXISTS stops at the first match. It doesn't need to find all matching rows—just one.Question 43: Semi-Join with IN
Difficulty: Intermediate | Topic: IN vs EXISTS
Problem: Find customers who ordered in January 2024.
Solution:
sql-- Using IN SELECT user_id, email, first_name FROM users WHERE user_id IN ( SELECT user_id FROM orders WHERE created_at >= '2024-01-01' AND created_at < '2024-02-01' ); -- Using EXISTS (often faster for large subqueries) SELECT user_id, email, first_name FROM users u WHERE EXISTS ( SELECT 1 FROM orders o WHERE o.user_id = u.user_id AND o.created_at >= '2024-01-01' AND o.created_at < '2024-02-01' );
When to Use Which:
| Scenario | Recommended |
|---|---|
| Small subquery result (< 1000 rows) | IN |
| Large subquery result | EXISTS |
| Subquery is a literal list | IN |
| Correlated condition | EXISTS |
Question 44: Anti-Join with NOT EXISTS
Difficulty: Intermediate | Topic: Anti-Join
Problem: Find products never ordered.
Solution:
sqlSELECT p.product_id, p.name, p.price, p.created_at FROM products p WHERE NOT EXISTS ( SELECT 1 FROM order_items oi WHERE oi.product_id = p.product_id );
Why NOT EXISTS is Safer Than NOT IN:
sql-- DANGEROUS: If order_items.product_id has ANY NULL value, -- NOT IN returns zero rows! SELECT * FROM products WHERE product_id NOT IN (SELECT product_id FROM order_items); -- SAFE: NOT EXISTS handles NULLs correctly SELECT * FROM products p WHERE NOT EXISTS (SELECT 1 FROM order_items oi WHERE oi.product_id = p.product_id);
Section 7: Advanced Join Patterns
Question 45: Latest Record Per Group (Top 1 Per Group)
Difficulty: Advanced | Topic: Join Pattern
Problem: Get each user's most recent order.
Solution:
sql-- Method 1: Correlated subquery SELECT o.* FROM orders o WHERE o.created_at = ( SELECT MAX(o2.created_at) FROM orders o2 WHERE o2.user_id = o.user_id ); -- Method 2: Window function with DISTINCT ON (PostgreSQL) SELECT DISTINCT ON (user_id) order_id, user_id, order_number, total_amount, created_at FROM orders ORDER BY user_id, created_at DESC; -- Method 3: Window function with ROW_NUMBER WITH ranked_orders AS ( SELECT *, ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY created_at DESC) AS rn FROM orders ) SELECT * FROM ranked_orders WHERE rn = 1; -- Method 4: LATERAL join (PostgreSQL, very efficient) SELECT u.user_id, u.email, o.* FROM users u CROSS JOIN LATERAL ( SELECT * FROM orders WHERE user_id = u.user_id ORDER BY created_at DESC LIMIT 1 ) o;
Performance Comparison:
| Method | Index Required | Performance |
|---|---|---|
| Correlated subquery | (user_id, created_at) | Slow for large tables |
| DISTINCT ON | (user_id, created_at DESC) | Fast, PostgreSQL only |
| ROW_NUMBER | (user_id, created_at) | Medium, scans all rows |
| LATERAL | (user_id, created_at DESC) | Fast, PostgreSQL only |
Question 46: Joining with Date Ranges
Difficulty: Advanced | Topic: Range Joins
Problem: Find the applicable discount for each order based on order date.
Schema:
sqlCREATE TABLE discounts ( discount_id SERIAL PRIMARY KEY, code VARCHAR(50), percentage DECIMAL(5,2), start_date DATE, end_date DATE );
Solution:
sqlSELECT o.order_id, o.order_number, o.total_amount, o.created_at, d.code AS discount_code, d.percentage AS discount_percentage FROM orders o LEFT JOIN discounts d ON o.created_at::date BETWEEN d.start_date AND d.end_date AND d.code = 'SUMMER2024'; -- Or whatever matching criteria
Performance Warning:
BETWEEN for date ranges can be slow. Consider using a date dimension table or range types.Using PostgreSQL Range Types:
sqlALTER TABLE discounts ADD COLUMN valid_range daterange; UPDATE discounts SET valid_range = daterange(start_date, end_date, '[]'); CREATE INDEX idx_discounts_range ON discounts USING GIST (valid_range); -- Efficient range query SELECT o.*, d.code FROM orders o LEFT JOIN discounts d ON o.created_at::date <@ d.valid_range;
Question 47: Joining on Multiple Conditions
Difficulty: Intermediate | Topic: Composite Joins
Problem: Get order items with product prices at the time of order.
Schema:
sqlCREATE TABLE product_price_history ( history_id SERIAL PRIMARY KEY, product_id BIGINT, price DECIMAL(10,2), effective_from DATE, effective_to DATE );
Solution:
sqlSELECT oi.item_id, oi.order_id, p.name AS product_name, oi.quantity, pph.price AS historical_price, oi.unit_price AS charged_price FROM order_items oi INNER JOIN orders o ON oi.order_id = o.order_id INNER JOIN products p ON oi.product_id = p.product_id LEFT JOIN product_price_history pph ON oi.product_id = pph.product_id AND o.created_at::date >= pph.effective_from AND (pph.effective_to IS NULL OR o.created_at::date <= pph.effective_to);
Question 48: Non-Equi Joins
Difficulty: Advanced | Topic: Non-Equality Joins
Problem: Find all orders that are larger than the customer's average order.
Solution:
sqlWITH customer_avg AS ( SELECT user_id, AVG(total_amount) AS avg_order_value FROM orders WHERE status NOT IN ('cancelled', 'refunded') GROUP BY user_id ) SELECT o.order_id, o.order_number, o.user_id, o.total_amount, ca.avg_order_value, o.total_amount - ca.avg_order_value AS above_average_by FROM orders o INNER JOIN customer_avg ca ON o.user_id = ca.user_id AND o.total_amount > ca.avg_order_value -- Non-equi condition ORDER BY above_average_by DESC;
Question 49: Many-to-Many Join
Difficulty: Intermediate | Topic: Junction Tables
Schema:
sqlCREATE TABLE product_tags ( product_id BIGINT REFERENCES products(product_id), tag_id INT REFERENCES tags(tag_id), PRIMARY KEY (product_id, tag_id) ); CREATE TABLE tags ( tag_id SERIAL PRIMARY KEY, name VARCHAR(50) );
Problem: Get products with all their tags as a comma-separated list.
Solution:
sqlSELECT p.product_id, p.name AS product_name, STRING_AGG(t.name, ', ' ORDER BY t.name) AS tags FROM products p LEFT JOIN product_tags pt ON p.product_id = pt.product_id LEFT JOIN tags t ON pt.tag_id = t.tag_id GROUP BY p.product_id, p.name;
Finding Products with Specific Tags:
sql-- Products with ALL specified tags SELECT p.product_id, p.name FROM products p INNER JOIN product_tags pt ON p.product_id = pt.product_id INNER JOIN tags t ON pt.tag_id = t.tag_id WHERE t.name IN ('wireless', 'bluetooth', 'portable') GROUP BY p.product_id, p.name HAVING COUNT(DISTINCT t.tag_id) = 3; -- Must have all 3 tags -- Products with ANY of specified tags SELECT DISTINCT p.product_id, p.name FROM products p INNER JOIN product_tags pt ON p.product_id = pt.product_id INNER JOIN tags t ON pt.tag_id = t.tag_id WHERE t.name IN ('wireless', 'bluetooth', 'portable');
Question 50: Avoiding N+1 with JOIN
Difficulty: Advanced | Topic: Performance Pattern
Problem: Get user details with their last 5 orders in a single query.
Bad Pattern (N+1):
python# Application code - N+1 problem users = db.query("SELECT * FROM users LIMIT 100") for user in users: orders = db.query(f"SELECT * FROM orders WHERE user_id = {user.id} ORDER BY created_at DESC LIMIT 5") # 101 queries total!
Good Pattern (Single Query):
sqlWITH ranked_orders AS ( SELECT o.*, ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY created_at DESC) AS rn FROM orders o ) SELECT u.user_id, u.email, u.first_name, ro.order_id, ro.order_number, ro.total_amount, ro.created_at FROM users u LEFT JOIN ranked_orders ro ON u.user_id = ro.user_id AND ro.rn <= 5 WHERE u.status = 'active' ORDER BY u.user_id, ro.rn; -- 1 query total!
Alternative with LATERAL:
sqlSELECT u.user_id, u.email, o.* FROM users u CROSS JOIN LATERAL ( SELECT order_id, order_number, total_amount, created_at FROM orders WHERE user_id = u.user_id ORDER BY created_at DESC LIMIT 5 ) o WHERE u.status = 'active';
Section 8: Join Performance Deep Dive
Join Algorithm Types
Understanding how databases execute joins helps you optimize them.
NESTED LOOP JOIN: For each row in outer table: For each row in inner table: If join condition matches: Output row Best for: Small tables, indexed inner table Worst for: Large tables without indexes HASH JOIN: 1. Build hash table from smaller table 2. Probe hash table for each row from larger table Best for: Large tables, equality joins Worst for: Non-equality joins, memory-constrained systems MERGE JOIN: 1. Sort both tables on join key 2. Walk through both sorted lists, matching rows Best for: Large sorted data, range conditions Worst for: Unsorted data (sort overhead)
Question: Analyzing Join Performance
How to Read EXPLAIN for Joins:
sqlEXPLAIN ANALYZE SELECT o.order_id, u.email FROM orders o INNER JOIN users u ON o.user_id = u.user_id WHERE o.created_at >= '2024-01-01';
Look for:
Hash Join (cost=... rows=... width=...) Hash Cond: (o.user_id = u.user_id) -> Seq Scan on orders o (cost=...) Filter: (created_at >= '2024-01-01') -> Hash (cost=...) -> Seq Scan on users u (cost=...)
Key Indicators:
Seq Scan→ No index used, consider adding oneNested Loopwith large outer table → May be slowHash→ Building hash table, check memorySort→ Sorting for merge join, check if index can help
Summary: Part 2 Key Takeaways
Join Types Quick Reference
| Join Type | Returns | Use When |
|---|---|---|
| INNER JOIN | Matching rows only | Need data from both tables |
| LEFT JOIN | All left + matching right | Include non-matching rows from left |
| RIGHT JOIN | All right + matching left | Rare, use LEFT JOIN instead |
| FULL OUTER JOIN | All rows from both | Data reconciliation |
| CROSS JOIN | Cartesian product | Generate combinations |
| SELF JOIN | Table joined to itself | Hierarchies, duplicates |
| Semi-Join (EXISTS) | Rows with matches | Check existence |
| Anti-Join (NOT EXISTS) | Rows without matches | Find missing data |
Performance Rules
- Always index join columns - Foreign keys, join conditions
- Filter early - Apply WHERE before joining when possible
- **Avoid SELECT *** - Only select needed columns
- LEFT JOIN filter placement - Use ON for outer table filters
- Avoid NOT IN with NULLs - Use NOT EXISTS instead
- Use EXISTS over IN - For large subqueries
- Check EXPLAIN plans - Verify join algorithms
Indexing for Joins
sql-- Index the foreign key (child table) CREATE INDEX idx_orders_user_id ON orders(user_id); CREATE INDEX idx_order_items_order_id ON order_items(order_id); CREATE INDEX idx_order_items_product_id ON order_items(product_id); -- Composite index for filtered joins CREATE INDEX idx_orders_user_created ON orders(user_id, created_at DESC);
Coming in Part 3: Aggregations and GROUP BY
- COUNT, SUM, AVG, MIN, MAX
- GROUP BY mechanics
- HAVING vs WHERE
- Window functions introduction
- Partitioning and ranking
- Rolling calculations
This is Part 2 of the SQL Mastery Guide series. Practice these join patterns—they appear in almost every production query.