SQL Mastery Guide Part 1: Foundations
Introduction: How to Use This Guide
After 20+ years of designing databases for fintech, e-commerce, and high-scale platforms, I've seen thousands of engineers struggle with SQL—not because it's hard, but because they learned it wrong.
Most tutorials teach you SELECT * FROM table and call it a day. Real production databases have millions of rows. That SELECT * would bring your server to its knees.
This guide is different. Every query comes with:
- The problem to solve
- The solution
- Why it works
- What can go wrong
- How to make it fast
- When NOT to use this pattern
Practice Schema: E-Commerce Platform
We'll use this schema throughout the series:
sql-- Users table: 10 million users CREATE TABLE users ( user_id BIGSERIAL PRIMARY KEY, email VARCHAR(255) NOT NULL UNIQUE, username VARCHAR(50) NOT NULL UNIQUE, password_hash VARCHAR(255) NOT NULL, first_name VARCHAR(100), last_name VARCHAR(100), phone VARCHAR(20), status VARCHAR(20) DEFAULT 'active', -- active, suspended, deleted email_verified BOOLEAN DEFAULT FALSE, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, last_login_at TIMESTAMP, login_count INT DEFAULT 0 ); -- Products table: 500,000 products CREATE TABLE products ( product_id BIGSERIAL PRIMARY KEY, sku VARCHAR(50) NOT NULL UNIQUE, name VARCHAR(255) NOT NULL, description TEXT, category_id INT REFERENCES categories(category_id), brand_id INT REFERENCES brands(brand_id), price DECIMAL(10,2) NOT NULL, cost_price DECIMAL(10,2), stock_quantity INT DEFAULT 0, status VARCHAR(20) DEFAULT 'active', -- active, discontinued, out_of_stock weight_kg DECIMAL(8,3), created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ); -- Orders table: 50 million orders CREATE TABLE orders ( order_id BIGSERIAL PRIMARY KEY, order_number VARCHAR(20) NOT NULL UNIQUE, user_id BIGINT NOT NULL REFERENCES users(user_id), status VARCHAR(20) NOT NULL, -- pending, paid, shipped, delivered, cancelled, refunded subtotal DECIMAL(12,2) NOT NULL, tax_amount DECIMAL(12,2) DEFAULT 0, shipping_amount DECIMAL(12,2) DEFAULT 0, discount_amount DECIMAL(12,2) DEFAULT 0, total_amount DECIMAL(12,2) NOT NULL, currency CHAR(3) DEFAULT 'USD', shipping_address_id BIGINT, billing_address_id BIGINT, payment_method VARCHAR(50), paid_at TIMESTAMP, shipped_at TIMESTAMP, delivered_at TIMESTAMP, cancelled_at TIMESTAMP, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ); -- Order Items table: 150 million items CREATE TABLE order_items ( item_id BIGSERIAL PRIMARY KEY, order_id BIGINT NOT NULL REFERENCES orders(order_id), product_id BIGINT NOT NULL REFERENCES products(product_id), quantity INT NOT NULL, unit_price DECIMAL(10,2) NOT NULL, total_price DECIMAL(12,2) NOT NULL, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ); -- Categories table: 500 categories CREATE TABLE categories ( category_id SERIAL PRIMARY KEY, name VARCHAR(100) NOT NULL, parent_id INT REFERENCES categories(category_id), level INT DEFAULT 1, path VARCHAR(500), -- e.g., '/electronics/phones/smartphones' is_active BOOLEAN DEFAULT TRUE ); -- Payments table: 40 million payments CREATE TABLE payments ( payment_id BIGSERIAL PRIMARY KEY, order_id BIGINT NOT NULL REFERENCES orders(order_id), amount DECIMAL(12,2) NOT NULL, currency CHAR(3) DEFAULT 'USD', method VARCHAR(50) NOT NULL, -- credit_card, debit_card, upi, wallet status VARCHAR(20) NOT NULL, -- pending, completed, failed, refunded gateway_ref VARCHAR(100), error_message TEXT, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, processed_at TIMESTAMP );
Section 1: SELECT Basics
Question 1: Select All Columns
Difficulty: Beginner | Topic: Basic SELECT
Problem: Retrieve all information about all products.
Solution:
sql-- The naive approach (AVOID in production) SELECT * FROM products; -- The correct approach SELECT product_id, sku, name, description, category_id, brand_id, price, cost_price, stock_quantity, status, weight_kg, created_at, updated_at FROM products;
Explanation:
SELECT * fetches every column. While convenient for exploration, it's problematic in production:- Fetches unnecessary data (like large TEXT descriptions)
- Breaks when schema changes (new columns appear unexpectedly)
- Prevents covering indexes from being used
Common Mistakes:
- Using
SELECT *in application code - Not considering column order for readability
Performance Concerns:
- With 500,000 products and a TEXT description column,
SELECT *might transfer gigabytes of data - Network becomes the bottleneck
When NOT to Use:
- Never use
SELECT *in production application code - Acceptable only for ad-hoc exploration in a SQL client
Question 2: Select Specific Columns
Difficulty: Beginner | Topic: Column Selection
Problem: Get product names and prices for a product catalog page.
Solution:
sqlSELECT product_id, name, price FROM products;
Explanation:
Only select what you need. For a catalog page, you need ID (for links), name, and price. Not description, not stock, not dates.
Performance Concerns:
- Fewer columns = less data transfer
- Enables use of covering indexes (we'll cover this in Part 6)
Indexing Suggestion:
sql-- A covering index for catalog queries CREATE INDEX idx_products_catalog ON products(product_id, name, price);
Question 3: Column Aliases
Difficulty: Beginner | Topic: Aliases
Problem: Get product information with clearer column names for API response.
Solution:
sqlSELECT product_id AS id, name AS product_name, price AS unit_price, stock_quantity AS available_stock FROM products;
Explanation:
Aliases rename columns in the result set. Useful when:
- Column names are unclear (
sku→product_code) - Preparing data for specific API formats
- Using expressions (we'll see later)
Common Mistakes:
sql-- WRONG: Can't use alias in WHERE (in most databases) SELECT name AS product_name FROM products WHERE product_name = 'iPhone'; -- CORRECT: Use original column name in WHERE SELECT name AS product_name FROM products WHERE name = 'iPhone';
Question 4: Calculated Columns
Difficulty: Beginner | Topic: Expressions
Problem: Show products with their profit margin (price - cost_price).
Solution:
sqlSELECT product_id, name, price, cost_price, (price - cost_price) AS profit, ROUND((price - cost_price) / price * 100, 2) AS profit_margin_percent FROM products WHERE cost_price IS NOT NULL;
Explanation:
You can perform calculations in SELECT:
- Arithmetic:
+,-,*,/ - Functions:
ROUND(),ABS(), etc. - Give calculated columns meaningful aliases
Common Mistakes:
sql-- WRONG: Division by zero SELECT (price - cost_price) / cost_price FROM products; -- CORRECT: Handle zero SELECT CASE WHEN cost_price = 0 THEN 0 ELSE (price - cost_price) / cost_price END AS markup_ratio FROM products;
Performance Concerns:
- Calculations are performed for every row
- Cannot be indexed (unless using computed/generated columns)
- For heavy calculations, consider materialized views
Section 2: WHERE Clause
Question 5: Simple Equality Filter
Difficulty: Beginner | Topic: WHERE Basics
Problem: Find all orders with status 'pending'.
Solution:
sqlSELECT order_id, order_number, user_id, total_amount, created_at FROM orders WHERE status = 'pending';
Explanation:
WHERE filters rows before they're returned. Only rows matching the condition are included.Performance Concerns:
With 50 million orders, filtering without an index means scanning all rows.
Indexing Suggestion:
sql-- Index for status lookups CREATE INDEX idx_orders_status ON orders(status); -- Check how many 'pending' orders exist (cardinality matters!) SELECT status, COUNT(*) FROM orders GROUP BY status;
When Index Helps vs Hurts:
- If 1% of orders are 'pending' (500K rows) → Index helps a lot
- If 90% of orders are 'pending' → Full table scan might be faster
Question 6: Multiple Conditions with AND
Difficulty: Beginner | Topic: Compound Conditions
Problem: Find active products with price over $100.
Solution:
sqlSELECT product_id, name, price, stock_quantity FROM products WHERE status = 'active' AND price > 100;
Explanation:
AND requires both conditions to be true. Rows must be active AND have price > 100.EXPLAIN Analysis:
sqlEXPLAIN ANALYZE SELECT product_id, name, price, stock_quantity FROM products WHERE status = 'active' AND price > 100;
Look for:
Seq Scan→ No index used, scanning all rowsIndex Scan→ Using an indexBitmap Heap Scan→ Using index then fetching rows
Indexing Suggestion:
sql-- Composite index (order matters!) CREATE INDEX idx_products_status_price ON products(status, price);
Why This Order?
Put equality conditions first (
status = 'active'), then range conditions (price > 100). The index can efficiently filter by status, then scan the price range.Question 7: Multiple Conditions with OR
Difficulty: Beginner | Topic: OR Conditions
Problem: Find orders that are either shipped or delivered.
Solution:
sqlSELECT order_id, order_number, status, created_at FROM orders WHERE status = 'shipped' OR status = 'delivered'; -- Better alternative using IN SELECT order_id, order_number, status, created_at FROM orders WHERE status IN ('shipped', 'delivered');
Explanation:
OR requires at least one condition to be true. IN is cleaner for multiple values of the same column.Performance Concerns:
OR conditions can prevent index usage in some databases. The optimizer may need to do an "index merge" or fall back to full scan.Indexing Suggestion:
sql-- Single index works well with IN CREATE INDEX idx_orders_status ON orders(status);
Common Mistakes:
sql-- WRONG: Mixing columns with OR can be slow SELECT * FROM orders WHERE status = 'pending' OR user_id = 12345; -- BETTER: Use UNION for different column filters SELECT * FROM orders WHERE status = 'pending' UNION ALL SELECT * FROM orders WHERE user_id = 12345 AND status != 'pending';
Question 8: NOT Condition
Difficulty: Beginner | Topic: Negation
Problem: Find all products that are NOT discontinued.
Solution:
sql-- Using NOT SELECT product_id, name, status FROM products WHERE NOT status = 'discontinued'; -- Using != (same result) SELECT product_id, name, status FROM products WHERE status != 'discontinued'; -- Using <> (ANSI standard) SELECT product_id, name, status FROM products WHERE status <> 'discontinued';
Performance Concerns:
NOT and != often can't use indexes efficiently. If 5% of products are discontinued, you're fetching 95% of the table—might as well full scan.Better Alternative:
sql-- If you know the valid statuses SELECT product_id, name, status FROM products WHERE status IN ('active', 'out_of_stock');
When NOT to Use:
- Avoid
NOT INwith nullable columns (see Question 15) - Avoid
NOT LIKEon large tables
Question 9: BETWEEN for Range Queries
Difficulty: Beginner | Topic: Range Filtering
Problem: Find products priced between $50 and $200.
Solution:
sqlSELECT product_id, name, price FROM products WHERE price BETWEEN 50 AND 200; -- Equivalent to: SELECT product_id, name, price FROM products WHERE price >= 50 AND price <= 200;
Explanation:
BETWEEN is inclusive on both ends. $50 and $200 are included in results.Common Mistakes:
sql-- WRONG: Expecting BETWEEN to be exclusive -- This INCLUDES 50 and 200 -- If you want exclusive: SELECT * FROM products WHERE price > 50 AND price < 200;
Performance Concerns:
Range queries can use B-tree indexes efficiently.
Indexing Suggestion:
sqlCREATE INDEX idx_products_price ON products(price);
Date Range Example:
sql-- Orders from January 2024 SELECT * FROM orders WHERE created_at BETWEEN '2024-01-01' AND '2024-01-31 23:59:59'; -- Better for dates (explicit and avoids time issues): SELECT * FROM orders WHERE created_at >= '2024-01-01' AND created_at < '2024-02-01';
Question 10: LIKE for Pattern Matching
Difficulty: Beginner | Topic: Pattern Matching
Problem: Find products whose name contains 'phone'.
Solution:
sql-- Contains 'phone' anywhere SELECT product_id, name FROM products WHERE name LIKE '%phone%'; -- Starts with 'iPhone' SELECT product_id, name FROM products WHERE name LIKE 'iPhone%'; -- Case-insensitive (PostgreSQL) SELECT product_id, name FROM products WHERE name ILIKE '%phone%';
Explanation:
%matches any sequence of characters (including empty)_matches exactly one characterILIKEis case-insensitive (PostgreSQL specific)
Performance Concerns:
This is crucial to understand:
sql-- CAN use index (prefix search) WHERE name LIKE 'iPhone%' -- CANNOT use index (leading wildcard) WHERE name LIKE '%phone%'
Why? B-tree indexes are sorted. You can quickly find all entries starting with 'iPhone', but finding entries containing 'phone' anywhere requires scanning everything.
Solutions for Full-Text Search:
sql-- Create a full-text search index CREATE INDEX idx_products_name_gin ON products USING GIN (to_tsvector('english', name)); -- Use full-text search SELECT * FROM products WHERE to_tsvector('english', name) @@ to_tsquery('phone');
Question 11: IN Operator
Difficulty: Beginner | Topic: Set Membership
Problem: Find orders for specific users.
Solution:
sqlSELECT order_id, user_id, total_amount, created_at FROM orders WHERE user_id IN (1001, 1002, 1003, 1004, 1005);
Explanation:
IN checks if a value exists in a set. Cleaner than multiple OR conditions.Performance Concerns:
- Small lists (< 100 values): Index lookup for each value, then combine
- Large lists (> 1000 values): May degrade to full scan
When to Use JOIN Instead:
sql-- Instead of IN with many values SELECT o.* FROM orders o WHERE o.user_id IN (SELECT user_id FROM premium_users); -- Better: Use JOIN SELECT o.* FROM orders o INNER JOIN premium_users pu ON o.user_id = pu.user_id;
Indexing Suggestion:
sqlCREATE INDEX idx_orders_user_id ON orders(user_id);
Question 12: NULL Handling
Difficulty: Intermediate | Topic: NULL Semantics
Problem: Find products that don't have a cost price set.
Solution:
sql-- CORRECT SELECT product_id, name, price, cost_price FROM products WHERE cost_price IS NULL; -- WRONG (this returns no rows!) SELECT product_id, name, price, cost_price FROM products WHERE cost_price = NULL;
Explanation:
NULL represents "unknown" or "missing." In SQL:
NULL = NULLis not true (it's unknown!)NULL != NULLis not true either- Only
IS NULLandIS NOT NULLwork correctly
Common Mistakes:
sql-- WRONG: Trying to compare with NULL WHERE cost_price != NULL -- Returns nothing! -- WRONG: IN with NULL WHERE status IN ('active', NULL) -- NULL is ignored! -- CORRECT: Explicit NULL check WHERE status = 'active' OR status IS NULL
COALESCE for Default Values:
sql-- Show 0 when cost_price is NULL SELECT product_id, name, price, COALESCE(cost_price, 0) AS cost_price, price - COALESCE(cost_price, 0) AS profit FROM products;
Performance Concerns:
IS NULLcan use indexes (partial indexes work great)COALESCEin WHERE clause prevents index usage
Indexing Suggestion:
sql-- Partial index for NULL cost_price products CREATE INDEX idx_products_null_cost ON products(product_id) WHERE cost_price IS NULL;
Question 13: NULLIF Function
Difficulty: Intermediate | Topic: NULL Handling
Problem: Calculate average order value, treating zero-value orders as NULL (exclude from average).
Solution:
sqlSELECT AVG(NULLIF(total_amount, 0)) AS avg_order_value FROM orders;
Explanation:
NULLIF(a, b) returns NULL if a = b, otherwise returns a.Use cases:
- Exclude zero values from aggregations
- Prevent division by zero:
value / NULLIF(divisor, 0)
Common Mistakes:
sql-- Division by zero error SELECT profit / quantity FROM order_items; -- Safe division SELECT profit / NULLIF(quantity, 0) FROM order_items;
Section 3: Sorting and Limiting
Question 14: ORDER BY Basics
Difficulty: Beginner | Topic: Sorting
Problem: Get the most expensive products first.
Solution:
sqlSELECT product_id, name, price FROM products WHERE status = 'active' ORDER BY price DESC;
Explanation:
ORDER BY column ASC- ascending (default, smallest first)ORDER BY column DESC- descending (largest first)
Multiple Column Sorting:
sql-- Sort by category, then by price within each category SELECT product_id, name, category_id, price FROM products ORDER BY category_id ASC, price DESC;
Performance Concerns:
Sorting requires memory and CPU. For large result sets:
- Database loads matching rows
- Sorts them in memory (or spills to disk if too large)
- Returns sorted results
Indexing Suggestion:
sql-- Index can provide pre-sorted data CREATE INDEX idx_products_category_price ON products(category_id ASC, price DESC); -- Now this query uses the index order, no sorting needed: SELECT * FROM products WHERE category_id = 5 ORDER BY price DESC;
Question 15: LIMIT and OFFSET
Difficulty: Beginner | Topic: Pagination
Problem: Get the first 20 products for page 1 of a product listing.
Solution:
sql-- Page 1: First 20 products SELECT product_id, name, price FROM products WHERE status = 'active' ORDER BY created_at DESC LIMIT 20 OFFSET 0; -- Page 2: Next 20 products SELECT product_id, name, price FROM products WHERE status = 'active' ORDER BY created_at DESC LIMIT 20 OFFSET 20; -- Page N formula: OFFSET = (page_number - 1) * page_size
Performance Concerns:
OFFSET is a performance killer!
Page 1: OFFSET 0 → Read 20 rows Page 10: OFFSET 180 → Read 200 rows, discard 180 Page 100: OFFSET 1980 → Read 2000 rows, discard 1980
With deep pagination, you're reading and discarding thousands of rows.
Better Alternative: Keyset Pagination
sql-- Instead of OFFSET, remember the last value -- Page 1: SELECT product_id, name, price, created_at FROM products WHERE status = 'active' ORDER BY created_at DESC, product_id DESC LIMIT 20; -- Page 2 (using last row's values from page 1): SELECT product_id, name, price, created_at FROM products WHERE status = 'active' AND (created_at, product_id) < ('2024-01-15 10:30:00', 12345) ORDER BY created_at DESC, product_id DESC LIMIT 20;
Why Keyset is Better:
- Always reads exactly 20 rows
- Uses index efficiently
- Consistent performance regardless of page number
When to Use OFFSET:
- Small tables (< 10,000 rows)
- When users won't paginate deeply
- Admin interfaces with low traffic
When to Use Keyset:
- Large tables
- API endpoints
- Infinite scroll interfaces
- Any high-traffic pagination
Question 16: TOP N Queries
Difficulty: Intermediate | Topic: Top N Patterns
Problem: Find the 5 most expensive products in each category.
Solution (PostgreSQL):
sqlWITH ranked_products AS ( SELECT product_id, name, category_id, price, ROW_NUMBER() OVER (PARTITION BY category_id ORDER BY price DESC) AS rank FROM products WHERE status = 'active' ) SELECT * FROM ranked_products WHERE rank <= 5;
Explanation:
ROW_NUMBER()assigns a unique number to each rowPARTITION BY category_idrestarts numbering for each categoryORDER BY price DESCensures highest price gets rank 1- Filter
rank <= 5keeps only top 5 per category
Alternative with LATERAL (PostgreSQL):
sqlSELECT c.category_id, c.name AS category_name, p.* FROM categories c CROSS JOIN LATERAL ( SELECT product_id, name, price FROM products WHERE category_id = c.category_id AND status = 'active' ORDER BY price DESC LIMIT 5 ) p;
Performance Comparison:
- Window function: Scans all products, ranks them, filters
- LATERAL: For each category, efficiently gets top 5 using index
Indexing Suggestion:
sqlCREATE INDEX idx_products_category_price ON products(category_id, price DESC) WHERE status = 'active';
Question 17: DISTINCT
Difficulty: Beginner | Topic: Deduplication
Problem: Get all unique categories that have products.
Solution:
sqlSELECT DISTINCT category_id FROM products WHERE status = 'active';
Explanation:
DISTINCT removes duplicate rows from the result set.DISTINCT vs GROUP BY:
sql-- These are equivalent: SELECT DISTINCT category_id FROM products; SELECT category_id FROM products GROUP BY category_id;
But
GROUP BY allows aggregations, DISTINCT doesn't.Performance Concerns:
DISTINCT requires sorting or hashing to identify duplicates.For Counting Distinct Values:
sql-- How many unique customers ordered? SELECT COUNT(DISTINCT user_id) FROM orders;
Warning:
COUNT(DISTINCT) on large tables is expensive—it must track every unique value.Common Mistakes:
sql-- WRONG: DISTINCT applies to entire row, not just first column SELECT DISTINCT category_id, product_id FROM products; -- This returns all products (each product_id is unique!) -- What you probably wanted: SELECT DISTINCT category_id FROM products;
Section 4: Practical Filtering Patterns
Question 18: Date Filtering
Difficulty: Intermediate | Topic: Date Operations
Problem: Find all orders from the last 30 days.
Solution:
sql-- PostgreSQL SELECT order_id, user_id, total_amount, created_at FROM orders WHERE created_at >= CURRENT_DATE - INTERVAL '30 days'; -- MySQL SELECT order_id, user_id, total_amount, created_at FROM orders WHERE created_at >= DATE_SUB(CURDATE(), INTERVAL 30 DAY); -- SQL Server SELECT order_id, user_id, total_amount, created_at FROM orders WHERE created_at >= DATEADD(day, -30, GETDATE());
Common Mistakes:
sql-- WRONG: Function on indexed column prevents index usage WHERE DATE(created_at) = '2024-01-15' -- CORRECT: Use range WHERE created_at >= '2024-01-15' AND created_at < '2024-01-16' -- WRONG: String comparison issues WHERE created_at = '2024-01-15' -- Misses times! -- CORRECT: Cover the full day WHERE created_at >= '2024-01-15 00:00:00' AND created_at < '2024-01-16 00:00:00'
Indexing Suggestion:
sqlCREATE INDEX idx_orders_created_at ON orders(created_at);
Question 19: Status-Based Filtering
Difficulty: Intermediate | Topic: Workflow States
Problem: Find orders that need attention (pending payment or failed payment).
Solution:
sqlSELECT o.order_id, o.order_number, o.total_amount, o.status AS order_status, p.status AS payment_status, o.created_at FROM orders o LEFT JOIN payments p ON o.order_id = p.order_id WHERE o.status = 'pending' OR p.status = 'failed' ORDER BY o.created_at ASC;
Alternative with CASE:
sqlSELECT order_id, order_number, total_amount, status, CASE WHEN status = 'pending' THEN 'Awaiting Payment' WHEN status = 'paid' THEN 'Payment Received' WHEN status = 'shipped' THEN 'In Transit' WHEN status = 'delivered' THEN 'Completed' WHEN status = 'cancelled' THEN 'Cancelled' ELSE 'Unknown' END AS status_display FROM orders;
Indexing Suggestion:
sql-- Partial index for specific statuses CREATE INDEX idx_orders_pending ON orders(created_at) WHERE status = 'pending'; CREATE INDEX idx_payments_failed ON payments(order_id) WHERE status = 'failed';
Question 20: Price Range Filters
Difficulty: Intermediate | Topic: E-commerce Filters
Problem: Implement product filtering by price range for a shopping site.
Solution:
sql-- With dynamic price ranges (passed as parameters) SELECT product_id, name, price, category_id FROM products WHERE status = 'active' AND price >= :min_price -- Parameter AND price <= :max_price -- Parameter ORDER BY price ASC; -- With predefined ranges SELECT CASE WHEN price < 25 THEN 'Under $25' WHEN price < 50 THEN '$25 - $50' WHEN price < 100 THEN '$50 - $100' WHEN price < 200 THEN '$100 - $200' ELSE '$200+' END AS price_range, COUNT(*) AS product_count FROM products WHERE status = 'active' GROUP BY 1 ORDER BY MIN(price);
Building Dynamic Filter Counts:
sql-- Show how many products in each price range SELECT SUM(CASE WHEN price < 25 THEN 1 ELSE 0 END) AS under_25, SUM(CASE WHEN price >= 25 AND price < 50 THEN 1 ELSE 0 END) AS range_25_50, SUM(CASE WHEN price >= 50 AND price < 100 THEN 1 ELSE 0 END) AS range_50_100, SUM(CASE WHEN price >= 100 THEN 1 ELSE 0 END) AS over_100 FROM products WHERE status = 'active' AND category_id = :category_id;
Question 21: Multi-Filter Search
Difficulty: Intermediate | Topic: Dynamic Filtering
Problem: Build a product search that handles multiple optional filters.
Solution:
sql-- The flexible approach SELECT product_id, name, price, category_id, brand_id FROM products WHERE status = 'active' AND (:category_id IS NULL OR category_id = :category_id) AND (:brand_id IS NULL OR brand_id = :brand_id) AND (:min_price IS NULL OR price >= :min_price) AND (:max_price IS NULL OR price <= :max_price) AND (:search_term IS NULL OR name ILIKE '%' || :search_term || '%') ORDER BY created_at DESC LIMIT 20;
Performance Warning:
This pattern is convenient but problematic:
ORconditions prevent efficient index usage- Query plan is chosen based on statistics, not actual parameters
Better Approach: Build Query Dynamically
go// Application code (Go example) query := "SELECT product_id, name, price FROM products WHERE status = 'active'" params := []interface{}{} if categoryID != nil { query += " AND category_id = $" + strconv.Itoa(len(params)+1) params = append(params, categoryID) } if minPrice != nil { query += " AND price >= $" + strconv.Itoa(len(params)+1) params = append(params, minPrice) } // ... etc
Question 22: Existence Check
Difficulty: Intermediate | Topic: Conditional Logic
Problem: Find users who have placed at least one order.
Solution:
sql-- Method 1: EXISTS (usually fastest) SELECT u.user_id, u.email, u.first_name FROM users u WHERE EXISTS ( SELECT 1 FROM orders o WHERE o.user_id = u.user_id ); -- Method 2: IN SELECT user_id, email, first_name FROM users WHERE user_id IN (SELECT user_id FROM orders); -- Method 3: JOIN (may have duplicates!) SELECT DISTINCT u.user_id, u.email, u.first_name FROM users u INNER JOIN orders o ON u.user_id = o.user_id;
Performance Comparison:
EXISTSstops at first match (efficient!)INbuilds a list then checks membershipJOINmay produce duplicates, needsDISTINCT
When to Use Each:
| Pattern | Best For |
|---|---|
| EXISTS | Large subquery results, just checking existence |
| IN | Small list of values, readable code |
| JOIN | When you need data from both tables |
Question 23: Non-Existence Check
Difficulty: Intermediate | Topic: Anti-Patterns
Problem: Find users who have never placed an order.
Solution:
sql-- Method 1: NOT EXISTS (safest and often fastest) SELECT u.user_id, u.email, u.first_name FROM users u WHERE NOT EXISTS ( SELECT 1 FROM orders o WHERE o.user_id = u.user_id ); -- Method 2: LEFT JOIN with NULL check (anti-join) SELECT u.user_id, u.email, u.first_name FROM users u LEFT JOIN orders o ON u.user_id = o.user_id WHERE o.order_id IS NULL; -- Method 3: NOT IN (DANGEROUS with NULLs!) SELECT user_id, email, first_name FROM users WHERE user_id NOT IN (SELECT user_id FROM orders);
Critical Warning about NOT IN:
sql-- If orders.user_id contains any NULL values: SELECT * FROM users WHERE user_id NOT IN (SELECT user_id FROM orders); -- Returns ZERO rows! Because NOT IN with NULL is always UNKNOWN. -- Safe version: SELECT * FROM users WHERE user_id NOT IN (SELECT user_id FROM orders WHERE user_id IS NOT NULL);
Performance:
NOT EXISTSandLEFT JOINanti-patterns perform similarly- Modern optimizers often convert between them
NOT INcan be slower and has NULL issues
Question 24: Conditional Aggregation
Difficulty: Intermediate | Topic: Filtered Counts
Problem: Count orders by status in a single query.
Solution:
sqlSELECT COUNT(*) AS total_orders, COUNT(CASE WHEN status = 'pending' THEN 1 END) AS pending_orders, COUNT(CASE WHEN status = 'paid' THEN 1 END) AS paid_orders, COUNT(CASE WHEN status = 'shipped' THEN 1 END) AS shipped_orders, COUNT(CASE WHEN status = 'delivered' THEN 1 END) AS delivered_orders, COUNT(CASE WHEN status = 'cancelled' THEN 1 END) AS cancelled_orders FROM orders WHERE created_at >= CURRENT_DATE - INTERVAL '30 days'; -- PostgreSQL FILTER syntax (cleaner) SELECT COUNT(*) AS total_orders, COUNT(*) FILTER (WHERE status = 'pending') AS pending_orders, COUNT(*) FILTER (WHERE status = 'paid') AS paid_orders, COUNT(*) FILTER (WHERE status = 'shipped') AS shipped_orders, COUNT(*) FILTER (WHERE status = 'delivered') AS delivered_orders, COUNT(*) FILTER (WHERE status = 'cancelled') AS cancelled_orders FROM orders WHERE created_at >= CURRENT_DATE - INTERVAL '30 days';
Why This is Better Than Multiple Queries:
sql-- BAD: 6 queries hitting the same table SELECT COUNT(*) FROM orders WHERE created_at >= ...; SELECT COUNT(*) FROM orders WHERE created_at >= ... AND status = 'pending'; SELECT COUNT(*) FROM orders WHERE created_at >= ... AND status = 'paid'; -- ... 4 more queries -- GOOD: 1 query, 1 table scan SELECT COUNT(*), COUNT(CASE WHEN status = 'pending' THEN 1 END), ... FROM orders WHERE created_at >= ...;
Question 25: Complex Boolean Logic
Difficulty: Intermediate | Topic: Query Logic
Problem: Find high-value customers who either: (a) have spent over $1000 total, OR (b) have placed more than 10 orders in the last year.
Solution:
sqlSELECT u.user_id, u.email, u.first_name, u.last_name, COALESCE(o.total_spent, 0) AS total_spent, COALESCE(o.order_count, 0) AS order_count FROM users u LEFT JOIN ( SELECT user_id, SUM(total_amount) AS total_spent, COUNT(*) AS order_count FROM orders WHERE created_at >= CURRENT_DATE - INTERVAL '1 year' AND status NOT IN ('cancelled', 'refunded') GROUP BY user_id ) o ON u.user_id = o.user_id WHERE o.total_spent > 1000 OR o.order_count > 10;
Breaking Down the Logic:
- Subquery calculates spending and order count per user
LEFT JOINkeeps all users (we filter later)WHEREapplies the high-value criteriaCOALESCEhandles users with no orders
Indexing Suggestion:
sqlCREATE INDEX idx_orders_user_created ON orders(user_id, created_at) WHERE status NOT IN ('cancelled', 'refunded');
Summary: Part 1 Key Takeaways
What We Covered
- SELECT basics - Always specify columns, avoid SELECT *
- WHERE clauses - AND, OR, NOT, BETWEEN, LIKE, IN
- NULL handling - IS NULL, COALESCE, NULLIF
- Sorting - ORDER BY with indexes
- Pagination - LIMIT/OFFSET vs Keyset
Golden Rules
| Rule | Why |
|---|---|
| Never use SELECT * | Network overhead, schema changes, prevents covering indexes |
| Always filter on indexed columns | Avoid full table scans |
| Beware of OR | Can prevent index usage |
| Handle NULLs explicitly | NULL comparisons are tricky |
| Use keyset pagination | OFFSET is slow for deep pages |
| Test with production data volumes | Queries that work with 1000 rows may fail with 10 million |
Indexing Checklist
- Index columns used in WHERE clauses
- Index columns used in JOIN conditions
- Index columns used in ORDER BY
- Consider composite indexes for multi-column filters
- Use partial indexes for filtered queries
- Check EXPLAIN plans to verify index usage
Coming in Part 2: Keys, Constraints, and Normalization
- Primary keys: Natural vs Surrogate
- Foreign keys: When to use, when to skip
- Unique constraints: Partial uniqueness
- Composite keys: Design patterns
- Normalization: 1NF through 3NF with examples
- Denormalization: When to break the rules
This is Part 1 of the SQL Mastery Guide series. Each part builds on the previous, progressing from foundations to advanced optimization techniques.