SQL Mastery Guide Part 1: Foundations

Introduction: How to Use This Guide

After 20+ years of designing databases for fintech, e-commerce, and high-scale platforms, I've seen thousands of engineers struggle with SQL—not because it's hard, but because they learned it wrong.
Most tutorials teach you SELECT * FROM table and call it a day. Real production databases have millions of rows. That SELECT * would bring your server to its knees.
This guide is different. Every query comes with:
  • The problem to solve
  • The solution
  • Why it works
  • What can go wrong
  • How to make it fast
  • When NOT to use this pattern
Practice Schema: E-Commerce Platform
We'll use this schema throughout the series:
sql
-- Users table: 10 million users CREATE TABLE users ( user_id BIGSERIAL PRIMARY KEY, email VARCHAR(255) NOT NULL UNIQUE, username VARCHAR(50) NOT NULL UNIQUE, password_hash VARCHAR(255) NOT NULL, first_name VARCHAR(100), last_name VARCHAR(100), phone VARCHAR(20), status VARCHAR(20) DEFAULT 'active', -- active, suspended, deleted email_verified BOOLEAN DEFAULT FALSE, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, last_login_at TIMESTAMP, login_count INT DEFAULT 0 ); -- Products table: 500,000 products CREATE TABLE products ( product_id BIGSERIAL PRIMARY KEY, sku VARCHAR(50) NOT NULL UNIQUE, name VARCHAR(255) NOT NULL, description TEXT, category_id INT REFERENCES categories(category_id), brand_id INT REFERENCES brands(brand_id), price DECIMAL(10,2) NOT NULL, cost_price DECIMAL(10,2), stock_quantity INT DEFAULT 0, status VARCHAR(20) DEFAULT 'active', -- active, discontinued, out_of_stock weight_kg DECIMAL(8,3), created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ); -- Orders table: 50 million orders CREATE TABLE orders ( order_id BIGSERIAL PRIMARY KEY, order_number VARCHAR(20) NOT NULL UNIQUE, user_id BIGINT NOT NULL REFERENCES users(user_id), status VARCHAR(20) NOT NULL, -- pending, paid, shipped, delivered, cancelled, refunded subtotal DECIMAL(12,2) NOT NULL, tax_amount DECIMAL(12,2) DEFAULT 0, shipping_amount DECIMAL(12,2) DEFAULT 0, discount_amount DECIMAL(12,2) DEFAULT 0, total_amount DECIMAL(12,2) NOT NULL, currency CHAR(3) DEFAULT 'USD', shipping_address_id BIGINT, billing_address_id BIGINT, payment_method VARCHAR(50), paid_at TIMESTAMP, shipped_at TIMESTAMP, delivered_at TIMESTAMP, cancelled_at TIMESTAMP, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ); -- Order Items table: 150 million items CREATE TABLE order_items ( item_id BIGSERIAL PRIMARY KEY, order_id BIGINT NOT NULL REFERENCES orders(order_id), product_id BIGINT NOT NULL REFERENCES products(product_id), quantity INT NOT NULL, unit_price DECIMAL(10,2) NOT NULL, total_price DECIMAL(12,2) NOT NULL, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ); -- Categories table: 500 categories CREATE TABLE categories ( category_id SERIAL PRIMARY KEY, name VARCHAR(100) NOT NULL, parent_id INT REFERENCES categories(category_id), level INT DEFAULT 1, path VARCHAR(500), -- e.g., '/electronics/phones/smartphones' is_active BOOLEAN DEFAULT TRUE ); -- Payments table: 40 million payments CREATE TABLE payments ( payment_id BIGSERIAL PRIMARY KEY, order_id BIGINT NOT NULL REFERENCES orders(order_id), amount DECIMAL(12,2) NOT NULL, currency CHAR(3) DEFAULT 'USD', method VARCHAR(50) NOT NULL, -- credit_card, debit_card, upi, wallet status VARCHAR(20) NOT NULL, -- pending, completed, failed, refunded gateway_ref VARCHAR(100), error_message TEXT, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, processed_at TIMESTAMP );

Section 1: SELECT Basics

Question 1: Select All Columns

Difficulty: Beginner | Topic: Basic SELECT
Problem: Retrieve all information about all products.
Solution:
sql
-- The naive approach (AVOID in production) SELECT * FROM products; -- The correct approach SELECT product_id, sku, name, description, category_id, brand_id, price, cost_price, stock_quantity, status, weight_kg, created_at, updated_at FROM products;
Explanation: SELECT * fetches every column. While convenient for exploration, it's problematic in production:
  • Fetches unnecessary data (like large TEXT descriptions)
  • Breaks when schema changes (new columns appear unexpectedly)
  • Prevents covering indexes from being used
Common Mistakes:
  • Using SELECT * in application code
  • Not considering column order for readability
Performance Concerns:
  • With 500,000 products and a TEXT description column, SELECT * might transfer gigabytes of data
  • Network becomes the bottleneck
When NOT to Use:
  • Never use SELECT * in production application code
  • Acceptable only for ad-hoc exploration in a SQL client

Question 2: Select Specific Columns

Difficulty: Beginner | Topic: Column Selection
Problem: Get product names and prices for a product catalog page.
Solution:
sql
SELECT product_id, name, price FROM products;
Explanation: Only select what you need. For a catalog page, you need ID (for links), name, and price. Not description, not stock, not dates.
Performance Concerns:
  • Fewer columns = less data transfer
  • Enables use of covering indexes (we'll cover this in Part 6)
Indexing Suggestion:
sql
-- A covering index for catalog queries CREATE INDEX idx_products_catalog ON products(product_id, name, price);

Question 3: Column Aliases

Difficulty: Beginner | Topic: Aliases
Problem: Get product information with clearer column names for API response.
Solution:
sql
SELECT product_id AS id, name AS product_name, price AS unit_price, stock_quantity AS available_stock FROM products;
Explanation: Aliases rename columns in the result set. Useful when:
  • Column names are unclear (skuproduct_code)
  • Preparing data for specific API formats
  • Using expressions (we'll see later)
Common Mistakes:
sql
-- WRONG: Can't use alias in WHERE (in most databases) SELECT name AS product_name FROM products WHERE product_name = 'iPhone'; -- CORRECT: Use original column name in WHERE SELECT name AS product_name FROM products WHERE name = 'iPhone';

Question 4: Calculated Columns

Difficulty: Beginner | Topic: Expressions
Problem: Show products with their profit margin (price - cost_price).
Solution:
sql
SELECT product_id, name, price, cost_price, (price - cost_price) AS profit, ROUND((price - cost_price) / price * 100, 2) AS profit_margin_percent FROM products WHERE cost_price IS NOT NULL;
Explanation: You can perform calculations in SELECT:
  • Arithmetic: +, -, *, /
  • Functions: ROUND(), ABS(), etc.
  • Give calculated columns meaningful aliases
Common Mistakes:
sql
-- WRONG: Division by zero SELECT (price - cost_price) / cost_price FROM products; -- CORRECT: Handle zero SELECT CASE WHEN cost_price = 0 THEN 0 ELSE (price - cost_price) / cost_price END AS markup_ratio FROM products;
Performance Concerns:
  • Calculations are performed for every row
  • Cannot be indexed (unless using computed/generated columns)
  • For heavy calculations, consider materialized views

Section 2: WHERE Clause

Question 5: Simple Equality Filter

Difficulty: Beginner | Topic: WHERE Basics
Problem: Find all orders with status 'pending'.
Solution:
sql
SELECT order_id, order_number, user_id, total_amount, created_at FROM orders WHERE status = 'pending';
Explanation: WHERE filters rows before they're returned. Only rows matching the condition are included.
Performance Concerns: With 50 million orders, filtering without an index means scanning all rows.
Indexing Suggestion:
sql
-- Index for status lookups CREATE INDEX idx_orders_status ON orders(status); -- Check how many 'pending' orders exist (cardinality matters!) SELECT status, COUNT(*) FROM orders GROUP BY status;
When Index Helps vs Hurts:
  • If 1% of orders are 'pending' (500K rows) → Index helps a lot
  • If 90% of orders are 'pending' → Full table scan might be faster

Question 6: Multiple Conditions with AND

Difficulty: Beginner | Topic: Compound Conditions
Problem: Find active products with price over $100.
Solution:
sql
SELECT product_id, name, price, stock_quantity FROM products WHERE status = 'active' AND price > 100;
Explanation: AND requires both conditions to be true. Rows must be active AND have price > 100.
EXPLAIN Analysis:
sql
EXPLAIN ANALYZE SELECT product_id, name, price, stock_quantity FROM products WHERE status = 'active' AND price > 100;
Look for:
  • Seq Scan → No index used, scanning all rows
  • Index Scan → Using an index
  • Bitmap Heap Scan → Using index then fetching rows
Indexing Suggestion:
sql
-- Composite index (order matters!) CREATE INDEX idx_products_status_price ON products(status, price);
Why This Order? Put equality conditions first (status = 'active'), then range conditions (price > 100). The index can efficiently filter by status, then scan the price range.

Question 7: Multiple Conditions with OR

Difficulty: Beginner | Topic: OR Conditions
Problem: Find orders that are either shipped or delivered.
Solution:
sql
SELECT order_id, order_number, status, created_at FROM orders WHERE status = 'shipped' OR status = 'delivered'; -- Better alternative using IN SELECT order_id, order_number, status, created_at FROM orders WHERE status IN ('shipped', 'delivered');
Explanation: OR requires at least one condition to be true. IN is cleaner for multiple values of the same column.
Performance Concerns: OR conditions can prevent index usage in some databases. The optimizer may need to do an "index merge" or fall back to full scan.
Indexing Suggestion:
sql
-- Single index works well with IN CREATE INDEX idx_orders_status ON orders(status);
Common Mistakes:
sql
-- WRONG: Mixing columns with OR can be slow SELECT * FROM orders WHERE status = 'pending' OR user_id = 12345; -- BETTER: Use UNION for different column filters SELECT * FROM orders WHERE status = 'pending' UNION ALL SELECT * FROM orders WHERE user_id = 12345 AND status != 'pending';

Question 8: NOT Condition

Difficulty: Beginner | Topic: Negation
Problem: Find all products that are NOT discontinued.
Solution:
sql
-- Using NOT SELECT product_id, name, status FROM products WHERE NOT status = 'discontinued'; -- Using != (same result) SELECT product_id, name, status FROM products WHERE status != 'discontinued'; -- Using <> (ANSI standard) SELECT product_id, name, status FROM products WHERE status <> 'discontinued';
Performance Concerns: NOT and != often can't use indexes efficiently. If 5% of products are discontinued, you're fetching 95% of the table—might as well full scan.
Better Alternative:
sql
-- If you know the valid statuses SELECT product_id, name, status FROM products WHERE status IN ('active', 'out_of_stock');
When NOT to Use:
  • Avoid NOT IN with nullable columns (see Question 15)
  • Avoid NOT LIKE on large tables

Question 9: BETWEEN for Range Queries

Difficulty: Beginner | Topic: Range Filtering
Problem: Find products priced between $50 and $200.
Solution:
sql
SELECT product_id, name, price FROM products WHERE price BETWEEN 50 AND 200; -- Equivalent to: SELECT product_id, name, price FROM products WHERE price >= 50 AND price <= 200;
Explanation: BETWEEN is inclusive on both ends. $50 and $200 are included in results.
Common Mistakes:
sql
-- WRONG: Expecting BETWEEN to be exclusive -- This INCLUDES 50 and 200 -- If you want exclusive: SELECT * FROM products WHERE price > 50 AND price < 200;
Performance Concerns: Range queries can use B-tree indexes efficiently.
Indexing Suggestion:
sql
CREATE INDEX idx_products_price ON products(price);
Date Range Example:
sql
-- Orders from January 2024 SELECT * FROM orders WHERE created_at BETWEEN '2024-01-01' AND '2024-01-31 23:59:59'; -- Better for dates (explicit and avoids time issues): SELECT * FROM orders WHERE created_at >= '2024-01-01' AND created_at < '2024-02-01';

Question 10: LIKE for Pattern Matching

Difficulty: Beginner | Topic: Pattern Matching
Problem: Find products whose name contains 'phone'.
Solution:
sql
-- Contains 'phone' anywhere SELECT product_id, name FROM products WHERE name LIKE '%phone%'; -- Starts with 'iPhone' SELECT product_id, name FROM products WHERE name LIKE 'iPhone%'; -- Case-insensitive (PostgreSQL) SELECT product_id, name FROM products WHERE name ILIKE '%phone%';
Explanation:
  • % matches any sequence of characters (including empty)
  • _ matches exactly one character
  • ILIKE is case-insensitive (PostgreSQL specific)
Performance Concerns: This is crucial to understand:
sql
-- CAN use index (prefix search) WHERE name LIKE 'iPhone%' -- CANNOT use index (leading wildcard) WHERE name LIKE '%phone%'
Why? B-tree indexes are sorted. You can quickly find all entries starting with 'iPhone', but finding entries containing 'phone' anywhere requires scanning everything.
Solutions for Full-Text Search:
sql
-- Create a full-text search index CREATE INDEX idx_products_name_gin ON products USING GIN (to_tsvector('english', name)); -- Use full-text search SELECT * FROM products WHERE to_tsvector('english', name) @@ to_tsquery('phone');

Question 11: IN Operator

Difficulty: Beginner | Topic: Set Membership
Problem: Find orders for specific users.
Solution:
sql
SELECT order_id, user_id, total_amount, created_at FROM orders WHERE user_id IN (1001, 1002, 1003, 1004, 1005);
Explanation: IN checks if a value exists in a set. Cleaner than multiple OR conditions.
Performance Concerns:
  • Small lists (< 100 values): Index lookup for each value, then combine
  • Large lists (> 1000 values): May degrade to full scan
When to Use JOIN Instead:
sql
-- Instead of IN with many values SELECT o.* FROM orders o WHERE o.user_id IN (SELECT user_id FROM premium_users); -- Better: Use JOIN SELECT o.* FROM orders o INNER JOIN premium_users pu ON o.user_id = pu.user_id;
Indexing Suggestion:
sql
CREATE INDEX idx_orders_user_id ON orders(user_id);

Question 12: NULL Handling

Difficulty: Intermediate | Topic: NULL Semantics
Problem: Find products that don't have a cost price set.
Solution:
sql
-- CORRECT SELECT product_id, name, price, cost_price FROM products WHERE cost_price IS NULL; -- WRONG (this returns no rows!) SELECT product_id, name, price, cost_price FROM products WHERE cost_price = NULL;
Explanation: NULL represents "unknown" or "missing." In SQL:
  • NULL = NULL is not true (it's unknown!)
  • NULL != NULL is not true either
  • Only IS NULL and IS NOT NULL work correctly
Common Mistakes:
sql
-- WRONG: Trying to compare with NULL WHERE cost_price != NULL -- Returns nothing! -- WRONG: IN with NULL WHERE status IN ('active', NULL) -- NULL is ignored! -- CORRECT: Explicit NULL check WHERE status = 'active' OR status IS NULL
COALESCE for Default Values:
sql
-- Show 0 when cost_price is NULL SELECT product_id, name, price, COALESCE(cost_price, 0) AS cost_price, price - COALESCE(cost_price, 0) AS profit FROM products;
Performance Concerns:
  • IS NULL can use indexes (partial indexes work great)
  • COALESCE in WHERE clause prevents index usage
Indexing Suggestion:
sql
-- Partial index for NULL cost_price products CREATE INDEX idx_products_null_cost ON products(product_id) WHERE cost_price IS NULL;

Question 13: NULLIF Function

Difficulty: Intermediate | Topic: NULL Handling
Problem: Calculate average order value, treating zero-value orders as NULL (exclude from average).
Solution:
sql
SELECT AVG(NULLIF(total_amount, 0)) AS avg_order_value FROM orders;
Explanation: NULLIF(a, b) returns NULL if a = b, otherwise returns a.
Use cases:
  • Exclude zero values from aggregations
  • Prevent division by zero: value / NULLIF(divisor, 0)
Common Mistakes:
sql
-- Division by zero error SELECT profit / quantity FROM order_items; -- Safe division SELECT profit / NULLIF(quantity, 0) FROM order_items;

Section 3: Sorting and Limiting

Question 14: ORDER BY Basics

Difficulty: Beginner | Topic: Sorting
Problem: Get the most expensive products first.
Solution:
sql
SELECT product_id, name, price FROM products WHERE status = 'active' ORDER BY price DESC;
Explanation:
  • ORDER BY column ASC - ascending (default, smallest first)
  • ORDER BY column DESC - descending (largest first)
Multiple Column Sorting:
sql
-- Sort by category, then by price within each category SELECT product_id, name, category_id, price FROM products ORDER BY category_id ASC, price DESC;
Performance Concerns: Sorting requires memory and CPU. For large result sets:
  1. Database loads matching rows
  2. Sorts them in memory (or spills to disk if too large)
  3. Returns sorted results
Indexing Suggestion:
sql
-- Index can provide pre-sorted data CREATE INDEX idx_products_category_price ON products(category_id ASC, price DESC); -- Now this query uses the index order, no sorting needed: SELECT * FROM products WHERE category_id = 5 ORDER BY price DESC;

Question 15: LIMIT and OFFSET

Difficulty: Beginner | Topic: Pagination
Problem: Get the first 20 products for page 1 of a product listing.
Solution:
sql
-- Page 1: First 20 products SELECT product_id, name, price FROM products WHERE status = 'active' ORDER BY created_at DESC LIMIT 20 OFFSET 0; -- Page 2: Next 20 products SELECT product_id, name, price FROM products WHERE status = 'active' ORDER BY created_at DESC LIMIT 20 OFFSET 20; -- Page N formula: OFFSET = (page_number - 1) * page_size
Performance Concerns: OFFSET is a performance killer!
Page 1: OFFSET 0 → Read 20 rows Page 10: OFFSET 180 → Read 200 rows, discard 180 Page 100: OFFSET 1980 → Read 2000 rows, discard 1980
With deep pagination, you're reading and discarding thousands of rows.
Better Alternative: Keyset Pagination
sql
-- Instead of OFFSET, remember the last value -- Page 1: SELECT product_id, name, price, created_at FROM products WHERE status = 'active' ORDER BY created_at DESC, product_id DESC LIMIT 20; -- Page 2 (using last row's values from page 1): SELECT product_id, name, price, created_at FROM products WHERE status = 'active' AND (created_at, product_id) < ('2024-01-15 10:30:00', 12345) ORDER BY created_at DESC, product_id DESC LIMIT 20;
Why Keyset is Better:
  • Always reads exactly 20 rows
  • Uses index efficiently
  • Consistent performance regardless of page number
When to Use OFFSET:
  • Small tables (< 10,000 rows)
  • When users won't paginate deeply
  • Admin interfaces with low traffic
When to Use Keyset:
  • Large tables
  • API endpoints
  • Infinite scroll interfaces
  • Any high-traffic pagination

Question 16: TOP N Queries

Difficulty: Intermediate | Topic: Top N Patterns
Problem: Find the 5 most expensive products in each category.
Solution (PostgreSQL):
sql
WITH ranked_products AS ( SELECT product_id, name, category_id, price, ROW_NUMBER() OVER (PARTITION BY category_id ORDER BY price DESC) AS rank FROM products WHERE status = 'active' ) SELECT * FROM ranked_products WHERE rank <= 5;
Explanation:
  1. ROW_NUMBER() assigns a unique number to each row
  2. PARTITION BY category_id restarts numbering for each category
  3. ORDER BY price DESC ensures highest price gets rank 1
  4. Filter rank <= 5 keeps only top 5 per category
Alternative with LATERAL (PostgreSQL):
sql
SELECT c.category_id, c.name AS category_name, p.* FROM categories c CROSS JOIN LATERAL ( SELECT product_id, name, price FROM products WHERE category_id = c.category_id AND status = 'active' ORDER BY price DESC LIMIT 5 ) p;
Performance Comparison:
  • Window function: Scans all products, ranks them, filters
  • LATERAL: For each category, efficiently gets top 5 using index
Indexing Suggestion:
sql
CREATE INDEX idx_products_category_price ON products(category_id, price DESC) WHERE status = 'active';

Question 17: DISTINCT

Difficulty: Beginner | Topic: Deduplication
Problem: Get all unique categories that have products.
Solution:
sql
SELECT DISTINCT category_id FROM products WHERE status = 'active';
Explanation: DISTINCT removes duplicate rows from the result set.
DISTINCT vs GROUP BY:
sql
-- These are equivalent: SELECT DISTINCT category_id FROM products; SELECT category_id FROM products GROUP BY category_id;
But GROUP BY allows aggregations, DISTINCT doesn't.
Performance Concerns: DISTINCT requires sorting or hashing to identify duplicates.
For Counting Distinct Values:
sql
-- How many unique customers ordered? SELECT COUNT(DISTINCT user_id) FROM orders;
Warning: COUNT(DISTINCT) on large tables is expensive—it must track every unique value.
Common Mistakes:
sql
-- WRONG: DISTINCT applies to entire row, not just first column SELECT DISTINCT category_id, product_id FROM products; -- This returns all products (each product_id is unique!) -- What you probably wanted: SELECT DISTINCT category_id FROM products;

Section 4: Practical Filtering Patterns

Question 18: Date Filtering

Difficulty: Intermediate | Topic: Date Operations
Problem: Find all orders from the last 30 days.
Solution:
sql
-- PostgreSQL SELECT order_id, user_id, total_amount, created_at FROM orders WHERE created_at >= CURRENT_DATE - INTERVAL '30 days'; -- MySQL SELECT order_id, user_id, total_amount, created_at FROM orders WHERE created_at >= DATE_SUB(CURDATE(), INTERVAL 30 DAY); -- SQL Server SELECT order_id, user_id, total_amount, created_at FROM orders WHERE created_at >= DATEADD(day, -30, GETDATE());
Common Mistakes:
sql
-- WRONG: Function on indexed column prevents index usage WHERE DATE(created_at) = '2024-01-15' -- CORRECT: Use range WHERE created_at >= '2024-01-15' AND created_at < '2024-01-16' -- WRONG: String comparison issues WHERE created_at = '2024-01-15' -- Misses times! -- CORRECT: Cover the full day WHERE created_at >= '2024-01-15 00:00:00' AND created_at < '2024-01-16 00:00:00'
Indexing Suggestion:
sql
CREATE INDEX idx_orders_created_at ON orders(created_at);

Question 19: Status-Based Filtering

Difficulty: Intermediate | Topic: Workflow States
Problem: Find orders that need attention (pending payment or failed payment).
Solution:
sql
SELECT o.order_id, o.order_number, o.total_amount, o.status AS order_status, p.status AS payment_status, o.created_at FROM orders o LEFT JOIN payments p ON o.order_id = p.order_id WHERE o.status = 'pending' OR p.status = 'failed' ORDER BY o.created_at ASC;
Alternative with CASE:
sql
SELECT order_id, order_number, total_amount, status, CASE WHEN status = 'pending' THEN 'Awaiting Payment' WHEN status = 'paid' THEN 'Payment Received' WHEN status = 'shipped' THEN 'In Transit' WHEN status = 'delivered' THEN 'Completed' WHEN status = 'cancelled' THEN 'Cancelled' ELSE 'Unknown' END AS status_display FROM orders;
Indexing Suggestion:
sql
-- Partial index for specific statuses CREATE INDEX idx_orders_pending ON orders(created_at) WHERE status = 'pending'; CREATE INDEX idx_payments_failed ON payments(order_id) WHERE status = 'failed';

Question 20: Price Range Filters

Difficulty: Intermediate | Topic: E-commerce Filters
Problem: Implement product filtering by price range for a shopping site.
Solution:
sql
-- With dynamic price ranges (passed as parameters) SELECT product_id, name, price, category_id FROM products WHERE status = 'active' AND price >= :min_price -- Parameter AND price <= :max_price -- Parameter ORDER BY price ASC; -- With predefined ranges SELECT CASE WHEN price < 25 THEN 'Under $25' WHEN price < 50 THEN '$25 - $50' WHEN price < 100 THEN '$50 - $100' WHEN price < 200 THEN '$100 - $200' ELSE '$200+' END AS price_range, COUNT(*) AS product_count FROM products WHERE status = 'active' GROUP BY 1 ORDER BY MIN(price);
Building Dynamic Filter Counts:
sql
-- Show how many products in each price range SELECT SUM(CASE WHEN price < 25 THEN 1 ELSE 0 END) AS under_25, SUM(CASE WHEN price >= 25 AND price < 50 THEN 1 ELSE 0 END) AS range_25_50, SUM(CASE WHEN price >= 50 AND price < 100 THEN 1 ELSE 0 END) AS range_50_100, SUM(CASE WHEN price >= 100 THEN 1 ELSE 0 END) AS over_100 FROM products WHERE status = 'active' AND category_id = :category_id;

Difficulty: Intermediate | Topic: Dynamic Filtering
Problem: Build a product search that handles multiple optional filters.
Solution:
sql
-- The flexible approach SELECT product_id, name, price, category_id, brand_id FROM products WHERE status = 'active' AND (:category_id IS NULL OR category_id = :category_id) AND (:brand_id IS NULL OR brand_id = :brand_id) AND (:min_price IS NULL OR price >= :min_price) AND (:max_price IS NULL OR price <= :max_price) AND (:search_term IS NULL OR name ILIKE '%' || :search_term || '%') ORDER BY created_at DESC LIMIT 20;
Performance Warning: This pattern is convenient but problematic:
  • OR conditions prevent efficient index usage
  • Query plan is chosen based on statistics, not actual parameters
Better Approach: Build Query Dynamically
go
// Application code (Go example) query := "SELECT product_id, name, price FROM products WHERE status = 'active'" params := []interface{}{} if categoryID != nil { query += " AND category_id = $" + strconv.Itoa(len(params)+1) params = append(params, categoryID) } if minPrice != nil { query += " AND price >= $" + strconv.Itoa(len(params)+1) params = append(params, minPrice) } // ... etc

Question 22: Existence Check

Difficulty: Intermediate | Topic: Conditional Logic
Problem: Find users who have placed at least one order.
Solution:
sql
-- Method 1: EXISTS (usually fastest) SELECT u.user_id, u.email, u.first_name FROM users u WHERE EXISTS ( SELECT 1 FROM orders o WHERE o.user_id = u.user_id ); -- Method 2: IN SELECT user_id, email, first_name FROM users WHERE user_id IN (SELECT user_id FROM orders); -- Method 3: JOIN (may have duplicates!) SELECT DISTINCT u.user_id, u.email, u.first_name FROM users u INNER JOIN orders o ON u.user_id = o.user_id;
Performance Comparison:
  • EXISTS stops at first match (efficient!)
  • IN builds a list then checks membership
  • JOIN may produce duplicates, needs DISTINCT
When to Use Each:
PatternBest For
EXISTSLarge subquery results, just checking existence
INSmall list of values, readable code
JOINWhen you need data from both tables

Question 23: Non-Existence Check

Difficulty: Intermediate | Topic: Anti-Patterns
Problem: Find users who have never placed an order.
Solution:
sql
-- Method 1: NOT EXISTS (safest and often fastest) SELECT u.user_id, u.email, u.first_name FROM users u WHERE NOT EXISTS ( SELECT 1 FROM orders o WHERE o.user_id = u.user_id ); -- Method 2: LEFT JOIN with NULL check (anti-join) SELECT u.user_id, u.email, u.first_name FROM users u LEFT JOIN orders o ON u.user_id = o.user_id WHERE o.order_id IS NULL; -- Method 3: NOT IN (DANGEROUS with NULLs!) SELECT user_id, email, first_name FROM users WHERE user_id NOT IN (SELECT user_id FROM orders);
Critical Warning about NOT IN:
sql
-- If orders.user_id contains any NULL values: SELECT * FROM users WHERE user_id NOT IN (SELECT user_id FROM orders); -- Returns ZERO rows! Because NOT IN with NULL is always UNKNOWN. -- Safe version: SELECT * FROM users WHERE user_id NOT IN (SELECT user_id FROM orders WHERE user_id IS NOT NULL);
Performance:
  • NOT EXISTS and LEFT JOIN anti-patterns perform similarly
  • Modern optimizers often convert between them
  • NOT IN can be slower and has NULL issues

Question 24: Conditional Aggregation

Difficulty: Intermediate | Topic: Filtered Counts
Problem: Count orders by status in a single query.
Solution:
sql
SELECT COUNT(*) AS total_orders, COUNT(CASE WHEN status = 'pending' THEN 1 END) AS pending_orders, COUNT(CASE WHEN status = 'paid' THEN 1 END) AS paid_orders, COUNT(CASE WHEN status = 'shipped' THEN 1 END) AS shipped_orders, COUNT(CASE WHEN status = 'delivered' THEN 1 END) AS delivered_orders, COUNT(CASE WHEN status = 'cancelled' THEN 1 END) AS cancelled_orders FROM orders WHERE created_at >= CURRENT_DATE - INTERVAL '30 days'; -- PostgreSQL FILTER syntax (cleaner) SELECT COUNT(*) AS total_orders, COUNT(*) FILTER (WHERE status = 'pending') AS pending_orders, COUNT(*) FILTER (WHERE status = 'paid') AS paid_orders, COUNT(*) FILTER (WHERE status = 'shipped') AS shipped_orders, COUNT(*) FILTER (WHERE status = 'delivered') AS delivered_orders, COUNT(*) FILTER (WHERE status = 'cancelled') AS cancelled_orders FROM orders WHERE created_at >= CURRENT_DATE - INTERVAL '30 days';
Why This is Better Than Multiple Queries:
sql
-- BAD: 6 queries hitting the same table SELECT COUNT(*) FROM orders WHERE created_at >= ...; SELECT COUNT(*) FROM orders WHERE created_at >= ... AND status = 'pending'; SELECT COUNT(*) FROM orders WHERE created_at >= ... AND status = 'paid'; -- ... 4 more queries -- GOOD: 1 query, 1 table scan SELECT COUNT(*), COUNT(CASE WHEN status = 'pending' THEN 1 END), ... FROM orders WHERE created_at >= ...;

Question 25: Complex Boolean Logic

Difficulty: Intermediate | Topic: Query Logic
Problem: Find high-value customers who either: (a) have spent over $1000 total, OR (b) have placed more than 10 orders in the last year.
Solution:
sql
SELECT u.user_id, u.email, u.first_name, u.last_name, COALESCE(o.total_spent, 0) AS total_spent, COALESCE(o.order_count, 0) AS order_count FROM users u LEFT JOIN ( SELECT user_id, SUM(total_amount) AS total_spent, COUNT(*) AS order_count FROM orders WHERE created_at >= CURRENT_DATE - INTERVAL '1 year' AND status NOT IN ('cancelled', 'refunded') GROUP BY user_id ) o ON u.user_id = o.user_id WHERE o.total_spent > 1000 OR o.order_count > 10;
Breaking Down the Logic:
  1. Subquery calculates spending and order count per user
  2. LEFT JOIN keeps all users (we filter later)
  3. WHERE applies the high-value criteria
  4. COALESCE handles users with no orders
Indexing Suggestion:
sql
CREATE INDEX idx_orders_user_created ON orders(user_id, created_at) WHERE status NOT IN ('cancelled', 'refunded');

Summary: Part 1 Key Takeaways

What We Covered

  1. SELECT basics - Always specify columns, avoid SELECT *
  2. WHERE clauses - AND, OR, NOT, BETWEEN, LIKE, IN
  3. NULL handling - IS NULL, COALESCE, NULLIF
  4. Sorting - ORDER BY with indexes
  5. Pagination - LIMIT/OFFSET vs Keyset

Golden Rules

RuleWhy
Never use SELECT *Network overhead, schema changes, prevents covering indexes
Always filter on indexed columnsAvoid full table scans
Beware of ORCan prevent index usage
Handle NULLs explicitlyNULL comparisons are tricky
Use keyset paginationOFFSET is slow for deep pages
Test with production data volumesQueries that work with 1000 rows may fail with 10 million

Indexing Checklist

  • Index columns used in WHERE clauses
  • Index columns used in JOIN conditions
  • Index columns used in ORDER BY
  • Consider composite indexes for multi-column filters
  • Use partial indexes for filtered queries
  • Check EXPLAIN plans to verify index usage

Coming in Part 2: Keys, Constraints, and Normalization

  • Primary keys: Natural vs Surrogate
  • Foreign keys: When to use, when to skip
  • Unique constraints: Partial uniqueness
  • Composite keys: Design patterns
  • Normalization: 1NF through 3NF with examples
  • Denormalization: When to break the rules

This is Part 1 of the SQL Mastery Guide series. Each part builds on the previous, progressing from foundations to advanced optimization techniques.
All Blogs
Tags:sqldatabasequeriesinterviewperformance