SQL Mastery Guide Part 1: Foundations

Introduction: How to Use This Guide

After 20+ years of designing databases for fintech, e-commerce, and high-scale platforms, I've seen thousands of engineers struggle with SQL not because it's hard, but because they learned it wrong.

Most tutorials teach you SELECT * FROM table and call it a day. Real production databases have millions of rows. That SELECT * would bring your server to its knees.

This guide is different. Every query comes with:

The problem to solve
The solution
Why it works
What can go wrong
How to make it fast
When NOT to use this pattern

Practice Schema: E-Commerce Platform

We'll use this schema throughout the series:

sql
-- Users table: 10 million users
CREATE TABLE users (
    user_id         BIGSERIAL PRIMARY KEY,
    email           VARCHAR(255) NOT NULL UNIQUE,
    username        VARCHAR(50) NOT NULL UNIQUE,
    password_hash   VARCHAR(255) NOT NULL,
    first_name      VARCHAR(100),
    last_name       VARCHAR(100),
    phone           VARCHAR(20),
    status          VARCHAR(20) DEFAULT 'active',  -- active, suspended, deleted
    email_verified  BOOLEAN DEFAULT FALSE,
    created_at      TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at      TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    last_login_at   TIMESTAMP,
    login_count     INT DEFAULT 0
);

-- Products table: 500,000 products
CREATE TABLE products (
    product_id      BIGSERIAL PRIMARY KEY,
    sku             VARCHAR(50) NOT NULL UNIQUE,
    name            VARCHAR(255) NOT NULL,
    description     TEXT,
    category_id     INT REFERENCES categories(category_id),
    brand_id        INT REFERENCES brands(brand_id),
    price           DECIMAL(10,2) NOT NULL,
    cost_price      DECIMAL(10,2),
    stock_quantity  INT DEFAULT 0,
    status          VARCHAR(20) DEFAULT 'active',  -- active, discontinued, out_of_stock
    weight_kg       DECIMAL(8,3),
    created_at      TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at      TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Orders table: 50 million orders
CREATE TABLE orders (
    order_id        BIGSERIAL PRIMARY KEY,
    order_number    VARCHAR(20) NOT NULL UNIQUE,
    user_id         BIGINT NOT NULL REFERENCES users(user_id),
    status          VARCHAR(20) NOT NULL,  -- pending, paid, shipped, delivered, cancelled, refunded
    subtotal        DECIMAL(12,2) NOT NULL,
    tax_amount      DECIMAL(12,2) DEFAULT 0,
    shipping_amount DECIMAL(12,2) DEFAULT 0,
    discount_amount DECIMAL(12,2) DEFAULT 0,
    total_amount    DECIMAL(12,2) NOT NULL,
    currency        CHAR(3) DEFAULT 'USD',
    shipping_address_id BIGINT,
    billing_address_id  BIGINT,
    payment_method  VARCHAR(50),
    paid_at         TIMESTAMP,
    shipped_at      TIMESTAMP,
    delivered_at    TIMESTAMP,
    cancelled_at    TIMESTAMP,
    created_at      TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at      TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Order Items table: 150 million items
CREATE TABLE order_items (
    item_id         BIGSERIAL PRIMARY KEY,
    order_id        BIGINT NOT NULL REFERENCES orders(order_id),
    product_id      BIGINT NOT NULL REFERENCES products(product_id),
    quantity        INT NOT NULL,
    unit_price      DECIMAL(10,2) NOT NULL,
    total_price     DECIMAL(12,2) NOT NULL,
    created_at      TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Categories table: 500 categories
CREATE TABLE categories (
    category_id     SERIAL PRIMARY KEY,
    name            VARCHAR(100) NOT NULL,
    parent_id       INT REFERENCES categories(category_id),
    level           INT DEFAULT 1,
    path            VARCHAR(500),  -- e.g., '/electronics/phones/smartphones'
    is_active       BOOLEAN DEFAULT TRUE
);

-- Payments table: 40 million payments
CREATE TABLE payments (
    payment_id      BIGSERIAL PRIMARY KEY,
    order_id        BIGINT NOT NULL REFERENCES orders(order_id),
    amount          DECIMAL(12,2) NOT NULL,
    currency        CHAR(3) DEFAULT 'USD',
    method          VARCHAR(50) NOT NULL,  -- credit_card, debit_card, upi, wallet
    status          VARCHAR(20) NOT NULL,  -- pending, completed, failed, refunded
    gateway_ref     VARCHAR(100),
    error_message   TEXT,
    created_at      TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    processed_at    TIMESTAMP
);

Section 1: SELECT Basics

Question 1: Select All Columns

Difficulty: Beginner | Topic: Basic SELECT

Problem: Retrieve all information about all products.

Solution:

sql
-- The naive approach (AVOID in production)
SELECT * FROM products;

-- The correct approach
SELECT
    product_id, sku, name, description, category_id,
    brand_id, price, cost_price, stock_quantity,
    status, weight_kg, created_at, updated_at
FROM products;

Explanation: SELECT * fetches every column. While convenient for exploration, it's problematic in production:

Fetches unnecessary data (like large TEXT descriptions)
Breaks when schema changes (new columns appear unexpectedly)
Prevents covering indexes from being used

Common Mistakes:

Using SELECT * in application code
Not considering column order for readability

Performance Concerns:

With 500,000 products and a TEXT description column, SELECT * might transfer gigabytes of data
Network becomes the bottleneck

When NOT to Use:

Never use SELECT * in production application code
Acceptable only for ad-hoc exploration in a SQL client

Question 2: Select Specific Columns

Difficulty: Beginner | Topic: Column Selection

Problem: Get product names and prices for a product catalog page.

Solution:

sql
SELECT product_id, name, price
FROM products;

Explanation: Only select what you need. For a catalog page, you need ID (for links), name, and price. Not description, not stock, not dates.

Performance Concerns:

Fewer columns = less data transfer
Enables use of covering indexes (we'll cover this in Part 6)

Indexing Suggestion:

sql
-- A covering index for catalog queries
CREATE INDEX idx_products_catalog ON products(product_id, name, price);

Question 3: Column Aliases

Difficulty: Beginner | Topic: Aliases

Problem: Get product information with clearer column names for API response.

Solution:

sql
SELECT
    product_id AS id,
    name AS product_name,
    price AS unit_price,
    stock_quantity AS available_stock
FROM products;

Explanation: Aliases rename columns in the result set. Useful when:

Column names are unclear (sku → product_code)
Preparing data for specific API formats
Using expressions (we'll see later)

Common Mistakes:

sql
-- WRONG: Can't use alias in WHERE (in most databases)
SELECT name AS product_name FROM products WHERE product_name = 'iPhone';

-- CORRECT: Use original column name in WHERE
SELECT name AS product_name FROM products WHERE name = 'iPhone';

Question 4: Calculated Columns

Difficulty: Beginner | Topic: Expressions

Problem: Show products with their profit margin (price - cost_price).

Solution:

sql
SELECT
    product_id,
    name,
    price,
    cost_price,
    (price - cost_price) AS profit,
    ROUND((price - cost_price) / price * 100, 2) AS profit_margin_percent
FROM products
WHERE cost_price IS NOT NULL;

Explanation: You can perform calculations in SELECT:

Arithmetic: +, -, *, /
Functions: ROUND(), ABS(), etc.
Give calculated columns meaningful aliases

Common Mistakes:

sql
-- WRONG: Division by zero
SELECT (price - cost_price) / cost_price FROM products;

-- CORRECT: Handle zero
SELECT
    CASE WHEN cost_price = 0 THEN 0
         ELSE (price - cost_price) / cost_price
    END AS markup_ratio
FROM products;

Performance Concerns:

Calculations are performed for every row
Cannot be indexed (unless using computed/generated columns)
For heavy calculations, consider materialized views

Section 2: WHERE Clause

Question 5: Simple Equality Filter

Difficulty: Beginner | Topic: WHERE Basics

Problem: Find all orders with status 'pending'.

Solution:

sql
SELECT order_id, order_number, user_id, total_amount, created_at
FROM orders
WHERE status = 'pending';

Explanation: WHERE filters rows before they're returned. Only rows matching the condition are included.

Performance Concerns: With 50 million orders, filtering without an index means scanning all rows.

Indexing Suggestion:

sql
-- Index for status lookups
CREATE INDEX idx_orders_status ON orders(status);

-- Check how many 'pending' orders exist (cardinality matters!)
SELECT status, COUNT(*) FROM orders GROUP BY status;

When Index Helps vs Hurts:

If 1% of orders are 'pending' (500K rows) → Index helps a lot
If 90% of orders are 'pending' → Full table scan might be faster

Question 6: Multiple Conditions with AND

Difficulty: Beginner | Topic: Compound Conditions

Problem: Find active products with price over $100.

Solution:

sql
SELECT product_id, name, price, stock_quantity
FROM products
WHERE status = 'active'
  AND price > 100;

Explanation: AND requires both conditions to be true. Rows must be active AND have price > 100.

EXPLAIN Analysis:

sql
EXPLAIN ANALYZE
SELECT product_id, name, price, stock_quantity
FROM products
WHERE status = 'active' AND price > 100;

Look for:

Seq Scan → No index used, scanning all rows
Index Scan → Using an index
Bitmap Heap Scan → Using index then fetching rows

Indexing Suggestion:

sql
-- Composite index (order matters!)
CREATE INDEX idx_products_status_price ON products(status, price);

Why This Order? Put equality conditions first (status = 'active'), then range conditions (price > 100). The index can efficiently filter by status, then scan the price range.

Question 7: Multiple Conditions with OR

Difficulty: Beginner | Topic: OR Conditions

Problem: Find orders that are either shipped or delivered.

Solution:

sql
SELECT order_id, order_number, status, created_at
FROM orders
WHERE status = 'shipped' OR status = 'delivered';

-- Better alternative using IN
SELECT order_id, order_number, status, created_at
FROM orders
WHERE status IN ('shipped', 'delivered');

Explanation: OR requires at least one condition to be true. IN is cleaner for multiple values of the same column.

Performance Concerns: OR conditions can prevent index usage in some databases. The optimizer may need to do an "index merge" or fall back to full scan.

Indexing Suggestion:

sql
-- Single index works well with IN
CREATE INDEX idx_orders_status ON orders(status);

Common Mistakes:

sql
-- WRONG: Mixing columns with OR can be slow
SELECT * FROM orders
WHERE status = 'pending' OR user_id = 12345;

-- BETTER: Use UNION for different column filters
SELECT * FROM orders WHERE status = 'pending'
UNION ALL
SELECT * FROM orders WHERE user_id = 12345 AND status != 'pending';

Question 8: NOT Condition

Difficulty: Beginner | Topic: Negation

Problem: Find all products that are NOT discontinued.

Solution:

sql
-- Using NOT
SELECT product_id, name, status
FROM products
WHERE NOT status = 'discontinued';

-- Using != (same result)
SELECT product_id, name, status
FROM products
WHERE status != 'discontinued';

-- Using <> (ANSI standard)
SELECT product_id, name, status
FROM products
WHERE status <> 'discontinued';

Performance Concerns: NOT and != often can't use indexes efficiently. If 5% of products are discontinued, you're fetching 95% of the table might as well full scan.

Better Alternative:

sql
-- If you know the valid statuses
SELECT product_id, name, status
FROM products
WHERE status IN ('active', 'out_of_stock');

When NOT to Use:

Avoid NOT IN with nullable columns (see Question 15)
Avoid NOT LIKE on large tables

Question 9: BETWEEN for Range Queries

Difficulty: Beginner | Topic: Range Filtering

Problem: Find products priced between $50 and $200.

Solution:

sql
SELECT product_id, name, price
FROM products
WHERE price BETWEEN 50 AND 200;

-- Equivalent to:
SELECT product_id, name, price
FROM products
WHERE price >= 50 AND price <= 200;

Explanation: BETWEEN is inclusive on both ends. $50 and $200 are included in results.

Common Mistakes:

sql
-- WRONG: Expecting BETWEEN to be exclusive
-- This INCLUDES 50 and 200

-- If you want exclusive:
SELECT * FROM products WHERE price > 50 AND price < 200;

Performance Concerns: Range queries can use B-tree indexes efficiently.

Indexing Suggestion:

sql
CREATE INDEX idx_products_price ON products(price);

Date Range Example:

sql
-- Orders from January 2024
SELECT * FROM orders
WHERE created_at BETWEEN '2024-01-01' AND '2024-01-31 23:59:59';

-- Better for dates (explicit and avoids time issues):
SELECT * FROM orders
WHERE created_at >= '2024-01-01'
  AND created_at < '2024-02-01';

Question 10: LIKE for Pattern Matching

Difficulty: Beginner | Topic: Pattern Matching

Problem: Find products whose name contains 'phone'.

Solution:

sql
-- Contains 'phone' anywhere
SELECT product_id, name
FROM products
WHERE name LIKE '%phone%';

-- Starts with 'iPhone'
SELECT product_id, name
FROM products
WHERE name LIKE 'iPhone%';

-- Case-insensitive (PostgreSQL)
SELECT product_id, name
FROM products
WHERE name ILIKE '%phone%';

Explanation:

% matches any sequence of characters (including empty)
_ matches exactly one character
ILIKE is case-insensitive (PostgreSQL specific)

Performance Concerns: This is crucial to understand:

sql
-- CAN use index (prefix search)
WHERE name LIKE 'iPhone%'

-- CANNOT use index (leading wildcard)
WHERE name LIKE '%phone%'

Why? B-tree indexes are sorted. You can quickly find all entries starting with 'iPhone', but finding entries containing 'phone' anywhere requires scanning everything.

Solutions for Full-Text Search:

sql
-- Create a full-text search index
CREATE INDEX idx_products_name_gin ON products USING GIN (to_tsvector('english', name));

-- Use full-text search
SELECT * FROM products
WHERE to_tsvector('english', name) @@ to_tsquery('phone');

Question 11: IN Operator

Difficulty: Beginner | Topic: Set Membership

Problem: Find orders for specific users.

Solution:

sql
SELECT order_id, user_id, total_amount, created_at
FROM orders
WHERE user_id IN (1001, 1002, 1003, 1004, 1005);

Explanation: IN checks if a value exists in a set. Cleaner than multiple OR conditions.

Performance Concerns:

Small lists (< 100 values): Index lookup for each value, then combine
Large lists (> 1000 values): May degrade to full scan

When to Use JOIN Instead:

sql
-- Instead of IN with many values
SELECT o.* FROM orders o
WHERE o.user_id IN (SELECT user_id FROM premium_users);

-- Better: Use JOIN
SELECT o.*
FROM orders o
INNER JOIN premium_users pu ON o.user_id = pu.user_id;

Indexing Suggestion:

sql
CREATE INDEX idx_orders_user_id ON orders(user_id);

Question 12: NULL Handling

Difficulty: Intermediate | Topic: NULL Semantics

Problem: Find products that don't have a cost price set.

Solution:

sql
-- CORRECT
SELECT product_id, name, price, cost_price
FROM products
WHERE cost_price IS NULL;

-- WRONG (this returns no rows!)
SELECT product_id, name, price, cost_price
FROM products
WHERE cost_price = NULL;

Explanation: NULL represents "unknown" or "missing." In SQL:

NULL = NULL is not true (it's unknown!)
NULL != NULL is not true either
Only IS NULL and IS NOT NULL work correctly

Common Mistakes:

sql
-- WRONG: Trying to compare with NULL
WHERE cost_price != NULL  -- Returns nothing!

-- WRONG: IN with NULL
WHERE status IN ('active', NULL)  -- NULL is ignored!

-- CORRECT: Explicit NULL check
WHERE status = 'active' OR status IS NULL

COALESCE for Default Values:

sql
-- Show 0 when cost_price is NULL
SELECT
    product_id,
    name,
    price,
    COALESCE(cost_price, 0) AS cost_price,
    price - COALESCE(cost_price, 0) AS profit
FROM products;

Performance Concerns:

IS NULL can use indexes (partial indexes work great)
COALESCE in WHERE clause prevents index usage

Indexing Suggestion:

sql
-- Partial index for NULL cost_price products
CREATE INDEX idx_products_null_cost
ON products(product_id)
WHERE cost_price IS NULL;

Question 13: NULLIF Function

Difficulty: Intermediate | Topic: NULL Handling

Problem: Calculate average order value, treating zero-value orders as NULL (exclude from average).

Solution:

sql
SELECT
    AVG(NULLIF(total_amount, 0)) AS avg_order_value
FROM orders;

Explanation: NULLIF(a, b) returns NULL if a = b, otherwise returns a.

Use cases:

Exclude zero values from aggregations
Prevent division by zero: value / NULLIF(divisor, 0)

Common Mistakes:

sql
-- Division by zero error
SELECT profit / quantity FROM order_items;

-- Safe division
SELECT profit / NULLIF(quantity, 0) FROM order_items;

Section 3: Sorting and Limiting

Question 14: ORDER BY Basics

Difficulty: Beginner | Topic: Sorting

Problem: Get the most expensive products first.

Solution:

sql
SELECT product_id, name, price
FROM products
WHERE status = 'active'
ORDER BY price DESC;

Explanation:

ORDER BY column ASC - ascending (default, smallest first)
ORDER BY column DESC - descending (largest first)

Multiple Column Sorting:

sql
-- Sort by category, then by price within each category
SELECT product_id, name, category_id, price
FROM products
ORDER BY category_id ASC, price DESC;

Performance Concerns: Sorting requires memory and CPU. For large result sets:

Database loads matching rows
Sorts them in memory (or spills to disk if too large)
Returns sorted results

Indexing Suggestion:

sql
-- Index can provide pre-sorted data
CREATE INDEX idx_products_category_price
ON products(category_id ASC, price DESC);

-- Now this query uses the index order, no sorting needed:
SELECT * FROM products
WHERE category_id = 5
ORDER BY price DESC;

Question 15: LIMIT and OFFSET

Difficulty: Beginner | Topic: Pagination

Problem: Get the first 20 products for page 1 of a product listing.

Solution:

sql
-- Page 1: First 20 products
SELECT product_id, name, price
FROM products
WHERE status = 'active'
ORDER BY created_at DESC
LIMIT 20 OFFSET 0;

-- Page 2: Next 20 products
SELECT product_id, name, price
FROM products
WHERE status = 'active'
ORDER BY created_at DESC
LIMIT 20 OFFSET 20;

-- Page N formula: OFFSET = (page_number - 1) * page_size

Performance Concerns: OFFSET is a performance killer!

Page 1:  OFFSET 0    → Read 20 rows
Page 10: OFFSET 180  → Read 200 rows, discard 180
Page 100: OFFSET 1980 → Read 2000 rows, discard 1980

With deep pagination, you're reading and discarding thousands of rows.

Better Alternative: Keyset Pagination

sql
-- Instead of OFFSET, remember the last value
-- Page 1:
SELECT product_id, name, price, created_at
FROM products
WHERE status = 'active'
ORDER BY created_at DESC, product_id DESC
LIMIT 20;

-- Page 2 (using last row's values from page 1):
SELECT product_id, name, price, created_at
FROM products
WHERE status = 'active'
  AND (created_at, product_id) < ('2024-01-15 10:30:00', 12345)
ORDER BY created_at DESC, product_id DESC
LIMIT 20;

Why Keyset is Better:

Always reads exactly 20 rows
Uses index efficiently
Consistent performance regardless of page number

When to Use OFFSET:

Small tables (< 10,000 rows)
When users won't paginate deeply
Admin interfaces with low traffic

When to Use Keyset:

Large tables
API endpoints
Infinite scroll interfaces
Any high-traffic pagination

Question 16: TOP N Queries

Difficulty: Intermediate | Topic: Top N Patterns

Problem: Find the 5 most expensive products in each category.

Solution (PostgreSQL):

sql
WITH ranked_products AS (
    SELECT
        product_id,
        name,
        category_id,
        price,
        ROW_NUMBER() OVER (PARTITION BY category_id ORDER BY price DESC) AS rank
    FROM products
    WHERE status = 'active'
)
SELECT * FROM ranked_products
WHERE rank <= 5;

Explanation:

ROW_NUMBER() assigns a unique number to each row
PARTITION BY category_id restarts numbering for each category
ORDER BY price DESC ensures highest price gets rank 1
Filter rank <= 5 keeps only top 5 per category

Alternative with LATERAL (PostgreSQL):

sql
SELECT c.category_id, c.name AS category_name, p.*
FROM categories c
CROSS JOIN LATERAL (
    SELECT product_id, name, price
    FROM products
    WHERE category_id = c.category_id
      AND status = 'active'
    ORDER BY price DESC
    LIMIT 5
) p;

Performance Comparison:

Window function: Scans all products, ranks them, filters
LATERAL: For each category, efficiently gets top 5 using index

Indexing Suggestion:

sql
CREATE INDEX idx_products_category_price
ON products(category_id, price DESC)
WHERE status = 'active';

Question 17: DISTINCT

Difficulty: Beginner | Topic: Deduplication

Problem: Get all unique categories that have products.

Solution:

sql
SELECT DISTINCT category_id
FROM products
WHERE status = 'active';

Explanation: DISTINCT removes duplicate rows from the result set.

DISTINCT vs GROUP BY:

sql
-- These are equivalent:
SELECT DISTINCT category_id FROM products;
SELECT category_id FROM products GROUP BY category_id;

But GROUP BY allows aggregations, DISTINCT doesn't.

Performance Concerns: DISTINCT requires sorting or hashing to identify duplicates.

For Counting Distinct Values:

sql
-- How many unique customers ordered?
SELECT COUNT(DISTINCT user_id) FROM orders;

Warning: COUNT(DISTINCT) on large tables is expensive it must track every unique value.

Common Mistakes:

sql
-- WRONG: DISTINCT applies to entire row, not just first column
SELECT DISTINCT category_id, product_id FROM products;
-- This returns all products (each product_id is unique!)

-- What you probably wanted:
SELECT DISTINCT category_id FROM products;

Section 4: Practical Filtering Patterns

Question 18: Date Filtering

Difficulty: Intermediate | Topic: Date Operations

Problem: Find all orders from the last 30 days.

Solution:

sql
-- PostgreSQL
SELECT order_id, user_id, total_amount, created_at
FROM orders
WHERE created_at >= CURRENT_DATE - INTERVAL '30 days';

-- MySQL
SELECT order_id, user_id, total_amount, created_at
FROM orders
WHERE created_at >= DATE_SUB(CURDATE(), INTERVAL 30 DAY);

-- SQL Server
SELECT order_id, user_id, total_amount, created_at
FROM orders
WHERE created_at >= DATEADD(day, -30, GETDATE());

Common Mistakes:

sql
-- WRONG: Function on indexed column prevents index usage
WHERE DATE(created_at) = '2024-01-15'

-- CORRECT: Use range
WHERE created_at >= '2024-01-15'
  AND created_at < '2024-01-16'

-- WRONG: String comparison issues
WHERE created_at = '2024-01-15'  -- Misses times!

-- CORRECT: Cover the full day
WHERE created_at >= '2024-01-15 00:00:00'
  AND created_at < '2024-01-16 00:00:00'

Indexing Suggestion:

sql
CREATE INDEX idx_orders_created_at ON orders(created_at);

Question 19: Status-Based Filtering

Difficulty: Intermediate | Topic: Workflow States

Problem: Find orders that need attention (pending payment or failed payment).

Solution:

sql
SELECT
    o.order_id,
    o.order_number,
    o.total_amount,
    o.status AS order_status,
    p.status AS payment_status,
    o.created_at
FROM orders o
LEFT JOIN payments p ON o.order_id = p.order_id
WHERE o.status = 'pending'
   OR p.status = 'failed'
ORDER BY o.created_at ASC;

Alternative with CASE:

sql
SELECT
    order_id,
    order_number,
    total_amount,
    status,
    CASE
        WHEN status = 'pending' THEN 'Awaiting Payment'
        WHEN status = 'paid' THEN 'Payment Received'
        WHEN status = 'shipped' THEN 'In Transit'
        WHEN status = 'delivered' THEN 'Completed'
        WHEN status = 'cancelled' THEN 'Cancelled'
        ELSE 'Unknown'
    END AS status_display
FROM orders;

Indexing Suggestion:

sql
-- Partial index for specific statuses
CREATE INDEX idx_orders_pending
ON orders(created_at)
WHERE status = 'pending';

CREATE INDEX idx_payments_failed
ON payments(order_id)
WHERE status = 'failed';

Question 20: Price Range Filters

Difficulty: Intermediate | Topic: E-commerce Filters

Problem: Implement product filtering by price range for a shopping site.

Solution:

sql
-- With dynamic price ranges (passed as parameters)
SELECT product_id, name, price, category_id
FROM products
WHERE status = 'active'
  AND price >= :min_price  -- Parameter
  AND price <= :max_price  -- Parameter
ORDER BY price ASC;

-- With predefined ranges
SELECT
    CASE
        WHEN price < 25 THEN 'Under $25'
        WHEN price < 50 THEN '$25 - $50'
        WHEN price < 100 THEN '$50 - $100'
        WHEN price < 200 THEN '$100 - $200'
        ELSE '$200+'
    END AS price_range,
    COUNT(*) AS product_count
FROM products
WHERE status = 'active'
GROUP BY 1
ORDER BY MIN(price);

Building Dynamic Filter Counts:

sql
-- Show how many products in each price range
SELECT
    SUM(CASE WHEN price < 25 THEN 1 ELSE 0 END) AS under_25,
    SUM(CASE WHEN price >= 25 AND price < 50 THEN 1 ELSE 0 END) AS range_25_50,
    SUM(CASE WHEN price >= 50 AND price < 100 THEN 1 ELSE 0 END) AS range_50_100,
    SUM(CASE WHEN price >= 100 THEN 1 ELSE 0 END) AS over_100
FROM products
WHERE status = 'active'
  AND category_id = :category_id;

Question 21: Multi-Filter Search

Difficulty: Intermediate | Topic: Dynamic Filtering

Problem: Build a product search that handles multiple optional filters.

Solution:

sql
-- The flexible approach
SELECT product_id, name, price, category_id, brand_id
FROM products
WHERE status = 'active'
  AND (:category_id IS NULL OR category_id = :category_id)
  AND (:brand_id IS NULL OR brand_id = :brand_id)
  AND (:min_price IS NULL OR price >= :min_price)
  AND (:max_price IS NULL OR price <= :max_price)
  AND (:search_term IS NULL OR name ILIKE '%' || :search_term || '%')
ORDER BY created_at DESC
LIMIT 20;

Performance Warning: This pattern is convenient but problematic:

OR conditions prevent efficient index usage
Query plan is chosen based on statistics, not actual parameters

Better Approach: Build Query Dynamically

go
// Application code (Go example)
query := "SELECT product_id, name, price FROM products WHERE status = 'active'"
params := []interface{}{}

if categoryID != nil {
    query += " AND category_id = $" + strconv.Itoa(len(params)+1)
    params = append(params, categoryID)
}

if minPrice != nil {
    query += " AND price >= $" + strconv.Itoa(len(params)+1)
    params = append(params, minPrice)
}
// ... etc

Question 22: Existence Check

Difficulty: Intermediate | Topic: Conditional Logic

Problem: Find users who have placed at least one order.

Solution:

sql
-- Method 1: EXISTS (usually fastest)
SELECT u.user_id, u.email, u.first_name
FROM users u
WHERE EXISTS (
    SELECT 1 FROM orders o WHERE o.user_id = u.user_id
);

-- Method 2: IN
SELECT user_id, email, first_name
FROM users
WHERE user_id IN (SELECT user_id FROM orders);

-- Method 3: JOIN (may have duplicates!)
SELECT DISTINCT u.user_id, u.email, u.first_name
FROM users u
INNER JOIN orders o ON u.user_id = o.user_id;

Performance Comparison:

EXISTS stops at first match (efficient!)
IN builds a list then checks membership
JOIN may produce duplicates, needs DISTINCT

When to Use Each:

Pattern	Best For
EXISTS	Large subquery results, just checking existence
IN	Small list of values, readable code
JOIN	When you need data from both tables

Question 23: Non-Existence Check

Difficulty: Intermediate | Topic: Anti-Patterns

Problem: Find users who have never placed an order.

Solution:

sql
-- Method 1: NOT EXISTS (safest and often fastest)
SELECT u.user_id, u.email, u.first_name
FROM users u
WHERE NOT EXISTS (
    SELECT 1 FROM orders o WHERE o.user_id = u.user_id
);

-- Method 2: LEFT JOIN with NULL check (anti-join)
SELECT u.user_id, u.email, u.first_name
FROM users u
LEFT JOIN orders o ON u.user_id = o.user_id
WHERE o.order_id IS NULL;

-- Method 3: NOT IN (DANGEROUS with NULLs!)
SELECT user_id, email, first_name
FROM users
WHERE user_id NOT IN (SELECT user_id FROM orders);

Critical Warning about NOT IN:

sql
-- If orders.user_id contains any NULL values:
SELECT * FROM users WHERE user_id NOT IN (SELECT user_id FROM orders);
-- Returns ZERO rows! Because NOT IN with NULL is always UNKNOWN.

-- Safe version:
SELECT * FROM users
WHERE user_id NOT IN (SELECT user_id FROM orders WHERE user_id IS NOT NULL);

Performance:

NOT EXISTS and LEFT JOIN anti-patterns perform similarly
Modern optimizers often convert between them
NOT IN can be slower and has NULL issues

Question 24: Conditional Aggregation

Difficulty: Intermediate | Topic: Filtered Counts

Problem: Count orders by status in a single query.

Solution:

sql
SELECT
    COUNT(*) AS total_orders,
    COUNT(CASE WHEN status = 'pending' THEN 1 END) AS pending_orders,
    COUNT(CASE WHEN status = 'paid' THEN 1 END) AS paid_orders,
    COUNT(CASE WHEN status = 'shipped' THEN 1 END) AS shipped_orders,
    COUNT(CASE WHEN status = 'delivered' THEN 1 END) AS delivered_orders,
    COUNT(CASE WHEN status = 'cancelled' THEN 1 END) AS cancelled_orders
FROM orders
WHERE created_at >= CURRENT_DATE - INTERVAL '30 days';

-- PostgreSQL FILTER syntax (cleaner)
SELECT
    COUNT(*) AS total_orders,
    COUNT(*) FILTER (WHERE status = 'pending') AS pending_orders,
    COUNT(*) FILTER (WHERE status = 'paid') AS paid_orders,
    COUNT(*) FILTER (WHERE status = 'shipped') AS shipped_orders,
    COUNT(*) FILTER (WHERE status = 'delivered') AS delivered_orders,
    COUNT(*) FILTER (WHERE status = 'cancelled') AS cancelled_orders
FROM orders
WHERE created_at >= CURRENT_DATE - INTERVAL '30 days';

Why This is Better Than Multiple Queries:

sql
-- BAD: 6 queries hitting the same table
SELECT COUNT(*) FROM orders WHERE created_at >= ...;
SELECT COUNT(*) FROM orders WHERE created_at >= ... AND status = 'pending';
SELECT COUNT(*) FROM orders WHERE created_at >= ... AND status = 'paid';
-- ... 4 more queries

-- GOOD: 1 query, 1 table scan
SELECT COUNT(*), COUNT(CASE WHEN status = 'pending' THEN 1 END), ...
FROM orders WHERE created_at >= ...;

Question 25: Complex Boolean Logic

Difficulty: Intermediate | Topic: Query Logic

Problem: Find high-value customers who either: (a) have spent over $1000 total, OR (b) have placed more than 10 orders in the last year.

Solution:

sql
SELECT
    u.user_id,
    u.email,
    u.first_name,
    u.last_name,
    COALESCE(o.total_spent, 0) AS total_spent,
    COALESCE(o.order_count, 0) AS order_count
FROM users u
LEFT JOIN (
    SELECT
        user_id,
        SUM(total_amount) AS total_spent,
        COUNT(*) AS order_count
    FROM orders
    WHERE created_at >= CURRENT_DATE - INTERVAL '1 year'
      AND status NOT IN ('cancelled', 'refunded')
    GROUP BY user_id
) o ON u.user_id = o.user_id
WHERE o.total_spent > 1000
   OR o.order_count > 10;

Breaking Down the Logic:

Subquery calculates spending and order count per user
LEFT JOIN keeps all users (we filter later)
WHERE applies the high-value criteria
COALESCE handles users with no orders

Indexing Suggestion:

sql
CREATE INDEX idx_orders_user_created
ON orders(user_id, created_at)
WHERE status NOT IN ('cancelled', 'refunded');

Summary: Part 1 Key Takeaways

What We Covered

SELECT basics - Always specify columns, avoid SELECT *
WHERE clauses - AND, OR, NOT, BETWEEN, LIKE, IN
NULL handling - IS NULL, COALESCE, NULLIF
Sorting - ORDER BY with indexes
Pagination - LIMIT/OFFSET vs Keyset

Golden Rules

Rule	Why
Never use SELECT *	Network overhead, schema changes, prevents covering indexes
Always filter on indexed columns	Avoid full table scans
Beware of OR	Can prevent index usage
Handle NULLs explicitly	NULL comparisons are tricky
Use keyset pagination	OFFSET is slow for deep pages
Test with production data volumes	Queries that work with 1000 rows may fail with 10 million

Indexing Checklist

Index columns used in WHERE clauses
Index columns used in JOIN conditions
Index columns used in ORDER BY
Consider composite indexes for multi-column filters
Use partial indexes for filtered queries
Check EXPLAIN plans to verify index usage

Coming in Part 2: Keys, Constraints, and Normalization

Primary keys: Natural vs Surrogate
Foreign keys: When to use, when to skip
Unique constraints: Partial uniqueness
Composite keys: Design patterns
Normalization: 1NF through 3NF with examples
Denormalization: When to break the rules

This is Part 1 of the SQL Mastery Guide series. Each part builds on the previous, progressing from foundations to advanced optimization techniques.