Every time I inherit a slow application, I start the same way: I open Query Store, sort by average CPU, and look at the top 10 offenders. Nine times out of ten, the fix is an index. But "add an index" is deceptively simple advice — the which, where, and why of it takes years to internalize.
Here's the mental model I've landed on.
Why indexes get misunderstood
The naive take is that indexes are just magic speed-up things. Slap one on the column you're filtering and you're done. That works — until it doesn't.
The real picture: an index is a sorted copy of (part of) your data, stored separately, maintained on every write. You're trading write overhead and storage for read speed. That trade-off has to be deliberate.
The three questions
Before creating any index, I ask:
- What queries will benefit? (Look at actual execution plans, not guesses)
- How does the table get written to? (Write-heavy tables pay a higher index tax)
- Is there an existing index I can extend instead? (Covering a nearby query is often cheaper than a new index)
Clustered vs. non-clustered
SQL Server tables have exactly one clustered index — this is the table, physically ordered by those key columns. Everything else is a non-clustered index, which stores the indexed columns plus a pointer back to the clustered row.
-- The clustered index is usually your primary key
-- Choose it for range scans, not just point lookups
CREATE TABLE Orders (
OrderId INT IDENTITY(1,1) PRIMARY KEY, -- clustered by default
CustomerId INT NOT NULL,
OrderDate DATETIME2 NOT NULL,
Status TINYINT NOT NULL
);
If you're doing a lot of WHERE OrderDate BETWEEN @start AND @end, clustering on
OrderDate might outperform clustering on an identity column. This is the kind of
decision that's easy to get wrong at schema design time and expensive to change later.
The seek vs. scan distinction
The most important thing to internalize from an execution plan:
- Index Seek → SQL Server jumped directly to the rows. Good.
- Index Scan → SQL Server read the whole (or most of the) index. Often bad.
A scan on a small table is fine. A scan on a 50M-row table is a full table read, even if it's going through a non-clustered index.
Selective predicates (high cardinality columns like OrderId, Email) produce
seeks. Non-selective predicates (Status IN (1,2) on a column with three values) often
produce scans regardless of indexing.
Covering indexes are underused
A covering index includes all the columns a query needs, so SQL Server never has to look up the base table row (called a key lookup). Key lookups are the silent killers — they look small in the plan but explode under load.
-- Query that frequently runs:
SELECT OrderId, OrderDate, Status
FROM Orders
WHERE CustomerId = @customerId
AND Status = 1;
-- Covering index: key column first, then INCLUDE the rest
CREATE INDEX IX_Orders_Customer_Status
ON Orders (CustomerId, Status)
INCLUDE (OrderId, OrderDate);
The rule of thumb: filter and sort columns go in the key; all other needed columns go in INCLUDE.
Column order in composite indexes
Column order matters — a lot. SQL Server can use a composite index for a query only if the leading edge (leftmost key columns) is referenced.
-- Index on (CustomerId, OrderDate)
-- ✅ Used: WHERE CustomerId = 1
-- ✅ Used: WHERE CustomerId = 1 AND OrderDate > '2024-01-01'
-- ❌ NOT used: WHERE OrderDate > '2024-01-01' (skips leading column)
This is the "leftmost prefix" rule. It's why you sometimes see an index on (A, B) and
another on (B) for the same table — they serve different query shapes.
Filtered indexes: an underrated tool
If a large percentage of your queries filter on the same subset (e.g., only active records), a filtered index can be dramatically smaller and faster than a full index:
-- 98% of reads are on active orders only
CREATE INDEX IX_Orders_Active
ON Orders (OrderDate, CustomerId)
WHERE Status = 1;
The index is only built over rows where Status = 1. Smaller index, better cache
utilization, faster seeks.
Caveat: the query's WHERE clause has to match the filter predicate exactly for the optimizer to consider it.
How I investigate a slow query
- Get the actual execution plan (
SET STATISTICS IO, TIME ONor SSMS / Azure Data Studio) - Find the most expensive operator (fat arrows in the plan, high estimated rows vs. actual rows mismatch)
- Look for key lookups — almost always worth eliminating with INCLUDE columns
- Look for table scans — is the predicate selective enough for an index to help?
- Check missing index hints in the plan — they're a suggestion, not gospel, but worth considering
- Verify with
sys.dm_exec_query_statsor Query Store whether the query improved after the change
The thing nobody tells you
Indexes don't just slow down writes — they also consume buffer pool cache. An over-indexed table might be reading fast in dev (warm cache, small data volume) and slow in prod (cold cache, 200 million rows fighting for buffer pool space with your expensive indexes).
Index tuning is a practice, not a one-time fix. Check usage with
sys.dm_db_index_usage_stats periodically and drop indexes that haven't been used
since the last service restart. Unused indexes are write overhead with zero read benefit.
Got a query that's still slow after indexing? Most of the time it's one of three
things: statistics are stale (run UPDATE STATISTICS), the query is parameterized
poorly (parameter sniffing), or the shape of the query prevents seeks (implicit
conversions, functions on columns). Those are posts for another day.