Hey there. In this article, I will share a few simple ways to speed up SQL queries and make them more efficient. The examples focus on PostgreSQL, but most of the ideas apply to other relational databases too.
The goal is not to memorize tricks. The goal is to understand why certain query shapes make it easier for the database engine to use indexes, avoid repeated scans, and return results faster.
1. Replace IN with a JOIN on a Virtual Table
Problem
Having a large list of values in an IN clause can lead to many checks for each row. With a small list this may not matter, but with long lists it can make the planner's job harder and add unnecessary CPU work.
Solution
Use a virtual table with VALUES, then join it to the main table. This can give the database a query shape that is easier to optimize.
-- BEFORE:
EXPLAIN
SELECT order_id, city
FROM orders
WHERE city IN ('Berlin', 'Paris', 'Rome');
-- AFTER:
EXPLAIN
SELECT o.order_id, o.city
FROM orders AS o
JOIN (
VALUES ('Berlin'), ('Paris'), ('Rome')
) AS v(city_name)
ON o.city = v.city_name;
Why It Can Be Faster
By treating the list of values as a table, PostgreSQL can sometimes create a more efficient execution plan than it would with a long IN clause.
This is especially useful when the value list is large, repeated, or generated dynamically by the application.
2. Use ANY(ARRAY[]) Instead of IN in PostgreSQL
Problem
A large IN list can slow down a query because each possible match must be checked.
Solution
In PostgreSQL, try the = ANY(ARRAY[...]) syntax. It can be clearer when the application already has values as an array, and PostgreSQL can stop checking once a match is found.
-- BEFORE:
EXPLAIN
SELECT product_id, quantity
FROM order_items
WHERE product_id IN (101, 202, 303, 404);
-- AFTER:
EXPLAIN
SELECT product_id, quantity
FROM order_items
WHERE product_id = ANY(ARRAY[101, 202, 303, 404]);
Why It Can Be Faster
ANY can reduce unnecessary comparisons in some cases, especially when combined with a clean query plan and indexed columns.
This technique is PostgreSQL-specific, so check behavior before applying it across different database engines.
3. Use JOIN Instead of a Correlated Subquery
Problem
Correlated subqueries run in relation to each row from the outer query. That can cause repeated scans of the same table and become expensive as the dataset grows.
Solution
Replace the correlated subquery with a regular JOIN or a non-correlated subquery when possible.
-- BEFORE:
EXPLAIN
SELECT c.customer_id, c.name
FROM customers AS c
WHERE EXISTS (
SELECT 1
FROM orders AS o
WHERE o.customer_id = c.customer_id
AND o.amount > 1000
);
-- AFTER:
EXPLAIN
SELECT DISTINCT c.customer_id, c.name
FROM customers AS c
JOIN orders AS o
ON c.customer_id = o.customer_id
WHERE o.amount > 1000;
Why It Can Be Faster
A JOIN gives PostgreSQL more room to use indexes and optimize the relationship between the two tables. It also avoids executing the same lookup repeatedly for every row in the outer query.
The DISTINCT is needed here because one customer can have several qualifying orders.
4. Use BETWEEN Instead of Date Functions
Problem
Expressions like this are common:
EXTRACT(YEAR FROM order_date) = 2026
The problem is that applying a function to a column can prevent the database from using a normal index on that column efficiently. PostgreSQL has to evaluate the function for many rows instead of scanning a clean date range.
Solution
Use explicit date ranges with BETWEEN, or even better, an inclusive lower bound and exclusive upper bound for timestamp columns.
-- BEFORE:
EXPLAIN
SELECT *
FROM orders
WHERE EXTRACT(YEAR FROM order_date) = 2026
AND EXTRACT(MONTH FROM order_date) = 5;
-- AFTER:
EXPLAIN
SELECT *
FROM orders
WHERE order_date BETWEEN '2026-05-01'::DATE
AND '2026-05-31'::DATE;
For timestamp columns, this shape is often safer:
SELECT *
FROM orders
WHERE order_date >= '2026-05-01'::DATE
AND order_date < '2026-06-01'::DATE;
Why It Can Be Faster
Filtering directly on the indexed date column enables an index range scan. The database can jump to the relevant part of the index instead of applying date functions row by row.
5. Use EXISTS Instead of JOIN When Checking for Existence
Problem
If you only need to know whether at least one related row exists, a JOIN can fetch more rows than necessary.
Solution
Use EXISTS. It can stop as soon as it finds the first matching row.
-- BEFORE:
EXPLAIN
SELECT COUNT(DISTINCT o.order_id)
FROM orders AS o
JOIN order_items AS i
ON o.order_id = i.order_id;
-- AFTER:
EXPLAIN
SELECT COUNT(DISTINCT o.order_id)
FROM orders AS o
WHERE EXISTS (
SELECT 1
FROM order_items AS i
WHERE i.order_id = o.order_id
);
Why It Can Be Faster
The query does not need to retrieve all related rows. Once PostgreSQL finds a matching record, the condition is satisfied and the engine can move on.
This is a clean pattern for permission checks, relationship checks, and filters where the actual joined data is not needed.
6. Apply Small Query Hygiene Improvements
Some optimizations are not dramatic on their own, but they add up quickly in production systems.
Select Only the Columns You Need
Avoid SELECT * when you only need a few fields. Pulling unnecessary columns increases IO, memory use, network transfer, and sometimes prevents index-only scans.
-- BEFORE:
SELECT *
FROM customers;
-- AFTER:
SELECT customer_id, name, email
FROM customers;
Add LIMIT When You Only Need a Sample
If the UI or script only needs a small subset of rows, use LIMIT.
SELECT order_id, created_at, status
FROM orders
ORDER BY created_at DESC
LIMIT 50;
Avoid Functions in WHERE Conditions
Prefer index-friendly predicates:
-- BEFORE:
WHERE SUBSTRING(code, 1, 3) = 'ABC'
-- AFTER:
WHERE code LIKE 'ABC%'
The second form gives the database a better chance to use an index on code.
Optimize Conditional Aggregations
In PostgreSQL, FILTER can make conditional counts clearer than repeated subqueries or verbose CASE expressions.
-- BEFORE:
EXPLAIN
SELECT SUM(CASE WHEN status = 'NEW' THEN 1 END) AS new_orders,
SUM(CASE WHEN status = 'CLOSED' THEN 1 END) AS closed_orders
FROM orders;
-- AFTER:
EXPLAIN
SELECT COUNT(*) FILTER (WHERE status = 'NEW') AS new_orders,
COUNT(*) FILTER (WHERE status = 'CLOSED') AS closed_orders
FROM orders;
Use Logical Expressions for Simple Boolean Checks
For simple boolean logic, this:
table1.is_deleted OR table2.is_deleted
is usually clearer than a large CASE expression that returns a boolean.
7. Use ROW_NUMBER() as an Alternative to DISTINCT
Goal
Extract unique values efficiently, especially in large datasets.
Solution
Sometimes ROW_NUMBER() with partitioning can be faster than DISTINCT, particularly when the partitioning columns are indexed and the query needs more control over which row is kept.
-- BEFORE:
EXPLAIN
SELECT COUNT(DISTINCT user_id)
FROM logins
WHERE login_date BETWEEN '2026-01-01'::DATE
AND '2026-01-31'::DATE;
-- AFTER:
EXPLAIN
SELECT COUNT(user_id)
FROM (
SELECT user_id,
ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY user_id) AS rn
FROM logins
WHERE login_date BETWEEN '2026-01-01'::DATE
AND '2026-01-31'::DATE
) AS tmp
WHERE rn = 1;
Why It Can Be Faster
ROW_NUMBER() can avoid a heavy distinct-sorting operation by assigning row numbers within each group. It also gives you more flexibility if you need to choose the newest, oldest, or highest-priority row per group.
That said, always verify with EXPLAIN ANALYZE. DISTINCT can still be the better choice for simple cases.
My Thoughts
SQL query optimization is a core step in any database-backed project. The best improvements often come from small query-shape changes:
- Replace long
INlists with virtual tables or PostgreSQL arrays when appropriate. - Use range predicates instead of applying functions to indexed columns.
- Choose
EXISTSwhen you only need to check whether a related row exists. - Avoid unnecessary columns, rows, repeated scans, and heavy sorting.
- Keep queries readable enough that the next engineer can maintain them.
The most important habit is to measure. Use EXPLAIN, EXPLAIN ANALYZE, real data volumes, and realistic filters. A query that looks elegant in isolation may behave differently under production data distribution.
Good SQL performance is not magic. It is usually the result of making the database do less unnecessary work.


