Hey there. In this article, I will share a few simple ways to speed up SQL queries and make them more efficient. The examples focus on PostgreSQL, but most of the ideas apply to other relational databases too.
The goal is not to memorize tricks. The goal is to understand why certain query shapes make it easier for the database engine to use indexes, avoid repeated scans, and return results faster.

1. Replace IN with a JOIN on a Virtual Table
Problem (1)
Having a large list of values in an IN clause can lead to many checks for each row. With a small list this may not matter, but with long lists it can make the planner’s job harder and add unnecessary CPU work.
Solution (1)
Use a virtual table with VALUES, then join it to the main table. This can give the database a query shape that is easier to optimize.
-- BEFORE:EXPLAINSELECT order_id, city FROM orders WHERE city IN ('Berlin', 'Paris', 'Rome');
-- AFTER:EXPLAINSELECT o.order_id, o.city FROM orders AS o JOIN ( VALUES ('Berlin'), ('Paris'), ('Rome') ) AS v(city_name) ON o.city = v.city_name;Why It Can Be Faster (1)
By treating the list of values as a table, PostgreSQL can sometimes create a more efficient execution plan than it would with a long IN clause.
This is especially useful when the value list is large, repeated, or generated dynamically by the application.
2. Use ANY(ARRAY[]) Instead of IN in PostgreSQL
Problem (2)
A large IN list can slow down a query because each possible match must be checked.
Solution (2)
In PostgreSQL, try the = ANY(ARRAY[...]) syntax. It can be clearer when the application already has values as an array, and PostgreSQL can stop checking once a match is found.
-- BEFORE:EXPLAINSELECT product_id, quantity FROM order_items WHERE product_id IN (101, 202, 303, 404);
-- AFTER:EXPLAINSELECT product_id, quantity FROM order_items WHERE product_id = ANY(ARRAY[101, 202, 303, 404]);Why It Can Be Faster (2)
ANY can reduce unnecessary comparisons in some cases, especially when combined with a clean query plan and indexed columns.
This technique is PostgreSQL-specific, so check behavior before applying it across different database engines.
3. Use JOIN Instead of a Correlated Subquery
Problem (3)
Correlated subqueries run in relation to each row from the outer query. That can cause repeated scans of the same table and become expensive as the dataset grows.
Solution (3)
Replace the correlated subquery with a regular JOIN or a non-correlated subquery when possible.
-- BEFORE:EXPLAINSELECT c.customer_id, c.name FROM customers AS c WHERE EXISTS ( SELECT 1 FROM orders AS o WHERE o.customer_id = c.customer_id AND o.amount > 1000 );
-- AFTER:EXPLAINSELECT DISTINCT c.customer_id, c.name FROM customers AS c JOIN orders AS o ON c.customer_id = o.customer_id WHERE o.amount > 1000;Why It Can Be Faster (3)
A JOIN gives PostgreSQL more room to use indexes and optimize the relationship between the two tables. It also avoids executing the same lookup repeatedly for every row in the outer query.
The DISTINCT is needed here because one customer can have several qualifying orders.
4. Use BETWEEN Instead of Date Functions
Problem (4)
Expressions like this are common:
EXTRACT(YEAR FROM order_date) = 2026The problem is that applying a function to a column can prevent the database from using a normal index on that column efficiently. PostgreSQL has to evaluate the function for many rows instead of scanning a clean date range.
Solution (4)
Use explicit date ranges with BETWEEN, or even better, an inclusive lower bound and exclusive upper bound for timestamp columns.
-- BEFORE:EXPLAINSELECT * FROM orders WHERE EXTRACT(YEAR FROM order_date) = 2026 AND EXTRACT(MONTH FROM order_date) = 5;
-- AFTER:EXPLAINSELECT * FROM orders WHERE order_date BETWEEN '2026-05-01'::DATE AND '2026-05-31'::DATE;For timestamp columns, this shape is often safer:
SELECT * FROM orders WHERE order_date >= '2026-05-01'::DATE AND order_date < '2026-06-01'::DATE;Why It Can Be Faster (4)
Filtering directly on the indexed date column enables an index range scan. The database can jump to the relevant part of the index instead of applying date functions row by row.
5. Use EXISTS Instead of JOIN When Checking for Existence
Problem (5)
If you only need to know whether at least one related row exists, a JOIN can fetch more rows than necessary.
Solution (5)
Use EXISTS. It can stop as soon as it finds the first matching row.
-- BEFORE:EXPLAINSELECT COUNT(DISTINCT o.order_id) FROM orders AS o JOIN order_items AS i ON o.order_id = i.order_id;
-- AFTER:EXPLAINSELECT COUNT(DISTINCT o.order_id) FROM orders AS o WHERE EXISTS ( SELECT 1 FROM order_items AS i WHERE i.order_id = o.order_id );Why It Can Be Faster (5)
The query does not need to retrieve all related rows. Once PostgreSQL finds a matching record, the condition is satisfied and the engine can move on.
This is a clean pattern for permission checks, relationship checks, and filters where the actual joined data is not needed.
6. Apply Small Query Hygiene Improvements
Some optimizations are not dramatic on their own, but they add up quickly in production systems.
Select Only the Columns You Need
Avoid SELECT * when you only need a few fields. Pulling unnecessary columns increases IO, memory use, network transfer, and sometimes prevents index-only scans.
-- BEFORE:SELECT * FROM customers;
-- AFTER:SELECT customer_id, name, email FROM customers;Add LIMIT When You Only Need a Sample
If the UI or script only needs a small subset of rows, use LIMIT.
SELECT order_id, created_at, status FROM orders ORDER BY created_at DESC LIMIT 50;Avoid Functions in WHERE Conditions
Prefer index-friendly predicates:
-- BEFORE:WHERE SUBSTRING(code, 1, 3) = 'ABC'
-- AFTER:WHERE code LIKE 'ABC%'The second form gives the database a better chance to use an index on code.
Optimize Conditional Aggregations
In PostgreSQL, FILTER can make conditional counts clearer than repeated subqueries or verbose CASE expressions.
-- BEFORE:EXPLAINSELECT SUM(CASE WHEN status = 'NEW' THEN 1 END) AS new_orders, SUM(CASE WHEN status = 'CLOSED' THEN 1 END) AS closed_orders FROM orders;
-- AFTER:EXPLAINSELECT COUNT(*) FILTER (WHERE status = 'NEW') AS new_orders, COUNT(*) FILTER (WHERE status = 'CLOSED') AS closed_orders FROM orders;Use Logical Expressions for Simple Boolean Checks
For simple boolean logic, this:
table1.is_deleted OR table2.is_deletedis usually clearer than a large CASE expression that returns a boolean.
7. Use ROW_NUMBER() as an Alternative to DISTINCT
Goal
Extract unique values efficiently, especially in large datasets.
Solution (7)
Sometimes ROW_NUMBER() with partitioning can be faster than DISTINCT, particularly when the partitioning columns are indexed and the query needs more control over which row is kept.
-- BEFORE:EXPLAINSELECT COUNT(DISTINCT user_id) FROM logins WHERE login_date BETWEEN '2026-01-01'::DATE AND '2026-01-31'::DATE;
-- AFTER:EXPLAINSELECT COUNT(user_id) FROM ( SELECT user_id, ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY user_id) AS rn FROM logins WHERE login_date BETWEEN '2026-01-01'::DATE AND '2026-01-31'::DATE ) AS tmp WHERE rn = 1;Why It Can Be Faster (7)
ROW_NUMBER() can avoid a heavy distinct-sorting operation by assigning row numbers within each group. It also gives you more flexibility if you need to choose the newest, oldest, or highest-priority row per group.
That said, always verify with EXPLAIN ANALYZE. DISTINCT can still be the better choice for simple cases.
My Thoughts
SQL query optimization is a core step in any database-backed project. The best improvements often come from small query-shape changes:
- Replace long
INlists with virtual tables or PostgreSQL arrays when appropriate. - Use range predicates instead of applying functions to indexed columns.
- Choose
EXISTSwhen you only need to check whether a related row exists. - Avoid unnecessary columns, rows, repeated scans, and heavy sorting.
- Keep queries readable enough that the next engineer can maintain them.
The most important habit is to measure. Use EXPLAIN, EXPLAIN ANALYZE, real data volumes, and realistic filters. A query that looks elegant in isolation may behave differently under production data distribution.
Good SQL performance is not magic. It is usually the result of making the database do less unnecessary work.
