Posted on

Query pattern: OR across different tables

When a query uses a construct like

SELECT ... FROM a JOIN b ON (...) WHERE a.c1 = X OR b.c2 = Y

execution will inevitably degrade as the dataset grows.

The optimiser can choose to use an index merge when dealing with two relevant indexes over a single table, but that’s obviously of no use in this scenario as the optimiser has to choose which table to access first. And regardless of which table is accessed first, the other one might yield a result. Thus the query will never be efficient.

The real answer is that the query construct is wrong, a JOIN is used inappropriately. The correct approach for this type of query is using a UNION:

SELECT ... FROM a WHERE a.c1 = X
UNION [ALL]
SELECT ... FROM b WHERE b.c2 = Y

This mistake occurs relatively often, because while on the one hand people try to reduce the number of queries necessary to achieve their objective, they are (somewhat) familiar with JOINs and either don’t know about UNION or use it so rarely that they don’t think of it when the question calls for its use. So, this is your reminder 😉

Posted on