Posted on 11 Comments

Abuse of LEFT JOIN

I teach the exact differences and use of the different join types and keywords in my courses (specifically the Optimisation by Design day). A few tidbits:

  • Left and right joins are outer joins.
  • A,B is an inner join (equivalent to a cross join if you don’t have a join condition).

But the point of this post… when looking at apps, I often see LEFT JOINs abused. That is, the query does not actually require a left join. The result might be ok, but that entirely depends on the data contained in the tables; no guarantee.

Apart from understanding joins, I’ve found that there’s a very simple but effective test for whether or not to use a left join…. “if you can’t think of a specific reason, you don’t” 😉

Posted on 11 Comments

11 thoughts on “Abuse of LEFT JOIN

  1. “LEFT JOIN” seems to be some sort of cargo cult, i heard rumors that it was faster than a straight join on some product in a distant past but don’t remember any details …

    And then there is a small fraction that just uses it as they think it makes their queries look smart and advanced without actually having any clue what it does … i even had a customer a while ago that called me up to report that he had found a bug in MySQL with queries returning erroneous NULL results (and i’m not even sure whether that guy ever got my point ;/ )


    hartmut

  2. “A,B is an inner join (equivalent to a cross join if you don’t have a join condition).”

    It turns out that the term ‘join condition’ is not really firmly defined – not in the case of the “comma join”.

    You can try something like ‘a join condition is a condition that relates two joined tables by comparing their columns’ or something like that. But that leaves us with:

    A INNER JOIN B
    ON B.Column = ‘Value’

    may seem like nonsense, but what if you have:

    A LEFT JOIN B
    ON B.Column = ‘Value’

    Surely, the condition following the ON can’t be anything else but the join condition, even though it does not serve to relate tables to each other. See also the standard:

    <join condition> ::= ON <search condition>

    So, now we move on:

    A INNER JOIN B
    ON TRUE

    I can’t call this an inner join (the concept), although it is an INNER JOIN (the syntax). By that same token,

    A INNER JOIN B — omitted ON clause

    is an INNER JOIN (the syntax) but not an inner join (the concept)
    Of course the real wtf is that MySQL allows this at all – in standard SQL the INNER JOIN syntax requires a join specification, that is either a USING or an ON clause.

    Similarly, this:

    A CROSS JOIN B
    ON A.ID = B.A_ID

    Cannot be called a cross join, although it is a CROSS JOIN. (Same wtf. MySQL allows adding an ON or USING clause, which is non-standard, and if you ask me, non-sense too)

  3. I recommend always specifying the type of join you want — that’s why I always put INNER JOIN when I mean it, not just JOIN. That way if I really want:

    INNER JOIN
    LEFT JOIN
    CROSS JOIN

    then the person reading my queries knows that’s exactly what I want. I NEVER use comma joins or just “JOIN”. In that way the queries become just a bit more self-documenting.

  4. Yep I’d have much preferred MySQL to enforce the ON/USING clause when using explicit join syntax. However, it hasn’t in the past, so enforcing it in the future is impossible without breaking things.
    I suppose it could be changed with an SQL_MODE and that may indeed be an idea. I wasn’t aware of the exact SQL standard for the join syntax (although I do have “SQL-99 Complete, Really” on the bookshelf and could therefore check 😉
    So, perhaps we can suggest a patch to enforce it with an SQL_MODE setting.

  5. I agree and that’s what I teach.
    Although I admit I have actually used JOIN rather than INNER JOIN in my own apps…. LEFT JOIN and CROSS JOIN however I always spell out like that.

    No need for LEFT OUTER JOIN really, that’s really fluff, unless we want to re-iterate that a left join is really always an outer join 😉

  6. Just to add that MySQL changes left join to an inner join automatically, if the query allows to – for example, if all rows having NULL for the second table are filtered out by WHERE condition.

    It helps the optimizer to select a better execution plan, as inner-joined tables can be joined in any order.


    Sergei

  7. True – but let’s stick with the main issue and not encourage bad habits to persist 😉

  8. Regardless of whether you use optional keywords or not, the syntax is only self-documenting if the reader has the remotest clue of the difference between an inner and outer join. Comments above suggest that there are plenty of users who don’t have such a clue!

  9. in my OLAP applications it is the other way round: never use an INNER JOIN if you don’t have a specific reason for it. 🙂

    there is data from many sources, some sources only delivering a subset of the columns, some rows containing mostly dummy data to get external data into the db at all, the key fields have OCR or data entry errors, some people have local extended versions of key tables which means keys can be locally legal but unknown to us. to not inadvertently lose data rows just by looking up something it is important to get used to LEFT JOIN in any but the most controlled situations.

  10. Most probably CROSS JOIN in MySQL is not true cross join because of historical reasons, just like Oracle and SQL Server have old outer join operators, which are not in SQL standard. Of course this syntax used with ON clause is really a way to write unmaintainable code.
    I think most of the time Left joins are used because
    1) developers don’t know the difference of join types at all
    2) just to be safe, in some mystical case the data are not available in outer joined table.
    It is also worth to remember that join conditions can always be true for example
    SELECT * FROM A INNER JOIN B ON (1=1) and this leads to cross (cartesian) join. The same is true also for outer joins. Therefore only the knowledge and accuracy of more or less guarantee query to return data as expected.
    And speaking about join conditions vs conditions in where clause for inner joins it is not relevant where these are put. But for outer joins that really does matter. It is further described in my article about SQL join types.

  11. Sorry, I think I have to disagree.
    I think using an ON (or USING) clause is much more maintainable than putting a join condition in the WHERE. Of course some outer join situations *require* that specific conditions go in the ON, but that’s not the main reason.
    With many joins, the WHERE just becomes spaghetti.

Comments are closed.