Posted on

Chicken & egg problem, or flawed design?

Consider the following query:

SELECT substring_index(column_name, ",", -1) AS blah
  FROM table_name WHERE blah='something'

Naturally, that returns with ERROR 1054 (42S22): Unknown column ‘blah’ in ‘where clause’, since you can’t have calculated columns in the WHERE clause (chicken & egg problem).

Strictly speaking, you can easily resolve this by using HAVING instead of WHERE. The query will work as desired…. but damn it’ll be slow! The server will scan through every row in the table, calculate blah, and then apply the filter.

Not good eh. So what’s the actual problem? Design. Not just the query, but the schema. Multi-valued columns don’t just break a rule of Codd, they are simply a bad idea if you need to do any search operation on them – any queries dealing with such designs will be overly complex, ugly, and slow.

Normalisation is your friend here! (and at most other times too 😉

Posted on