Posted on 4 Comments

Dealing with NULLs

I frequently see a question fly past on why an expression like WHERE col = NULL does not come up with the desired result, even though it superficially looks perfectly sane. To address this, we can recap some high school maths, and at the same time finally find out that some things learnt there are actually very practical and useful later in life… I mean, who would have guessed that learning about sets and Venn diagrams was really giving you the fundamentals of relational databases?

So anyway, the proper form would be WHERE col IS NULL. NULLs can’t be compared in the normal way, because NULL is not a value. NULL is the absence of a value. If you show this visually, the domain of valid values for a column lies inside the circle of a Venn diagram, while the NULL is outside. It’s not a member of the domain.
There are also other functions for dealing with NULLs in queries, such as IFNULL(), COALESCE(), and so on.

One of Codd’s rules for relational databases deals with NULLs:

Rule 3: Systematic treatment of null values:
The DBMS must allow each field to remain null (or empty). Specifically, it must support a representation of “missing information and inapplicable information” that is systematic, distinct from all regular values, and independent of data type.

It’s good practise to always specify a column to be NOT NULL, unless you really need NULLs. With MySQL, there’s a real practical reason for this: it’s faster. Since NULL is not a value, it can’t be stored in the regular space for that particular column type. Consider a TINYINT UNSIGNED, which in its 8 bits has a range of 0-255. How or where would you indicate NULL? So, for columns that can be NULL, MySQL uses a separate bit in the row, and looks that bit first to check for NULL. By specifying NOT NULL for the column, this saves doing the extra compare operation. And thus it’s quicker. It may not seem like much, but all good little habits makes for fast apps.

Just for comparison, Oracle actually takes a shortcut with this by storing NULL the same as an empty string (“”). This makes Oracle break Codd’s rule#3 (it’s not using a distinct and separate from all regular values), so going by that assessment it can’t be a relational database. Haha funny. Wikipedia’s page on this already correctly notes: “the rules are so strict that all popular relational database systems fail Codd’s criteria in one way or another.” So there you go.

IFNULL() is a useful function: SELECT IFNULL(col,”N/A”) AS val
This will output N/A if col happens to be NULL.

COALESCE() is that one brilliant function you always forget the name of… it returns the first not-NULL:
SELECT COALESCE(col1,col2,col3) AS val
This will output the value of col2 if col1 happens to be NULL; col3 doesn’t matter in this case as COALECSE() just returns the first match from the list (and col2 happens to not be NULL. You wouldn’t believe how useful this function is! Remember it.

Posted on 4 Comments

4 thoughts on “Dealing with NULLs

  1. Discrete mathematics and relational databases are so strongly linked often I can barely discern the difference.

  2. That’s because it’s not a case of analogy, it’s simply the one being based on the other.
    So the connection is one of direct heritage.

  3. “NULL is not a value. NULL is the absence of a value.”

    “[NULL is] not a member of the domain.”

    Well, Codd may agree, but the SQL standard does not. From ISO 9075-1:2003; “4.4.2 The Null value”

    “Every data type includes a special value, called the null value, sometimes denoted by the keyword NULL.”

    “the null value is neither equal to any other value nor not equal to any other value — it is unknown whether or not it is equal to any given value”

    And this last statement is sufficient to explain why X = NULL can’t work – because by definition, we can’t know whether X is equal to NULL or not.

    Of course as you rightly point out – NULL is used to *represent* the absence or inapplicability of a value

    Personally, I do not see any contradiction in these different interpretations of NULL. It is just a difference of “NULL as an entity” and “NULL as a syntactical and computational construct”.

    (Where codd’s rule corresponds to the former, and the SQL null-value concept corresponds to the latter)

    It is easy to see why the concept of NULL-value as a ‘special’ value makes more sense then “the absence of a value”. If we really have to assume that NULL is the absence of a value then

    X = NULL

    would be the same as

    X =

    (If it really holds true that NULL is the absence of a value that I can substitute NULL with nothing)

    Of course, SQL the language does not agree:

    X =

    is a syntax error, and

    X = NULL

    is a perfectly valid expression (at least froma syntactical point of view)

  4. Late in the game here — not only is it faster, but many developers don’t take NULL as a possibility into account. For instance, count(col) doesn’t count NULL values, and there are many parts of code which will try to operate on columns, ie, CONCAT(a,b) or SUM(a,b) or a/b, which will cause many headaches if NULLs are allowed.

    Then again, sometimes you need NULL. Basically, if there’s no difference between ” and NULL for your app, use ”. But if there is, and sometimes you want a distinction between “blank” and “no data”, then you want NULL.

Comments are closed.