Friend and former colleague Roland Bouwman has written an excellent update on the GROUP BY implementation in MySQL.
MySQL’s implementation of GROUP BY has historically been quirky. Sometimes that quirkiness has been useful, but often it causes grief as SQL authors can make mistakes that are executed but don’t produce the results they want (or expect).
SELECT cat, COUNT(val) as cnt, othercol FROM tbl GROUP BY cat
The ‘cat‘ column is in the GROUP BY clause, the COUNT(val) is an aggregate, but the ‘othercol‘ column is … well… neither. What used to effectively happen is that the server would pick one othercol value from within each group. As I noted before, sometimes useful but often a pest as the server wouldn’t know if you just made a mistake or whether you actually intended to use this ‘feature’.
The long existent sql_mode option ONLY_FULL_GROUP_BY mitigated some of that (but not fully in all cases – see Roland’s explanation).
With the sql_mode option enabled, the server chucks an error for faulty constructs, the most common one being:
Error: 1055 SQLSTATE: 42000 (ER_WRONG_FIELD_WITH_GROUP)
Message: ‘col’ isn’t in GROUP BY
MySQL 5.7.5 and above have ONLY_FULL_GROUP_BY enabled by default, with an updated implementation, so GROUP BY will now generally behave properly.
We know that many existing applications often rely on the previously “standard” but technically quirky/faulty behaviour. So when upgrading to this version or beyond, it’s important to
- use and test in development and staging environment first. This also means upgrading your dev environment first.
- scan the code for GROUP BY use, and ensure that the queries are correct.
- have proper error handling and reporting in your application.
- in some cases disable the sql_mode ONLY_FULL_GROUP_BY option, but that’s really a last resort. It’s best to fix the application.
When you write your create table statements, always make sure that you make them non-ambiguous. That way even though other servers might have different configurations, you make sure your table will be created in the same way.
Imagine for instance you are developing an application on a development server, nicely storing all the scripts you need to create the same database on your production server. If the same script creates a table differently on both servers, that might cause you a lot of headache later on. At Open Query, we strive to minimise (or preferrably eliminate) headaches.
One of the parts of the create table statement that has the largest impact is the storage engine specification. When you omit the storage engine from the create table statement, your table is automatically created with the default storage engine type configured for the server. Since the storage engine is a very important choice when designing your tables, you want to make sure that it is always the correct type.
Here’s an example: instead of writing CREATE TABLE city (city_id int, city_name varchar(100)) you should write: CREATE TABLE city (city_id int, city_name varchar(100)) ENGINE=InnoDB
It is a simple adjustment, but it will save you from possible problems if you just make a habit out of specifying the storage engine.
But wait, there one more thing! It’s also very important you have sql_mode=NO_ENGINE_SUBSTITUTION in your my.cnf, otherwise your table may still silently become a MyISAM table if your desired engine (for any reason) is disabled in the binary, configuration, or at runtime. With this setting error, such a situation will cause an error – so you know for sure.
In our good practice / bad practice series, we will provide you with byte/bite sized pieces of advice on what we consider good (or bad) practice in MySQL-land. The topics can be just about anything, so expect random things to come up. Also, the level of advancedness greatly varies. A topic might be a no-brainer for some, a reminder for others and a revelation for a third person. We strive to tender to all of you!