Posted on

Open Query looking for new colleagues!

My colleagues and I are looking for extra talent – is that you?

What we do:help clients prevent problems (rather than being the fire department), we work on a subscription basis although we also do some ad-hoc consulting, and training. Apart from MySQL/MariaDB query and DBA work, we do quite a bit of system administration. Mainly Red Hat and Debian based distros, and expect to see replication and the MySQL-MMM multi-master system. You’d work from home, whereever it might be, so you will need to be self-motivating (but we do keep in touch online).

What we’re not: a full-time employer. With us, you make a life rather than a living. Everybody is contracted part-time. You can make enough to live comfortably, but that has nothing to do with hours. If you’re stressed about not filling all hours in your week with work-work-work, we’re not the company for you… there’s more to life than work, and we feel that’s really important.

Haven’t scared you off yet? Groovy. Take a peek at our jobs page for additional detail and contact info. Hope to hear from you!

Posted on
Posted on

Various Anniversaries

This week, ten years ago, I was in London for MySQL AB‘s first “train the trainer” course, also meeting (for the first time) my first boss at MySQL Kaj. I’d been hired mid August as employee#25, also doing training but actually primarily as tech-writer for the MySQL documentation (taking over from Jeremy Cole, and essentially I was the documentation team for quite some time ;-). So from this you can deduce that yes, I was hired without meeting either Kaj or anyone in-person! I don’t think we even had a phone call, only email. Oh the days 😉

The training week itself was of course disrupted quite a bit by the events in New York. We had Jeremy who had come on a UA flight from the US, and others from all over the place… it also taught some students a lesson about browsing the net while in a training course, it can end up very distracting.

The oddest event I remember about that particular trip happened upon departure from Heathrow: someone with a clipboard went round the long queues asking whether anybody was carrying eyebrow tweezers. No other items/questions, just that.

I stayed with MySQL for about 6 years, until with a brief break, I started my own company Open Query in 2007 (about half a year before the Sun acquisition). So this September marks the 4-year anniversary of that event already. I spotted an old business card earlier, reminding me that early on we did not just MySQL consulting and training, but also (OSS) business advice – that’s now essentially spun off to Upstarta.

The MySQL side of my business has changed quite significantly as well, going from the usual reactive consulting to proactive subscriptions, in part based on Pythian‘s successful model. A key difference has been that we don’t do emergencies. This disruptive shift happened somewhat by luck, after a talk at Linux Users of Victoria in April 2009. Ben Dechrai made a video recording of this interactive “Relax! A Failure is not an Emergency…” try-out. It also mentioned the BlueHackers initiative/stickers.

While some people including competitors regard our “no emergencies” approach as nuts ;-), it has worked out very well and apart from making customers happy it’s created the sane lifestyle I was looking for (so I could spend more time with my daughter), and enabled contracting others as well. We’re still growing organically, having adapted our internal tools and processes for the proactive service approach along the way – obviously, it’s now more about project management than handling incident tickets.

Like my time at MySQL AB, my journey since then has so far proven interesting, educational, and mostly enjoyable. Later in the year I aim to once again buy a house with a modest garden. And it’s the independence that’ll have made that -and my other explorations- possible. Who knows what lies ahead – most fun when you create your own future!

Posted on
Posted on

On Password Strength

XKCD (as usual) makes a very good point – this time about password strength, and I reckon it’s something app developers need to consider urgently. Geeks can debate the exact amount of entropy, but that’s not really the issue: insisting on mixed upper/lower and/or non-alpha and/or numerical components to a user password does not really improve security, and definitely makes life more difficult for users.

So basically, the functions that do a “is this a strong password” should seriously reconsider their approach, particularly if they’re used to have the app decide whether to accept the password as “good enough” at all.

Update: Jeff Preshing has written an xkcd password generator. Users probably should choose their own four words, but it’s a nice example and a similar method could be used by an app to give “password suggestions” that are still safe.

Posted on
Posted on 2 Comments

HDlatency – now with quick option

I’ve done a minor update to the hdlatency tool (get it from Launchpad), it now has a –quick option to have it only do its tests with 16KB blocks rather than a whole range of sizes. This is much quicker, and 16KB is the InnoDB page size so it’s the most relevant for MySQL/MariaDB deployments.

However, I didn’t just remove the other stuff, because it can be very helpful in tracking down problems and putting misconceptions to rest. On SANs (and local RAID of course) you have things like block sizes and stripe sizes, and opinions on what might be faster. Interestingly, the real world doesn’t always agree with the opinions.

We Mark Callaghan correctly pointed out when I first published it, hdlatency does not provide anything new in terms of functionality, the db IO tests of sysbench cover it all. A key advantage of hdlatency is that it doesn’t have any dependencies, it’s a small single piece of C code that’ll compile on or can run on very minimalistic environments. We often don’t control what the base environment we have to work on is, so that’s why hdlatency was initially written. It’s just a quick little tool that does the job.

We find hdlatency particularly useful for comparing environments, primarily at the same client. For instance, the client might consider moving from one storage solution to another – well, in that case it’s useful to know whether we can expect an actual performance benefit.

The burst data rate (big sequential read or write) which often gets quoted for a SAN or even an individual disk is of little interest to database use, since its key performance bottleneck lies in random access I/O. The disk head(s) will need to move. So it’s important to get some real relevant numbers, rather than just go with magic vendor numbers that are not really relevant to you. Also, you can have a fast storage system attached via a slow interface, and consequentially the performance then will not be at all what you’d want to see. It can be quite bad.

To get an absolute baseline on what are sane numbers, run hdlatency also on a local desktop HD. This may seem odd, but you might well encounter storage systems that show a lower performance than that. ‘nuf said.

If you’re willing to share, I’d be quite interested in seeing some (–quick) output data from you – just make sure you tell what storage it is: type of interface, etc. Simply drop it in a comment to this post, so it can benefit more people. thanks

Posted on 2 Comments
Posted on

Slides from DrupalDownUnder2011 on Tuning for Drupal

By popular request, here’s the PDF of the slides of this talk as presented in January 2011 in brisbane; it’s fairly self-explanatory. Note that it’s not really extensive “tuning”, it just fixes up a few things that are usually “wrong” in default installs, creating a more sane baseline. If you want to get to optimal correctness and more performance, other things do need to be done as well.

Posted on
Posted on

Open Query, new on Fifth Ave

Some of you already know since you helped us move, we recently shifted Open Query’s main office to Fifth Avenue, next door to Elizabeth’s. The new place is comfortable, I really like it so far. Anna is also happy with her new admin space and cat Figaro has found an empty spot on a bookshelf to stretch out on!

The lease costs are a bit steep, as is common these days… chances are we’ll just buy our next place.


Follow-Up yes this was an April 1st post. But, everything in the above post is the truth, it’s just phrased to be very open for a bit of mis-interpretation 😉

I find that the real world provides plenty of fun and unbelievable yet true tidbits, so why bother making up nonsense!

Posted on
Posted on 3 Comments

Importing a file dumped from MySQL with mysqldump into drizzle

As a big fan of new technology, we try to keep up to date with what’s happening in the industry. As such, I decided to start using drizzle on my development machine since they announced GA this week.
First exercise: import a file dumped from a MySQL server I don’t have access to into drizzle. Normally, you can use drizzledump on the mysql server and make it dump a drizzle compatible file. Not in this case, so I decided to sed my way through the various errors. Not pretty, and I hope that at some point we’ll have a tool that can convert a mysqldump into a drizzle compatible file, but it works for now.
Here’s what I had to do. Note that this is by no means complete or comes with any guarantees, it’s just a starting point.
# This file started by setting a SQL_MODE. That doesn't exist in 
# drizzle, so we comment it out
sed -i "s/^SET SQL_MODE/#SET SQL_MODE/g" mysqldump.sql 

# The create database statement set a default character set. 
# Everything in drizzle is UTF8, so let's lose it!
sed -i "s/DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci//g" mysqldump.sql 

# The table definitions mentioned a default character set. 
# Everything in drizzle is UTF8, so let's lose it!
sed -i 's/DEFAULT CHARSET=utf8//g' mysqldump.sql 

# No MyISAM except for temporary tables, so away with it.
sed -i 's/ENGINE=MyISAM//g' mysqldump.sql 

# Invalid timestamps are not accepted in drizzle, so this should be a null 
# value. Since some of the columns in this file are actually NOT NULL defined, 
# for now I just set those dates to 1970. UGLY, but works for me. Don't do this 
# on anything that will ever go anywhere near production though!
sed -i "s/'0000-00-00/'1970-01-01/g" mysqldump.sql 

# tinyint doesn't exist anymore, so just replace with integer. Note that you'll 
# have to do this for all data types that no longer exist in drizzle
sed -i "s/tinyint(.*)/integer/g" mysqldump.sql
Hope this helps others!
Posted on 3 Comments
Posted on 2 Comments

Cache pre-loading on mysqld startup

The following quirky dynamic SQL will scan each index of each table so that they’re loaded into the key_buffer (MyISAM) or innodb_buffer_pool (InnoDB). If you also use the PBXT engine which does have a row cache but no clustered primary key, you could also incorporate some full table scans.

To make mysqld execute this on startup, create /var/lib/mysql/initfile.sql and make it be owned by mysql:mysql

SET SESSION group_concat_max_len=100*1024*1024;
SELECT GROUP_CONCAT(CONCAT('SELECT COUNT(`',column_name,'`) FROM `',table_schema,'`.`',table_name,'` FORCE INDEX (`',index_name,'`)') SEPARATOR ' UNION ALL ') INTO @sql FROM information_schema.statistics WHERE table_schema NOT IN ('information_schema','mysql') AND seq_in_index = 1;
PREPARE stmt FROM @sql;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;
SET SESSION group_concat_max_len=@@group_concat_max_len;

and in my.cnf add a line in the [mysqld] block

init-file = /var/lib/mysql/initfile.sql

That’s all. mysql reads that file on startup and executes each line. Since we can do the whole select in a single (admittedly quirky) query and then use dynamic SQL to execute the result, we don’t need to create a stored procedure.

Of course this kind of simplistic “get everything” only really makes sense if the entire dataset+indexes fit in memory, otherwise you’ll want to be more selective. Still, you could use the above as a basis, perhaps using another table to provide a list of tables/indexes to be excluded – or if the schema is really stable, simply have a list of tables/indexes to be included instead of dynamically using information_schema.

Practical (albeit niche) application:

In a system with multiple slaves, adding in a new slave makes it start with cold caches, but since with loadbalancing it will pick up only some of the load it often works out ok. However, some environments have dual masters but the application is not able to do read/write splits to utilise slaves. In that case all the reads also go to the active master. Consequentially, the passive master will have relatively cold caches (only rows/indexes that have been updated will be in memory) so in case of a failover the amount of disk reads for the many concurrent SELECT queries will go through the roof – temporarily slowing the effective performance to a dismal crawl: each query takes longer with the required additional disk access so depending on the setup the server may even run out of connections which in turn upsets the application servers. It’d sort itself out but a) it looks very bad on the frontend and b) it may take a number of minutes.

The above construct prevents that scenario, and as mentioned it can be used as a basis to deal with other situations. Not many people know about the init-file option, so this is a nice example.

If you want to know how the SQL works, read on. The original line is very long so I’ll reprint it below with some reformatting:

SELECT GROUP_CONCAT(CONCAT(
  'SELECT COUNT(`',column_name,'`)
          FROM `',table_schema,'`.`',table_name,
          '` FORCE INDEX (`',index_name,'`)'
       ) SEPARATOR ' UNION ALL ')
  INTO @sql
  FROM information_schema.statistics
  WHERE table_schema NOT IN ('information_schema','mysql')
  AND seq_in_index = 1;

The outer query grabs each regular db/table/index/firstcol name that exists in the server, writing out a SELECT query that counts all not-NULL values of the indexed column (so it must scan the index), forcing that specific index. We then abuse the versatile and flexible GROUP_CONCAT() function to glue all those SELECTs together, with “UNION ALL” inbetween. The result is a single very long string, so we need to tweak the maximum allowed group_concat output beforehand to prevent truncation.

Posted on 2 Comments
Posted on 6 Comments

Oracle Blamed for Laws of Nature

A catchy headline, and I believe more accurate than Oracle Puts the Squeeze on SMBs with MySQL Price Hike (Network World) and MySQL price hikes reveal depth of Oracle’s wallet love [MySQL Jacking up MySQL Prices] (The Register). Slightly more realistic is Oracle kills low-priced MySQL support (again The Register).

First, let’s review what Oracle has actually done: they ditched the MySQL enterprise Basic and Silver offerings. For Oracle, that makes sense. Their intended client base is “enterprise” (high end, think big corporates) and their MySQL sales and cost structure reflects this. It’s not a new thing that came with MySQL at Oracle, because MySQL at Sun Microsystems and MySQL AB before it had the same approach.

A company simply cannot operate below its market – that is not simply a matter of choice, instead it is dictated by their processes and cost structure. Smart people like Clayton Christensen at Harvard Business School have done ample research on this, here I’ll just give one simple example:

If you hire a sales person on commission and their quarterly quota is $100k, then they have to talk with clients that have at least a $10k-$20k potential (qualified leads), and they need to close (sign contract) with at least 10 within the period. They simply cannot spend any time on talking with potential $1k customers.

We may lament this state of affairs, but you can see how, given the choices made (sales person hired, commission system, quota), it’s as inevitable as an apple falling when you drop it. The way I describe this at Upstarta: if a company wants different results, they need to make sure that their business processes and cost structure lead them in that direction. But the simple fact is that most companies don’t have an internal feedback cycle that keeps an eye on these things, so they just go with the flow of consequences of common choices: aim for large(r) clients, grow turnover, get higher operational costs along the way – that in itself is a cycle and the only direction this particular one can go is up. As a natural consequence, over time old low-end offerings and clients need to be jettisoned – one way or another.

I say horay for Oracle to finally acknowledge this, since Sun Microsystems and MySQL AB before it did not (for whatever reason). This is years overdue. Whether the original MySQL company should have aimed to also serve smaller clients also is an entirely separate topic – and one which I covered at length previously (including internally in my time at MySQL AB), but it’s very much a station long passed. Once you float upward in the market, you can’t operate or move downward.

Now, are SMBs using MySQL actually getting squeezed by Oracle? They are not. There is no lock-in. This is about service contracts, not licensing. As we all know, MySQL is GPL licensed and internal use (even on a website or SaaS offering) is well within GPL parameters. There are a number of different companies offering service for MySQL, different types of service and delivery models and a corresponding wide range of pricing. So SMBs and anyone else has a choice, each can pick the type of service most suited to their needs. Let us celebrate and promote that freedom within the MySQL ecosystem, rather than being outraged about dropped apples falling!

Posted on 6 Comments