Arjen's journal

Syndicate content
Open Query: MySQL, Open Source & other ponderings Arjen Lentz 2008-07-03T11:49:44Z
Updated: 51 min 19 sec ago

FORCE INDEX

Thu, 03/07/2008 - 11:36
SELECT ... FROM tblProduct as prd FORCE INDEX(vProductName,iCategoryId,eProductType,vProductImage,fPrice,eProductStatus,dProductDateAdded) LEFT OUTER JOIN (SELECT bidjoin.* FROM tblProduct as prdjoin, tblBid as bidjoin WHERE bidjoin.iProductId = prdjoin.iProductId) bid ON prd.iProductId = bid.iProductId , tblCategory as cat FORCE INDEX(vCategoryName,eCategoryStatus,dCategoryDateAdded) , tblAuctionProduct as aucprd FORCE INDEX(iRequiredBids,iProductId,eAuctionType,iFreeBidLevel,fAdminFee,dDateClosed) , tblPurchaseProduct as purprd FORCE INDEX(dProductCloseDate,iProductId,iTotalInventory,dLastUpdated)I know, unreadable. But the main issue today is the FORCE INDEX. This is just one query (real, with permission - I just adjusted some identifiers) from an app that is litterally full of queries using FORCE INDEX. So what does FORCE INDEX do? It forces the MySQL optimiser to use the specified index (or choose from one of the specified, if multiple), even if it reckons it's not the best choice. Likewise there's an IGNORE INDEX modifier that denies the optimiser the choice for using a specified index in a particular query.

I generally hold that these modifiers should only be used for testing things and tracking down optimiser issues (not that common these days, but 3.23 had plenty that 4.0 fixed). This because when you hardcode them into an app, perhaps fixing a real problem you see *now*, your data will still change over time and so the optimiser needs the freedom to change its choices too. It might be ok for a week, a month, or even a year, but at some point it's going to cost.

The above query takes about 5 times as long compared to the same query without the FORCE modifiers. Considering the app is chockers with it, I'm guessing that it might have been coded by someone who didn't know that MySQL actually has an optimiser at all. Who can say... anyway it serves here as an extreme example.

The take-away message is: use these modifiers with care, and be very hesitant ever adding it to production code. There's generally always a better way to resolve whatever problem you might be observing.

OSS-based appliances: Cybersource/datasafe

Tue, 01/07/2008 - 00:47
I'm pretty happy with my 1TB Apple Time Capsule. Bought it while at the MySQL Conf in April, and it does the right thing for my situation.

Con Zymaris and his crowd at Cybersource in Melbourne made something like this ages ago, aimed at small businesses: Cybersource/DATASAFE. That's a pretty neat solution, and an excellent example of how an OSS-based solution can be deployed in a business, regardless of what other technologies might already exist on the premises. The box will work just fine in a Windows environment.

Always focus on the solution (and what practical needs it solves for the client), not the technology (or the philosophy) - with a happy customer, you'll get plenty of opportunity (over time) to discuss what OSS is really about, and you're likely to find a very willing ear at that point.

Training schedule readability

Mon, 30/06/2008 - 23:36
A tricky meta-question. I've been experimenting with ways to display an overview of training days (or days grouped by audience/topics, like Developers or DBAs), locations, and schedule. Product Audience BNESYDCBRMELADLAKL MySQL Dev SepAugSepAug MySQL DBA SepJulAugAugAugJul MySQL HA Aug PostgreSQL DBA JulJul PHP QA Dev AugAug With the number of locations I have now, it's easy to run out of space horizontally. And vertically can't make the list too long either, otherwise the nice simple overview is lost. It's a difficult problem and I haven't fully resolved it. Displaying the locations vertically and the topics horizontally doesn't seem to work either. Different pages on the Open Query web site now display things differently depending on context, and I hope to learn more from that experiment. The above is an example of this. Compact, but I'm not sure it conveys sufficient information. Within each group, people are able to sign up for individual course days. It's work in progress I suppose. Suggestions welcome!

Finding useless indexes

Mon, 30/06/2008 - 02:47
I'll say beforehand that the following is not very clean - for a number of reasons. I'll discuss some issues after the query. The query returns all indexes in a db where the cardinality is lower than 30% of the rows, thus making it unlikely that the server will ever use that index. SELECT s.table_name, concat(s.index_name,'(',group_concat(s.column_name order by s.seq_in_index),')') as idx, GROUP_CONCAT(s.cardinality ORDER BY s.seq_in_index) AS card, t.table_rows FROM information_schema.tables t JOIN information_schema.statistics s USING (table_schema,table_name) WHERE t.table_schema='dbname' AND t.table_rows > 1000 AND s.non_unique GROUP BY s.table_name,s.index_name HAVING (card + 0) < (t.table_rows / 3); Let's discuss...
The number of rows in a table used here will be accurate for MyISAM; for InnoDB it's essentiall a rough guess.
Cardinality in this context is the number of distinct values in a column. This statistic needs to be updated using ANALYZE TABLE, otherwise it might either not be available or just outdated.
Since there can be composite indexes (index on a,b or more columns, rather than just one column), the cardinality is per column and in the output we show this comma-separated; of course for anything but the first, it's relative to the previous columns. The HAVING clause just ends up using the cardinality of the first column, which statistically is not really correct. If someone has a bright idea on how to grab or calculate a sensible cardinality figure for composite indexes, please do comment.
And yes, some selected columns (like t.table_rows) fall outside the grouping, however they remain the same so that's "ok" for this quick hack and MySQL allows this unless you have sql_mode=ONLY_FULL_GROUP_BY enabled.

The fact that the calculated cardinality figure (for composite indexes) is dodgy may not actually be relevant for the purpose of this query. The point of the query is that generally, the server will not use an index if it needs to look at >30% of the rows anyway. That's not *actually* how it works inside the server (as Peter Zaitsev can explain in great detail ;-) but as an easy rule-of-thumb it's close enough to reality. That's not the whole story, because if all the columns referenced in a query are in the index you have a so-called covering index, and the server knows that scanning an index is quicker than scanning the table. So in that case, an index might still make sense, but this query can't know about that case.
Also, for very small tables, the server will prefer a table scan anyway; to weed out the worst of this, only tables with >1000 rows are shown in the output. Again, a rather crude filter with plenty of flaws. But again, it may suffice for this purpose.

Normally, what you'd do is pick out the obvious indexes on yes/no, male/female, active/deleted style columns. They're easy to spot. But if you run the above query, much more may show up. Do unused indexes hurt? Well, they slow down INSERT/UPDATE/DELETE and ALTER TABLE statements. So this query can be useful as a rough filter, then human brains can be used to decide which indexes to keep and which ones to get rid of.

mysql cmdline tricks: output control (\G, pager, tee)

Fri, 27/06/2008 - 05:13
Compared to SQL*Plus (Oracle's cmdline client) which can as I understand produce complete paginated reports, the 'mysql' tool has fairly simple output formatting. I'm not generally fussed about that, as there are perfectly good other tools do make snazzy reports with. Some very cross platform (like web based), output PDF, not a problem. You can even write your own quickly these days.

But the cmdline tool has its use. I always teach my students the basics of it, and insist that some exercises are done using this tool. Familiarity (which involves some practice) helps there. Why? If all else fails, you generally do have mysql available on a machine. It's not hindered by remote ssh logins, or anything else. And, of course, for quickly checking something. And that's where the proficiency helps again. So, normally, you get this type of ASCII table:mysql> select "Hello, world!" AS foo; --------------- | foo | --------------- | Hello, world! | --------------- which is nice. But for some commands, this type of formatting would make the output unreadable. Key examples are SHOW CREATE TABLE, SHOW CREATE TRIGGER, and SHOW WARNINGS (particularly after EXPLAIN). Now check this out:mysql> select "Hello, world!" AS foo \G *************************** 1. row *************************** foo: Hello, world!If, instead of a semicolon you specify \G, the output is formatted vertically. A separator line above each row, and then each column prefixed by its name/alias followed by the value of that column within the row. It's much more readable for cases where you get a one-row result set with only a few columns, at least one of which might be longer than the width of your console.

The above also teaches us another lesson: the semicolon is not actually part of the SQL command. It's merely a delimiter that tells the client code to send the query to the server. If you're in a programming or scripting language using the MySQL API, you wouldn't include it in the query string.

A related trick is to use the "pager" command. If you specify pager less on the mysql command prompt, the output will thereafter be run through less. So then you can scroll through a larger result set, search it, etc. (see man less).

Final one for this post, there's "tee". This command followed by a filename (or program/pipe!) lets you log everything you do, queries (what you type) as well as output. Very handy, particularly during training and testing, or when you need to make a bugreport or support request.

Website usability vs performance - Measly Mouse revisited

Fri, 27/06/2008 - 01:26
Peter Zaitsev wrote an interesting item on front-end performance of a website.

I've always tried to look at the front-end from the user perspective, rather than purely technical. Once you weed out what's not really necessary for the user, and also deal with issues like "how important is it that this number is live", you generally look at a fairly different site already ;-)

Before my time at MySQL, I wrote a little gizmo called Measly Mouse which leads a modest but still active life. When reading on from here, please remember it was designed in 2001 and hasn't really been changed since.

Measly Mouse retrieves a page and deals with redirects, CSS and other includes like images, and tries to apply some basic metric to see how sensible the page is. Basic usability testing shows that people cannot choose between more than about 7 or so items. So you can imagine how the brain desperately fails to deal with most websites (website creators often feel everything is so important it must be on the front page), or through training filters out most things than don't relate directly to what the person is looking for. The other key factor is size. The bigger the page, and the more includes, the longer it takes. That's annoying. So the Measly scoring formula is as follows: BYTES + ((REQUESTS + LINKS) * 1024) where:
  • BYTES - is the total page content including stylesheets, images, etc;
  • REQUESTS - is the total number of requests required (including redirects) to get all page content;
  • LINKS - is the total number of clickable items on the page, including forms fields.
Naturally all pages have some includes and some links, it's just a matter of finding balance to keep the site usable. Some sites use so many redirects.... nutty.

At the time there was quite a lot of debate on the simple methodology and the owners of some sites got pretty upset when someone ran their front page through Measly and it ended up in the top 10 ;-)
I still reckon the concept has merit though... please do make your own judgement and feel free to comment.
You may find some aggregating (like mailing list archive) sites in the top 10... I personally take those entries less serious, because they're generally focused on a niche (geek) group which does not conform to a general user profile. Still, it's quite possible their user interface could be improved!

As to how the tool works... it actually parses the pages in PHP using regexes (again remember the time it was built). Although it still works reasonably well, could be vastly improved now and catch more of a modern page.
But what would be really great, is if someone would care turn Measly Mouse into a Firefox plugin. Inside Firefox you have clean access to a page, so the analysis becomes extremely easy. For any page, the plugin could calculate the Measly Mouse (MM?) score, and perhaps optionally submit it to a central location. Who would like to pick this up?

Some impressive hw hacking of Eee PC

Fri, 27/06/2008 - 00:18
This fellow is pretty amazing: http://beta.ivancover.com/wiki/index.php/Eee_PC_Internal_Upgrades. To the already tiny Eee PC, he added (internally!):
  1. USB hub
  2. GPS with antenna
  3. Bluetooth
  4. Card reader w/ additioal SSD
  5. Power switch (10 dip) for switching all extra foo on/off
  6. Wifi upgrade 802.11n
  7. FM transmitter
  8. Modem (admittedly there's design space for that)
  9. Touch screen
  10. Temperature sensor
  11. Heatsink
That's pretty cool...

Got my copy! High Performance MySQL (2nd edition)

Thu, 26/06/2008 - 02:05

Grand, I got my personal copy from the courier today!
Many people kept asking me over recent months "when is it coming, when can I get it"... so now you can. Officially out, and available.

Click on the book pic if you want to grab your copy via Amazon (which is probably one of the cheapest sources, even if you're overseas).

I'm also looking at acquiring a bundle of them through the author arrangement with O'Reilly, to give away to Open Query training students - well, for the courses related to MySQL at least ;-)

The grey art of eBay - the feedback process borked

Sat, 21/06/2008 - 05:01
Since a few months, sellers can no longer provide neutral or negative feedback to buyers. Not sure if I completely agree with this as I reckon buyers can be quite dodgy and how else to capture that. But the objective was to weed out retaliatory feedback. I.e. buyer has a bad experience, provides neutral or negative feedback, and in return for his honesty gets back similar feedback. So that's no longer possible. I suppose that's good.

I actually had such an experience at the beginning of the year, before the new policy. That one (in this case neutral) retaliatory feedback seriously impacts my rating to just over 90%. Not much of an issue as a buyer, but if I want to sell something, potential buyers might regard me as dodgy just on the basis of that number. Of course they could delve into the exact feedback comment and figure out what's been going on to some degree, but who bothers with such detail while browsing eBay for goodies?

The formula for the rating is positive / (positive+neutral+negative), and all that limited to the last 12 months. Let's take my simple case where I get one neutral. For 11 transactions over the last year, you get 10/(10+1) = 90%. Ok so I don't buy stuff on eBay all the time. Most people don't. But let's double the number of transactions and see how the formula works out: 20/(20+1) = 95%. Still not good for seller karma. Anyway, you get the picture; this kind of thing is damaging, and you'd really have to go on a buying spree to make up for it. With different buyers or on different weeks, otherwise it's counted as a duplicate and disregarded. And heaven beware if any of the many different sellers also decides to give you a neutral rating!

Ok ok, so the new eBay policy prevents this exact scenario from occurring again. But consider the mechanism if you're a seller. Buyers *will* provide you with feedback, and using the same formula a single malicious buyer can destroy your reputation. You suddenly have to sell a lot more to make up for it, except you may not be able to, because of your rating! In a nutshell, unless you're a powerseller moving dozens of items a month, you're pretty much an open target. If you, like I, sell the odd item once in a while, you don't have a chance.

But hang on, isn't there a policy that allows you to get feedback removed? Yes, but only in very specific cases. For this, the comment would have to be defamatory, referencing an unrelated item/transaction, a policy file number, a URL or other contact details, stuff like that. Also, a clearly negative comment on a positive rating qualifies. But not the other way round.
So, the loophole is a gaping one, and that is: you just scribble something nonsensical or at least meaningless; that way it won't qualify for removal. So you can just give a seller a neutral (just as damaging as negative, as far as the formula is concerned!) with a comment of "blah de blah" and that does the trick.

Someone please tell me that I went wrong with my logic - and where!

Making my TomTom GO 720 GPS work again... and Linux!

Fri, 20/06/2008 - 14:14
I spent some time today on that, it had gone increasingly gaga after some software and POI updates, then lost the plot completely, and then didn't want to boot anymore. The usual reset/restore procedures didn't improve things, so I invented my own which did not involve a restore procedure. I basically formatted the flash and reset the NVRAM and then added the basic stuff until it did stuff again ;-) So now my GO works once more. Horay... just need to figure out where the favourites were stored so I can restore them, setting that up from scratch is a pest.

Along the way, while searching for info online, I actually found out that the TomTom GO series actually runs a Linux 2.6 kernel. There's a few custom modules for the specific hardware, but basically it's just an ARM processor.
So there ya go. Good stuff!

MySQL moves from BitKeeper to Bazaar

Fri, 20/06/2008 - 01:33
Horay! Thank goodness, and about time too!

On the latter, I do appreciate that any such switch is a serious change for a development team, I was involved with this while at MySQL so I'm well aware of the issues ;-)
Nevertheless, it was high time as it once again allows the rest of the world (outside of Sun/MySQL) to grab a repo and peruse the history properly. Long time ago (before bitkeeper turned fully evil) this was possible also. See MySQL is using Bazaar and Launchpad for their source code.

We had a distributed revision control talk in Bris a few weeks ago, by Ian Clatworthy of Canonical. We managed to do a video, but it's not yet online. Various attendants including Damian Hickey (of the Freeway ecommerce project) now want to move to either bzr (Bazaar) or hg (Mercurial) as well, which is great (yes git is ok too). Move from what? Subversion. It's a different way of working.

Look, subversion is better than cvs, but that doesn't mean it's brilliant. It seriously hinders and borks development processes, particularly for teams. Changing over is, particularly with bzr, quite easy. bzr can start operating in a similar setup to svn, and then you can shift across to the additional beautiful features it has. Good stuff.

Keyboard Bashing on Eee PC

Wed, 18/06/2008 - 05:03
Is your child not quite ready for proper typing, but does she love "typing stories" on the keyboard The trick is finding a program that allows them to muck around without the possibility of triggering special functions through some keypress or mouse movement. With the help from Greg Black & James McPherson on the #humbug channel, this is what I have devised for Phoebe's Eee PC:

Using emEditor, create a menu item "KeyBashing" somewhere and give it a keyboard as icon. Commandline /usr/bin/uxterm -fn 10x20 -geometry 79x21+0+0 -title 'Keyboard Bashing!' /home/user/typer.sh

typer.sh contains the following:stty -isig
echo Welcome! Start typing - exit by closing window
echo
cat >/dev/nullMake typer.sh executable with chmod +x typer.sh
The stty -isig command disables keycombos like Ctrl-Z and Ctrl-C.

After editing menus, you can use Ctrl-Alt-Backspace to restart X.

One laptop per Phoebe

Wed, 18/06/2008 - 02:25
I acquired an Eee PC for my Phoebs. On special somewhere, unfortunately not a pink one ;-)
While I see that lots of people are replacing the Asus/Xandros OS with Ubuntu for Eee or other distros, I've decided to stick with the default for now. It's a very easy desktop to use, with all clutter, confusion and traps disabled but still accessible through other means. Traps you ask? Yea... traps. A three-year-old with a mouse is a dangerous mix ;-)

So, why have a computer for a 3 year old anyway, and why an Eee, and ....?
Phoebe is already quite aware that work generally involves using a computer, so she insists having one on her little play desk: "for work"; that's no reason to give her one though. Eee starts very quickly, being flash based and in its default simple desktop mode. Smurf gets easily bored so any regular machine just doesn't cut it (believe me I've tried).

But kids these days must to become computer literate. Being able to use a mouse and a keyboard is not an optional extra. So, letting them play with it from early on, it's a natural thing rather than something complicated later.

I installed GCompris and put it into the Learn menu, as Phoebe loves to play with the mouse exercises.
She also liked typing, but I haven't yet found a program that allows just simple "keyboard bashing" without icon clutter (more traps). Suggestions welcome - perhaps an add-on for GCompris?

And no, I won't be installing MySQL on this machine!

What's Jim up to after Falcon?

Sat, 14/06/2008 - 06:09
Well I don't quite know either (yet ;-) but I spotted a hint: his LinkedIn profile now says "President at NimbusDB", with stated goal "To build a database in the clouds that scales to 1000X disk-based database systems." Sounds good.

Which Jim? Jim Starkey, most recently known from MySQL's Falcon storage engine.

Open Source Developers Conference 2008 (Sydney) CfP still open

Fri, 13/06/2008 - 11:05
You still have some time to get your proposal in for OSDC 2008 in Sydney. Yes I lead the conference organisation last year, but there's a whole new team for this one and I can blissfully be an attendee or, if at least one of my proposals gets accepted, a speaker.

Last year was actually also my first time at OSDC, and it's really a great event... by developers, for developers... about creating OSS, or using OSS to create other stuff, and about the processes involved. You learn heaps. Remember, the worst killer for learning something, is to say "I already know that".

There's heaps each of us doesn't know, and OSDC is an ideal place to share your own experience and pick up some more info for your wetware as well as tune it, and of course have heaps of fun. OSDC is also lunches and other munches, the included-for-all conference dinner, and just the random chats in the corridor. Not forgetting the lightning talks at the end of each day, that's like a whole special OSDC culture!

Tracking MySQL replication lag

Fri, 13/06/2008 - 01:37
I sometimes encounter the issue where a replication slave is behind the master, while happily connected and in state of "waiting for binlog event". Clearly it somehow didn't get the memo that new stuff was available. I know others have seen this too, but for lack of reproducibility it's not yet a bug report.
Mark Barger of ANYwebcam has been kind enough to tweak a script so Nagios can alert on this problem, see the Nagios Scripts page in the Resources/Tools section of the OQ website. See the page itself for further details on monitoring and dealing with the problem.

Tracking MySQL replication status in general is important anyway, as basically a master doesn't "care" whether a slave is connected or not (the system is slave driven) and a slave can quite validly not be connected or replicating at any point in time (stopped to take a backup snapshot, do some big reporting on this slave, etc).

MySQL Licenses - Perception and Reality

Thu, 12/06/2008 - 11:21
Perception and reality don't always converge with users when it comes to MySQL licensing, even with Sun/MySQL focusing on selling its MySQL Enterprise support subscriptions. Just today I was talking with a client who thought that to get a MySQL Enterprise subscription, it was required to purchase licenses.

Not true. Simple. They hadn't talked with a MySQL sales person, so this wasn't the (usual ;-) warping of reality that I tend to see, but somehow perusing the information on the net they'd come to the above conclusion. Easy to dispell once understood, but much more complicated (if not impossible) to figure out how they came to that conclusion in the first place....

They also didn't realise that the MySQL Enterprise binaries are also GPL. Ohwell.

But really, nobody wants to care about licensing, it's this necessary evil that you have to deal with. Best to keep it simple, or even better, make it a non-issue! MySQL's subscription model was intended to do just that. Didn't quite work out that way - but I do believe the intent was good at least in that respect.

Open Query training around Australia & New Zealand

Fri, 06/06/2008 - 04:03
Moving ahead nicely, Open Query now has training courses scheduled in Brisbane, Sydney, Canberra, Melbourne, Adelaide, and Auckland. If that still makes you feel left out, please do let me know so we can work on it!

Course modules are typically one day, with a few grouped together so there are Dev, DBA, HA sets. There's quite a selection of topics for MySQL, as well as PHP (QA in PHP Projects Workshop taught by Sebastian Bergmann), and PostgreSQL (taught by Jonathon Coombes). Due to popular demand, Sebastian and I have decided to do a second run of his workshop in Brisbane, after the Melbourne one in August. Should be good!

For courses in August onward, there's a discount of $50/person/module for bookings before June 15th. For earlier courses, you can still save $25/person/module if you book before June 15th also. And if you're a student... well, then you can join any day module for as little as $50 total!
Product Audience Course (one-day modules unless noted otherwise) MySQL Dev Optimisation by Design Dev/DBA Storage Engine Optimisation Dev 5.0 Upgrade and Advanced Features MySQL DBA Installation, Security and User Management DBA Backup and Recovery DBA InnoDB Performance Tuning MySQL HA Scaling and High Availability Solutions HA DRBD and Heartbeat for MySQL Workshop HA Replication Workshop HA Cluster Workshop PHP Dev Quality Assurance in PHP Projects (3 days) PostgreSQL DBA Administration DBA PostgreSQL Performance Tuning DBA PostgreSQL Replication Workshop DBA Bucardo PostgreSQL Replication Workshop

Joint MySQL/PHP meetup in Bris: talk on distributed revision control!

Fri, 06/06/2008 - 03:47
If you're around Brisbane, do join us next Tuesday evening for a combined MySQL/PHP meetup... info at http://mysql.meetup.com/84/

Ian Clatworthy (Canonical) will be presenting on Distributed Version Control, such as is used in Bazaar, Mercurial and Git. You may be using SVN/SVK now, but DVCS is definitely worth a look; it's a different approach that makes you (and your team) more productive, and avoids many common version control problems (think "everybody stop working we're doing a merge", or "eek I had this right half a hour ago").

To give you an idea of how important this stuff is.... a brief snip from a conversation I just had with a developer:D: I was wondering whether you got to set up that mercurial repository?
A: sorry not yet. but I presume you've created a local one with hg init so that can be pushed at any time when I create it, it won't delay anything.
D: oh didn't know it worked like that
A: it does. your repo is identical to the host's.In a nutshell, with distributed revision control, when you need a new repository you just create one locally, you don't need to wait for someone to create one on a main server. It can, at any point, be synchronised there.

RepRap Brisbane

Fri, 06/06/2008 - 00:26
We've created a google group for our regular communications on the
RepRap activities in Brisbane. http://groups.google.com/group/reprap-brisbane/web

To learn what a RepRap is, see http://reprap.org/
In a nutshell, it's open source hardware, chemistry, mechanics, electronics, software, education, fun.
Or, in Chris Dibona's words: "Think of RepRap as a China on your desktop."

If you want to be part of it, please join the group.
There's a few messages and links to get you up to speed.
We're aiming to have a live meeting in a week and a bit.