Tag Archives: Red-Black tree

Tokutek’s Fractal Tree Indexes

Tokutek’s Bradley did a session on their Fractal Tree Index technology at the MySQL Conference (and an OpenSQL Camp before that – but I wasn’t at that one), and my first thought was: great, now we get to see what and where the magic is. On second thought, I realised you may not want to know.

I know I’m going to be a party pooper here, but I do feel it’s important for people to be aware of the consequences of looking at this stuff (there’s slide PDFs online as well as video), and software patents in general. I reckon Tokutek has done some cool things, but the patents are a serious problem.

Tokutek’s technology has patents pending, and is thus patent encumbered. What does this mean for you? It means that if you look at their “how they did it” info and you happen to code something that later ends up in a related patent lawsuit, you and the company you work for will be liable for triple damages. That’s basic US patent law, if you knowingly infringe you pay thrice. If you were at either session and are involved in database development work, you may wish to talk with your boss and legal council.

I made the assessment for myself (although I’m in Australia, there’s the Free Trade Agreement with patent-related provisions, so I am exposed) and decided that since Open Query’s activities are well within my control, it’s a manageable risk. So yep I’ve looked at the details. I’ll review some broad aspects below – I am not a lawyer but if the above worries you, to be sure, now is the time to stop reading and not see the rest of this post.


The insertion methodology is an interesting and nifty trick. It’s more CPU intensive but reduces disk I/O, and is thus faster for high volume inserts (the exact spot where B-trees and derivatives tend to be slower).

First of all, it’s important to appreciate why the B-tree family of indexing algorithms exist. They acknowledge that disk I/O is a) relatively expensive and b) operates in blocks (that is, writing/grabbing a larger chunk is more efficient when you’re reading from disk anyway). So B-trees store groups of keys together and thus try to minimise disk I/O particularly on lookup, balanced B-trees (B+tree algorithm etc) go wide rather than deep so for billions of entries you could still have a max of 6-8 disk blocks to fetch. Inserts (and deletes) can be more costly, particularly with page splits (merges for deletes) and rebalancing operations. Blocks are also not full, which is technically wasteful on your storage – it’s a tradeoff.

If you have an index purely in memory, algorithms that don’t work with blocks are more efficient, MySQL (NDB)Cluster uses T-trees and MySQL’s MEMORY tables have red/black trees which are a balanced (weighted) binary tree. If you’re interested in the structure and basic logic for each of the algorithms involved, Wikipedia tends to have good descriptions and diagrams, and there are many resources on the web including neatly animated demos of how inserts work, and so on.

So, Tokutek’s method is basically an enhancement on B-trees, it’s relevant as long as we deal with not just spinning disks but block devices that operate in large(r) read/write chunks. For spinning disks, seek time is an important factor. For SSD it is not, but SSD still works with relatively large blocks of data: you can’t just write 3 bytes, if you do the SSD actually reads the rest of the block and rewrites it (with your new 3 bytes) elsewhere, marking the old block for re-use (since SSD requires an erase cycle before it can write again). These technologies will be with us for a while yet, so enhancements are useful.

Monetisation models (and patents) aside, I reckon it’d be best to see enhancements such as these added to existing storage engines and indexing implementations (think text indexers and many other applications – it’s by no means limited to plain RDBMS or databases in general). Then it would quickly benefit a large group of users.

Building a basic storage engine is not that hard for an experienced database coder, but it takes time to mature and there are many aspects and trade-offs to it. It’s taken years for InnoDB to mature and for people to understand how to optimally use it. Planting a new/separate storage engine on the market to monetise a new indexing scheme makes -to me- only sense in the monetisation context. It makes absolutely no sense when looking at the technical aspects or the needs of the users.

For companies using MySQL/MariaDB because the code is available and they’re not locked into a single vendor for bugfixing and enhancements (just look at what Percona has done with InnoDB!), buying/using proprietary extensions makes no sense. I do by no means wish to diminish the accomplishments of the innovative minds at Tokutek, and I appreciate their tough predicament in terms of finding a way to monetise on their innovation, but what we have now is problematic.

In a nutshell, my excitement on behalf of my clients is hindered by the proprietary and patent aspects. Which is a great pity! We need to seriously think about alternative ways for smart people to benefit from their innovation, without effectively hindering broad adoption. Using different monetisation means may mean less money is made – however, do also consider the cost (both in time and money) of the patenting and product development process (in this case for a complete storage engine). That’s all overhead and significantly burdens future profitability. You need to consider these things the moment you create something, before going down the road of patenting or setting up a business as those things are in fact defining decisions – they define how you approach the market and how much money you need to make to make any profit at all. There are methods to cheaply explore what might be a right way for you (and quickly eliminate wrong ways), but some “wrong ways” are permanent, you can’t backtrack and you definitely lose any time advantage (which is of course more relevant if you don’t patent).