Posted on

Supporting Scalable Online Statistical Processing

Interesting Google tech talk: Supporting Scalable Online Statistical Processing.

Goes for an hour and has a slowish start but around the 10 minute mark the beef starts… basically, rather than doing complete aggregates, he uses statistical sampling to provide a reasonable estimate (unbiased guess) of the result.

This makes sense, statistically!
It might be possible to transplant his system into a MySQL storage engine, but it would need to be able to do in-engine joins, something that MySQL doesn’t yet support.

(Thanks Ian for the link)

Posted on