Posted on

Cleaning up data: OpenRefine

Always a problem: imported data tends to be messy.  So, you want to clean it – and preferably before it gets into your database!

OpenRefine has existed for some years already, and I particularly like that it runs locally (on Linux, Mac, Windows) rather than being an server “elsewhere”. It does use a web interface, but you can run the (Java based) backend on your laptop or another local place.  Have a look at the videos on the site to see how it works and what different tricks OpenRefine can do for you.

And another thing I like – it’s possibly to call it programmatically.  Once you work out that you need to do certain operations on a particular dataset to sanitise it, you should be able to automate the process for when you grab more of the same data later.

Posted on