OpenRefine Usage survey (2014)

Earlier this year the community had a discussion regarding the substainibility of the community and draft for a bounty model and governance model have been proposed with little feedback from the community.


Prepare SQL update where query in OpenRefine

Following the article regarding how to prepare SQL SELECT, INSERT INTO, DELETE query using OpenRefine, we will now see how the template function can be used to prepare UPDATE WHERE statement.

UPDATE WHERE statement are slightly different from a select or an insert statement since we want to define a different where clause for each record. Thus we cannot wrap all rows we want to update into a single query. We will need to write a separate statement for each record in our OpenRefine project and define the SET and WHERE conditions.


Prepare SQL SELECT, INSERT INTO, DELETE query using OpenRefine

I know that a lot of us export data from MySQL / SQL databases to clean them in OpenRefine before loading them back in their original database. Before, I was exporting my project to csv and loading the csv using some command utilities for MySQL, it worked by that was a painful process with a lot of details to pay attention to (encoding, field separator ...). But all this was was before I found a new way to use the template option of OpenRefine to prepare large select, update, insert or delete SQL statement

So instead of exporting to csv and importing through an other interface / tool like phpmyadmin you can use the template function of OpenRefine to preparethat will iterate through all the row of your project.


Padding left and right

Padding is the action of adding 0 to the left or the right of a text value until you reach a certain string lenght.

The padding function can be useful in OpenRefine if you lost leading zero while importing or transforming your data.

Of couse you can hack the example below to add letters or any other type of charactere.

padding left up to four digit
"0000"[0,4-value.length()] + value

padding righ up to four digit
value + "0000"[0,4-value.length()]



The Named Entity Extractor extension by Free You Metadata (from around the web)

The Free Your Metadata Named Entity Extractor extension helps you to enrich your data in OpenRefine using AlchemyAPI, DBpedia Lookup and Zemanta. The extension works on plain text field and any unstructured (meta)data


Mining and OpenRefine(ing) JISCMail: (from around the web)

A look at OER-DISCUSS [Listserv] JISC CETIS MASHe: a complete tutorial to scrap data from a mailing list and analyse participant and contribution.

Read the full article.

Finding (Nearly) Duplicate Items in a Data Column (from around the web)

An other great article by Tony Hirst. This tutorial will show you how to use clustering function (ngram and fingerprint) directly in your facet. Really handy.

Read the full article.