Australia's Chief Scientist

Where’s Wally

For any data modelling problem – who is the best person to analyse it? One of the things we’re learning with crowdsourcing is that Wally could be anywhere.

A bank will use data modelling to profile which of its customers are most likely to default.  A medical researcher will analyse data and look for patterns.  If he or she finds them, lives can be saved.

William Dampier – not the seventeenth century explorer, but the twenty first century “scientific” explorer – had a rich dataset on HIV patients.  After a decade of analysis, the best model from the scientific literature yielded predictions on the progression of viral load in the patients that were 70% accurate.

Then William hosted a global data prediction competition…

William asked “Where’s Wally” and within a week and a half a model with 70.8% accuracy was produced.  Three months later – 77%. accuracy.  The state-of-the art in the scientific literature was advanced by 10 percent in three months!

It was the work of over 100 teams from 30 countries.  PhD-level specialists from around the world, all experts in analysing data, all volunteering countless hours of their free time to perform the analysis.  The winner’s prize?  Only $500!  A commercial competition to predict freeway commute times for the NSW Government attracted over 50 entries in the first week alone ($10,000 prize).

Data prediction competitions are particularly effective because there are countless techniques that can be applied to any problem.  Any analyst or consultant might be sufficiently skilled/resourced to try a few.  Only by opening the problem to a wide audience, with different participants trying different techniques, can we reach the frontier of what’s possible.

Competitions flush out the best technique/analyst for your problem.  The “freelance” community is well supported by PhD-level specialists who crave real-world data to benchmark/refine their techniques and who can leverage competitions to enhance their professional reputations.

Right now Wally’s waiting to find out what’s hiding in your data.  Something that no-one else has found . . . yet.

Wally’s the best analyst in the world for your data.  Trouble is, right now we don’t know where he is.

A data prediction competition can help you find out.

What could the world’s best analysts find in your data?

This article was written by Kaggle Chairman, Nicholas Gruen.