Netflix Prize May Have A Winner

June 29, 2009

The Netflix Prize has entered the 30-day notification period as a team has announced that they have achieved a 10.05% improvement over the original Cinematch algorithm.

Some further background on the contest can be found in a nice writeup in Wired from last year.


Datamining “Where’s George?”

May 5, 2009

The New York Times had an interesting article on Sunday regarding computer modeling of the swine flu epidemic.

The article highlights two university teams that are doing computer models of the spread of the virus. Some of the data being used is obvious: air traffic and commuter traffic data. One data source is not so obvious: the Where’s George? site that lets people track US dollar bills via serial number. I think this is interesting and it apparently provided useful data.

A couple of thoughts:

Read the rest of this entry »


Datamining Everquest

March 28, 2009

A group of academic researchers have obtained the complete server logs for the Everquest 2 MMORPG. It’s four years of data for over 400,000 players – the resulting dataset is nearly 60TB. That’s right, terabytes. Combined with some demographic surveys there is interesting datamining potential here.

This is also interesting because apparently the standard tools don’t quite scale to the task of analyzing this data:

Regardless of format, many one-pass, exhaustive algorithms simply choke on a dataset this large, which is forcing his group to use some incremental analysis methods or to work with subsets of the data.

Some items in the results that I found interesting:

Read the rest of this entry »