Datamining “Where’s George?”

The New York Times had an interesting article on Sunday regarding computer modeling of the swine flu epidemic.

The article highlights two university teams that are doing computer models of the spread of the virus. Some of the data being used is obvious: air traffic and commuter traffic data. One data source is not so obvious: the Where’s George? site that lets people track US dollar bills via serial number. I think this is interesting and it apparently provided useful data.

A couple of thoughts:

  • I didn’t realize that the entire Where’s George? database was available for datamining. Is the database available for downloading to anyone? I can’t find any details on their site.
  • The Where’s George? participants are entirely self-selected, so the while the data was apparently relatively accurate for this simulation it may not be entirely representative.
  • The Times article refers to the Where’s George? data as "a map of face-to-face transactions", which may be true for some of the data but is most likely not true for the majority of the data (see my comment above about the self-selection problem).
  • While BookCrossing data will also suffer from the self-selection problem, I would be curious to see if that database is also available and how it compares with the Where’s George? data. (I will note that Where’s George? refuses to publish the farthest/fastest bills to prevent people from manipulating the data, as compared to BookCrossing publishing data for the most traveled books)

Leave a Reply