The newest IEEE Spectrum has a nice article on debugging a data anomaly with the Pioneer 10 and Pioneer 11 spacecraft. It is a short and interesting read – one of the key issues was the need to review many years of historical data retrieved from a variety of physical media formats.
Chernoff Face Tutorial on Flowing Data
September 20, 2010If you’ve read Blindsight, then you have come across Chernoff faces.
I recently spotted a tutorial on Flowing Data for using Chernoff faces with R.
Data Dangers Redux
September 16, 2010Recent bit from the New York Times about burglars picking houses based on Facebook status updates. Not surprising at all. Does remember to namecheck PleaseRobMe.com.
See also Tom Scott’s project that datamines public phone numbers from Facebook.
Realtime Datamining Of Location Data
March 16, 2010All the recent coverage of PleaseRobMe.com has focused on one particular idea of what can be done with realtime datamining of location data. (And how long until we see mashups combining PleaseRobMe.com with social cataloging sites such as LibraryThing?)
Where My Ladies At? (link goes to John Bollozos’ descriptive writeup) is another such site – it mines data from apps such as FourSquare and Gowalla and compares it against a database of female names in order to identify places where lots of females are at a given moment. I’m really at a loss to say more about it.
See also Jim Bumgardner’s “Mayor Of The North Pole” for a perspective on forging location data.
[Tip o’ the hat to Dan G for the Where My Ladies At? pointer.]
Another Profile Of Demand Media
March 16, 2010Five months after the Wired article, Time Magazine has a recent profile of Demand Media.
The article is short and doesn’t have much additional data compared to the Wired article. However, the author submitted approximately 20 articles to Demand, and did some experimenting such as including factual errors to see if they would be caught.
Datamining Facebook
February 9, 2010Just a quick post of two links about datamining Facebook:
- Pete Warden’s post: How to split up the US
- Marshall Kirkpatrick’s post: The Man Who Looked Into Facebook’s Soul
Pete’s divisions of the US are interesting to consider.
(I haven’t read all the commentary yet, but it’s clear that Mr. Warden is needing some pop culture data to help him understand why Ashley is a popular name in the South and why Twilight is popular in Utah. I think the answers are fairly self-evident.)
Datamining The Government
February 3, 2010The article is a bit old now (June 2009), but Wired had an interesting interview with Vivek Kundra about Data.gov. This is the usual Web 2.0 pitch about making data transparent and available and hoping that crowdsourcing will magically create useful things.
I will however admit to having been intrigued by the concept of one of the apps mentioned in the article:
In DC, someone combined several of the data sets released by local government—maps, liquor license info, crime statistics—into an app called Stumble Safely, which shows users the safest way to walk home when drunk.
Now someone just needs to mash it up with some augmented reality software or turn-by-turn GPS directions (“Turn left at the next corner. Stagger 2 blocks east. Try not to walk into that telephone pole.”).
Also of interest is DataMasher, which I spotted on LifeHacker. It appears to be a site for mashing up various government data sources. You can also save your own mashup and make it available to others. Looks interesting. So far, the Highest Rated and Most Discussed mashups seem to focus on health, mortality, guns, alcohol, obesity, and reproduction.
Finally, if you’ve read Freakonomics, check out this article on “bad boy” baby names as a predictor of behavior.
Datamining Games
February 3, 2010The rise of the online always-on videogame opens a new world of stat tracking. The recent changes is this area are well beyond simple high score boards or achievements/trophies. For example, consider the article “You Are Being Watched” from a recent issue of the Official Xbox Magazine. The article details the datamining that Bungie is doing for Halo 3 and Halo 3: ODST, that Criterion is doing for Burnout Paradise, and Valve is doing for Team Fortress 2 and Left 4 Dead.
All of these companies are gathering data that shows them how their games are really being played. One usage for this data is to potentially make improvements and bug fixes. In the case of Bungie, players can actually log onto bungie.net and see their own stats and own personal heat maps for the matches they have played. Valve shares some of the overall data, and has recently started adding personalized data (for Steam players only).
For the personalized data, it would be interesting to see some numbers for how many players actually review their stats and whether it has an impact on their playing.
See also:
- Kotaku’s regular coverage of the fact that Nintendo makes Wii playtime data available
- Major Nelson’s regular updates on Xbox Live game popularity
- a Slashdot item about some neural network software being used to monitor games and assist players.
While I’m clearing out the videogame datamining links…
- A thesis analyzing squad tactics in team-based FPS games (Counter-Strike in this instance).
- Two more items on data showing that female gamers outnumber males: CNet News and Water Cooler Games
Search Box Candor
January 30, 2010It is becoming increasing clear that we don’t lie to search engines. As the AOL search data scandal revealed, you can give away your identity simply through egosurfing.
People ask all sorts of questions to search engines. And the autocomplete features recently added to the search boxes at Google and Bing are quite revealing about what things people are searching for. This is most readily pointed out with two recent articles at Slate. The first article has a number of interesting examples of what people are typing into the Google search box, and calls for submissions from readers. It’s the second article that is the most interesting – consider the difference in suggestions that Google provides based upon your grammar – the difference in suggestions for “is it wrong to” compared with “is it ethical to” is quite interesting.
Which brings us to the outing of anonymous blogger Belle de Jour. It is not especially surprising that her identity was figured out from her online writings. What is interesting is that someone figured out her identity, kept it secret, and used a Googlewhack in order to spot when others began to suspect her identity around six years later.
Update Feb 23, ’10: See also AutoComplete Me for more Google examples.
Plotting Social Networks (Of Fictional Characters) Over Time
January 30, 2010From the non-academic world, some infographics charting social network interactions over time. In this case, the source is the xkcd comic strip – and the social networks are Star Wars, The Lord of the Rings and three other films.
The orcs in the Lord of the Rings graphic are particularly reminiscent of Minard’s Napolean map.