FlowingData’s Best Visualizations Of 2009

January 30, 2010

The FlowingData site has recently posted their Best Visualizations Of 2009. Of particular interest is Ben Fry’s work with Charles Darwin’s text.


A Survey Of Streaming SQL

January 29, 2010

The latest issue of CACM has an article entitled “Data In Flight” by Julian Hyde (chief architect of SQLstream). The article is a survey of streaming SQL technology and how it may apply to ever increasing datastreams.

I will also highlight two small items out of the article. The first is an assertion that web application authors are generalists:

The technologies for powering Web applications must be fairly straightforward for two reasons: first, because it must be possible to evolve a Web application rapidly and then to deploy it at scale with a minimum of hassle; second, because the people writing Web applications are generalists and are not prepared to learn the kind of complex, hard-to-tune technologies used by systems programmers.

And second, about 2/3 of the way through the article he finally makes the logical connection to CEP, and throws in an aside about an ongoing religious war. Is this the CEP/Rete debate that I am aware of, or some other debate?

CEP has been used within the industry as a blanket term to describe the entire field of streaming query systems. This is regrettable because it has resulted in a religious war between SQL-based and non-SQL-based vendors and, in overly focusing on financial services applications, has caused other application areas to be neglected.


Graphing The Beatles

January 20, 2010

Spotted this week (on waxy.org?), an interesting project that is creating a bunch of infographics using data from The Beatles. I especially like the lyric self-reference graphic.


One Billion Spams Analyzed

December 17, 2009

An interesting set of results from the analysis of 1 billion spam emails. And apparently there are at least 956 variations for spelling “viagra” that spam filter tools need to take into account.


“an extra dollar for fact-checking”

November 6, 2009

The latest Wired magazine has an interesting article on Demand Media. If you’ve ever used sites such as eHow then you may have encountered Demand Media without even realizing it.

Demand Media generates web content – a lot of it. It appears that they have an algorithm that analyzes popular web search terms, advertisement rates and their competition – and spits out ideas for content. The example output shown in the article is “how to make butterflies for cake decorating”. That’s after two proof readers have munged the set of terms from the original output into a sentence. (I don’t know if this example is contrived or real, but it does lead to a real article.)

Once they have a topic, they use freelancers to create articles and/or video tutorials. They pay as much as $20 per clip to the filmmakers, whereas the title proofers get 8 cents per headline.

These folks are pumping out enormous amounts of content. The article says that by next summer they will be publishing 1 million items a month. They already have 170,000 videos on YouTube.

Anyway, the article is an interesting read. I’ll close with a quote:

“We’re not talking about $1,000 videos, so a couple dollars here or there can make a serious difference. For instance, pay an extra dollar for fact-checking.”


Big Data (Again) and NoSQL

October 8, 2009

If you still haven’t read Adam Jacob’s paper “The Pathologies of Big Data” that I linked to previously, go read it.

Also interesting, Dare Obasanjo’s recent post on denormalization in which he makes some similar arguments, complete with examples.

I also had not heard before of the supposed “NoSQL Movement“, that Dare mentions. Very interesting.


Big Data

August 6, 2009

There is a nice article from Adam Jacobs in the most recent CACM magazine entitled “The Pathologies of Big Data”. Jacobs discusses the fact that it is often easier to put data into a database than it is to get data out, as well as strategies for improving how we work with large datasets. It’s an interested read.


Direct3D 11 Compute Shaders

August 4, 2009

In other news, the upcoming Microsoft Direct3D 11 will feature compute shaders. If I read correctly, this is shipping in Windows 7. NVIDIA is already out promoting the compatibility with CUDA. Apparently, this technology is also sometimes called DX Compute.


A Survey Of Programming Video Cards For Other Purposes

August 3, 2009

The August 2009 issue of ;login: from USENIX has a nice article on programming video cards by Tim Kaldewey entitled “Programming Video Cards For Database Applications”. Sadly, the article is only available to USENIX members until August of 2010.

Kaldewey surveys the past and present of programming video cards for non-graphics purposes – from the early days of using the graphics APIs to fool the GPU into thinking it is rendering graphics when it is really performing a general-purpose calculation, to the present era of general-purpose APIs such as CUDA.

He also shows a back-of-the-envelope calculation for building out a 100 teraflop data center using 100 GPUs versus 1400 CPUs, including power consumption differences.

If you are a USENIX member, the article is a good read. Sadly, it won’t be current when it finally becomes freely available.

[The same issue of ;login: also has a nice article by Leo Meyerovich: “Rethinking Browser Performance“.]


Farewell To Popfly

July 17, 2009

The official Popfly blog states that the service is being shut down.

I’m a little sad to see a neat mashup tool get shut down. The integration with Silverlight and the ability to use the Popfly widgets on the Windows desktop were unique features.

I wish the best of luck to the Popfly team in their next endeavors.