Last week I attended the Strata 2011 Conference, out in Santa Clara. This was my first O’Reilly conference, and I wasn’t entirely sure what level of technical rigor to expect. (I am used to the PyCon and SciPy conferences, which are very tech-centric.) Many of the talks in the program piqued my interest, and I was excited to find out what others were doing and thinking in this somewhat vaguely-defined field of “big data”.
The first day of the conference consisted of several tutorial tracks. I chose to attend the visualization track, although I did drop in to the “Data Bootcamp” at one point to check it out. The first visualization tutorial was presented by two guys from Juice Analytics. I initially had concerns that this would simply be a plug for their favorite tools, but it actually turned out to be quite informative. Ken Hilburn’s slides were visually interesting and the content was worth my time. (You can download them from bit.ly/JuiceStrata2011.) They covered not only graph design and how to apply the perceptual principles, but also talked about visual communications and even dashboard design and the emerging field of interactive graphs.
Of course, the latter field is near and dear to my heart, and one of the things I was actively looking for at Strata was anyone or any technology that really attempts to bring the power of interactive visualization to “big data analysis”. In the same way that Bertin lays out a “”semiology of graphics“”, I was hoping to find people who had any insights into a structured architecture and conceptual framework for interactive graphics. I don’t want to ruin the surprise, but basically I found no one at Strata who was even thinking about this.
The second tutorial session in the visualization track was a presentation by Naomi Robbins, author of Creating More Effective Graphs, entitled “Communicating Data Clearly”. This presentation was probably one of my favorite talks of the conference, because it was so full of good information and insight, and it was presented in a clear, straightforward, professional manner. Whereas many other presenters got caught up in trying to polish their style and resorted to name-dropping to enhance the prestige of their material, Naomi spent every minute of her 3+ hours giving her audience good, actionable information about data visualization. In fact, throughout the rest of the conference, I found myself playing a fun game of “What would Naomi dislike about this plot?” every time someone popped up a graph. That’s when you know you’ve gained a skill.
Unfortunately, although her presentation was very information about visualization in general, the specific issue of effective visualization of “big data” was not seriously addressed, except in cases when naive downsampling is acceptable.
At the end of the day, I walked around the main Startup Showcase area as various startups were setting up. I ran into the founder and CEO of DataSift, Nick Halstead, who talked to me briefly about his company. Their company aggregates and correlates multiple large data feeds from various social services and web services, include the full “Twitter firehose”. The same folks/parent company had previously release TweetMeme, a service that reports, in realtime, the hottest links on Twitter.
Another interesting company was BillGuard, a social credit card fraud-detection startup. Its founder and chief architect, Raphael Ouzan, practiced his pitch on me before the Startup Showcase formally started, and I gave him some feedback on it. Later he came and thanked me for helping him practice, because BillGuard was one of the two judge-selected winners of the Startup Showcase!