March Madness for data junkies

[Disclaimer: The following post is partly a reprise of one I wrote last year]

March Madness is almost here, and my workplace productivity is bound to suffer a little (don’t worry Kyte crew — I promise I’ll get all my stuff done). Selection Sunday is this weekend, and then it’s all about bracketology. I always look around the Internetz for a little help, and there’s no shortage of resources out there. There are roughly three ways to approach it…

Tap the hive mind

teamranker1

Yahoo Sports has an application called the “Team Ranker” that’s sort of like a Hot-or-Not for evaluating possible matchups. The theory is that the masses will collectively gravitate toward the most likely outcome. The obvious risk is that the Team Ranker application might be dominated by people who know nothing about college basketball and make their picks more or less at random. Imagine the Yahoo Answers kids attacking this one. Yikes.

Fanboys might be a problem too. Duke and UCLA, for example, have a lot of them – and haters too for that matter, so no matter how viable they might be as contenders, I would worry about people expressing their desires instead of their predictions. Finally, the official tournament seeds and rankings are themselves driven – in a way and in part – by a collection of opinions, so even if Yahoo’s Team Ranker is dominated by true college basketball aficionados, I would expect the results to follow the seeds.

Turn to the Experts

I’ve done well with this strategy in past tournaments, but it’s not a sure bet. Taken as a whole, the experts tend to follow the seeds, and they inevitably split on all the toss-up games, so you still have to use your gut to a certain extent. The other challenge is that the expert commentary you can find is pretty disjointed. There are a lot of bits and pieces out there – separate breakdowns by region and conference, lots of hypothetical head-to-head matchups and riffs on narrow subjects like “injuries to watch” – so it’s difficult to synthesize it into any kind of cohesive set of picks. That said, the free resources I tend to look at are the obvious ones:

Each of these sites has its stable of pundits who crank out a furious stream of blog posts and articles between the time the field of 64 is announced and the first tip-off. The trick is to sift through the noise and spot the nuggets that can help you. Most of all, I look for predictions – especially whole brackets.

DIY science geekery

bb5

This is especially fertile ground for data junkies. Impress your friends by rattling off the latest betting odds or spouting opinions about how the Pomeroy Pythag Model stacks up against the Key Game Play stats model – if you can find any of this info for free. If you’re willing to pay, however, there are all kinds of nifty online tools to play with. One called Bracket Brains lets you dive deep into individual matchups. It costs anywhere from $26.95 to $79.95, although they do offer a free version that gives you a taste. Matchup by matchup, it provides a whole range of parameters you can tinker with to help you make your picks.

You can adjust how you think various slices of things like recent performance, strength of schedule and Vegas spread will factor in to each matchup. You can look at similar matchups from past tournaments (based on the parameters you set). You can even view a map showing the distance each team will travel to the game venue. As you tinker with the weightings of all these parameters, the projected outcome of the matchup in question changes in real time.

Another tool called Bracket Caster runs simulations based on each team’s past performance and calculated chances of winning against any other team. According to the description, every possible tournament game has been simulated one play at a time and repeated 10,000 times. Using this data, you can run your own simulations of the regional brackets, or look at a high-level analysis of any individual matchup.

Finally, one category of basketball statistics – efficiency – has become especially popular as a way to measure any team’s true merit and predict its performance in future games.

efficiency

A team’s offensive efficiency is defined simply as points scored per 100 possessions. Defensive efficiency is points allowed per 100 possessions. Defining a “possession” is somewhat more complicated, and I’ll spare you the details (go here if you’re interested). Last year, a Sports Illustrated blogger named Luke Winn wrote a compelling examination of just how good a predictor efficiency is (the actual post seems to have moved), which he nicely summed up as follows: “From 2004-07, only two teams outside the top 49 in defensive efficiency made the Elite Eight, and zero teams outside the top 25 made the Final Four.”

OK, back to work everyone.

Scout Labs launches its social media and brand monitoring solution

From ClickZ:

More than two years after its founding, San Francisco startup Scout Labs has unveiled its first software suite, an assortment of Web monitoring tools that allows marketers to monitor chatter about their brands across social and consumer-generated media.

The first phrase of the article says a lot. The launch of Scout Labs was a long time coming, and I was there at the very beginning. When my friend Jenny called me more than two years ago and laid out her vision for the Scout Labs software-as-a-service product, I jumped on board without any hesitation. From my years of consulting, I knew how hungry marketers are for data about what’s happening in social media circles, and I knew how much they routinely paid for it. I knew the idea of delivering this as an affordable SaaS tool was a winner. I took the title “Experience Architect” because I wanted it to describe my function rather than my position in the org chart, and along with Margaret and Jon, we embarked.

I poured a lot into Scout Labs for 18 months, before I became the first (and so far only) member of the original team to leave. Partly, I was impatient. Two-and-a-half years is a long time, and there were some pretty big bumps along the way.

In those two years a whole bunch of other companies entered what had been a totally empty playing field. The space was given a name (or several – social media analysis, social media monitoring, online brand monitoring); leaders emerged. Scout Labs drifted further and further from the center of attention until we weren’t mentioned at all anywhere. The challenge kept getting bigger.

Early on, we thought of the tool as having three pillars of functionality. We called these Tune In, Jump In and Collaborate. The foundation of everything was the ability to “Tune In” to the content real people generate about things marketers and brand managers care about. We knew it had to sift through the content, tease out the significant things and present these in ways that provide insight and meaning. We knew it shouldn’t be a passive or voyeuristic app, and that it needed to enable marketers and brand managers to engage with both fans and detractors. And finally, we wanted to enable and encourage teams to share their insights and work efficiently with each other.

As a UE guy, I furiously cranked out screens for our dream application – as well as the real one of course. I labored over a myriad of ways to slice and dice and visualize all the juicy data we expected to have in the tool. I whiteboarded like hell. Then I waited… for the data mostly. We all waited. Except for the engineers, who labored just as furiously to go after that data.

And that’s the other thing that happened in the course of those “more than two years.”

You can have all the cool visualizations and analysis you want, but it’s worthless without good data. And although there’s suddenly some stiff competition out there – from free tools as well as some very expensive services – I believe this is where Scout Labs will rule. Some of the early criticism in the first wave of blog posts and comments about Scout Labs (largely from people that haven’t actually used the tool) is that it’s pretty barebones. It’s thin on analysis.

But the word from the beta testers is that the content it returns is better, more complete and more interesting than what they were getting from Google Alerts. Better than Google. Now that’s good.

Now that the data Scout Labs returns is finally rock solid, the sexy slicing and dicing and visualizations will come soon. Much less than two years from now ;)

Congrats Scouts!

The Data Visualization Palette

I might expand this into a larger article at some point, but for now it’s just something I decided to cobble together for a quick post. Thinking about data visualization was a big part of my job at Scout Labs, and this represents my palette for expressing data in picture form.

Since color consists of three factors (hue, value and saturation), it’s three for the price of one from a data visualization standpoint. Hue can communicate difference, but value and saturation can communicate other dimensions – like degree of difference. Color is tricky though. You have to be careful to accommodate colorblind people and black and white printing.
Size is good for expressing one dimension of difference between things. It suggests something quantitative. If precision matters, then it’s safer to vary size along just one axis (e.g. length). Studies show that people are bad at judging area and angles. They can tell when one line is roughly twice as long as another, but they’re wildly off when they try to guess the exact difference in area between, say, two adjacent circles or two sections of a pie chart.
Shape is a good way of creating very basic distinctions between things – or classes of things. It works well, for example, in scatter diagrams and other visualizations that plot data in two- or three-dimensional space.
Decoration is good when you want to make an item or a small subset of items stand out from a larger set. Decoration can be more or less subtle, so I like to use it to represent variation as opposed to difference.
For position to mean anything, it helps to have stable reference points – like x and y axes (i.e. a grid). Meaning is expressed by the position of objects relative to each other of course, but more importantly it’s expressed in the position of objects relative to the axes.
Motion can be a powerful way to add directional nuance around things like trends, or to wrap in concepts like velocity, but the biggest drawback, obviously, is that motion isn’t possible on paper and needs to be translated into something else.

Obviously these aren’t mutually exclusive. People are capable of grokking a number of concepts from a single visualization, so I usually combine dimensions from the palette. Sometimes I combine things just for efficiency – to get more out of each pixel so to speak. More often, I combine things when I feel like they make sense together.

For example, I might use hue to represent positive or negative sentiment in a product review, saturation or value to represent the intensity of the sentiment, and size to represent the reach of the source.

Return of the Douchebag

Perusing the online blogopolis today, I saw that Xeni over at boingboing has proposed a new greeting card that would thank the recipient for “not douching out.”

“Douchebag” was a fairly common insult back when I was in high school – in the ’80s, but it seems to have made a comeback. Jon Stewart uses it fairly regularly, and I even used it in a blog post a while back to describe how I feel about Chad from those Alltel commercials.

I don’t remember hearing the word much or at all between, say, 1988 and 2005, so out of curiosity I did a Google Trend search for it:

Apparently I’m right. As you can see, the word is virtually nonexistent in searches before around the fall of 2005, and then it suddenly leaps back into the vernacular. I wonder what happened in 2005. What kind of douchebaggery was it that made people reach back in the collective consciousness and pull ‘douchebag’ out of the mothballs.

I don’t know, but if I were to put on my CSI hat, I’d probably start my investigation in Austin TX, where there seems to be a concentration of doucheness:

UPDATE: In September 2005, during the aftermath of hurricane Katrina, George Bush congratulated then FEMA director, Michael Brown with the now infamous line, “Brownie, you’re doing a heckuva job.” If that’s not douche-worthy, then I don’t know what is.

As a final thought, I’ll share my own opinion on ‘douchebag’ and its return. I like it. It just seems to be the perfect term to describe a certain collection of human qualities. Just what those qualities are is something we were talking about at work a few months ago, and we didn’t come up with an answer beyond ‘you know doucheness when you see it.’

We did agree that ‘douchebag’ describes a collection of qualities as opposed to a type of behavior. As an example of a behavior-describing word, we thought of ‘asshole.’

So, if you make a reckless move and cut someone off in traffic, you’re definitely an asshole. If you’re driving a Hummer and talking on your cell phone, then there’s a good chance you’re also a douchebag.

Make sense?

Pronouns used by the candidates in their post-caucus speeches

pronouns used by the candidates in their post-caucus speeches

File this under “random.” I’m not sure what to make of this, but as I was listening to the speeches of the various candidates after the caucus results were in, it occurred to me to count the number of times they used various pronouns.

Incidentally, I found no evidence that Fred Thompson actually spoke at all after his third place finish, so I substituted McCain – who was virtually tied with Thompson anyway.

Some things to note: I omitted instances of “you” in the numerous thank yous that started most of the speeches. McCain’s and Romney’s speeches were especially short – less than four minutes each. I believe Obama’s was the longest, although I didn’t time it. The applause he drew throughout certainly made it even longer. His 22 utterances of the second person pronoun was double Romney’s and nearly triple any of the other candidates. Clinton, perhaps not surprisingly, referred to herself quite a bit – using the first person singular more than any other candidate. Edwards – who also gave a long speech – referred to “we” and “us” more than the other candidates. It’s also worth noting that he devoted a significant part of his speech to stories about particular people (using third person pronouns for them).

For that matter, all the candidates used the third person numerous times to refer to family members and associates, to the other candidates and often in abstract references to “Americans” in general, as well as particular groups of American citizens. Initially, I was counting all the third person pronouns too, but the candidates used “they” to refer to such a wide variety of concepts that it became too muddy for my simple analysis.

Anyway, here are some key quotes from the evening:

Obama: “You said the time has come to move beyond the bitterness and pettiness and anger that’s consumed Washington.”

Edwards: “It’s our responsibility to ensure that we leave America better than we found it; that we give our children a better life than we’ve had.”

Clinton: “I am so ready for the rest of this campaign, and I am so ready to lead.”

Huckabee: “Ladies and gentlemen, I recognize that running for office, it’s not hating those who are in front of us. It’s loving those who are behind us.”

Romney: “We love you. We’re going to miss you for a few months, but we’re coming back. We’ll never forget what you’ve done for us. We love Iowa.”

McCain: “We can feel the momentum that — the same kind of momentum we felt in 2000. I’m very confident with a strong positive finish here that we’re going to win here in New Hampshire.”

red vs. blue – wherever you are

A group calling themselves gravity monkey has gone mobile with FundRace’s geo-coded FEC data. They’ve created a Java app called red | blue (pronounced “red or blue”) that can tell you whether or not you’re standing in enemy territory.

compass.gif gauge.gif breakdown

© 2009 Shawn Smith | Creative Commons.
Entries RSS Comments RSS