Monday, March 27, 2006

Titanic Disaster Revisited

There are many stories and myths about the sinking of the Titanic.

We recently compiled the data on the lifeboats (721 passengers who entered a lifeboat) and a nice pattern came out in the fluctuation diagram of “launch sequence no.” against “class”.


(female passengers highlighted)

The “fate” of the lifeboats is listed under http://www.encyclopedia-titanica.org/lifeboats/

What can be learned from the fluctuation diagram?

  • 1st class was served with the first 6 boats
  • The next 4 boats were mainly 2nd class
  • Boats 11-14 had many 3rd class passengers
  • Boats 3 and 12 were less filled and almost no men entered these boats
  • Boats 18-20 are not well filled

There is more to see, when reading the story and the graphics side by side.

Posted by Martin at 08:51:00 | Permalink | Comments (4)

The Good & the Bad [3/2006]

This time no fancy graph, but a nice example of “less can be more”. I took this month’s example from the R Graph Gallery.
Here’s “The Bad”:

Above graphics tries to depict a r by s table in a 3-d view. Uwe’s example has several problems

  • The 3-d view make a judgement of the heights almost impossible
  • The chosen view point (seems to be the default) puts 2 to 3 bars into a row, which makes it even harder to read the plot
  • What does the gray shading of the bars mean?
  • Why use meaningless random data for the example … (if there is nothing to interpret in the data, it is hard to prove that the plot has problems in interpreting the data displayed)

“The Good” uses Bertin’s “Accident” data. It is a simple fluctuation diagram of Age vs. Vehicle which performs very well to display simple tables:
Sizes of tiles are simply proportional to the counts in the category. Patterns and trends are easy to depict, though it lacks the fancy 3-d property …

(There will be a “Statgraphics 101″ on Mosaic plots and alike soon …)

Posted by Martin at 08:32:31 | Permalink | No Comments »

R Graph Gallery

There is a gallery of R graphics at http://addictedtor.free.fr/graphiques/index.php.

The nice thing of this gallery is that you can get the sources of all examples, so this can be a good starting point for your own custom graphics in R.

There is a similar page - not so sleek - at http://zoonek2.free.fr/UNIX/48_R/04.html which is a bit more instructive.

None of the pages raise the question whether the presented graphics are useful though …

Posted by Martin at 08:12:46 | Permalink | Comments (2)

Friday, February 3, 2006

User Interface Advice?!

There is a very funny list of the definite Do’s in good interface design …

http://toastytech.com/guis/uirant.html

In my own experience, I find it very hard to teach student programmers the Do’s and Don’ts in good user interface design. If the only thing they are used to is Windows or some X11 window manager, you kind of start at 0.

I wonder, whether the ironic way is more effective than teaching the how to’s directly (given they understand the irony!)?

Posted by Martin at 18:53:23 | Permalink | No Comments »

Thursday, January 5, 2006

The Good & the Bad [1/2006]

The new year starts with a Good & Bad posting …

The following figure was taken from:

http://www.forbes.com/2005/09/15/hometowns-networths-america-richest_05rich400_map.html © Forbes 2005

and explained by “Each disk represents one of the 400 richest Americans. They are arranged by hometown, and their size represents the person’s wealth:”


So this is obviously “The Bad”!

The guys who designed this graphics were obviously very ambitious to get as much information in it as possible. Let’s forget about the different years, what do we have here:

  1. Different colors for different states (actually meaningless)
  2. 10 different colors for the wealth ranges (arbitrarily ordered)
  3. the circle diameters are somehow proportional to the wealth (coding essentially the same as the 10 colors)
  4. Heights for the number of cases at a certain location (with quirky 3-d effect)
  5. … and of course the geographic location.

Since I don’t have the underlying data to produce the plot (would be to painful to get it from Forbes website) I can only present a graphics which does better for different data.

This is a choropleth map on US County level (showing rental prices). Adding the (x,y) coordinates of the actual cities where the millionaires live would be a good idea, too.

Posted by Martin at 08:18:53 | Permalink | Comments (5)

Friday, November 11, 2005

The Good & the Bad [11/2005]

Yes, it has been a while since I posted a “Good & Bad” …

But as I saw this “novel decision tree plot” on an advertisement by C&H for Paul’s R Graphics book, I got inspired again …

Now here is “The Bad”:

Ctree

Let me explain, what went wrong with the R graphics:

  1. A tree, which is just a special graph, consists of nodes and edges
    A full featured barchart in a leaf is certainly a doubtfull glyph for a leaf/node!
  2. The size of the nodes must be read as text …
  3. Side by side barcharts are among the weaker representations to display a proportion
  4. The numbering of the nodes is non-standard and does not help reading the information
  5. Now think of a tree with, say, 10 inner nodes and 11 leaves! How big must the plotting device be to display so much (overhead) information?

… and here is “The Good” (from Simon Urbanek’s KLIMT)

KLIMT1

This representation is much clearer. Not to overstress Tufte, but the data-ink-ratio in the KLIMT plot is hard to beat!

I personally would prefer the next display (which only takes two key-strokes to change from the first plot!), which puts leaves on the “correct” level and has proportionally sized nodes.

KLIMT2

Posted by Martin at 10:50:29 | Permalink | No Comments »

Thursday, November 10, 2005

R Graphics by Paul Murrell

RGraphics

Well, here it finally is:
The R Graphics book. Probably the ultimate resource for contructing R graphics.

What you don’t get is anything on statistical graphics. The book won’t give you any advise on an efficient use of graphics for statistical data analysis or diagnostics of statistical models. At most, it gives several examples of nifty info graphics - which is still nice.

(For those of you who always wanted to know how to create a crossword in R - that’s it!)

Posted by Martin at 13:52:20 | Permalink | No Comments »

Friday, August 12, 2005

Best of Show

One of the better sessions at the 2005 JSM was on “Data and the Digital Arts”. Here are some visualizations which are quite nice. Not from a general statistical analysis view, but still nice to watch. What is interesting with all examples is the use of smooth animation, which is probably not necessary for most of them, but can be very helpful to preserve context.

- enjoy

Baby wizard ZIP Codes Text Arc
BW ZIP TA
Martin Wattenberg Ben Fry W. Bradford Paley
(click on the images to enlarge)
Posted by Martin at 00:24:09 | Permalink | No Comments »

Tuesday, August 2, 2005

News Visualization

The “Buzztracker” is a nice visualization of current world news. Here is a screenshot of Aug. 2nd.

Buzztracker

Circles are sized according to the amount of news related to a site, lines link sites which are connected by an event in an article.

There is also a dashboard widget available:

Posted by Martin at 08:03:18 | Permalink | No Comments »

Wednesday, July 20, 2005

Tour de France 2005 Special

This special will track the development of the Tour de France 2005. Using parallel coordinate plots, the stage result, total elapsed time and the current ranking are displayed from day to day.

All Stages

Stage Results cumulative Time Ranks
Stage Total Rank
(click on the images to enlarge)

Stage 02: ZANINI, VAN BON and VANSEVENANT fell behind in stage 2.
Stage 03: HINAULT, ALBASINI and ZABALLA fell behind in stage 3.
Stage 04: ZABRISKIE lost the yellow jersey after crashing close to the finish.
Stage 05: Calm day, no trouble. ZABALLA is the first withdrawel.
Stage 06: VINOKOUROV’s first (small) step towards ARMSTRONG.
Stage 07: Another mass arrival.
Stage 08: The first mountains spread the field - ZABRISKIE now almost last!
Stage 09: VOIGT gets the yellow jersey, ARMSTRONG back to No. 3, ZABRISKIE out.
Stage 10: ARMSTRONG in yellow again, VOIGT back to 72, Ullrich fighting.
Stage 11: VINOKOUROV’s day, VOIGT out.
Stage 12: Calm day after the last Alps stage.
Stage 13: Day of the sprinter. VALVERDE out.
Stage 14: TOTSCHNIG wins, BOTERO lost almost 30′.
Stage 15: RASMUSSEN and BASSO switch places, Ullrich can’t keep pace.
Stage 16: No big changes, BOTERO definitely doesn’t like the Pyrenees.
Stage 17: SAVOLDELLI wins longest stage with biggest gap, KLÖDEN out.
Stage 18: ULLRICH gains 37″ on RASMUSSEN.
Stage 19: Second T-Mobile victory. ULLRICH needs more than 2′ to RASMUSSEN.
Stage 20: All of bad luck for RASMUSSEN, ULLRICH now 3rd.
Stage 21: VINOKOUROV’s good bye to T-Mobile …

Thanks for following this special, and see you next year!

This is the parallel coordinate plot for the total time of the 2004 Tour de France for all stages.

TDF2004
Lance Armtrong and Jan Ullrich are highlighted.

To play with the data yourself, get the Mondrian software and the dataset.

Posted by Martin at 19:01:55 | Permalink | Comments (2)