RSS Feed
  1. Internetanschluss ohne Telefon

    October 27, 2012 thestatgraphics

    In der heutigen Zeit muss jeder Kosten sparen. Wozu brauche ich noch ein Telefon denke ich mir manchmal. Wozu gibt es Facebook, whatsapp und skype? Ich prüfe also meine restlichen Vertragslaufzeiten und suche mir dann anschließend einen Internetanschluss ohne Telefon. Das wird heutzutage immer öfter von Firmen angeboten. Meist geht das dann über den Surfstick. Die Geschwindigkeit ist dabei ausreichend.

    Gut ist diese Alternative auch für Leute die keinen Telefonanschluss bekommen können. Entweder wegen negativer Schufaeinträge oder weil die einfach soweit draußen wohnen und oder auch mobil unterwegs sind, dass ein fester Telefonanschluss einfach nicht machbar ist.

    Ich kann euch nur raten auch mal diese Option im Kopf durchzuspeieln. Man kann wie schon gesagt ne Menge Geld sparen.


  2. Fernstudium Sozialpädagogik

    thestatgraphics

    Ich möchte mich gerne in meinem Leben sozial engagieren. Aber wie. Mann muss essen und man muss schlafen. Essen und ein Dach über dem Kopf kosten Geld. Man verdient aber kein Geld indem man Schwachen und Bedürftigen hilft, dachte ich. Ich habe nun meinen Traumjob gefunden: ich werde Streetworker und kann so den Leuten helfen. Ich werde zwar nicht reich aber ich kann meine Mite und etwas Essen davon finanzieren.

    Um Streetworker werden zu können muss man ein abgeschlossenes Studium der Sozialpädagogik vorweisen können. Da so etwas bei mir nicht angeboten wird und ich mir die Zeit gerne selber einteile, habe ich mich für das Fernstudium Sozialpädagogik entschieden. So bleibe ich total unabhängig und kann trotzdem meinem Traum etwas näher kommen. Eine tolle Schae wie ich finde.


  3. Die letzten Gruesse von mir!

    thestatgraphics

    Es war doch so schön! Aber irgendwann ist es auch mal gut – ich nehm den Hut.

    I will give further details on my strange behavior later …

    Buch macht kluch. Diese Redewendug kennen wohl die wenigsten. Es muss ja aber auch nicht immer ein Buch sein. Zwei sind halt oft besser als nur eins ;)

    In diesem Sinne: Viel Spaß beim Lesen. Auch wenn’s nicht gedruckt auf Papier ist. Aber das macht nichts. So schont man wenisgtens den Regenwald.
    Concordance Visualization
    Books featuring the “Search Inside” option in Amazon also have a concordance (here is a definition) added to the books info.

    Here are the 100 most frequently used words in Graphics of Large Datasets. Obviously, we are talking about how to “plot large data in figures”.

    (Pause the mouse over a keyword to see the number of occurences or click the link to see the context where the word is found)

    al  algorithm  analysis  approach  area  barchart  bars  between  bins  cases  categorical  categories  cells  chapter  class  companies  coordinate  data  datasets  density  diagram  different  displays  distribution  et  even  example  few  fig  figure  first  give  graph  graphics  group  highlighted  histogram  important  information  ing  interactive  large  layout  left  level  lines  may  means  methods  million  mosaic  new  nodes  number  parallel  pixel  plot  points  possible  problem  range  responses  results  right  sales  sample  sampling  scale  scatterplot  section  see  selection  set  several  should  shown  shows  size  small  software  space  split  state  statistical  structure  three  time  tion  tree  two  use  used  users  values  variables  view  visualization  weight  years  zooming

    October 17, 2006
    Communities of Interest …
    … is the name for subgraphs of a large network (e.g. telephone calls) which have certain target properties.

    Chris Volinsky has a very nice page, which allows to look for subgraphs that connect authors by papers in computer science journals (based on DBLP) or actors connected by movies (based on IMDB).

    Here is the proximity graph that connects me (not being a computer scientist) with Donald E. Knuth:

    Visit Chris’ page here.

    statistical graphics 101: Histograms
    It’s been too long since the last posting (on barcharts) in the teaching corner. This one will be on histograms.

    Histograms are often mistaken with barcharts. The fundamental distinction between the two is

    Barcharts show counts (or weights) for the discrete axis of a categorical variable
    Histograms show an approximation of the density function (if scaled accordingly) of a countinuous variable.

    As a consequence, the only thing that can be quantified in a barchart is the bar height (better length, which makes it independent from their orientation). On the other hand, in a histogram, the area of the boxes is proportional to the density approximation. If all bars have the same width in a barcharts, or gaps are drawn in a histogram (which is complete nonsense), the two plots can get mixed up.

    (% votes for Kerry in the 2004 presidential election)

    Much has been written about optimal bin width for histograms – almost nothing about the choice of the anchor point. Changing the latter often changes the shape of the histogram more dramatic than choosing between 8, 9 or 10 bins.
    Setting the anchor point from 0 to -2.4 yields:

    (changing the anchor point to -2.4)

    In most applications, there are sensible breaks which can be chosen. Since we look at 3,111 voting districts, we can use far more bins and start at 0 with bin width of 5.

    (using meaningful parameters (0, 5) – density estimate added)

    If we link a second attribute into the histogram, the whole thing gets more exiting!

    (all districts where more than 15% have a college degree selected)

    We don’t really can see what is going on here (although we might guess, that there is a slightly higher proportion of highlighting towards the higher votes for Kerry).

    When we use the same normalization trick as for Spineplots (see previous post), we get the clearer picture of the Spinogram:

    Well, that’s what we would have expected, probably except for the increase at the left end of the scale.

    The problem with the histograms used so far is that we looked at voting districts, and not at voters! This will distort the impression if the districts are not of equal size. Weighting above histogram will move the mode further to the right.

    (the weighted histogram represents voters not districts)

    Finally we get the weighted spinogram, which probably supports more the hypothesis … of the selected group.

    Antony pointed me to this nice example found on BBC News.

    So what is the message here? “Chinese and other foreigners (not being British nationals) more and more fill Brirish jails …”

    Well, as we look at the percentage changes, we do not have any clue about the underlying group sizes. As whites are by far the larger group in this example, the absolute increase for whites is certainly much bigger. Any better display at hand?

    So called “Skyline Plots” – as implemented in RENOIR – take the absolute size of groups into account by adjusting the bin width, such that the plot covers both aspects: absolute and relative change.

    (This is certainly a different example as the one from BBC News, but without the absolute figures it is impossible to recreate the skyline plot for the prison example.)

    Looking at the colors, we find the odd choice of coding an increase of prisoners in green and the decrease in red. (Does not make much sense, unless the graph comes from the company which runs the prinson …)
    June 30, 2006
    Le Tour 2006
    That’s it for this year. Au revoir 2007!.
    Stage Results     cumulative Time     Ranks
    Stage     Total     Rank
    (click on the images to enlarge)

    Prolog: Did anyone have Thor HUSHOVD on his list?
    Stage 01: Except for 5, all arrived in the peleton
    Stage 02: MC EWEN already on 3.
    Stage 03: First drop outs
    Stage 04: BOONEN still keeps the yellow jersey
    Stage 05: Only O’GRADY can’t keep the pace of the top 10 from the Prolog
    Stage 06: First 37 still in a window of one minute
    Stage 07: This was the day of team T-Mobile!
    Stage 08: No stage to remember …
    Stage 09: Still waiting for the mountains, so we look at Sebastian JOLY …
    Stage 10: MERCADO and DESSEL out of the blue?
    Stage 11: LANDIS now in yellow
    Stage 12: POPOVYCH’s day, now on 10. 5 withdrawls after the mountains.
    Stage 13: PEREIRO SIO’s second place awards him the yellow jersey.
    Stage 14: No changes within the top 17.
    Stage 15: LANDIS back in yellow; two more mountain stages; down to 152
    Stage 16: LANDIS passes yellow to PEREIRO, KLÖDEN the real winner in the end?
    Stage 17: LANDIS back after great ride; top 3 within 30”
    Stage 18: No changes, we are all waiting for the show down tomorrow
    Stage 19: Everything as expected, LANDIS too strong for PEREIRO
    Stage 20: Profile of a winner …

    To play with the data yourself, get the Mondrian software and the dataset. Thanks goes to Sergej Potapov, who wrote the script to manage the data!
    Chart Junk?!
    Here is an example of the so called “Sectioned Density Plot”, which was recently published in “The American Statistician” (Vol 60, No. 2, 167-174).

    Using a simple histogram, maybe with an added density estimator, and/or a simple standard boxplot for group comparison does the job here. No need to “invent” a new plot, which introduces more problems as it solves any.
    Actually this plot makes a good case against the use of graphics …
    June 12, 2006
    The Good & the Bad [6/2006]
    Haven’t we been preaching against 3-d barcharts and the like for a long time? Here is what we didn’t think of in our wildest dreams: an animated 3-d barchart!

    As you might guess, this is the usage statistics from this blog over the last week. It is not hard to draw “The Good” (far less sensational)

    Tour de France 2005 Update
    Well, the Tour de France 2006 is only a few weeks ahead, probably still time to give an update on the 2005 data.
    We now have the data recompiled with drop outs (thanks to Sergej).

    Interesting to see that almost half of the cyclists categorized as “sprinter” don’t make it to the Champs-Elysees.

    Looking at the ranks is quite funny as well. Here’s what happens when you start as first and last …
    (Certainly, David Zabriskie would have probably looked better when not having the crash in the team time trial)
    MS Chart Junk
    For those of you who do not stop by at JunkCharts regularly, here is what John S. found:

    Probably one of the best example of “featurism”; computer scientists set free with no idea of application …

    February 03, 2006
    User Interface Advice?!
    There is a very funny list of the definite Do’s in good interface design …

    In my own experience, I find it very hard to teach student programmers the Do’s and Don’ts in good user interface design. If the only thing they are used to is Windows or some X11 window manager, you kind of start at 0.

    I wonder, whether the ironic way is more effective than teaching the how to’s directly (given they understand the irony!)?

    January 05, 2006
    The Good & the Bad [1/2006]
    The new year starts with a Good & Bad posting …

    The following figure was taken from:

    and explained by “Each disk represents one of the 400 richest Americans. They are arranged by hometown, and their size represents the person’s wealth:”

    So this is obviously “The Bad”!

    The guys who designed this graphics were obviously very ambitious to get as much information in it as possible. Let’s forget about the different years, what do we have here:

    Different colors for different states (actually meaningless)
    10 different colors for the wealth ranges (arbitrarily ordered)
    the circle diameters are somehow proportional to the wealth (coding essentially the same as the 10 colors)
    Heights for the number of cases at a certain location (with quirky 3-d effect)
    … and of course the geographic location.

    November 11, 2005
    The Good & the Bad [11/2005]
    Yes, it has been a while since I posted a “Good & Bad” …

    But as I saw this “novel decision tree plot” on an advertisement by C&H for Paul’s R Graphics book, I got inspired again …

    Now here is “The Bad”:

    Ctree

    Let me explain, what went wrong with the R graphics:

    A tree, which is just a special graph, consists of nodes and edges
    A full featured barchart in a leaf is certainly a doubtfull glyph for a leaf/node!
    The size of the nodes must be read as text …
    Side by side barcharts are among the weaker representations to display a proportion
    The numbering of the nodes is non-standard and does not help reading the information
    Now think of a tree with, say, 10 inner nodes and 11 leaves! How big must the plotting device be to display so much (overhead) information?

    … and here is “The Good” (from Simon Urbanek’s KLIMT)

    KLIMT1

    This representation is much clearer. Not to overstress Tufte, but the data-ink-ratio in the KLIMT plot is hard to beat!

    I personally would prefer the next display (which only takes two key-strokes to change from the first plot!), which puts leaves on the “correct” level and has proportionally sized nodes.

    KLIMT2
    November 10, 2005
    R Graphics by Paul Murrell

    RGraphics

    Well, here it finally is:
    The R Graphics book. Probably the ultimate resource for contructing R graphics.

    What you don’t get is anything on statistical graphics. The book won’t give you any advise on an efficient use of graphics for statistical data analysis or diagnostics of statistical models. At most, it gives several examples of nifty info graphics – which is still nice.

    (For those of you who always wanted to know how to create a crossword in R – that’s it!)
    August 12, 2005
    Best of Show
    One of the better sessions at the 2005 JSM was on “Data and the Digital Arts”. Here are some visualizations which are quite nice. Not from a general statistical analysis view, but still nice to watch. What is interesting with all examples is the use of smooth animation, which is probably not necessary for most of them, but can be very helpful to preserve context.

    - enjoy
    Baby wizard     ZIP Codes     Text Arc
    BW     ZIP     TA
    Martin Wattenberg     Ben Fry     W. Bradford Paley
    (click on the images to enlarge)
    August 02, 2005
    News Visualization
    The “Buzztracker” is a nice visualization of current world news. Here is a screenshot of Aug. 2nd.

    Buzztracker

    Circles are sized according to the amount of news related to a site, lines link sites which are connected by an event in an article.

    There is also a dashboard widget available:
    July 20, 2005
    Tour de France 2005 Special
    This special will track the development of the Tour de France 2005. Using parallel coordinate plots, the stage result, total elapsed time and the current ranking are displayed from day to day.

    All Stages
    Stage Results     cumulative Time     Ranks
    Stage     Total     Rank
    (click on the images to enlarge)

    Stage 02: ZANINI, VAN BON and VANSEVENANT fell behind in stage 2.
    Stage 03: HINAULT, ALBASINI and ZABALLA fell behind in stage 3.
    Stage 04: ZABRISKIE lost the yellow jersey after crashing close to the finish.
    Stage 05: Calm day, no trouble. ZABALLA is the first withdrawel.
    Stage 06: VINOKOUROV’s first (small) step towards ARMSTRONG.
    Stage 07: Another mass arrival.
    Stage 08: The first mountains spread the field – ZABRISKIE now almost last!
    Stage 09: VOIGT gets the yellow jersey, ARMSTRONG back to No. 3, ZABRISKIE out.
    Stage 10: ARMSTRONG in yellow again, VOIGT back to 72, Ullrich fighting.
    Stage 11: VINOKOUROV’s day, VOIGT out.
    Stage 12: Calm day after the last Alps stage.
    Stage 13: Day of the sprinter. VALVERDE out.
    Stage 14: TOTSCHNIG wins, BOTERO lost almost 30′.
    Stage 15: RASMUSSEN and BASSO switch places, Ullrich can’t keep pace.
    Stage 16: No big changes, BOTERO definitely doesn’t like the Pyrenees.
    Stage 17: SAVOLDELLI wins longest stage with biggest gap, KLÖDEN out.
    Stage 18: ULLRICH gains 37″ on RASMUSSEN.
    Stage 19: Second T-Mobile victory. ULLRICH needs more than 2′ to RASMUSSEN.
    Stage 20: All of bad luck for RASMUSSEN, ULLRICH now 3rd.
    Stage 21: VINOKOUROV’s good bye to T-Mobile …

    Thanks for following this special, and see you next year!

    This is the parallel coordinate plot for the total time of the 2004 Tour de France for all stages.

    TDF2004
    Lance Armtrong and Jan Ullrich are highlighted.

    To play with the data yourself, get the Mondrian software and the dataset.
    May 01, 2005
    The Good & the Bad [5/2005]
    This month’s edition is just perfect to show how NOT to do it!
    The Bad of the month May is from a talk by Kurt Hornik given at the compstat 2004 meeting in Prague. It looks like follows:

    This is the famous barley data used in Bill Cleveland’s Visualizing Data many times.
    Well, on a first view we would say it looks good …

    Here is what went wrong with this graphic:

    Never use areas to display continuous variables. Continuous values should be plotted along an axis as points or other sensible glyphs.
    Use stacked barcharts only for proportions, that add up to a fixed amount (say 100%). Put the least varying class at the bottom of the stack, the more varying clases at the top.
    Avoid ”scale hopping”, i.e. the things that should be compared must be plotted along ONE scale.

    (Not to mention that the legend messes up the colors …)
    Can you see anything ‘out of line’ in the data?

    Using the same lattice package in R, we can do much better. Here is the Good:
    Now we use points, and only have one scale for the whole plot … and, aha! Somethings wrong with the field in Morris.
    But talking about this feature and talking about statistical graphics, there is only one very simple and long known plot to display the feature in the data: the Interaction Plot.

    The feature we spotted in the data it nothing else than an interaction of the factor year for the site Morris, which is mostly related to a transcription error.

    Going from the first to the third plot, we more and more focus on the ‘right’ information, and need less ink to draw it, which nicely corresponds to Tufte’s data/ink ratio.

    Obviously, boxplots are another good choice for visualize this kind of data.
    April 22, 2005

    This month’s graphics is the Barchart. The barchart is used to visualize categorical data. It is often confused with the histogram, which can only be used if the data is continuous. The following barchart shows the distribution of all passengers of the Titanic according to their classes.
    All passengers who survived are highlighted in red. As it is not an easy task to compare the proportions between the classes, one might want to switch to the Spineplot view. In a spineplot, the proportionality is exchanged between width and height, but the highlighting direction is kept the same.
    In a spineplot it is a trivial task to compare the highlighting across categories.

    April 01, 2005
    The Good & the Bad [4/2005]
    Two examples of plotting geographical information.
    The first ”The Bad” is from ”Informationen zur politischen Bildung, No. 285″ (Information for political education) and gives a very good example how badly human perception works for judging the area of a circle

    GNP

    The second “The Good” is from the 19th century and shows a map of several kinds of cattle in Bavaria in Germany. The author of this graph uses a quite fancy double overlay to mix the two measures in one map. (Can be found in Howard Wainer’s ”Visual Revelations”).

    Cattle

    [ The Good & the Bad is a monthy posting showing outstanding examples of statistical data visualization ]