I found this amusing dataset in high school while trying to use a horrid statistical package (version 1 of DataDesk, I think). To the naive eye, it looks like evidence that radios drive people mad, or did so back in the England of the 1920s and ’30s. The units are millions of radio licenses issued in the U.K. vs. number of “notified mental defectives” per 10,000 of estimated population. The correlation coefficient is 0.99 and the P-value is on the order of 10^-10. There are a number of lessons we can learn from this — correlation does not prove causation, how to do labels by each point in Mathematica using the Epilog command, and to remember to shut off your radio.

How do we make such a graph?

First, our data:

`masterlist = {{1924, 1.35, 8}, {1925, 0.196, 8}, {1926, 2.27, 9}, {1927, 2.483, 10}, {1928, 2.73, 11}, {1929, 3.091, 11}, {1930, 3.647, 12}, {1931, 4.62, 16}, {1932, 5.497, 18}, {1933, 6.26, 19}, {1934, 7.012, 20}, {1935, 7.618, 21}, {1936, 8.131, 22}, {1937, 8.593, 23}};`

You can make a quick table of this data with

`TableForm[masterlist,`

TableHeadings -> {None, {"\nYear", "\nRadios", "Mental\nDefectives"}}]

For the purpose of graphing the relationship between radios and mental defectives, we’ll want to separate the data from the yearly labels. First, the data:

`justData = Transpose[{Transpose[masterlist][[2]], Transpose[masterlist][[3]]}];`

The labels are going to be included in the graph using the Epilog[] command. This can be done at the time we run the Plot[] command but I find it easier to run it first for clarity and ease of debugging.

`dateEpilog =`

Map[Text[#[[1]], #[[2]]] &,

Transpose[{Map[ToString[#] &, Range[1924, 1937]],

Transpose[Transpose[justData] + {0, 1}]}]];

Then we can do the graph, including the Epilog we just constructed to label the points:

`dotplot = ListPlot[justData, Epilog -> dateEpilog];`

We could call it complete here, though I prefer to show the regression line on the same axes as the points. We’ll determine that line with the Fit[] command, then graph it, and combine it with the dotplot using the Show[] command.

defectiveFit = Fit[justData, {1, x}, x];

` `

`Show[dotplot, Plot[defectiveFit, {x, 0, 9}, PlotStyle -> Orange],`

AxesLabel -> {"Radios", "Mental Defectives"},

PlotLabel -> "Correlation does not mean causation\nin the U.K."]

This gives us the graph seen at the top of this post — data nicely labeled, with the regression line shown. Sometimes it’s best to leave the PlotMarkers blank and just use the labels (the years in this case) as markers, Edward Tufte style. The hard part is getting the Epilog right, as that puts the labels in the right places. I hope this example has been useful.

How strong is the relationship between the number of radios and the number of mental defectives? In old versions of Mathematica, this was answered by loading the LinearRegression package and using the Regress[] command. In modern versions, we do it like this:

linearmodel = LinearModelFit[justData, x, x];

linearmodel["ParameterTable"]

You can get Pearson’s correlation coefficition with the Correlation[] command. It’s about 0.99. Credit where credit is due, this data seems to have been originally compiled, and offered as an illustration of spurious correlation in “An Introduction to the Theory of Statistics” by G. E. Yule and M. G. Kendall in the early 20th century.

Pingback: Mental defectives, part ii | monkeywrench