Category Archives: mathematica

Class and Survival on the Titanic

Detailed data about the passengers on the Titanic can be found widely online. I won’t replicate it all, though here is some basic class-level information about who survived the sinking on April 15, 1912.

data = {{"", "Lost", "Saved"}, {"Second Class Men", 147, 13}, {"Third Class Men", 399, 55}, {"Crew Men", 686, 189}, {"Third Class Children", 53, 23}, {"First Class Men", 115, 58}, {"Third Class Women", 81, 98}, {"Second Class Women", 15, 78}, {"Crew Women", 2, 21}, {"First Class Women", 5, 139}, {"First Class Children", 0, 5}, {"Second Class Children", 0, 24}};

first = RotateLeft[SortBy[Select[data, StringSplit[#[[1]]][[1]] == "First" &], First]];
second = RotateLeft[SortBy[Select[data, StringSplit[#[[1]]][[1]] == "Second" &], First]];
third = RotateLeft[SortBy[Select[data, StringSplit[#[[1]]][[1]] == "Third" &],
crew = SortBy[Select[data, StringSplit[#[[1]]][[1]] == "Crew" &], First];

A useful function:

Options[addSurvivalRatio] = {ratiodigits -> 2};

addSurvivalRatio[series_, OptionsPattern[]] :=
If[OptionValue[ratiodigits] == "none",
Map[{Last[StringSplit[#[[1]]]], #[[2]], #[[3]], #[[3]]/(#[[2]] + #[[3]]) // N} &, series],
Map[{Last[StringSplit[#[[1]]]], #[[2]], #[[3]], NumberForm[#[[3]]/(#[[2]] + #[[3]]) //N, {OptionValue[ratiodigits], OptionValue[ratiodigits]}]} &,

An overview, showing survivorship by class and man/woman/child status:

BarChart[Map[Transpose[addSurvivalRatio[#, ratiodigits -> "none"]][[4]] &, {first, second, third, crew}],
ChartElementFunction -> "FadingRectangle",
ChartLabels -> {Placed[{"First Class", "Second Class", "Third Class","Crew"}, Above], Placed[{"Men", "Women", "Children"}, Below, Rotate[#, Pi/2.4] &]},
PlotLabel -> "Survival Ratios\n"]

survivorship by class and sex

Walter Lord’s book A Night to Remember portrays third class passengers on the Titanic as nearly doomed, and suggests that some were effectively held belowdecks while first and second class passengers were evacuated to the lifeboats. This theme was picked up in James Cameron’s 1997 movie Titanic. And in Walter Lord’s 1998 book, The Night Lives On, he makes this explicit, with a chapter titled “What Happened to the Goodwins?” about a third class family that entirely perished. He implies that no similar losses were suffered in other classes.

Map[{#[[1]], NumberForm[#[[2]], 2]} &, overallTable] //
TableForm[#, TableHeadings -> {None, {"Group", "Survival"}}] &

overall survivorship table

Though it is true that third class as a whole did not do as well as first and second, the worst performing group on the ship was not among their number. The most fatal group that night was the second class men. Lord obscures this in his text by comparing the survivorship rate in third class to “first and second class” combined, thus hiding the fact that being a second class man on the Titanic was essentially a death sentence.

makeTable[group_] :=
Text[Grid[Prepend[addSurvivalRatio[group], titles2], Alignment -> {{Left, Right, Right, Right}, {Bottom, {Automatic}}}, Spacings -> {{Automatic, {2.2}}}, Background -> {Automatic, Automatic, {{2, 4} -> Pink}}, Dividers -> {None, {2 -> True}}]]

Grid[Transpose[{Map[Text[Style[#, Bold]] &, {"First Class", "Second Class","Third Class", "Crew"}], Map[makeTable[#] &, {first, second, third, crew}]}], Spacings -> {6, 3}, Alignment -> {Right, Automatic}]

how did the men do?

And while it is true that no children among the first and second class families perished, second class lost many families (Lilian and Earnest Carter, John and Sara Chapman, Harry and Shadrach Gale, Edgar and Frederick Giles, Leonard, Lewis and Stanley Hickman, Clifford and Ernest Jeffreys, William and Anna Lahtinen, and William and Dorothy Turpin).

Generally women did much better than men, even third class women did much better than first class men. It has been noticed by others that the survival rates among Americans was higher than among British citizens. Looking through the data, I notice that there were seven men on board with the title “Father,” all traveling second class, and they all died. Survivorship among priests was thus 0%, though their second-class status may well have been their primary risk factor.

With only 11 class/sex groups and only one sinking, there isn’t enough data to say much about the distribution of deaths, though it’s clearly not normally distributed. See the histogram below. A “fair” sinking would have low kurtosis, but kurtosis for this sinking was greater than 1.

Histogram[Flatten[Map[Transpose[addSurvivalRatio[#, ratiodigits -> "none"]][[4]] &, {first, second, third, crew}]], 12, PlotRange -> All,
PlotLabel ->
"Group survival probability not normally distributed,\nkurtosis " <>
ToString[NumberForm[Kurtosis[Flatten[Map[Transpose[addSurvivalRatio[#, ratiodigits -> "none"]][[4]] &, {first,second, third, crew}]]], 4]]]

histogram of survivorship by class and sex

Records have survived showing the fares paid by many passengers; the range was very wide even within each class. Converting the Pounds/shillings/pence used at the time to decimal pricing, and using a dummy variable of 1 to indicate survival, 0 to indicating death, we can run a regression to show the strength of wealth as a predictor of survival on the Titanic. I omit the raw data below because it requires so much space, but you should be able to find it online or e-mail me for a copy.

victimPaymentsDecimal =
Map[Total[# /. {s -> 1/20, d -> 1/(12*20)}] &, victimPayments]

survivorPaymentsDecimal =
Map[Total[# /. {s -> 1/20, d -> 1/(12*20)}] &, survivorPayments];

wDummies =
Join[Map[{#, 1} &, survivorPaymentsDecimal],
Map[{#, 0} &, victimPaymentsDecimal]];

dummyFit = Fit[wDummies, {x, 1}, x];

Show[ListPlot[wDummies, PlotRange -> All],
Plot[dummyFit, {x, 0, 520}, PlotStyle -> Pink], PlotRange -> All,
PlotLabel -> "Titanic Survivorship vs. Fare",
AxesLabel -> {"£", "Survivorship"}]

regression of survivorship vs. fare paid to the white star line

Lots of people died at all fare levels, though wealth is a significant predictor of survivorship. Not much changes if we omit the outliers who had paid more than 500 pounds sterling for their tickets. I’ll spare you the code and the graph, as it looks very similar to the above. However, for a final summary —

lm = LinearModelFit[wDummiesNoOutliers, x, x];


regression of survivorship vs. fare paid to the white star line

the earliest Mathematica logo

In the earliest Mathematica documentation, including the development pages visible in the scrapbook on the Wolfram website, an extremely early version of their evolving polyhedral logo is visible. It looks like this:
mathematica v1 logo

The image was, at the time, created with the following code:

In[]:= Needs["Graphics`Polyhedra`"]

In[]:= Show[Graphics3D[Stellate[Icosahedron[]]]]

This code would have worked unmodified for many releases of the software, through version 5 I suspect. Nowadays, however, to get the same output we would have to refer to the new library “PolyhedronOperations”, and use a variety of new options to shut off the otherwise automatic multi-coloration of the figure.

In[]:= Needs["PolyhedronOperations`"]

In[]:= Graphics3D[{Opacity[1], Glow[], FaceForm[Gray],
Stellate[PolyhedronData["Icosahedron", "Faces"]]},
Lighting -> "Neutral"]

And we get
mathematica logo 1 as rendered by Mathematica 8

IRR and NPV in Mathematica

If you work in finance, you probably compute net present values (NPVs) and internal rates of return (IRRs) from time to time. As of version 7, Mathematica does not have built-in functions for these. There are add-on libraries that one might buy but it seems wrong to have to pay for something that is both so simple and so important. I have tried to convince Wolfram to add functions for computing these to the standard libraries, but thus far without success. So I will provide sample code here for anybody looking to do these things in Mathematica.

Generally when working with financial data series, I prefer to use data objects that have an identifying header rather than just raw data. In the long run, it makes life much simpler. In a later post I will explain this in more detail and give examples. For this post, I will not use that technique, and will show how to complete the calculations of NPV and IRR in Mathematica for a simple list of {date/datum} pairs.

First, since we are going to be using Mathematica’s date functions, and my sample dates are in “Month/Day/Year” format, we had better shut off some warnings:


Now a function that takes a list of dates and a fixed annual discount rate, and returns a discount factor for each date.

sternDiscountFactors[dateList_, r_] := Module[{absDates, firstDate},
absDates = Map[AbsoluteTime[#] &, dateList];
firstDate = Min[absDates];
Transpose[{dateList, Map[1/(1 + r)^DateDifference[firstDate, #, "Year"][[1]] &, absDates]}]]

For an example of how this works, run
In[]:= sternDiscountFactors[{"1/1/2001", "1/1/2002", "1/1/2005", "1/10/2005"}, .05]

Out[]= {{"1/1/2001", 1.}, {"1/1/2002", 0.952381}, {"1/1/2005", 0.822702}, {"1/10/2005", 0.821713}}

Then we need a function for computing the net present value of a cash flow, given a fixed annual discount rate. Basically, it takes the dot product of vectors representing the discount factor on each date and the net payment on that date.

sternNetPV[dateAndCashFlowList_, r_] :=
Module[{cashflows, discountfactors},
cashflows = Transpose[dateAndCashFlowList][[2]];
discountfactors =
Transpose[sternDiscountFactors[Transpose[dateAndCashFlowList][[1]], r]][[2]];

Here is some sample data that we will use to test the NPV function and, in a minute, the IRR function:

testFlow = {{"1/1/2001", -1}, {"1/1/2002", .3}, {"1/1/2005", .4}, {"1/10/2005", .5}};

You can then test the NPV function with
In[]:= sternNetPV[testFlow, .05]

Out[]= 0.0256519

Computing the internal rate of return is now simple:

sternIrr[dateAndCashFlowList_] :=
Module[{eqn, r}, eqn = sternNetPV[dateAndCashFlowList, r] == 0; FindRoot[eqn, {r, .1}]]

and this can be tested with
In[]:= sternIrr[testFlow]

Out[]={r$3479 -> 0.0584316}

Aquí tiene. In real world applications, one would not typically rely on a fixed annual discount rate, and the NPV function would in most important contexts use a yield curve instead.

Number of Radios vs. the Number of Notified Mental Defectives

I found this amusing dataset in high school while trying to use a horrid statistical package (version 1 of DataDesk, I think). To the naive eye, it looks like evidence that radios drive people mad, or did so back in the England of the 1920s and ’30s. The units are millions of radio licenses issued in the U.K. vs. number of “notified mental defectives” per 10,000 of estimated population. The correlation coefficient is 0.99 and the P-value is on the order of 10^-10. There are a number of lessons we can learn from this — correlation does not prove causation, how to do labels by each point in Mathematica using the Epilog command, and to remember to shut off your radio.

How do we make such a graph?

First, our data:
masterlist = {{1924, 1.35, 8}, {1925, 0.196, 8}, {1926, 2.27, 9}, {1927, 2.483, 10}, {1928, 2.73, 11}, {1929, 3.091, 11}, {1930, 3.647, 12}, {1931, 4.62, 16}, {1932, 5.497, 18}, {1933, 6.26, 19}, {1934,  7.012, 20}, {1935, 7.618, 21}, {1936, 8.131, 22}, {1937, 8.593, 23}};

You can make a quick table of this data with
TableHeadings -> {None, {"\nYear", "\nRadios", "Mental\nDefectives"}}]

For the purpose of graphing the relationship between radios and mental defectives, we’ll want to separate the data from the yearly labels. First, the data:

justData = Transpose[{Transpose[masterlist][[2]], Transpose[masterlist][[3]]}];

The labels are going to be included in the graph using the Epilog[] command. This can be done at the time we run the Plot[] command but I find it easier to run it first for clarity and ease of debugging.

dateEpilog =
Map[Text[#[[1]], #[[2]]] &,
Transpose[{Map[ToString[#] &, Range[1924, 1937]],
Transpose[Transpose[justData] + {0, 1}]}]];

Then we can do the graph, including the Epilog we just constructed to label the points:

dotplot = ListPlot[justData, Epilog -> dateEpilog];

We could call it complete here, though I prefer to show the regression line on the same axes as the points. We’ll determine that line with the Fit[] command, then graph it, and combine it with the dotplot using the Show[] command.

defectiveFit = Fit[justData, {1, x}, x];

Show[dotplot, Plot[defectiveFit, {x, 0, 9}, PlotStyle -> Orange],
AxesLabel -> {"Radios", "Mental Defectives"},
PlotLabel -> "Correlation does not mean causation\nin the U.K."]

This gives us the graph seen at the top of this post — data nicely labeled, with the regression line shown. Sometimes it’s best to leave the PlotMarkers blank and just use the labels (the years in this case) as markers, Edward Tufte style. The hard part is getting the Epilog right, as that puts the labels in the right places. I hope this example has been useful.

How strong is the relationship between the number of radios and the number of mental defectives? In old versions of Mathematica, this was answered by loading the LinearRegression package and using the Regress[] command. In modern versions, we do it like this:

linearmodel = LinearModelFit[justData, x, x];

You can get Pearson’s correlation coefficition with the Correlation[] command. It’s about 0.99. Credit where credit is due, this data seems to have been originally compiled, and offered as an illustration of spurious correlation in “An Introduction to the Theory of Statistics” by G. E. Yule and M. G. Kendall in the early 20th century.