Main point: One should be skeptic when presented by statistics. One way to articulate well-founded skepticism is by following 3 steps of reasoning.
The average American is expected to live seventy-three years. Therefore if you are sixty-eight you can expect to live five more years, and should plan accordingly.
The above quote is from Nassim Talebs book Fooled by Randomness where he quotes a journalist for those exact words. Taleb points out the error that one conclusion has been moved to another context. The conclusion has been lifted into a higher order without taking that order into account. For this exact example, we see that the expected years to live are used both in a conclusion for all Americans and Americans who already have reached the age of 68.
These pitfalls can be detected and assessed and we can build small programs that can validate statements and to what extent they can be trusted. Like Taleb I use Monte Carlo simulations and follow 3 easy steps to do so:
- Build a distribution that yields the elements of interest in the statements, in this case, dead people.
- Read the statement carefully and extract conditionals.
- Build new specialized distributions taking conditionals into account.
I have done so just with Danish deaths and not the American ones.
Sampling the Death Realm
From the Danish institute for statistics, I can download a list of deaths from 2020dst. The information I get is gender, age, year of death, and, by inference, that they are Danish nationals. The list looks like following
observations = [(0, Men), ... (73, Women), (73, Women), (73, Women), (74, Men), (74, Men), (74, Men), (74, Men), ... ]
where the first number is the age and the middle element the gender. The list
long. exactly 54.645 deaths were recorded in Denmark in 2020. From that
list we simply draw a random element, that's it. We now have our distribution.
I like to think of this as the deathRealm
. In Haskell it can be defined as
followsmonad.
deathRealmRaw :: P (Age, Gender)deathRealmRaw = uniform observations
When we want to calculate the expected age we simply draw 10.000 elements from that list at random and average over their age. When I do that I get 77.65 years as the expected value. This is a bit different from the Americans' expected age but the idea is the same.
The Statement
The statement of interest from the quote was the second half:
Therefore if you are sixty-eight you can expect to live five more years, and should plan accordingly.
Understanding the orders is the craft as there might some indirect ones and some direct ones. Regardless, it is mentioned that you are 68 years old, so we will at least incorporate that into our distribution. This is done swiftly by discarding all deaths that do not satisfy that claim that the person is 68 years old. To be a bit more general, I built a function that can do this for any provided age.
deathRealmPerAge :: Age -> P (Age, Gender)deathRealmPerAge age = do (sampledAge, gender) <- deathRealm if sampledAge >= age then return (sampledAge, gender) else deathRealmPerAge age
In the code we draw a sample, check if it is more that the declared age. If so then we uge it, otherwise we try again. Again we draw 10.000 elements and take the average of their age.
> expectedAge $ deathRealmPerAge 6882.76
From here we see that expected age is more than 5 year more. Ie. if you have documented that you are able to reach a certain age, you are expected to become a bit older.
Naturally one is not able to build computational models and evaluate statements on the fly. But the main takeaway from Nassim Talebs example is the implicit lift that was done. Doing that should give som form og skepticism.
The death statistics in Denmark for 2020
↩I have use the probability monad to model the problems.
↩