Science » Math & Statistics

The best books on Statistics

recommended by Andrew Gelman

Award-winning statistician and political scientist Andrew Gelman says that uncertainty is an important part of life, and recognition of that uncertainty is itself an important step. He picks the best books on statistics.

Buy all books

Read

The Bill James Historical Baseball Abstract
by Bill James

Read

Judgment under Uncertainty: Heuristics and Biases
by Daniel Kahneman & Paul Slovic and Amos Tversky

Read

How Animals Work
by Knut Schmidt-Nielsen

Read

The Honest Rainmaker
by A J Liebling

Read

How to Talk So Kids Will Listen and Listen So Kids Will Talk
by Adele Faber and Elaine Mazlish

These books you’ve picked – have you chosen them to get people interested in statistics? Or are they more for people who are already interested in statistics as a way to think about statistics?

Statistics is what people think math is. Statistics is about patterns and that’s what people think math is about. The difference is that in math, you have to get very complicated before you get to interesting patterns. The math that we can all easily do – things like circles and triangles and squares – doesn’t really describe reality that much. Mandelbrot, when he wrote about fractals and talked about the general idea of self-similar processes, made it clear that if you want to describe nature, or social reality, you need very complicated mathematical constructions. The math that we can all understand from high school is just not going to be enough to capture the interesting features of real world patterns. Statistics, however, can capture a lot more patterns at a less technical level, because statistics, unlike mathematics, is all about uncertainty and variation. So all the books that I thought of, they’re all non-technical, but they’re all about variation and comparison and patterns. I put them in order from most statistical to least statistical. Most people would probably only consider the first of them as really about statistics, but they’re all about statistical thinking, as I view it.

Your first book, then, is The Bill James Baseball Abstracts, from 1982 to 1986. I have to confess I know nothing about either statistics or baseball…

Baseball and statistics traditionally go together. One of my inspirations to become a statistician was reading The Bill James Baseball Abstracts. I can’t remember what Bill James did before, but he had an unusual career: I believe he was a nightwatchman. He was not employed by any baseball team or academic organisation. He just, on his own, decided he wanted to study baseball statistics. He wrote a series of books called The Baseball Abstracts that became widely published, starting in 1982, and became cult classics. In these books he mixes in stories about baseball and goofy statistics – which in the pre-ESPN era weren’t widely available – with in-depth analysis of questions such as, which is more important: Speed or power? At what age are baseball players most productive?

People had looked at that before, and apparently there was baseball lore which said baseball players are most productive between the ages of 28 and 32. Bill James looked at the statistics, and it did look like players between the ages of 28 and 32 were the best players. But then he looked more carefully and it turned out that that wasn’t really true, that there was a selection effect. That the players who were staying past the age 30 – which is actually an advanced age in baseball years – were actually the best players. And if you look at the individual players, it turned out that they were mostly peaking around the age of 27. The conventional wisdom was wrong, and it was wrong because people weren’t directly asking the question that they should have been asking. Bill James was amazing, because when he wanted to ask a specific question he focused right in on that, which is the opposite of how people used to do baseball statistics.

He also studied – and baseball fans will care about this – the Chicago Cubs, who traditionally perform very badly, and whether that was because they are the only team that still plays a lot of day games. Was that hurting them? Most teams now play at night. Were the Cubs tired because they were playing a lot of day games?

Were they?

I think so. If it were true that the Cubs were getting exhausted from playing day after day in the hot sun, you’d expect them to perform worse nearer the end of the season. I think he did find that was the case.

And he was able to do all this with no formal training?

Yes. He just put a lot of effort into it and worked hard. I think it also helps, when you start to publish, that you become part of a community. In the early books he talks a lot about just getting the data. He created an organisation called Project Scoresheet, where all these people would gather the data on all the baseball games and send it to each other. It was only possible because he really cared. I’m a baseball fan but I don’t care like that. But it’s still worth reading: like any good writer, if they’re obsessed with something it’s fun to read and share their obsession. After a while his abstracts started to go downhill: I felt he started falling in love with his own voice. He started getting more about opinion and less about fact. Then he stopped with the annual Abstracts and started doing other things, which I think was a good decision.

Let’s go on to Judgment Under Uncertainty.

Daniel Kahneman, Paul Slovic, and Amos Tversky were three psychologists who studied judgement and decision-making – how we assess uncertainty in our lives and how we make decisions based on that. That’s a topic that has been studied by economists and psychologists for a long time – but for the longest time people would study it with normative models. They would have a model for how people should behave and they would see if people followed the model.

Kahneman, Slovic and Tversky did a series of experiments which started with studies that were pretty complicated. They asked professional statistical researchers and research psychologists who were doing real data analysis the kind of questions that might be conceptual questions on a statistics final exam. The kind of questions that might be hard if you don’t know statistics, but shouldn’t be hard if you’re a pro. What they found is that the pros were getting it wrong. This is always interesting to me. When someone who doesn’t know anything makes a mistake, it’s sort of boring. But when someone whose job it is to get things right, gets it wrong, that’s interesting. When someone who has every incentive to get things right gets it wrong, it makes you think there is something going on cognitively, that there’s a cognitive bias.

They did a series of studies that started with fairly complicated questions about statistical significance that people were getting wrong and they boiled them down, over the years, to simpler and simpler questions that people couldn’t get, sometimes called cognitive illusions. This book was the first place that a lot of these things were published. It came out in the early 1980s, and it’s a collection of articles. It has about 25 different chapters by different people, the top people in the field describing all sorts of experiments. I like to say that this is the best-edited book that I’ve ever seen, at least since the New Testament. It has become gradually more popular over the last few decades; now it’s sometimes called behavioural economics, but it’s basically psychology. I think it’s just incredible – studies of overconfidence, of how people estimate uncertain quantities, the importance of the framing – I could give you a million examples of where if you describe a decision option in a different way, people make a different choice.

Yes, give me some examples. What were the pros getting wrong?

The kind of questions they were getting wrong had to do with uncertainty. One question was – you have a large hospital, every month they have a number of boys and girls that are born, and there’s some variation in the percentage of boys that are born in each month. The basic statistical idea is that the larger the hospital, the less variation in the percentage of boys or girls that are born every month. A lot of people know that if you have a large sample, your standard deviation is smaller and more stable. But somehow they asked it in a very natural seeming way so that everybody would get it wrong. People were expecting a level of stability that wasn’t occurring.

The naive thing is that people believe in the law of averages, so they think that if the roulette wheel goes black three straight times it’s likely to go red. We all know that’s not going to happen, unless it’s a rigged roulette wheel (and roulette wheels generally aren’t rigged because people don’t need to rig them to make money, so that’s not usually an issue). This is a more sophisticated version of this, in an experimental context. It turned out that psychologists were expecting that their experimental results would automatically balance out; in a way that someone with statistical training should know will not really happen.

This is a different example from the babies – the mathematics is similar but it’s a different framing – and it had to do with, if you’re doing a research study and you’re expecting a certain result, how likely is it that you get something similar to what you expect? And people overestimated how similar it would be to their expectations. The researchers knew about the idea of uncertainty and statistical significance, but they tended to think of it more as an obstacle to be overcome rather than a true bit of uncertainty that they had to address in real life.

What about simpler questions in the more recent studies?

Some of the more recent studies involved what they call almanac questions – for example, they’d ask people the date of an uncertain historic event, or the population of Saudi Arabia. They would ask, ‘In what year did the State of Tennessee join the United States?’ Well, you know it’s some time later than 1776, but you have to guess. Before giving people the question they prompt them with a number – but the number will be unrelated. They’ll mention 1822 in passing, but say either explicitly that it’s a random number or they just slip it in in a different way. Then it turns out people use that number. Not that they say Tennessee joined the Union in exactly 1822, but their answer will be closer to 1822 than if you give them a different prompt, say 1799. Our brains are just machines, so it makes sense that we just use whatever information is there. But it’s not really appropriate decision-making.

This work started out as a bit of a curiosity in the field of psychology, but we get a lot of insight from it, it is absolutely essential to understanding how humans think. Just as visual illusions give you insight into how the brain sees things, cognitive illusions show us the shortcuts that our brain uses to make decisions.

The book sounds great. I think people generally find it pretty fascinating to see how the brain gets things wrong.

It is. It’s fun to read. I get a little upset that a lot of this has gone into, and people talk about, behavioural economics and nudging people. That stuff is fine too, but it’s really much more broad than that. Economics is on everyone’s mind right now, but it’s not just about economics.

It affects everything. We’re increasingly put in positions where we have to make decisions based on statistics – even when we visit the doctor. ‘If you take this drug it’ll halve your chance of X, but it could also double your chance of Y.’ When you ask for guidance, all they can tell you is that it’s a personal decision.

The phrase that Bill James has is that the alternative to doing good statistics is not no statistics, it’s bad statistics. Bill James had an on-going feud with various baseball writers who put down statistics. He would write about these people who would say, ‘Statistics are fine, but what you really need to do is see people play. Baseball is about athleticism and heart, and it’s not about numbers.’ What Bill James pointed out is that the people who said this, when they talked about their favourite players they would talk about their statistics. So and so batted .300. So they were relying on statistics, but just in an unsophisticated way. They’re still using the written record. To say that you don’t want to use statistics – that’s just not an option.

People are very bad about dealing with statistics though. Look at the number of people who are scared of flying (and I’m not necessarily excluding myself here) when the chance of crashing is one in 11 million.

I was at a panel for the National Institutes of Health evaluating grants. One of the proposals had to do with the study of the effect of water-pipe smoking, the hookah. There was a discussion around the table. The NIH is a United States government organisation; not many people in the US really smoke hookahs; so should we fund it? Someone said, ‘Well actually it’s becoming more popular among the young.’ And if younger people smoke it, they have a longer lifetime exposure, and apparently there is some evidence that the dose you get of carcinogens from hookah smoking might be 20 times the dose of smoking a cigarette. I don’t know the details of the math, but it was a lot. So even if not many people do it, if you multiply the risk, you get a lot of lung cancer.

Then someone at the table – and I couldn’t believe this – said, ‘My uncle smoked a hookah pipe all his life, and he lived until he was 90 years old.’ And I had a sudden flash of insight, which was this. Suppose you have something that actually kills half the people. Even if you’re a heavy smoker, your chance of dying of lung cancer is not 50%, so therefore, even with something as extreme as smoking and lung cancer, you still have lots of cases where people don’t die of the disease. The evidence is certainly all around you pointing in the wrong direction – if you’re willing to accept anecdotal evidence – there’s always going to be an unlimited amount of evidence which won’t tell you anything. That’s why the psychology is so fascinating, because even well-trained people make mistakes. It makes you realise that we need institutions that protect us from ourselves…

What kind of institutions?

If you’re a research psychologist, you need the institution of formal statistics to protect you from your false intuitions, which if you’re not protected from, will lead you to make all sorts of mistaken claims. Similarly for medical research, it’s very easy to fool oneself – even if you’re well trained. This man didn’t realise that even if hookah smoking doesn’t kill every single person, it can, potentially, still be a problem.

Your next choice is How Animals Work by Knut Schmidt-Nielsen. One of the Amazon reviews made me laugh: ‘There are many people in the world who don’t realise that animal physiology is the most intensely interesting thing to study. This book will make sure you are not one of those people.’

That’s about right. It’s statistical because it’s full of graphs. Schmidt-Nielsen’s books have graphs of metabolic rates versus running speed and flying speed for different animals, exhaled air temperature of lizards, all sorts of things.

A graph is always a statistic?

Yes, a graph of data is always a statistic. Most of this is not statistics, though, it’s really physics. How can birds fly and lift themselves up in the air? How do dogs cool themselves by panting? It sounds sort of obvious: They’re dripping water and as water evaporates it cools the tongue. But, as he points out, if you’re a dog and you’re panting to cool your tongue, you have to get the cool blood that’s in your tongue circulating to the rest of your body, you actually have to circulate all your blood through your tongue to cool it off. So how do you get it there? You have to move the blood fast.

Support Five Books

Five Books interviews are expensive to produce. If you're enjoying this interview, please support us by donating a small amount.

The book has a lot of things like that. He’s looking at things that people take for granted, and saying, you can’t just take these things for granted, these are amazing feats of engineering. Another reason I connect it with statistics – it’s not only the graphs – is the fact there’s an interplay between physics modelling, real substantive modelling using the laws of physics, and data collection and statistical analysis. It goes back and forth. People gathered data that inspired him to come up with a physical description, and then he gathered more data – or other people gathered data – and sometimes it turns out the description works and sometimes it turns out it doesn’t. And that’s what statistics is all about – it’s about building real models, using real information.

Next we have The Honest Rainmaker by AJ Liebling.

AJ Liebling was an old-time magazine journalist. This particular book was based on a bunch of articles about an old guy he knew, who went by the name of Colonel John R Stingo. It wasn’t his real name, but it was the name he liked to be called by. He had gone through life doing a number of things, including newspaper writing, and one of the things he had done was rainmaking. They would go to farm areas of the United States and make contracts with farmers, about how they would make it rain. Their entire rainmaking was based on actuarial principles. They worked out the frequency of precipitation and they would write very clever contracts – of the heads-I-win, tails-you-lose variety. So if it rained they would get all sorts of payments and if it didn’t rain they would have to pay. But they would somehow always set up the contract so they wouldn’t have to pay… Liebling is one of the great writers of all time and I felt that this particular book had strong statistical content.

Which is?

The statistical content is that you have to have a sense of the probability of rain for it to work out.

And people didn’t get on to him?

They probably did, on occasion. That’s kind of the point – it’s this lovable rogue, this guy who was never as successful as all that, but he would somehow just manage to stay afloat. To me it felt very statistical, the whole book. There was something about it…but it’s basically story telling. Most people wouldn’t consider it a statistics book. Another book along the same lines that I’d recommend is Jimmy the Greek’s autobiography. Jimmy the Greek was a Vegas oddsmaker, he got famous in the 1970s when he was a TV football commentator, eventually got thrown off the air after making some vaguely racist comments. Anyway, his autobiography is great, it’s full of statistical stories, starting with how he made his first fortune taking bets on the 1948 election. But that’s another story…

Your last book is the least statistical of the five. I looked up some of the reviews and people really do rave about it though – with comments like ‘this book changed my life’ etc.

Yes, perhaps the best book every written: How to Talk so Kids will Listen and Listen so Kids will Talk, by Adele Faber and Elaine Mazlish. This was a book that I read years ago because it happened to be on my sister’s bookshelf, and it really did change my life. It has all sorts of practical advice. There are simple things like if a kid has to do something you offer them choices: Instead of saying, ‘We have to go now’, you say, ‘Do you want to go now or in five minutes?’ You don’t offer them a choice they can’t take. These tricks don’t work forever, of course. We started doing this with the kids and then of course they started saying, ‘Don’t give me choices.’

Children are smart that way.

But I found it was useful in my life before I ever had kids. There are a lot of very clever ideas in there about the ways you can role-play and practice things. And it’s great to have tools like this when you teach. Every year there’s a new crop of students and the tricks will work like new. Like most skills, it seems very doable when you read the book. When you actually try to do it in real life it’s a lot more difficult, unless you’re naturally cut out to do it…which I’m probably not. The reason I think it’s implicitly statistical is because it’s really about what works and what doesn’t work. There probably is research on it – but if there isn’t, there could be research on it.

It’s also sort of fascinating at a deeper level, in the sense of, if these ideas are so powerful, which I think they are, why are they so hard to do? Why is it so hard to do the right thing? It’s so true in many aspects of life. It’s easier to tick somebody off than to say something nice. Why is that? It doesn’t make sense, that it would be so difficult to do the constructive thing. Often it takes a lot of work. It’s just a wonderful book. As this is one of those rare occasions when I’m allowed to recommend books to people, I thought I’d better put it on the list.

One of the reviewers said it didn’t just help with their children, but in all their relationships with other people.

It definitely works with my students. It made a big difference to me. My wife is just good at this stuff naturally, she’s a social worker and she read the book and said, ‘Yes, this is very reasonable.’ She didn’t think it was so special, because she already knew it. But for me, it was very special.

So what happens with your students? Do you give them choices? ‘Do you want to take your exam today or tomorrow?’

It’s hard for me to remember because with students I’ve developed my own thing. I wrote a book called Teaching Statistics: A Bag of Tricks, which is all about how to involve students. But sometimes things come up, difficult situations. Also, when they’re in my office, I have to physically use my left arm to hold back my right hand, so I do not pick up a pen or chalk. I tell the students, ‘You hold the chalk, you go to the board.’ A lot of professors know that – but it’s just so tempting to start writing oneself. You have to really think from the other person’s perspective, and get them involved with solving their own problems. There are things in there like, when someone is supposed to do something and they don’t do it, you just have to tell them that it’s important to you – that kind of thing.

When people say that there are ‘lies, damned lies and statistics’, what does that really mean?

Bill James once said that you can lie in statistics just like you can lie in English or French or any other language. Sure, the more powerful a language is the more ways you can lie using it. There are a bunch of great quotes about statistics. There’s another one, sometimes attributed to Mark Twain: ‘It ain’t what you don’t know that hurts you, it’s what you don’t know you don’t know.’ And there’s Earl Weaver: ‘It’s what you learn after you know it all that counts.’ There are a lot of sayings that emphasise that not only is uncertainty an important part of life, but that recognition of that uncertainly is itself an important step.

January 3, 2011

Five Books aims to keep its book recommendations and interviews up to date. If you are the interviewee and would like to update your choice of books (or even just what you say about them) please email us at [email protected]

Math & Statistics Science

Andrew Gelman

Andrew Gelman is a professor of statistics and political science and director of the Applied Statistics Center at Columbia University. He has received the Outstanding Statistical Application award from the American Statistical Association, the award for best article published in the American Political Science Review, and the Council of Presidents of Statistical Societies award for outstanding contributions by a person under the age of 40.

Math & Statistics Science

Andrew Gelman

Science » Math & Statistics

The best books on Statistics

recommended by Andrew Gelman

The Bill James Historical Baseball Abstract
by Bill James

Judgment under Uncertainty: Heuristics and Biases
by Daniel Kahneman & Paul Slovic and Amos Tversky

How Animals Work
by Knut Schmidt-Nielsen

The Honest Rainmaker
by A J Liebling

How to Talk So Kids Will Listen and Listen So Kids Will Talk
by Adele Faber and Elaine Mazlish

1 The Bill James Historical Baseball Abstract by Bill James

2 Judgment under Uncertainty: Heuristics and Biases by Daniel Kahneman & Paul Slovic and Amos Tversky

3 How Animals Work by Knut Schmidt-Nielsen

4 The Honest Rainmaker by A J Liebling

5 How to Talk So Kids Will Listen and Listen So Kids Will Talk by Adele Faber and Elaine Mazlish

Bayesian Data Analysis, Second Edition
by Andrew Gelman & Andrew Gelman with John B Carlin, Hal S Stern, Donald B Rubin

Applied Bayesian Modeling and Causal Inference from Incomplete-Data Perspectives
by Andrew Gelman & Andrew Gelman (edited with Xiao-Li Meng)

Data Analysis Using Regression and Multilevel/Hierarchical Models
by Andrew Gelman & Andrew Gelman with Jennifer Hill

Teaching Statistics
by Andrew Gelman & Andrew Gelman with Deborah Nolan

A Quantitative Tour of the Social Sciences
by Andrew Gelman & Andrew Gelman (edited with Jeronimo Cortina)

About