Nonfiction Books » Health & Lifestyle

The best books on Health and the Internet

recommended by Elad Yom-Tov

Crowdsourced Health: How What You Do on the Internet Will Improve Medicine by Elad Yom-Tov

Crowdsourced Health: How What You Do on the Internet Will Improve Medicine
by Elad Yom-Tov


A quick search of your symptoms on the internet may lead to an acute case of 'cyberchondria.' But it may also provide data which will improve health and even save lives. Elad Yom-Tov, the author of Crowdsourced Health, recommends books showing how internet data and data science can provide exciting new ways of conducting health and medical research.

Interview by Helen Phillips

Crowdsourced Health: How What You Do on the Internet Will Improve Medicine by Elad Yom-Tov

Crowdsourced Health: How What You Do on the Internet Will Improve Medicine
by Elad Yom-Tov


How did a computer scientist, with expertise in machine learning and search engines, become interested in health and medical research?

It started with my PhD which I did on helping people who could not communicate otherwise to use their EEG traces—their brainwaves—to operate a computer. That was a pretty amazing application because you sit in front of a computer, you think about a letter you want to write and it magically appears there. My background was not in medicine, it was in electrical engineering and computer science, and for a few years I left that area of trying to help people and moved into other applications of machine learning.

But then about 6 years ago I changed jobs and started working for Yahoo Research. I started thinking about what does Yahoo—or internet companies in general—have that nobody else does. I thought that this wealth of data that people leave with internet companies – and this could be social media posts, or the questions you type on a search engine, photos you post on Flickr – this tells us a lot about human behaviour that we don’t have other ways of capturing, and I thought we really should be able to use this to learn about people’s health.

Your book choices cover different elements that you’ve brought together in your research on internet data and health. You first choice is Better, by Atul Gawande, which is specifically about achieving success or improving performance in surgery. What did you take from this book?

Gawande is a wonderful writer. He has a way of explaining the medical world in a manner that is both compassionate but also very much data driven. This is something that’s close to my heart because that’s we are trying to do, we are trying to use data in order to learn about people. In this book he has quite a few examples of perhaps counterfactual things; things that you wouldn’t have thought would be the way they are, and yet the data shows what’s actually happening.

“When you’re anonymous on a search engine in your own room, there is no reason for you to lie”

One of the examples I like most is about cystic fibrosis, which is a condition where there is one measure of success – how long people survive. There are several tens of centres in the US which treat cystic fibrosis and for many years their average survival numbers were not released to the public. Then they started releasing these numbers and when they did, people thought this would not be a good idea. They thought what would happen is that people would leave their centres and move to the best ones and the service overall would suffer.

Yet what did happen was that all the centres improved, or on average they improved, because they had a way of measuring themselves against others and they had a way of knowing who they could turn to for help in order to improve themselves. To me, this a wonderful example of why you should have these data out there. Perhaps you should do it carefully but you should be able to have these data that help the system improve.

So data can be a very powerful thing. The second book on your list, The Patient Will See You Now, covers more the type of data you are interested in – internet data. Is the internet changing medicine?

The title is, of course, a pun on the doctor will see you now. The thesis is that since you can now have all of your data online, it empowers the patient to be a bigger player in medicine, to take more control of one’s own health. The book talks about your traditional kind of medical records. Traditional meaning the things that are currently used by medical research and by medical doctors. It asks what will happen when everybody’s data is available in a digital form and how this would affect research.

So you could contribute your data to a bigger cause, to a bigger data set, to enable us to do far better research. The other part of the thesis is the more personal part of allowing applications or systems that would give very personal advice or treatment, which is something I also discuss in my book.

The idea that my smartphone might offer me health advice or a diagnosis is slightly terrifying. After all, the internet is a notoriously bad place to search for medical information. It’s very easy to convince yourself you’re horribly ill.

Yes. My colleagues, Ryen White and Eric Horvitz, coined the term ‘cyberchondria’ for this. You start with a very benign symptom and within two or three searches you’re sure you are going to die. This is not a very new phenomenon, it was out there even when there were only public libraries.

Three Men in a Boat?

I love that book, yes! Information seeking is just one place that we can provide benefit. By looking at how people find information we can make that better. We can inform them of what is, perhaps, the more reliable information. That is just one area where I think data can contribute.

“You start with a very benign symptom and within two or three searches you’re sure you are going to die”

The other contribution is when people have a very difficult time in reporting what we need in order to do our research, for example, the adverse reactions of drugs and when we want a more sensitive indicator than what we currently have with medical research.

Tell me more about the kind of data you are working with.

The biggest volume of data that we have is search queries. There are several hundreds of millions of queries per day in the US. Then Facebook posts and Twitter messages, again we see between several hundreds of millions and several billions of examples per day. So if you just think about comparing that volume of data to what traditional sources of medical records contain, you see that there are orders of magnitude more internet data. Of course not everything in internet data is connected to health. It’s much noisier than traditional sources of data, but that’s just a challenge that we need to solve. It’s not a barrier to using these data.

And how does this data become Crowdsourced Health?

Let’s begin with the word crowdsourced. That’s a term that was coined a few years ago to capture the notion of people contributing information or contributing work towards a larger goal. Here we are using crowdsourced in a way that is a bit different than what it was originally coined for because the contribution here is not, strictly speaking, voluntary. People are not saying ‘here, use my Twitter feed or my Facebook posts to learn about medicine.’ But we are able to do that, and we’re using the data from many people who have independently provided these data to learn about medicine and health.

One example is some work we recently did on evaluating the effectiveness of child vaccinations in the UK. In 2013, Public Health England decided to vaccinate children in a number of cities in order to see if vaccinations against the flu for children would reduce the overall number of flu cases. Everybody who has a child knows that children are very good at transmitting the flu. So they vaccinated in a number of cities and they wanted to compare the number of doctor visits and hospitalisations for influenza in those cities compared to the rest of England. But the flu season wasn’t that serious and so they didn’t have enough people who were hospitalised or saw a doctor in those cities compared to the rest of the country to draw a conclusion.

Very few people who have the flu actually see a medical practitioner. Most of them will stay at home for a couple of days, drink tea, and that will be it. But they also do something else, which is they go online and they ask what to do about the flu, what are the symptoms of the flu, things like that, or they might go on Twitter and write about their awful flu, and so internet data provides a much more sensitive way of measuring how many people have flu in the population. Even if you only stay at home and never see a doctor, you are more likely to talk about it.

“Internet data provides a much more sensitive way of measuring how many people have flu in the population”

So we used search data and we used Twitter data and we showed that actually we get about a 25 to 30 per cent reduction in the number of cases in those cities with vaccination compared to those where children were not vaccinated. We had a much larger number of cases than Public Health England could trace.

So your next two book selections are about research that has been done with this kind of internet data. Tell us about the first, A Billion Wicked Thoughts.

A Billion Wicked Thoughts covers a topic everybody likes to talk about, sexual behaviour, and yet very few people want to talk to researchers about. Think about it, would you talk to a scientist who came to you and asked about your sexual preferences or the websites that you browse for these purposes on the internet? We know that people have a difficult time giving these kinds of details.

So what these researchers did was they looked at data that was collected — internet data. Instead of asking people to provide it explicitly, you could take the data from a very large population of people and analyse it to get at their preferences and their behaviours. So it’s a similar type of research not applied to medicine, but to human behaviour.

One review said Alfred Kinsey only scratched the surface. Can internet data reveal something we couldn’t otherwise discover?

Yes, well, if you think about it, what are you likely to get at if you’re not very careful about the kinds of surveys you do? You’re likely to get to people who are happy to contribute their data and they may not necessarily be representative of the population. Or you are likely get people who will not tell you the truth or maybe not the entire truth. Whereas if you passively look at what people are doing, you are likely to get a more accurate picture. The next book is perhaps an even better example of that.

Tell me about your 4th book, Dataclysm by Christian Rudder. Is this the future of social sciences research?

I don’t know if it’s the future of social sciences. I don’t think the future of medical research is crowdsourced health but I think it’s one more important tool in the toolbox. So Christian Rudder was the chief scientist of a dating website, OK Cupid, and in the book he describes both what they learned about people’s preferences and also some of the interventions, some of the changes they made to the website, to see what effect they had on people’s dating behaviour.

Actually, one of his examples is a histogram

the distribution of the heights of males on OK Cupid, and he shows that it doesn’t fit the height distribution that the Centers for Disease Control collect. It’s shifted to the right, people report a height that’s about 2 inches or 5 centimetres greater than what you would expect in the general population. Many more people are 6 feet than what you would expect.

So that raises an important potential problem with research using internet data – how reliable is it? We all know that our Facebook lives are quite different from our reality.

We know that when people have an incentive not to tell the truth, they are likely not to tell the truth. We have opposite examples, as well, from France in 1844, where people could dodge the draft by being too short. It was estimated that about 2 per cent of people lied about their height to dodge the draft. So when the people have an incentive to lie they will lie.

But, on average, what people report on search queries and on questions they ask on sites like Yahoo Answers, is remarkably accurate. If we look at the data where people report their heights, it matches beautifully to the known distribution of heights in the US. Of heights, of weights…

“We can use a tool that’s usually used to sell you something to help you improve your health”

Of course we cannot say about every individual, whether they are reflecting the truth. Some people don’t want to tell the truth, others don’t know the truth, you may not know your exact weight right now. But this kind of internet data correlates remarkably well with what we know from the physical measurements that have been taken of this population.

When you’re anonymous on a search engine in your own room there is no reason for you to lie. It also leads to a burden on us: we don’t want to break this anonymity, we don’t want to identify our users when they had not intended for that to happen.

Yes, that’s an important issue. This is extremely personal information in the hands of internet companies.

Maintaining user privacy is an important aspect. We don’t want to break the anonymity of our users. But it raises a problem too. How do you actually find a group of anonymous people who share some condition that you’re interested in?

We had to work very hard in order to be able to do these things together, on the one hand not identify the people who post their messages on Twitter or ask questions on Bing, on the other hand, finding a sequence of queries or single queries that would be typical of a person who has a mood disorder or a person who has anorexia.

Can you give us an example of how internet data research can be beneficial?

Mood disorders are interesting, partly because of the way that treatment works. If somebody has a mood disorder, they may get prescribed drugs. The drugs frequently have side effects. What sometimes happens is a patient will take the drugs, and if they work, the patient stops experiencing mood shifts. They think they are cured so they shouldn’t take the drugs because they are only giving side effects, and so they stop taking the drugs. Then they might experience a manic or a depressive episode.

So we were interested in whether we could actually use internet data, whether we could look at the queries that people make on a search engine in order to identify when they were likely going to have a manic episode. And it turns out that we can probably do this. So imagine an application on your smartphone or on a computer which you download, you install, and it monitors your queries for you.

When it identifies, in people who are at risk, that they might be going to have an event, it will tell them they might want to take some mitigating steps. So this is taking data collected from a large group of people and turning them into something very personal that a person can decide to use or not.

Your final book selection, Nudge, is about how we make choices. Perhaps you can explain how this is relevant to your topic?

I chose Nudge because I think it’s the next step. Nudge talks about the way that you present people with choices or the way that you allow people to make choices, very subtle interventions, which really have an effect on the choices people will make. So the way that you organise food on a tray or on display in a store will change the kinds of food that people select to eat. In our case, I know this is one area that we’re moving into – can we help people make better health choices?

We talked about the searches people make that convince them they have some terrible disease. That’s one area we could improve. We could present better results to tell people, ‘Actually, you’re not going to die, you’re fine.’

Get the weekly Five Books newsletter

But, also, if we know that people are searching for things that are likely to make them unwell, we could design interventions that would get people to make better choices. In the book, I talk about anorexia which is a huge problem. There’s a lot of content online which helps people to maintain their condition or to make it worse. We are not telling them to make certain choices and we’re not forcing behavioural change on people, but we could give these very small nudges, these small suggestions, perhaps using online advertising or in other ways that are more explicit, to help them make better choices.

One of the examples I liked in the book was about a road on Lake Michigan which has a curve, and many times people miss that curve and fall into the lake. By cleverly drawing lines across the road at shorter and shorter distances, it gives people the impression they are driving faster and faster and so people instinctively slow down just before the curve and there are fewer accidents. So you are not telling people to slow down or putting a police car there to make sure they slow down. Instead, you make these very minute changes to the environment in a way that suggests to people that they should actually slow down.

Can you apply that thinking to health choices?

A lot of people want to, say, quit smoking. They might search for ways to stop smoking or they might search for where to buy cigarettes or something like that. If you search for how to quit smoking, can I put an ad on the side that will actually make you more likely to quit? We have done some research to show how to identify a good ad to place for specific ages and specific demographics, genders, incomes, and so forth. The way we measure this is to see what subsequent searches people make. Over the next few weeks, do you actively go and find a way to quit smoking? Do you stop looking for where can you buy cigarettes? Things like that.

So, on the one hand, we can place these kinds of interventions. On the other, we can measure whether they were successful and tune our model so you don’t have to run a whole ad campaign — and then wait for several months to see who stops smoking and survey the population. Within a few days, we can know what’s effective and what’s not and make our ads better. We can use a tool that’s usually used to sell you something to help you improve your health.

Interview by Helen Phillips

May 11, 2016

Five Books aims to keep its book recommendations and interviews up to date. If you are the interviewee and would like to update your choice of books (or even just what you say about them) please email us at [email protected]

Support Five Books

Five Books interviews are expensive to produce. If you've enjoyed this interview, please support us by .

Elad Yom-Tov

Elad Yom-Tov

Elad Yom-Tov is a principal researcher at Microsoft Research and a visiting scientist at the Technion, Israel, who has previously worked at Yahoo Research and IBM. He specialises in using internet data to improve health and medicine, by applying tools from machine learning and information retrieval. His new book Crowdsourced Health: How What You Do on the Internet Will Improve Medicine (2016) is published by The MIT Press.

Elad Yom-Tov

Elad Yom-Tov

Elad Yom-Tov is a principal researcher at Microsoft Research and a visiting scientist at the Technion, Israel, who has previously worked at Yahoo Research and IBM. He specialises in using internet data to improve health and medicine, by applying tools from machine learning and information retrieval. His new book Crowdsourced Health: How What You Do on the Internet Will Improve Medicine (2016) is published by The MIT Press.