Seth Stephens-Davidowitz has published a new book to widespread acclaim that makes explosive claims about politics, race and sexuality on the basis of search data from Google and Pornhub. Among other choice nuggets he finds that more gay men than commonly thought are in the closet, that the American public’s sexual fantasies are a lot less conventional than is commonly imagined, and that Barack Obama’s speech after San Bernardino stoked Islamophobia rather than calming the public. In this telling, big data is a “digital truth serum” that can uncover the hidden beliefs of populations around the world in a way that traditional sources could never hope to.
He’s not the only one either. Hundreds of articles have been published in recent years, using Google Trends data to make inferences about many different areas of life including voting, racism, economic forecasting, influenza outbreaks, trade agreements and many more besides.
Google Trends and other similar sources seem to finally offer researchers a free lunch: free easy access to the most truthful thoughts of the world from the comfort of your internet browser. Unfortunately, the reality of using big data is a lot messier and less reliable than the book and surrounding studies suggest.
In two papers, I tested the validity of 71 promising looking search terms against survey data tracking what respondents felt was the most important problem or issue facing the country for Spain, the United Kingdom, and the United States. Of these 71 search terms, just 24 actually passed the various validity tests. Of those that did pass, the relationship is really quite striking. The plot below is taken from the first paper and shows just how close a match Google searches and specific issues are in the United States. So when Google Trends does work, the results can be very impressive, but in the tests I have conducted, this is only the case about a third of the time. The other two-thirds of the time any conclusions you draw are just storytelling based on noise. If you don’t want to rely on just my numbers, there is also another study by Christopher Whyte that finds 6 out of 9 tested terms validating. So depending on the study, you’re looking at somewhere between a third and two-thirds of search terms validating against representative data.
This is not such a problem when there is a good source of data to validate against. You can simply use the search terms that actually have validity and discard the remaining ones. But the most tempting uses of Google Trends are precisely ones where we suspect that the traditional data sources are not very accurate either. In these cases, the temptation is to say that Google Trends are better than nothing and point to the limitations of the existing sources. But the poor quality of survey data does not make Google Trends data any better. As the famous John Tukey quote goes:
“The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data.”
So why do Google Trends often fail to track what we’re interested in? The answers fall into two categories: the people doing the searching are not representative and the searches don’t mean what we think they do.
In rich countries, it is fair to say that almost everyone has internet access at this point. We’re not so worried about Google users being unrepresentative in general but that the people who use a particular search term may not be representative. If only a small minority of people react to having an opinion or attitude by searching, changes in the searches may not reflect changes in the proportion of people holding a view. Stephens-Davidowitz’s analysis of porn trends between men and women falls into exactly this problem. He argues that women are more likely to search for coercive sex and rape fantasies than men. However, the proportion of women who use pornography is certainly much lower than men (estimates vary, but female use is clearly a lot lower), so the male searches are likely to represent a larger portion of the male population than the female searches represent of the female population. This matters, because it is perfectly plausible that similar factors might predict both female use of pornography and more extreme sexual fantasies among women. In other words, if we are comparing the 40% most extreme fantasies of women to the 80% most extreme fantasies of men, the PornHub results may not tell us anything about the difference in extremity of fantasies between men and women in general.
The second problem of searches not meaning what we think they mean also plagues any interpretation of big data. Because we are interpreting data that was collected incidentally, we have to read meaning into it rather than capturing it directly and because the data reaches us largely absence of context it is very easy to read the wrong things into it. For instance, Stephens-Davidowitz argues that searches for gay porn vary less across states than would be expected from survey data and therefore there are more gay men in conservative states than is commonly thought. This is totally possible, but an alternative explanation would be that pornography and sex are substitute goods and gay men in areas with many other gay men have more sex and therefore less need for pornography. Or perhaps gay men in some areas just type porn site addresses into their browser directly. I’m not making a claim that this is the case, simply that there are a nearly infinite number of plausible stories we can tell about any set of Google Trends data in the absence of validation.
So where does that leave us? The best evidence we have suggests that Internet search data can track important trends in the population, but that a large percentage of the time it does not. There is no reason to think that, in situations where there is no data to validate against, it performs any better and many reasons to think that it could be misleading. None of this is to say that any particular conclusion in the Stephens-Davidowitz’s book is wrong, simply that the search data could very easily be misleading us. These critiques do not apply to every analysis in the book either. Stephens-Davidowitz’s work on racial sentiment validates some of his measures against measures of support for racial policies on the General Social Survey. We therefore can probably place higher weight (although see here) on these analyses as a result (although we could arguably have just used the General Social Survey to get the same answer).
We might want to fall back and say that at least Internet searches prove that some people are searching for things that conflict with their public images and this is certainly true. If the claim is instead just an existence proof that people sometimes lie, then most people could achieve this simply through introspection. The great promise of big data is not just proving that people have secrets but being able to precisely quantify them. Ultimately though, we just don’t know whose secrets we’re measuring and that limits what we can actually learn about the world from big data.
This article draws on the chapter “Making Inferences About Elections and Public Opinion Using Incidentally Collected Data” in The Routledge Handbook of Elections, Voting Behavior and Public Opinion (2017). A pre-print of the chapter is available for free here.