Shalizi on the confounding of contagion and homophily in social network studies

Cosma Shalizi has a post up today discussing a new paper he wrote with Andrew C. Thomas arguing that it’s pretty much impossible to distinguish the effects of social contagion from homophily in observational studies.

That’s probably pretty cryptic without context, so here’s the background. A number of high-profile studies have been published in the past few years suggesting that everything from obesity to loneliness to pot smoking is socially contagious. The basic argument is that when you look at the diffusion of certain traits within social networks, you find that having friends who are obese is more likely to make you obese, having happy friends is more likely to make you happy, and so on. These effects (it’s been argued) persist even after you control for homophily–that is, the tendency of people to know and like other people who are similar to them–and can be indirect, so that you’re more likely to be obese even if your friends’ friends (who you may not even know know) are obese.

Needless to say, the work has been controversial. A few weeks ago, Dave Johns wrote an excellent pair of articles in Slate describing the original research, as well as the recent critical backlash (see also Andrew Gelman’s post here). Much of the criticism has focused on the question of whether it’s really possible to distinguish homophily from contagion using the kind of observational data and methods that contagion researchers have relied on. That is, if the probability that you’ll become obese (or lonely, or selfish, etc.) increases as a function of the number of obese people you know, is that because your acquaintance with obese people exerts a causal influence on your own body weight (e.g., by shaping your perception of body norms, eating habits, etc.), or is it simply that people with a disposition to become obese tend to seek out other people with the same disposition, and there’s no direct causal influence at all? It’s an important question, but one that’s difficult to answer conclusively.

In their new paper, Shalizi and Thomas use an elegant combination of logical argumentation, graphical causal models, and simulation to show that, in general, contagion effects are unidentifiable: you simply can’t tell whether like begets like because of a direct causal influence (“real” contagion), or because of homophily (birds of a feather flocking together). The only way out of the bind is to make unreasonably strong assumptions–e.g., that the covariates explicitly included in one’s model capture all of the influence of latent traits on observable behaviors. In his post Shalizi sums up the conclusions of the paper this way:

What the statistician or social scientist sees is that bridge-jumping is correlated across the social network. In this it resembles many, many, many behaviors and conditions, such as prescribing new antibiotics (one of the classic examples), adopting other new products, adopting political ideologies, attaching tags to pictures on flickr, attaching mis-spelled jokes to pictures of cats, smoking, drinking, using other drugs, suicide, literary tastes, coming down with infectious diseases, becoming obese, and having bad acne or being tall for your age. For almost all of these conditions or behaviors, our data is purely observational, meaning we cannot, for one reason or another, just push Joey off the bridge and see how Irene reacts. Can we nonetheless tell whether bridge-jumping spreads by (some form) of contagion, or rather is due to homophily, or, if it is both, say how much each mechanism contributes?

A lot of people have thought so, and have tried to come at it in the usual way, by doing regression. Most readers can probably guess what I think about that, so I will just say: don’t you wish. More sophisticated ideas, like propensity score matching, have also been tried, but people have pretty much assumed that it was possible to do this sort of decomposition. What Andrew and I showed is that in fact it isn’t, unless you are willing to make very strong, and generally untestable, assumptions.

It’s a very clear and compelling paper, and definitely worth reading if you have any interest at all in the question of whether and when it’s okay to apply causal modeling techniques to observational data. The answer Shalizi’s argued for on many occasions–and an unfortunate one from many scientists’ perspective–seems to be: very rarely if ever.