[Update 12/10/2015: Lieberman & Eisenberger have now posted a lengthy response to this post here. I’ll post my own reply to their reply in the next few days.]
[Update 12/14/2015: I’ve posted an even lengthier reply to L&E’s reply here.]
[Update 12/16/2015: Alex Shackman has posted an interesting commentary of his own on the L&E paper. It focuses on anatomical concerns unrelated to the issues I raise here and in my last post.]
The anterior cingulate cortex (ACC)—located immediately above the corpus callosum on the medial surface of the brain’s frontal cortex—is an intriguing brain region. Despite decades of extensive investigation in thousands of animal and human studies, understanding the function(s) of this region has proven challenging. Neuroscientists have proposed a seemingly never-ending string of hypotheses about what role it might play in in emotion and/or cognition. The field of human neuroimaging has taken a particular shine to the ACC in the past two decades; if you’ve ever heard overheard some nerdy-looking people talking about “conflict monitoring”, “error detection”, or “reinforcement learning” in the human brain, there’s a reasonable chance they were talking at least partly about the role of the ACC.
In a new PNAS paper, Matt Lieberman and Naomi Eisenberger wade into the debate with what is quite possibly the strongest claim yet about ACC function, arguing (and this is a verbatim quote from the paper’s title) that “the dorsal anterior cingulate cortex is selective for pain”. That conclusion rests almost entirely on inspection of meta-analytic results produced by Neurosynth, an automated framework for large-scale synthesis of results from thousands of published fMRI studies. And while I’ll be the first to admit that I know very little about the anterior cingulate cortex, I am probably the world’s foremost expert on Neurosynth*—because I created it. I also have an obvious interest in making sure that Neurosynth is used with appropriate care and caution. In what follows, I provide my HIBAR reactions to the Lieberman & Eisenberger (2015) manuscript, focusing largely on whether L&E’s bold conclusion is supported by the Neurosynth findings they review (spoiler alert: no).
Before going any further, I should clarify my role in the paper, since I’m credited in the Acknowledgments section for “providing Neurosynth assistance”. My contribution consisted entirely of sending the first author (per an email request) an aggregate list of study counts for different terms on the Neurosynth website. I didn’t ask what it was for, he didn’t say what it was for, and I had nothing to do with any other aspect of the paper—nor did PNAS ask me to review it. None of this is at all problematic, from my perspective. My policy has always been that people can do whatever they want with any of the Neurosynth data, code, or results, without having to ask me or anyone else for permission. I do encourage people to ask questions or solicit feedback (we have a mailing list), but in this case the authors didn’t contact me before this paper was published (other than to request data). So being acknowledged by name shouldn’t be taken as an endorsement of any of the results.
With that out of the way, we can move onto the paper. The basic argument L&E make is simple, and largely hangs on the following observation about Neurosynth data: when we look for activation in the dorsal ACC (dACC) in various “reverse inference” brain maps on Neurosynth, the dominant associate is the term “pain”. Other candidate functions people have considered in relation to dACC—e.g., “working memory”, “salience”, and “conflict”—show (at least according to L&E) virtually no association with dACC. L&E take this as strong evidence against various models of dACC function that propose that the dACC plays a non-pain-related role in cognition—e.g., that it monitors for conflict between cognitive representations or detects salient events. They state, in no uncertain terms, that Neurosynth results “clearly indicated that the best psychological description of dACC function was related to pain processing ““ not executive, conflict, or salience processing”. This is a strong claim, and would represent a major advance in our understanding of dACC function if it were borne out. Unfortunately, it isn’t.
A crash course in reverse inference
To understand why, we need to understand the nature of the Neurosynth data L&E focus on. And to do that, we need to talk about something called reverse inference. L&E begin their paper by providing an excellent explanation of why the act of inferring mental states from patterns of brain activity (i.e., reverse inference—a term popularized in a seminal 2006 article by Russ Poldrack)—is a difficult business. Many experienced fMRI researchers might feel that the issue has already been beaten to death (see for instance this, this, this, or this). Those readers are invited to skip to the next section.
For everyone else, we can summarize the problem by observing that the probability of a particular pattern of brain activity conditional on a given mental state is not the same thing as the probability of a particular mental state conditional on a given pattern of observed brain activity (i.e., P(activity|mental state) != P(mental state|activity)). For example, if I know that doing a difficult working memory task produces activation in the dorsolateral prefrontal cortex (DLPFC) 80% of the time, I am not entitled to conclude that observing DLPFC activation in someone’s brain implies an 80% chance that that person is doing a working memory task.
To see why, imagine that a lot of other cognitive tasks—say, those that draw on recognition memory, emotion recognition, pain processing, etc.—also happen to produce DLPFC activation around 80% of the time. Then we would be justified in saying that all of these processes consistently produce DLPFC activity, but we would have no basis for saying that DLPFC activation is specific, or even preferential, for any one of these processes. To make the latter claim, we would need to directly estimate the probability of working memory being involved given the presence of DLPFC activation. But this is a difficult proposition, because most fMRI studies only compare a small number of experimental conditions (typically with low statistical power), and cannot really claim to demonstrate that a particular pattern of activity is specific to a given cognitive process.
Unfortunately, a huge proportion of fMRI studies continue to draw strong reverse inferences on the basis of little or no quantitative evidence. The practice is particularly common in Discussion sections, when authors often want to say something more than just “we found a bunch of differences as a result of this experimental manipulation”, and end up drawing inferences about what such-and-such activation implies about subjects’ mental states on the basis of a handful of studies that previously reported activation in the same region(s). Many of these attributions could well be correct, of course; but the point is that it’s exceedingly rare to see any quantitative evidence provided in support of claims that are often fundamental to the interpretation authors wish to draw.
Fortunately, this is where large-scale meta-analytic databases like Neurosynth can help—at least to some degree. Because Neurosynth contains results from over 11,000 fMRI studies drawn from virtually every domain of cognitive neuroscience, we can use it to produce quantitative whole-brain reverse inference maps (for more details, see Yarkoni et al. (2011)). In other words, we can estimate the relative specificity with which a particular pattern of brain activity implies that some cognitive process is in play—provided we’re willing to make some fairly strong assumptions (which we’ll return to below).
The dACC, lost and found
Armed with an understanding of the forward/reverse inference distinction, we can now turn to the focus of the L&E paper: a brain region known as the dorsal anterior cingulate cortex (dACC). The first thing L&E set out to do, quite reasonably, is identify the boundaries of the dACC, so that it’s clear what constitutes the target of analysis. To this end, they compare the anatomically-defined boundaries of dACC with the boundaries found in the Neurosynth forward inference map for “dACC”. Here’s what they show us:
The blue outline in panel A is the anatomical boundary of dACC; the colorful stuff in B is the Neurosynth map for ‘dACC’. (It’s worth noting in passing that the choice to rely on anatomy as the gold standard here is not completely uncontroversial; given the distributed nature of fMRI activation and the presence of considerable registration error in most studies, another reasonable approach would have been to use a probabilistic template). As you can see, the two don’t converge all that closely. Much of the Neurosynth map sits squarely inside preSMA territory rather than in dACC proper. As L&E report:
When “dACC“ is entered as a term into a Neurosynth forward inference analysis (Fig. 1B), there is substantial activity present in the anatomically defined dACC region; however, there is also substantial activity present in the SMA/preSMA region. Moreover, the location with the highest Z-score in this analysis is actually in SMA, not dACC. The same is true if the term “anterior cingulate“ is used (Fig. 1C).
L&E interpret this as a sign of confusion in the literature about the localization of dACC, and suggest that this observation might explain why people have misattributed certain functions to dACC:
These findings suggest that some of the disagreement over the function of the dACC may actually apply to the SMA/pre-SMA, rather than the dACC. In fact, a previous paper reporting that a reverse inference analysis for dACC was not selective for pain, emotion, or working memory (see figure 3 in ref. 13) seems to have used coordinates for the dACC that are in fact in the SMA/ pre-SMA (MNI coordinates 2, 8, 50), not in the dACC.
This is an interesting point, and clearly has a kernel of truth to it, inasmuch as some researchers undoubtedly confuse dACC with more dorsal regions. As L&E point out, I made this mistake myself in the original Neurosynth paper (that’s the ‘ref. 13’ in the above quote); specifically, here’s the figure where I clearly labeled dACC in the wrong place:
Mea culpa—I made a mistake, and I appreciate L&E pointing it out. I should have known better.
That said, L&E should also have known better, because they were among the first authors to ascribe a strong functional role to a region of dorsal ACC that wasn’t really dACC at all. I refer here to their influential 2003 Science paper on social exclusion, in which they reported that a region of dorsal ACC centered on (-6, 8, 45) was specifically associated with the feeling of social exclusion and concluded (based on the assumption that the same region was already known to be implicated in pain processing) that social pain shares core neural substrates with physical pain. Much of the ongoing debate over what the putative role of dACC is traces back directly to this paper. Yet it’s quite clear that the region identified in that paper was not the same as the one L&E now argue is the pain-specific dACC. At coordinates (-6, 8, 45), the top hits in Neurosynth are “SMA”, “motor”, and “supplementary motor”. If we scan down to the first cognitive terms, we find the terms “task”, “execution”, and “orthographic”. “Pain” is not significantly associated with activation at this location at all. So, to the extent that people have mislabeled this region in the past, L&E would appear to share much of the blame. Which is fine—we all make mistakes. But given the context, I think it would behoove L&E to clarify their own role in perpetuating this confusion.
That said, even if L&E are correct that a subset of researchers have sometimes confused dACC and pre-SMA, they’re clearly wrong to suggest that the cognitive neuroscience community as a whole is guilty of the same confusion. A perplexing aspect of their argument is that they base their claim of localization confusion entirely on inspection of the forward inference Neurosynth map for “dACC”—an odd decision, coming immediately after several paragraphs in which they lucidly explain why a forward inference analysis is exactly the wrong way to determine what brain regions are specifically associated with a particular term. If you want to use Neurosynth to find out where people think dACC is, you should use the reverse inference map, not the forward inference map. All the forward inference map tells you is where studies that use the term “dACC” tend to report activation most often. But as discussed above, and in the L&E paper, that estimate will be heavily biased by differences between regions in the base rate of activation.
Perhaps in tacit recognition of this potential criticism, L&E go on to suggest that the alleged “distortion” problem isn’t ubiquitous, and doesn’t happen in regions like the amygdala, hippocampus, or posterior cingulate:
We tested several other anatomical terms including “amygdala,“ “hippocampus,“ “posterior cingulate,“ “basal ganglia,“ “thalamus,“ “supplementary motor,“ and “pre sma.“ In each of these regions, the location with the highest Z-score was within the expected anatomical boundaries. Only within the dACC did we find this distortion. These results indicate that studies focused on the dACC are more likely to be reporting SMA/pre-SMA activations than dACC activations.
But this isn’t quite right. While it may be the case that dACC was the only brain region among the ones L&E examined that didn’t show this “distortion”, it’s certainly not the only brain region that shows this pattern. For example, the forward inference maps for “DMPFC” and “middle cingulate” (and probably others—I only spent a couple of minutes looking) show peak voxels in pre-SMA and the anterior insula, respectively, and not within the boundaries of the expected anatomical structures. If we take L&E’s “localization confusion” explanation seriously, we would be forced to conclude not only that cognitive neuroscientists generally don’t know where dACC is, but also that they don’t know DMPFC from pre-SMA or mid-cingulate from anterior insula. I don’t think this is a tenable suggestion.
For what it’s worth, Neurosynth clearly agrees with me: the “distortion” L&E point to completely vanishes as soon as one inspects the reverse inference map for “dacc” rather then forward inference map. Here’s what the two maps look like, side-by-side (incidentally, the code and data used to generate this plot and all the others in this post can be found here):
You can see that the extent of dACC in the bottom row (reverse inference) is squarely within the area that L&E take to be the correct extent of dACC (see their Figure 1). So, when we follow L&E’s recommendations, rather than their actual practice, there’s no evidence of any spatial confusion. Researchers (collectively, at least) do know where dACC is. It’s just that, as L&E themselves argue at length earlier in the paper, you would expect to find evidence of that knowledge in the reverse inference map, and not in the forward inference map.
The unobjectionable claim: dACC is associated with pain
Localization issues aside, L&E clearly do have a point when they note that there appears to be a relatively strong association between the posterior dACC and pain. Of course, it’s not a novel point. It couldn’t be, given that L&E’s 2003 claim that social pain and physical pain share common mechanisms was already predicated on the assumption that the dACC is selectively implicated in pain (even though, as I noted above, the putative social exclusion locus reported in that paper was actually centered in preSMA and not dACC). Moreover, the Neurosynth pain meta-analysis map that L&E used has been online for nearly 5 years now. Since the reverse inference map is loaded by default on Neurosynth, and the sagittal orthview is by default centered on x = 0, one of the first things anybody sees when they visit this page is the giant pain-related blob in the anterior cingulate cortex. When I give talks on Neurosynth, the preferential activation for pain in the posterior dACC is one of the most common examples I use to illustrate the importance of reverse inference.
But you don’t have to take my word for any of this, because my co-authors and I made this exact point in the 2011 paper introducing Neurosynth, where we observed that:
For pain, the regions of maximal pain-related activation in the insula and DACC shifted from anterior foci in the forward analysis to posterior ones in the reverse analysis. This is consistent with studies of nonhuman primates that have implicated the dorsal posterior insula as a primary integration center for nociceptive afferents and with studies of humans in which anterior aspects of the so-called “˜pain matrix’ responded nonselectively to multiple modalities.
Contrary to what L&E suggest, we did not claim in our paper that reverse inference analysis demonstrates that the dACC is not preferentially associated with any cognitive function; we made the considerably weaker point that accounting for differences in the base rate of activation changes the observed pattern of association for many terms. And we explicitly noted that there is preferential activation for pain in dACC and insula—much as L&E themselves do.
The objectionable claim: dACC is selective for pain
Of course, L&E go beyond the claims made in Yarkoni et al (2011)—and what the Neurosynth page for pain reveals—in that they claim not only that pain is preferentially associated with dACC, but that “the clearest account of dACC function is that it is selectively involved in pain-related processes.” The latter is a much stronger claim, and, if anything, is directly contradicted by the very same kind of evidence (i.e., Neurosynth maps) L&E claim to marshal in its support.
Perhaps the most obvious problem with the claim is that it’s largely based on comparison of pain with just three other groups of terms, reflecting executive function, cognitive conflict, and salience**. This is, on its face, puzzling evidence for the claim that the dACC is pain-selective. By analogy, it would be like giving people a multiple choice question asking whether their favorite color is green, fuchsia, orange, or yellow, and then proclaiming, once results were in, that the evidence suggests that green is the only color people like.
Given that Neurosynth contains more than 3,000 terms, it’s not clear why L&E only compared pain to 3 other candidates. After all, it’s entirely conceivable that dACC might be much more frequently activated by pain than by conflict or executive control, and still also be strongly associated with a large number of other functions. L&E’s only justification for this narrow focus, as far as I can tell, is that they’ve decided to only consider candidate functions that have been previously proposed in the literature:
We first examined forward inference maps for many of the psychological terms that have been associated with dACC activity. These terms were in the categories of pain (“pain“, “painful“, “noxious“), executive control (“executive“, “working memory“, “effort“, “cognitive control“, “cognitive“, “control“), conflict processing (“conflict“, “error“, “inhibition“, “stop signal“, “Stroop“, “motor“), and salience (“salience“, “detection“, “task relevant“, “auditory“, “tactile“, “visual“).
This seems like an odd decision considering that one can retrieve a rank-ordered listing of 3,000+ terms from Neurosynth at the push of a button. More importantly, L&E also omit a bunch of other accounts of dACC function that don’t focus on the above categories—for example, that the dACC is involved in various aspects of value learning (e.g., Kennerley et al., 2006; Behrens et al., 2007; autonomic control (e.g., Critchley et al., 2003; or fear processing (e.g., Milad et al., 2007). In effect, L&E are not really testing whether dACC is selective for pain; what they’re doing is, at best, testing whether the dACC is preferentially associated with pain in comparison to a select number of other candidate processes.
To be fair, L&E do report inspecting the full term rankings, even if they don’t report them explicitly:
Beyond the specific terms we selected for analyses, we also identified which psychological term was associated with the highest Z-score for each of the 8 dACC locations across all the psychological terms in the NeuroSynth database. Despite the fact that there are several hundred psychological terms in the NeuroSynth database, “pain“ was the top term for 6 out of 8 locations in the dACC.
This may seem compelling at face value, but there are several problems. First, z-scores don’t provide a measure of strength of effect, they provide (at best) a measure of strength of evidence. Pain has been extensively studied in the fMRI literature, so it’s not terribly surprising if z-scores for pain are larger than z-scores for many other terms in Neurosynth. Saying that dACC is specific to pain because it shows the strongest z-score is like saying that SSRIs are the only effective treatment for depression because a drug study with a sample size of 3,000 found a smaller p-value than a cognitive-behavioral therapy (CBT) study of 100 people. If we want to know if SSRIs beat CBT as a treatment for depression, we need to directly compare effect sizes for the two treatments, not p-values or z-scores. Otherwise we’re conflating how much evidence there is for each effect with how big the effect is. At best, we might be able to claim that we’re more confident that there’s a non-zero association between dACC activation and pain than that there’s a non-zero association between dACC activation and, say, conflict monitoring. But that doesn’t constitute evidence that the dACC is more strongly associated with pain than with conflict.
Second, if one looks at effect sizes estimates rather than z-scores—which is exactly what one should do if the goal is to make claims about the relative strengths of different associations—then it’s clearly not true that dACC is specific to pain. For the vast majority of voxels within the dACC, ranking associates by descending order of posterior probability results in some term or terms other than pain occupying the top spot for a majority of dACC voxels. For example, for coordinates (0, 22 26), we get ‘experiencing’ as the top associate (PP = 86%), then pain (82%), then ’empathic’ (81%). These results seem to cast dACC in a very different light than simply saying that dACC is involved in pain. Don’t like (0, 22, 26)? Okay, pick a different dACC coordinate. Say (4, 10, 28). Now the top associates are ‘aversive’ (79%), ‘anxiety disorders’ (79%), and ‘conditioned’ (78%) (‘pain’ is a little ways back, hanging out with ‘heart’, ‘skin conductance’, and ‘taste’). Or maybe you’d like something more anterior. Well, at (-2 30 22), we have ‘abuse’ (85%), ‘incentive delay’ (84%), ‘nociceptive’ (83%), and ‘substance’ (83%). At (0, 28, 16), we have ‘dysregulation’ (84%), ‘heat’ (83%), and ‘happy faces’ (82%). And so on.
Why didn’t L&E look at the posterior probabilities, which would have been a more appropriate way to compare different terms? They justify the decision as follows:
Because Z-scores are less likely to be inflated from smaller sample sizes than the posterior probabilities, our statistical analyses were all carried out on the Z-scores associated with each posterior probability (21).”
While it’s true that terms with fewer associated studies will have more variable (i.e., extreme) posterior probability estimates, this is an unavoidable problem that isn’t in any way remedied by focusing on z-scores instead of posterior probabilities. If some terms have too few studies in Neurosynth to support reliable comparisons with pain, the appropriate thing to do is to withhold judgment until more data is available. One cannot solve the problem of data insufficiency by pretending that p-values or z-scores are measures of effect size.
Meta-analytic contrasts in Neurosynth
It doesn’t have to be this way, mind you. If we want to directly compare effect sizes for different terms—which I think is what L&E want, even if they don’t actually do it—we can do that fairly easily using Neurosynth (though you have to use the Python core tools, rather than the website). The crux of the approach is that we need to directly compare the two conditions (or terms) using only those studies in the Neurosynth database that load on exactly one of the two target terms. This typically results in a rather underpowered test, because we end up working with only a few hundred studies, rather than the full database of 11,000+ studies. But such is our Rumsfeldian life—we do analysis with the data we have, not the data we wish we had.
In any case, if we conduct direct meta-analytic contrasts of pain versus a bunch of other terms like salience, emotion, and cognitive control, we get results that look like this:
These maps are thresholded very liberally (p < .001, uncorrected), so we should be wary of reading too much into them. And, as noted above, power for meta-analytic contrasts in Neurosynth is typically quite low. Still, it’s pretty clear that the results don’t support L&E’s conclusion. While pain does indeed activate the dACC with significantly higher probability than some other topics (e.g., emotion or touch), it doesn’t differentiate pain from a number of other viable candidates (e.g., salience, fear, and autonomic control). Moreover, there are other contrasts not involving pain that also elicit significant differences—e.g., between autonomic control and emotion, or fear and cognitive control.
Given that this is the correct way to test for activation differences between different Neurosynth maps, if we were to take seriously the idea that more frequent dACC activation in pain studies than in other kinds of studies implies pain selectivity, the above results would seem to indicate that dACC isn’t selective to pain (or at least, that there’s no real evidence for that claim). Perhaps we could reasonably say that dACC cares more about pain than, say, emotion (though, as discussed below, even that’s not a given); but that’s hardly the same thing as saying that “the best psychological description of dACC function is related to pain processing”.
A > B does not imply ~B
Of course, we wouldn’t want to buy L&E’s claim that the dACC is selective for pain even if the dACC did show significantly more frequent activation for pain than for all other terms, because showing that dACC activation is greater for task A than task B (or even tasks B through Z) doesn’t entail that the dACC is not also important for task B. By analogy, demonstrating that people on average prefer the color blue to the color green doesn’t entitle us to conclude that nobody likes green.
In fairness, L&E do say that the other candidate terms they examined don’t show any associations with the dACC in the Neurosynth reverse inference maps. For instance, they show us this figure:
A cursory inspection indeed reveals very little going on for terms other than pain. But this is pretty woeful evidence for the claim of no effect, as it’s based on low-resolution visual inspection of just one mid-saggital brain slice for just a handful of terms. The only quantitative support L&E marshal for their “nothing else activates dACC” claim is an inspection of activation at 8 individual voxels within dACC, which they report largely fail to activate for anything other than pain. The latter is not a very comprehensive analysis, and makes one wonder why L&E didn’t do something a little more systematic given the strength of their claim (e.g., they could have averaged over all dACC voxels and tested whether activation occurs more frequently than chance for each term).
As it turns out, when we look at the entire dACC rather than just 8 voxels, there’s plenty of evidence that the dACC does in fact care about things other than pain. You can easily see this on neurosynth.org just by browsing around for a few minutes, but to spare you the trouble, here are reverse inference maps for a bunch of terms that L&E either didn’t analyze at all, or looked at in only the 8 selected voxels (the pain map is displayed in the first row for reference):
In every single one of these cases, we see significant associations with dACC activation in the reverse inference meta-analysis. The precise location of activation varies from case to case (which might lead us to question whether it makes sense to talk about dACC as a monolithic system with a unitary function), but the point is that pain is clearly not the only process that activates dACC. So the notion that dACC is selective to pain doesn’t survive scrutiny even if you use L&E’s own criteria.
The limits of Neurosynth
All of the above problems are, in my view, already sufficient to lay the argument that dACC is pain selective to rest. But there’s another still more general problem with the L&E analysis that would, in my view, be sufficient to warrant extreme skepticism about their conclusion even if you knew nothing at all about the details of the analysis. Namely, in arguing for pain selectivity, L&E ignore many of the known limitations of Neurosynth. There are a number of reasons to think that—at least in its present state—Neurosynth simply can’t support the kind of inference that L&E are trying to draw. While L&E do acknowledge some of these limitations in their Discussion section, in my view, they don’t take them nearly as seriously as they ought to.
First, it’s important to remember that Neurosynth can’t directly tell us whether activation is specific to pain (or any other process), because terms in Neurosynth are just that—terms. They’re not carefully assigned task labels, let alone actual mental states. The strict interpretation of a posterior probability of 80% for pain in a dACC voxel is that, if we were to take 11,000 published fMRI studies and pretend that exactly 50% of them included the term ‘pain’ in their abstracts, the presence of activation in the voxel in question should increase our estimate of the likelihood of the term ‘pain’ occurring from 50% to 80%. If this seems rather weak, that’s because it is. It’s something of a leap to go from words in abstracts to processes in people’s heads.
Now, in most cases, I think it’s a perfectly defensible leap. I don’t begrudge anyone for treating Neurosynth terms as if they were decent proxies for mental states or cognitive tasks. I do it myself all the time, and I don’t feel apologetic about it. But that’s because it’s one thing to use Neurosynth to support a loose claim like “some parts of the dACC are preferentially associated with pain”, and quite another to claim that the dACC is selective for pain,  that virtually nothing else activates dACC, and that “pain represents the best psychological characterization of dACC function”. The latter is an extremely strong claim that requires one to demonstrate not only that there’s a robust association between dACC and pain (which Neurosynth supports), but also that (i) the association is meaningfully stronger than every other potential candidate, and (ii) no other process activates dACC in a meaningful way independently of its association with pain. L&E have done neither of these things, and frankly, I can’t imagine how they could do such a thing—at least, not with Neurosynth.
Second, there’s the issue of bias. Terms in Neurosynth are only good proxies for mental processes to the extent that they’re accurately represented in the literature. One important source of bias many people often point to (including L&E) is that if the results researchers report are colored by their expectations—which they almost certainly are—then Neurosynth is likely to reflect that bias. So, for example, if people think dACC supports pain, and disproportionately report activation in dACC in their papers (relative to other regions), the Neurosynth estimate of the pain-dACC assocation is likely be biased upwards. I think this is a legitimate concern, though (for technical reasons I won’t get into here) I also think it’s overstated. But there’s a second source of bias that I think is likely to be much more problematic in this particular case, which is that Neurosynth estimates (and, for that matter, estimates from every other large-scale meta-analysis, irrespective of database or method) are invariably biased to some degree by differences in the strength of different experimental manipulations.
To see what I mean, consider that pain is quite easy to robustly elicit in the scanner in comparison with many other processes or states. Basically, you attach some pain-inducing device to someone’s body and turn it on. If the device is calibrated properly and the subject has normal pain perception, you’re pretty much guaranteed to produce the experience of pain. In general, that effect is likely to be large, because it’s easy to induce fairly intense pain in the scanner.
Contrast that, with, say, emotion tasks. It’s an open secret in much of emotion research that what passes for an “emotional” stimulus is usually pretty benign by the standards of day-to-day emotional episodes. A huge proportion of studies use affective pictures to induce emotions like fear or disgust, and while there’s no doubt that such images successfully induce some change in emotional state, there are very few subjects who report large changes in experienced emotion (if you doubt this, try replacing the “extremely disgusted” upper anchor of your rating scale with “as disgusted as I would feel if someone threw up next to me” in your next study). One underappreciated implication of this is that if we decide to meta-analytically compare brain activation during emotion with brain activation during pain, our results are necessarily going to be biased by differences in the relative strengths of the two kinds of experimental manipulation—independently of any differences in the underlying neural substrates of pain and emotion. In other words, we may be comparing apples to oranges without realizing it. If we suppose, for the sake of hypothesis, that the dACC plays the same role in pain and emotion, and then compare strong manipulations of pain with weak manipulations of emotion, we would be confounding differences in experimental strength with differences in underlying psychology and biology. And we might well conclude that dACC is more important for pain than emotion—all because we have no good way of correcting for this rather mundane bias.
In point of fact, I think something like this is almost certainly true for the pain map in Neurosynth. One way to see this is to note that when we meta-analytically compare pain with almost any other term in Neurosynth (see the figure above), there are typically a lot of brain regions (extending well outside of dACC and other putative pain regions) that show greater activation for pain than for the comparison condition, and very few brain regions that show the converse pattern. I don’t think it’s plausible to think that much of the brain really prizes pain representation above all else. A more sensible interpretation is that the Neurosynth posterior probability estimates for pain are inflated to some degree by the relative ease of inducing pain experimentally. I’m not sure there’s any good way to correct for this, but given that small differences in posterior probabilities (e.g., going from 80% to 75%) would probably have large effects on the rank order of different terms, I think the onus is on L&E to demonstrate why this isn’t a serious concern for their analysis.
But it’s still good for plenty of other stuff!
Having spent a lot of time talking about Neurosynth’s limitations—and all the conclusions one can’t draw from reverse inference maps in Neurosynth—I want to make sure I don’t leave you with the wrong impression about where I see Neurosynth fitting into the cognitive neuroscience ecosystem. Despite its many weaknesses, I still feel quite strongly that Neurosynth is one of the most useful tools we have at the moment for quantifying the relative strengths of association between psychological processes and neurobiological substrates. There are all kinds of interesting uses for the data, website, and software that are completely unobjectionable. I’ve seen many published articles use Neurosynth in a variety of interesting ways, and a few studies have even used Neurosynth as their primary data source (and my colleagues and I have several more on the way). Russ Poldrack and I have a forthcoming paper in Annual Review of Psychology in which we review some of the ways databases like Neurosynth can play an invaluable role in the brain mapping enterprise. So clearly, I’m the last person who would tell anyone that Neurosynth isn’t useful for anything. It’s useful for a lot of things; but it probably shouldn’t be the primary source of evidence for very strong claims about brain-cognition or brain-behavior relationships.
What can we learn about the dACC using Neurosynth? A number of things. Here are some conclusions I think one can reasonably draw based solely on inspection of Neurosynth maps:
- There are parts of dACC (particularly the more posterior aspects) that are preferentially activated in studies involving painful stimulation.
- It’s likely that parts of dACC play a greater role in some aspect of pain processing than in many other candidate processes that at various times have been attributed to dACC (e.g., monitoring for cognitive conflict)—though we should be cautious, because in some cases some of those other functions are clearly represented in dACC, just in different sectors.
- Many of the same regions of dACC that preferentially activate during pain are also preferentially activated by other processes or tasks—e.g., fear conditioning, autonomic arousal, etc.
I think these are all interesting and potentially important observations. They’re hardly novel, of course, but it’s still nice to have convergent meta-analytic support for claims that have been made using other methods.
So what does the dACC do?
Having read this far, you might be thinking, well if dACC isn’t selective for pain, then what does it do? While I don’t pretend to have a good answer to this question, let me make three tentative observations about the potential role of dACC in cognition that may or may not be helpful.
First, there’s actually no particular reason why dACC has to play any unitary role in cognition. It may be a human conceit to think that just because we can draw some nice boundaries around a region and give it the name ‘dACC’, there must be some corresponding sensible psychological process that passably captures what all the neurons within that chunk of tissue are doing. But the dACC is a large brain region that contains hundreds of millions of neurons with enormously complex response profiles and connectivity patterns. There’s no reason why nature should respect our human desire for simple, interpretable models of brain function. To the contrary, our default assumption should probably be that there’s considerable functional heterogeneity within dACC, so that slapping a label like “pain” onto the entire dACC is almost certainly generating more heat than light.
Second, to the degree that we nevertheless insist on imposing a single unifying label on the entire dACC, it’s very unlikely that a generic characterization like “pain” is up to the job. While we can reasonably get away with loosely describing some (mostly sensory) parts of the brain as broadly supporting vision or motor function, the dACC—a frontal region located much higher in the processing hierarchy—is unlikely to submit to a similar analysis. It’s telling that most of the serious mechanistic accounts of dACC function have shied away from extensional definitions of regional function like “pain” or “emotion” and have instead focused on identifying broad computational roles that dACC might play. Thus, we have suggestions that dACC might be involved in response selection, conflict monitoring, or value learning. While these models are almost certainly wrong (or at the very least, grossly incomplete), they at least attempt to articulate some kind of computational role dACC circuits might be playing in cognition. Saying that the dACC is for “pain”, by contrast, tells us nothing about the nature of the representations in the region.
To their credit, L&E do address this issue to some extent. Specifically, they suggest that the dACC may be involved in monitoring for “survival-relevant goal conflicts”. Admittedly, it’s a bit odd that L&E make such a suggestion at all, seeing as it directly contradicts everything they argue for in the rest of the paper (i.e., if the dACC supports detection of the general class of things that are relevant for survival, then it is by definition not selective for pain, and vice versa). Contradictions aside, however, L&E’s suggestion is not completely implausible. As the Neurosynth maps above show, the dACC is clearly preferentially activated by fear conditioning, autonomic control, and reward—all of which could broadly be construed as “survival-relevant”. The main difficulty for L&E’s survival account comes from (a) the lack of evidence of dACC involvement in other clearly survival-relevant stimuli or processes—e.g., disgust, respiration, emotion, or social interaction, and (b) the availability of other much more plausible theories of dACC function (see the next point). Still, if we’re relying strictly on Neurosynth for evidence, we can give L&E the benefit of the doubt and reserve judgment on their survival-relevant account until more data becomes available. In the interim, what should not be controversial is that such an account has no business showing up in a paper titled “the dorsal anterior cingulate cortex is selective for pain”—a claim it is completely incompatible with.
Third, theories of dACC function based largely on fMRI evidence don’t (or shouldn’t) operate in a vacuum. Over the past few decades, literally thousands of animal and human studies have investigated the structure and function of the anterior cingulate cortex. Many of these studies have produced considerable insights into the role of the ACC (including dACC), and I think it’s safe to say that they collectively offer a much richer understanding than what fMRI studies—let alone a meta-analytic engine like Neurosynth—have produced to date. I’m especially partial to the work of Brent Vogt and colleagues (e.g., Vogt (2005); Vogt & Sikes, 2009), who have suggested a division within the anterior mid-cingulate cortex (aMCC; a region roughly co-extensive with the dACC in L&E’s nomenclature) between a posterior region involved in bodily orienting, and an anterior region associated with fear and avoidance behavior (though the two functions overlap in space to a considerable degree). Schematically, their “four-region” architectural model looks like this:
While the aMCC is assumed to contains many pain-selective neurons (as do more anterior sectors of the cingulate), it’s demonstrably not pain-selective, as neurons throughout the aMCC also respond to other stimuli (e.g., non-painful touch, fear cues, etc.).
Aside from being based on an enormous amount of evidence from lesion, electrophysiology, and imaging studies, the Vogt characterization of dACC/aMCC has several other nice features. For one thing, it fits almost seamlessly with the Neurosynth results displayed above (e.g., we find MCC activation associated with pain, fear, autonomic, and sensorimotor processes, with pain and fear overlapping closely in aMCC). For another, it provides an elegant and parsimonious explanation for the broad extent of pain-related activation in anterior cingulate cortex even though no part of aMCC is selective for pain (i.e., unlike other non-physical stimuli, pain involves skeletomotor orienting, and unlike non-painful touch, it elicits avoidance behavior and subjective unpleasantness).
Perhaps most importantly, Vogt and colleagues freely acknowledge that their model—despite having a very rich neuroanatomical elaboration—is only an approximation. They don’t attempt to ascribe a unitary role to aMCC or dACC, and they explicitly recognize that there are distinct populations of neurons involved in reward processing, response selection, value learning, and other aspects of emotion and cognition all closely interdigitated with populations involved in aspects of pain, touch, and fear. Other systems-level neuroanatomical models of cingulate function share this respect for the complexity of the underlying circuitry—complexity that cannot be adequately approximated by labeling the dACC simply as a pain region (or, for that matter, a “survival-relevance” region).
Conclusion
Lieberman & Eisenberger (2015) argue, largely on the basis of evidence from my Neurosynth framework, that the dACC is selective for pain. They are wrong. Neurosynth does not—and, at present, cannot—support such a conclusion. Moreover, a more careful examination of Neurosynth results directly refutes Lieberman and Eisenberger’s claims, providing clear evidence that the dACC is associated with many other operations, and converging with extensive prior animal and human work to suggest a far more complex view of dACC function.
- This is probably the first time I’ve been able to call myself the world’s foremost expert on anything while keeping a straight face. It feels pretty good.
** L&E show meta-analysis maps for a few more terms in an online supplement, but barely discuss them, even though at least one term (fear) clearly activates very similar parts of dACC.
“Rumsfeldian life” … wonderful!
I write a blog post with the metaphors that come to mind right away, not the metaphors I wish came to mind right away.
A fantastic (and important) post, Tal. One may also wonder in how many papers that found dACC and also have the word “pain” in the relationship was based on reverse inference. This could potentially bias Neurosynth even further, right?
Yes, and I touch on that in the post. The reason I tend not to worry about that very much is that it’s a general issue that applies at least as much to other methods. I.e., if people think the amygdala does emotion, they’re going to look there first, and will very likely slant the paper to focus on the amygdala if anything looks promising. I would argue that Neurosynth actually reduces this bias to some extent relative to the broader literature, because the Neurosynth database only contains activations reported in tables, and ignores coordinates reported in the main text of articles. So in practice, even if people frame their article entirely around an ROI of a priori interest, Neurosynth will tend to mitigate that bias somewhat in virtue of weighting all rows in activation tables equally.
Hety Tal, that’s a demonstration ! also I appreciate the distinction selective /specific / preferential as I made to task http://www.ncbi.nlm.nih.gov/pubmed/17336096, worth thinking how this applies to reverse inference
Thanks for the link! Hadn’t seen this before, but it’s definitely relevant. I think your distinction between selective and preferential maps pretty closely onto my usage here. I’m not sure I see a need for a further specific vs. selective distinction, as I think your operationalization of specificity can probably be recast as selectivity for some non-superficial feature in most cases (i.e., if the idea is that A > B > C, then one could alternatively say that there’s some common process or processes that A recruits more than B, and B more than C, but the region is still either preferential or selective for those particular processes).
Hey Tal,
Nice post. Shouldn’t this go to PNAS as a letter? Readers who find the original paper through, e.g., PubMed, will have a link pointing to this article then.
Cheers,
Anderson
Thank you for putting in words what had been my emotional reaction to reading the paper. This post should be read very carefully by everybody (and especially by who is still after selectivity rather than underlying computational mechanism).
I echo the gratitude of the other posters for this beautiful (and civil, all things considered) response to that paper. This should definitely be published somewhere, in some form.
Hi Tal, thanks a lot for this very useful comment. You still have time to submit a letter to PNAS (within 3 months of original article I think). I’ve posted a link to your blog on PubPeer.
Hey Tal,
I noticed there were some rando structural MRI and functional connectivity analyses floating around in your neurosynth database that don’t involve activation analyses at all…is there anyway to remove them before I do a meta-analysis of “activation”. I doubt that this would greatly affect the results, but I just wanted to check.
Thanks!
John
Hi John,
There’s no easy way to do that, unfortunately. The database has become progressively cleaner over time. The latest release–which I just noticed isn’t on GitHub yet (a situation I’ll remedy momentarily)–eliminated a whole bunch of non-fMRI articles. But short of manual filtering (which isn’t anticipated any time soon), there will always be some noise. But you’re right that in practice it makes very little difference.
Thank you Tal, for a great critique that really got me thinking (I know I’m late to the party, but I only just read your article!). The ACC in general provides such a fascinating illustration of all the problems associated with reverse inference. And of using one-liners to describe the entire function of a particular structure (e.g., pain processing).
As you rightly say, even partitions like rACC/dACC etc. might be too broad, But an even bigger problem is that our conceptualisations of the ACC always turn out to be too narrow. Its role seems to be more than “conflict monitoringâ€, “reward based behaviour†or even â€providing a continuously updated account of predicted demand on cognitive resourcesâ€. None of these explanations quite capture both its emotional as well and cognitive components, not its involvement in pain processing.
I like Critchley et al’s (2003) idea that the ACC, as a whole, might best be thought of as the interface between the central and autonomic nervous systems. It signals a need to modify bodily arousal states. This is a nice way to think of its role in pain – pain is obviously a hugely important signal than indicating the need for some sort of autonomic response. At the same time, this arousal modification model also captures the more cognitive functions of ACC. Medford &.Critchley (2010) is worth a look here.
Thanks again. Will now take a look at some of your other posts!
The reason why Dr. Matthew Lieberman and Dr. Naomi Eisenberger didn’t go into detail or talk about their research is because they were using unethical experimental procedure to come up with their research answers. They have conducted horrific experiments on unwilling human subjects to obtain information that they used to jump ahead of their competition ,wrongfully obtain grant money and cheat their way up the ladder of success. They deliberately ran experiments that caused their unwilling research subjects to experience extreme amounts of long term physical, emotional and social pain and suffering. They destroyed the health and lives of these subjects and deliberately caused them to suffer extreme amounts of loss, abandonment, pain ,suffering ect… Their work needs to be investigated and they need to be held accountable for the harm and damage that they caused to these human subjects.