UPDATE: the webcast is now archived here for posterity.
This is kind of late notice and probably of interest to few people, but I’m giving the NIF webinar tomorrow (or today, depending on your time zone–either way, we’re talking about November 1st). I’ll be talking about Neurosynth, and focusing in particular on the methods and data, since that’s what NIF (which stands for Neuroscience Information Framework) is all about. Assuming all goes well, the webinar should start at 11 am PST. But since I haven’t done a webcast of any kind before, and have a surprising knack for breaking audiovisual equipment at a distance, all may not go well. Which I suppose could make for a more interesting presentation. In any case, here’s the abstract:
The explosive growth of the human neuroimaging literature has led to major advances in understanding of human brain function, but has also made aggregation and synthesis of neuroimaging findings increasingly difficult. In this webinar, I will describe a highly automated brain mapping framework called NeuroSynth that uses text mining, meta-analysis and machine learning techniques to generate a large database of mappings between neural and cognitive states. The NeuroSynth framework can be used to automatically conduct large-scale, high-quality neuroimaging meta-analyses, address long-standing inferential problems in the neuroimaging literature (e.g., how to infer cognitive states from distributed activity patterns), and support accurate “˜decoding’ of broad cognitive states from brain activity in both entire studies and individual human subjects. This webinar will focus on (a) the methods used to extract the data, (b) the structure of the resulting (publicly available) datasets, and (c) some major limitations of the current implementation. If time allows, I’ll also provide a walk-through of the associated web interface (http://neurosynth.org) and will provide concrete examples of some potential applications of the framework.
There’s some more info (including details about how to connect, which might be important) here. And now I’m off to prepare my slides. And script some evasive and totally non-committal answers to deploy in case of difficult questions from the peanut gallery respected audience.
When I review papers for journals, I often find myself facing something of a tension between two competing motives. On the one hand, I’d like to evaluate each manuscript as an independent contribution to the scientific literature–i.e., without having to worry about how the manuscript stacks up against other potential manuscripts I could be reading. The rationale being that the plausibility of the findings reported in a manuscript shouldn’t really depend on what else is being published in the same journal, or in the field as a whole: if there are methodological problems that threaten the conclusions, they shouldn’t become magically more or less problematic just because some other manuscript has (or doesn’t have) gaping holes. Reviewing should simply be a matter of documenting one’s major concerns and suggestions and sending them back to the Editor for infallible judgment.
The trouble with this idea is that if you’re of a fairly critical bent, you probably don’t believe the majority of the findings reported in the manuscripts sent to you to review. Empirically, this actually appears to be the right attitude to hold, because as a good deal of careful work by biostatisticians like John Ioannidis shows, most published research findings are false, and most true associations are inflated. So, in some ideal world, where the job of a reviewer is simply to assess the likelihood that the findings reported in a paper provide an accurate representation of reality, and/or to identify ways of bringing those findings closer in line with reality, skepticism is the appropriate default attitude. Meaning, if you keep the question “why don’t I believe these results?” firmly in mind as you read through a paper and write your review, you probably aren’t going to go wrong all that often.
The problem is that, for better or worse, one’s job as a reviewer isn’t really–or at least, solely–to evaluate the plausibility of other people’s findings. In large part, it’s to evaluate the plausibility of reported findings in relation to the other stuff that routinely gets published in the same journal. For instance, if you regularly reviewing papers for a very low-tier journal, the editor is probably not going to be very thrilled to hear you say “well, Ms. Editor, none of the last 15 papers you’ve sent me are very good, so you should probably just shut down the journal.” So a tension arises between writing a comprehensive review that accurately captures what the reviewer really thinks about the results–which is often (at least in my case) something along the lines of “pffft, there’s no fucking way this is true”–and writing a review that weighs the merits of the reviewed manuscript relative to the other candidates for publication in the same journal.
To illustrate, suppose I review a paper and decide that, in my estimation, there’s only a 20% chance the key results reported in the paper would successfully replicate (for the sake of argument, we’ll pretend I’m capable of this level of precision). Should I recommend outright rejection? Maybe, since 1 in 5 odds of long-term replication don’t seem very good. But then again, what if 20% is actually better than average? What if I think the average article I’m sent to review only has a 10% chance of holding up over time? In that case, if I recommend rejection of the 20% article, and the editor follows my recommendation, most of the time I’ll actually be contributing to the journal publishing poorer quality articles than if I’d recommended accepting the manuscript, even if I’m pretty sure the findings reported in the manuscript are false.
Lest this sound like I’m needlessly overanalyzing the review process instead of buckling down and writing my own overdue reviews (okay, you’re right, now stop being a jerk), consider what happens when you scale the problem up. When journal editors send reviewers manuscripts to look over, the question they really want an answer to is, “how good is this paper compared to everything else that crosses my desk?” But most reviewers naturally incline to answer a somewhat different–and easier–question, namely, “in the grand scheme of life, the universe, and everything, how good is this paper?” The problem, then, is that if the variance in curmudgeonliness between reviewers exceeds the (reliable) variance within reviewers, then arguably the biggest factor in determining whether or not a given paper gets rejected is simply who happens to review it. Not how much expertise the reviewer has, or even how ‘good’ they are (in the sense that some reviewers are presumably better than others at identifying serious problems and overlooking trivial ones), but simply how critical they are on average. Which is to say, if I’m Reviewer 2 on your manuscript, you’ll probably have a better chance of rejection than if Reviewer 2 is someone who characteristically writes one-paragraph reviews that begin with the words “this is an outstanding and important piece of work…”
Anyway, on some level this is a pretty trivial observation; after all, we all know that the outcome of the peer review process is, to a large extent, tantamount to a roll of the dice. We know that there are cranky reviewers and friendly reviewers, and we often even have a sense of who they are, which is why we often suggest people to include or exclude as reviewers in our cover letters. The practical question though–and the reason for bringing this up here–is this: given that we have this obvious and ubiquitous problem of reviewers having different standards for what’s publishable, and that this undeniably impacts the outcome of peer review, are there any simple steps we could take to improve the reliability of the review process?
The way I’ve personally made peace between my desire to provide the most comprehensive and accurate review I can and the pragmatic need to evaluate each manuscript in relation to other manuscripts is to use the “comments to the Editor” box to provide some additional comments about my review. Usually what I end up doing is writing my review with little or no thought for practical considerations such as “how prestigious is this journal” or “am I a particularly harsh reviewer” or “is this a better or worse paper than most others in this journal”. Instead, I just write my review, and then when I’m done, I use the comments to the editor to say things like “I’m usually a pretty critical reviewer, so don’t take the length of my review as an indication I don’t like the manuscript, because I do,” or, “this may seem like a negative review, but it’s actually more positive than most of my reviews, because I’m a huge jerk.” That way I can appease my conscience by writing the review I want to while still giving the editor some indication as to where I fit in the distribution of reviewers they’re likely to encounter.
I don’t know if this approach makes any difference at all, and maybe editors just routinely ignore this kind of thing; it’s just the best solution I’ve come up with that I can implement all by myself, without asking anyone else to change their behavior. But if we allow ourselves to contemplate alternative approaches that include changes to the review process itself (while still adhering to the standard pre-publication review model, which, like many other people, I’ve argued is fundamentally dysfunctional), then there are many other possibilities.
One idea, for instance, would be to include calibration questions that could be used to estimate (and correct for) individual differences in curmudgeonliness. For instance, in addition to questions about the merit of the manuscript itself, the review form could have a question like “what proportion of articles you review do you estimate end up being rejected?” or “do you consider yourself a more critical or less critical reviewer than most of your peers?”
Another, logistically more difficult, idea would be to develop a centralized database of review outcomes, so that editors could see what proportion of each reviewer’s assignments ultimately end up being rejected (though they couldn’t see the actual content of the reviews). I don’t know if this type of approach would improve matters at all; it’s quite possible that the review process is fundamentally so inefficient and slow that editors just don’t have the time to spend worrying about this kind of thing. But it’s hard to believe that there aren’t some simple calibration steps we could take to bring reviewers into closer alignment with one another–even if we’re confined to working within the standard pre-publication model of peer review. And given the abysmally low reliability of peer review, even small improvements could potentially produce large benefits in the aggregate.
Several people left enlightening comments on my last post about the ADHD-200 Global Competition results, so I thought I’d bump some of them up and save you the trip back there (though the others are worth reading too!), since they’re salient to some of the issues raised in the last post.
Matthew Brown, the project manager on the Alberta team that was disqualified on a minor technicality (they cough didn’t use any of the imaging data), pointed out that they actually did initially use the imaging data (answering Sanjay’s question in another comment)–it just didn’t work very well:
For the record, we tried a pile of imaging-based approaches. As a control, we also did classification with age, gender, etc. but no imaging data. It was actually very frustrating for us that none of our imaging-based methods did better than the no imaging results. It does raise some very interesting issues.
He also pointed out that the (relatively) poor performance of the imaging-based classifiers isn’t cause for alarm:
I second your statement that we’ve only scratched the surface with fMRI-based diagnosis (and prognosis!). There’s a lot of unexplored potential here. For example, the ADHD-200 fMRI scans are resting state scans. I suspect that fMRI using an attention task could work better for diagnosing ADHD. Resting state fMRI has also shown promise for diagnosing other conditions (eg: schizophrenia – see Shen et al.).
I think big, multi-centre datasets like the ADHD-200 are the future though the organizational and political issues with this approach are non-trivial. I’m extremely impressed with the ADHD-200 organizers and collaborators for having the guts and perseverance to put this data out there. I intend to follow their example with the MR data that we collect in future.
I couldn’t agree more!
Jesse Brown explains why his team didn’t use the demographics alone:
I was on the UCLA/Yale team, we came in 4th place with 54.87% accuracy. We did include all the demographic measures in our classifier along with a boatload of neuroimaging measures. The breakdown of demographics by site in the training data did show some pretty strong differences in ADHD vs. TD. These differences were somewhat site dependent, eg girls from OHSU with high IQ are very likely TD. We even considered using only demographics at some point (or at least one of our wise team members did) but I thought that was preposterous. I think we ultimately figured that the imaging data may generalize better to unseen examples, particularly for sites that only had data in the test dataset (Brown). I guess one lesson is to listen to the data and not to fall in love with your method. Not yet anyway.
I imagine some of the other groups probably had a similar experience of trying to use the demographic measures alone and realizing they did better than the imaging data, but sticking with the latter anyway. Seems like a reasonable decision, though ultimately, I still think it’s a good thing the Alberta team used only the demographic variables, since their results provided an excellent benchmark against which to compare the performance of the imaging-based models. Sanjay Srivastava captured this sentiment nicely:
Two words: incremental validity. This kind of contest is valuable, but I’d like to see imaging pitted against behavioral data routinely. The fact that they couldn’t beat a prediction model built on such basic information should be humbling to advocates of neurodiagnosis (and shows what a low bar “better than chance” is). The real question is how an imaging-based diagnosis compares to a clinician using standard diagnostic procedures. Both “which is better” (which accounts for more variance as a standalone prediction) and “do they contain non-overlapping information” (if you put both predictions into a regression, does one or both contribute unique variance).
And Russ Poldrack raised a related question about what it is that the imaging-based models are actually doing:
What is amazing is that the Alberta team only used age, sex, handedness, and IQ.Â That suggests to me that any successful imaging-based decoding could have been relying upon correlates of those variables rather than truly decoding a correlate of the disease.
This seems quite plausible inasmuch as age, sex, and IQ are pretty powerful variables, and there are enormous literatures on their structural and functional correlates. While there probably is at least some incremental information added by the imaging data (and very possibly a lot of it), it’s currently unclear just how much–and whether that incremental variance might also be picked up by (different) behavioral variables. Ultimately, time (and more competitions like this one!) will tell.
UPDATE 10/13: a number of commenters left interesting comments below addressing some of the issues raised in this post. I expand on some of them here.
The ADHD-200 Global Competition, announced earlier this year, was designed to encourage researchers to develop better tools for diagnosing mental health disorders on the basis of neuroimaging data:
The competition invited participants to develop diagnostic classification tools for ADHD diagnosis based on functional and structural magnetic resonance imaging (MRI) of the brain. Applying their tools, participants provided diagnostic labels for previously unlabeled datasets. The competition assessed diagnostic accuracy of each submission and invited research papers describing novel, neuroscientific ideas related to ADHD diagnosis. Twenty-one international teams, from a mix of disciplines, including statistics, mathematics, and computer science, submitted diagnostic labels, with some trying their hand at imaging analysis and psychiatric diagnosis for the first time.
Data for the contest came from several research labs around the world, who donated brain scans from participants with ADHD (both inattentive and hyperactive subtypes) as well as healthy controls. The data were made openly available through the International Neuroimaging Data-sharing Initiative, and nicely illustrate the growing movement towards openly sharing large neuroimaging datasets and promoting their use in applied settings. It is, in virtually every respect, a commendable project.
Well, the results of the contest are now in–and they’re quite interesting. The winning team, from Johns Hopkins, came up with a method that performed substantially above chance and showed particularly high specificity (i.e., it made few false diagnoses, though it missed a lot of true ADHD cases). And all but one team performed above chance, demonstrating that the imaging data has at least some (though currently not a huge amount) of utility in diagnosing ADHD and ADHD subtype. There are some other interesting results on the page worth checking out.
But here’s hands-down the most entertaining part of the results, culled from the “Interesting Observations” section:
The team from the University of Alberta did not use imaging data for their prediction model. This was not consistent with intent of the competition. Instead they used only age, sex, handedness, and IQ. However, in doing so they obtained the most points, outscoring the team from Johns Hopkins University by 5 points, as well as obtaining the highest prediction accuracy (62.52%).
…or to put it differently, if you want to predict ADHD status using the ADHD-200 data, your best bet is to not really use the ADHD-200 data! At least, not the brain part of it.
I say this with tongue embedded firmly in cheek, of course; the fact that the Alberta team didn’t use the imaging data doesn’t mean imaging data won’t ultimately be useful for diagnosing mental health disorders. It remains quite plausible that ten or twenty years from now, structural or functional MRI scans (or some successor technology) will be the primary modality used to make such diagnoses. And the way we get from here to there is precisely by releasing these kinds of datasets and promoting this type of competition. So on the whole, I think this should actually be seen as a success story for the field of human neuroimaging–especially since virtually all of the teams performed above chance using the imaging data.
That said, there’s no question this result also serves as an important and timely reminder that we’re still in the very early days of brain-based prediction. Right now anyone who claims they can predict complex real-world behaviors better using brain imaging data than using (much cheaper) behavioral data has a lot of ‘splainin to do. And there’s a good chance that they’re trying to sell you something (like, cough, neuromarketing ‘technology’).
Over the last few days the commotion over Martin Lindstrom’s terrible New York Times iPhone loving Op-Ed, which I wrote about in my last post, seems to have spread far and wide. Highlights include excellent posts by David Dobbs and the Neurocritic, but really there are too many to list at this point. And the verdict is overwhelmingly negative; I don’t think I’ve seen a single post in defense of Lindstrom, which is probably not a good sign (for him).
Anyway, the fact that the Times published the rebuttal letter is all well and good, but as I mentioned in my last post, the bigger problem is that since the Times doesn’t include links to related content on their articles, people who stumble across the Op-Ed aren’t going to have any way of knowing that it’s been roundly discredited by pretty much the entire web. Lindstrom’s piece was the most emailed article on the Times website for a day or two, but only a tiny fraction of those readers will ever see (or even hear about) the critical response. As far as I know, the NYT hasn’t issued an explanation or apology for publishing the Op-Ed; they’ve simply published the letter and gone on about their business (I guess I can’t fault them for this–if they had to issue a formal apology for every mistake that gets published, they’d have no time for anything else; the trick is really to catch this type of screw-up at the front end). Adding links from each article to related content wouldn’t solve the problem entirely, of course, but it would be something. The fact that Times’ platform currently doesn’t have this capacity is kind of perplexing.
The other point worth mentioning is that, in the aftermath of the tsunami of criticism he received, Lindstrom left a comment on several blogs (Russ Poldrack and David Dobbs were lucky recipients; sadly, I wasn’t on the guest list). Here’s the full text of the comment:
My first foray into neuro-marketing research was for my New York Times bestseller Buyology: Truth and Lies about Why We Buy. For that book I teamed up with Neurosense, a leading independent neuro-marketing company that specializes in consumer research using functional magnetic resonance imaging (fMRI) headed by Oxford University trained Gemma Calvert, BSc DPhil CPsychol FRSA and Neuro-Insight, a market research company that uses unique brain-imaging technology, called Steady-State Topography (SST), to measure how the brain responds to communications which is lead by Dr. Richard Silberstein, PhD. This was the single largest neuro-marketing study ever conducted—25x larger than any such study to date and cost more than seven million dollars to run.
In the three-year effort scientists scanned the brains of over 2,000 people from all over the world as they were exposed to various marketing and advertising strategies including clever product placements, sneaky subliminal messages, iconic brand logos, shocking health and safety warnings, and provocative product packages. The purpose of all of this was to understand, quite successfully I may add, the key drivers behind why we make the purchasing decisions that we do.
For the research that my recent Op-Ed column in the New York Times was based on I turned to Dr. David Hubbard, a board-certified neurologist and his company MindSign Neuro Marketing, an independently owned fMRI neuro-marketing company. I asked Dr. Hubbard and his team a simple question, “Are we addicted to our iPhones?“ After analyzing the brains of 8 men and 8 women between the ages of 18-25 using fMRI technology, MindSign answered my question using standardized answering methods and completely reproducible results. The conclusion was that we are not addicted to our iPhones, we are in love with them.
The thought provoking dialogue that has been generated from the article has been overwhelmingly positive and I look forward to the continued comments from professionals in the field, readers and fans.
As evasive responses go, this is a masterpiece; at no point does Lindstrom ever actually address any of the substantive criticisms leveled at him. He spends most of his response name dropping (the list of credentials is almost long enough to make you forget that the rebuttal letter to his Op-Ed was signed by over 40 PhDs) and rambling about previous unrelated neuromarketing work (which may as well not exist, since none of it has ever been made public), and then closes by shifting the responsibility for the study to MindSign, the company he paid to run the iPhone study. The claim that MindSign “answered [his] question using standardized answering methods and completely reproducible results” is particularly ludicrous; as I explained in my last post, there currently aren’t any standardized methods for reading addiction or love off of brain images. And ‘completely reproducible results’ implies that one has, you know, successfully reproduced the results, which is simply false unless Lindstrom is suggesting that MindSign did the same experiment twice. It’s hard to see any “thought provoking dialogue” taking place here, and the neuroimaging community’s response to the Op-Ed column has been, virtually without exception, overwhelmingly negative, not positive (as Lindstrom claims).
That all said, I do think there’s one very positive aspect to this entire saga, and that’s the amazing speed and effectiveness of the response from scientists, science journalists, and other scientifically literate folks. Ten years ago, Lindstrom’s piece might have gone completely unchallenged–and even if someone like Russ Poldrack had written a response, it would probably have appeared much later, been signed by fewer scientists (because coordination would have been much more difficult), and received much less attention. But with 48 hours of Lindstrom’s Op-Ed being published, dozens of critical blog posts had appeared, and hundreds, if not thousands, of people all over the world had tweeted or posted links to these critiques (my last post alone received over 12,000 hits). Scientific discourse, which used to be confined largely to peer-reviewed print journals and annual conferences, now takes place at a remarkable pace online, and it’s fantastic to see social media used in this way. The hope is that as these technologies develop further and scientists take on a more active role in communicating with the public (something that platforms like Twitter and Google+ seem to be facilitating amazingly well), it’ll become increasingly difficult for people like Lindstrom to make crazy pseudoscientific claims without being immediately and visibly called out on it–even in those rare cases when the NYT makes the mistake of leaving one the biggest microphones on earth open and unmonitored.
The New York Times has a terrible, terrible Op-Ed piece today by Martin Lindstrom (who I’m not going to link to, because I don’t want to throw any more bones his way). If you believe Lindstrom, you don’t just like your iPhone a lot; you love it. Literally. And the reason you love it, shockingly, is your brain:
Earlier this year, I carried out an fMRI experiment to find out whether iPhones were really, truly addictive, no less so than alcohol, cocaine, shopping or video games. In conjunction with the San Diego-based firm MindSign Neuromarketing, I enlisted eight men and eight women between the ages of 18 and 25. Our 16 subjects were exposed separately to audio and to video of a ringing and vibrating iPhone.
But most striking of all was the flurry of activation in the insular cortex of the brain, which is associated with feelings of love and compassion. The subjects’ brains responded to the sound of their phones as they would respond to the presence or proximity of a girlfriend, boyfriend or family member.
In short, the subjects didn’t demonstrate the classic brain-based signs of addiction. Instead, they loved their iPhones.
There’s so much wrong with just these three short paragraphs (to say nothing of the rest of the article, which features plenty of other whoppers) that it’s hard to know where to begin. But let’s try. Take first the central premise–that an fMRI experiment could help determine whether iPhones are no less addictive than alcohol or cocaine. The tacit assumption here is that all the behavioral evidence you could muster–say, from people’s reports about how they use their iPhones, or clinicians’ observations about how iPhones affect their users–isn’t sufficient to make that determination; to “really, truly” know if something’s addictive, you need to look at what the brain is doing when people think about their iPhones. This idea is absurd inasmuch as addiction is defined on the basis of its behavioral consequences, not (right now, anyway) by the presence or absence of some biomarker. What makes someone an alcoholic is the fact that they’re dependent on alcohol, have trouble going without it, find that their alcohol use interferes with multiple aspects of their day-to-day life, and generally suffer functional impairment because of it–not the fact that their brain lights up when they look at pictures of Johnny Walker red. If someone couldn’t stop drinking–to the point where they lost their job, family, and friends–but their brain failed to display a putative biomarker for addiction, it would be strange indeed to say “well, you show all the signs, but I guess you’re not really addicted to alcohol after all.”
Now, there may come a day (and it will be a great one) when we have biomarkers sufficiently accurate that they can stand in for the much more tedious process of diagnosing someone’s addiction the conventional way. But that day is, to put it gently, a long way off. Right now, if you want to know if iPhones are addictive, the best way to do that is to, well, spend some time observing and interviewing iPhone users (and some quantitative analysis would be helpful).
Of course, it’s not clear what Lindstrom thinks an appropriate biomarker for addiction would be in any case. Presumably it would have something to do with the reward system; but what? Suppose Lindstrom had seen robust activation in the ventral striatum–a critical component of the brain’s reward system–when participants gazed upon the iPhone: what then? Would this have implied people are addicted to iPhones? But people also show striatal activity when gazing on food, money, beautiful faces, and any number of other stimuli. Does that mean the average person is addicted to all of the above? A marker of pleasure or reward, maybe (though even that’s not certain), but addiction? How could a single fMRI experiment with 16 subjects viewing pictures of iPhones confirm or disconfirm the presence of addiction? Lindstrom doesn’t say. I suppose he has good reason not to say: if he really did have access to an accurate fMRI-based biomarker for addiction, he’d be in a position to make millions (billions?) off the technology. To date, no one else has come close to identifying a clinically accurate fMRI biomarker for any kind of addiction (for more technical readers, I’m talking here about cross-validated methods that have both sensitivity and specificity comparable to traditional approaches when applied to new subjects–not individual studies that claim 90% with-sample classification accuracy based on simple regression models). So we should, to put it mildly, be very skeptical that Lindstrom’s study was ever in a position to do what he says it was designed to do.
We should also ask all sorts of salient and important questions about who the people are who are supposedly in love with their iPhones. Who’s the “You” in the “You Love Your iPhone” of the title? We don’t know, because we don’t know who the participants in Lindstrom’s sample, were, aside from the fact that they were eight men and eight women aged 18 to 25. But we’d like to know some other important things. For instance, were they selected for specific characteristics? Were they, say, already avid iPhone users? Did they report loving, or being addicted to their iPhones? If so, would it surprise us that people chosen for their close attachment to their iPhones also showed brain activity patterns typical of close attachment? (Which, incidentally, they actually don’t–but more on that below.) And if not, are we to believe that the average person pulled off the street–who probably has limited experience with iPhones–really responds to the sound of their phones “as they would respond to the presence or proximity of a girlfriend, boyfriend or family member”? Is the takeaway message of Lindstrom’s Op-Ed that iPhones are actually people, as far as our brains are concerned?
In fairness, space in the Times is limited, so maybe it’s not fair to demand this level of detail in the Op-Ed iteslf. But the bigger problem is that we have no way of evaluating Lindstrom’s claims, period, because (as far as I can tell), his study hasn’t been published or peer-reviewed anywhere. Presumably, it’s proprietary information that belongs to the neuromarketing firm in question. Which is to say, the NYT is basically giving Lindstrom license to talk freely about scientific-sounding findings that can’t actually be independently confirmed, disputed, or critiqued by members of the scientific community with expertise in the very methods Lindstrom is applying (expertise which, one might add, he himself lacks). For all we know, he could have made everything up. To be clear, I don’t really think he did make everything up–but surely, somewhere in the editorial process someone at the NYT should have stepped in and said, “hey, these are pretty strong scientific claims; is there any way we can make your results–on which your whole article hangs–available for other experts to examine?”
This brings us to what might be the biggest whopper of all, and the real driver of the article title: the claim that “most striking of all was the flurry of activation in the insular cortex of the brain, which is associated with feelings of love and compassion“. Russ Poldrack already tore this statement to shreds earlier this morning:
Insular cortex may well be associated with feelings of love and compassion, but this hardly proves that we are in love with our iPhones.Â In Tal Yarkoni’s recent paper in Nature Methods, we found that the anterior insula was one of the most highly activated part of the brain, showing activation in nearly 1/3 of all imaging studies!Â Further, the well-known studies of love by Helen Fisher and colleagues don’t even show activation in the insula related to love, but instead in classic reward system areas.Â So far as I can tell, this particular reverse inference was simply fabricated from whole cloth.Â I would have hoped that the NY Times would have learned its lesson from the last episode.
But you don’t have to take Russ’s word for it; if you surfforafewterms on our Neurosynth website, making sure to select “forward inference” under image type, you’ll notice that the insula shows up for almost everything. That’s not an accident; it’s because the insula (or at least the anterior part of the insula) plays a very broad role in goal-directed cognition. It really is activated when you’re doing almost anything that involves, say, following instructions an experimenter gave you, or attending to external stimuli, or mulling over something salient in the environment. You can see this pretty clearly in this modified figure from our Nature Methods paper (I’ve circled the right insula):
The insula is one of a few ‘hotspots’ where activation is reported very frequently in neuroimaging articles (the other major one being the dorsal medial frontal cortex). So, by definition, there can’t be all that much specificity to what the insula is doing, since it pops up so often. To put it differently, as Russ and others have repeatedly pointed out, the fact that a given region activates when people are in a particular psychological state (e.g., love) doesn’t give you license to conclude that that state is present just because you see activity in the region in question. If language, working memory, physical pain, anger, visual perception, motor sequencing, and memory retrieval all activate the insula, then knowing that the insula is active is of very little diagnostic value. That’s not to say that some psychological states might not be more strongly associated with insula activity (again, you can see this on Neurosynth if you switch the image type to ‘reverse inference’ and browse around); it’s just that, probabilistically speaking, the mere fact that the insula is active gives you very little basis for saying anything concrete about what people are experiencing.
In fact, to account for Lindstrom’s findings, you don’t have to appeal to love or addiction at all. There’s a much simpler way to explain why seeing or hearing an iPhone might elicit insula activation. For most people, the onset of visual or auditory stimulation is a salient event that causes redirection of attention to the stimulated channel. I’d be pretty surprised, actually, if you could present any picture or sound to participants in an fMRI scanner and not elicit robust insula activity. Orienting and sustaining attention to salient things seems to be a big part of what the anterior insula is doing (whether or not that’s ultimately its ‘core’ function). So the most appropriate conclusion to draw from the fact that viewing iPhone pictures produces increased insula activity is something vague like “people are paying more attention to iPhones”, or “iPhones are particularly salient and interesting objects to humans living in 2011.” Not something like “no, really, you love your iPhone!”
In sum, the NYT screwed up. Lindstrom appears to have a habit of making overblown claims about neuroimaging evidence, so it’s not surprising he would write this type of piece; but the NYT editorial staff is supposedly there to filter out precisely this kind of pseudoscientific advertorial. And they screwed up. It’s a particularly big screw-up given that (a) as of right now, Lindstrom’s Op-Ed is the single most emailed article on the NYT site, and (b) this incident almost perfectly recapitulates another NYT article 4 years ago in which some neuroscientists and neuromarketers wrote a grossly overblown Op-Ed claiming to be able to infer, in detail, people’s opinions about presidential candidates. That time, Russ Poldrack and a bunch of other big names in cognitive neuroscience wrote a concise rebuttal that appeared in the NYT (but unfortunately, isn’t linked to from the original Op-Ed, so anyone who stumbles across the original now has no way of knowing how ridiculous it is). One hopes the NYT follows up in similar fashion this time around. They certainly owe it to their readers–some of whom, if you believe Lindstrom, are now in danger of dumping their current partners for their iPhones.