Big Pitch or Big Lottery? The unenviable task of evaluating the grant review system

This week’s issue of Science has an interesting article on The Big Pitch–a pilot NSF initiative to determine whether anonymizing proposals and dramatically cutting down their length (from 15 pages to 2) has a substantial impact on the results of the review process. The answer appears to be an unequivocal yes. From the article:

What happens is a lot, according to the first two rounds of the Big Pitch. NSF’s grant reviewers who evaluated short, anonymized proposals picked a largely different set of projects to fund compared with those chosen by reviewers presented with standard, full-length versions of the same proposals.

Not surprisingly, the researchers who did well under the abbreviated format are pretty pleased:

Shirley Taylor, an awardee during the evolution round of the Big Pitch, says a comparison of the reviews she got on the two versions of her proposal convinced her that anonymity had worked in her favor. An associate professor of microbiology at Virginia Commonwealth University in Richmond, Taylor had failed twice to win funding from the National Institutes of Health to study the role of an enzyme in modifying mitochondrial DNA.

Both times, she says, reviewers questioned the validity of her preliminary results because she had few publications to her credit. Some reviews of her full proposal to NSF expressed the same concern. Without a biographical sketch, Taylor says, reviewers of the anonymous proposal could “focus on the novelty of the science, and this is what allowed my proposal to be funded.”

Broadly speaking, there are two ways to interpret the divergent results of the standard and abbreviated review. The charitable interpretation is that the change in format is, in fact, beneficial, inasmuch as it eliminates prior reputation as one source of bias and forces reviewers to focus on the big picture rather than on small methodological details. Of course, as Prof-Like Substance points out in an excellent post, one could mount a pretty reasonable argument that this isn’t necessarily a good thing. After all, a scientist’s past publication record is likely to be a good predictor of their future success, so it’s not clear that proposals should be anonymous when large amounts of money are on the line (and there are other ways to counteract the bias against newbies–e.g., NIH’s approach of explicitly giving New Investigators a payline boost until they get their first R01). And similarly, some scientists might be good at coming up with big ideas that sound plausible at first blush and not so good at actually carrying out the research program required to bring those big ideas to fruition. Still, at the very least, if we’re being charitable, The Big Pitch certainly does seem like a very different kind of approach to review.

The less charitable interpretation is that the reason the ratings of the standard and abbreviated proposals showed very little correlation is that the latter approach is just fundamentally unreliable. If you suppose that it’s just not possible to reliably distinguish a very good proposal from a somewhat good one on the basis of just 2 pages, it makes perfect sense that 2-page and 15-page proposal ratings don’t correlate much–since you’re basically selecting at random in the 2-page case. Understandably, researchers who happen to fare well under the 2-page format are unlikely to see it that way; they’ll probably come up with many plausible-sounding reasons why a shorter format just makes more sense (just like most researchers who tend to do well with the 15-page format probably think it’s the only sensible way for NSF to conduct its business). We humans are all very good at finding self-serving rationalizations for things, after all.

Personally I don’t have very strong feelings about the substantive merits of short versus long-format review–though I guess I do find it hard to believe that 2-page proposals could be ranked very reliably given that some very strange things seem to happen with alarming frequency even with 12- and 15-page proposals. But it’s an empirical question, and I’d love to see relevant data. In principle, the NSF could have obtained that data by having two parallel review panels rate all of the 2-page proposals (or even 4 panels, since one would also like to know how reliable the normal review process is). That would allow the agency to directly quantify the reliability of the ratings by looking at their cross-panel consistency. Absent that kind of data, it’s very hard to know whether the results Science reports on are different because 2-page review emphasizes different (but important) things, or because a rating process based on an extended 2-page abstract just amounts to a glorified lottery.

Alternatively, and perhaps more pragmatically, NSF could just wait a few years to see how the projects funded under the pilot program turn out (and I’m guessing this is part of their plan). I.e., do the researchers who do well under the 2-page format end producing science as good as (or better than) the researchers who do well under the current system? This sounds like a reasonable approach in principle, but the major problem is that we’re only talking about a total of ~25 funded proposals (across two different review panels), so it’s unclear that there will be enough data to draw any firm conclusions. Certainly many scientists (including me) are likely to feel a bit uneasy at the thought that NSF might end up making major decisions about how to allocate billions of dollars on the basis of two dozen grants.

Anyway, skepticism aside, this isn’t really meant as a criticism of NSF so much as an acknowledgment of the fact that the problem in question is a really, really difficult one. The task of continually evaluating and improving the grant review process is not one anyone should want to take on lightly. If time and money were no object, every proposed change (like dramatically shortened proposals) would be extensively tested on a large scale and directly compared to the current approach before being implemented. Unfortunately, flying thousands of scientists to Washington D.C. is a very expensive business (to say nothing of all the surrounding costs), and I imagine that testing out a substantively different kind of review process on a large scale could easily run into the tens of millions of dollars. In a sense, the funding agencies can’t really win. On the one hand, if they only ever pilot new approaches on a small scale, they never get enough empirical data to confidently back major changes in policy. On the other hand, if they pilot new approaches on a large scale and those approaches end up failing to improve on the current system (as is the fate of most innovative new ideas), the funding agencies get hammered by politicians and scientists alike for wasting taxpayer money in an already-harsh funding climate.

I don’t know what the solution is (or if there is one), but if nothing else, I do think it’s a good thing that NSF and NIH continue to actively tinker with their various processes. After all, if there’s anything most researchers can agree on, it’s that the current system is very far from perfect.

aftermath of the NYT / Lindstrom debacle

Over the last few days the commotion over Martin Lindstrom’s terrible New York Times iPhone loving Op-Ed, which I wrote about in my last post, seems to have spread far and wide. Highlights include excellent posts by David Dobbs and the Neurocritic, but really there are too many to list at this point. And the verdict is overwhelmingly negative; I don’t think I’ve seen a single post in defense of Lindstrom, which is probably not a good sign (for him).

In the meantime, Russ Poldrack and over 40 other neuroscientists and psychologists (including me) wrote a letter to the NYT complaining about the Lindstrom Op-Ed, which the NYT has now published. As per usual, they edited down the letter till it almost disappeared. But the original, along with a list of signees, is on Russ’s blog.

Anyway, the fact that the Times published the rebuttal letter is all well and good, but as I mentioned in my last post, the bigger problem is that since the Times doesn’t include links to related content on their articles, people who stumble across the Op-Ed aren’t going to have any way of knowing that it’s been roundly discredited by pretty much the entire web. Lindstrom’s piece was the most emailed article on the Times website for a day or two, but only a tiny fraction of those readers will ever see (or even hear about) the critical response. As far as I know, the NYT hasn’t issued an explanation or apology for publishing the Op-Ed; they’ve simply published the letter and gone on about their business (I guess I can’t fault them for this–if they had to issue a formal apology for every mistake that gets published, they’d have no time for anything else; the trick is really to catch this type of screw-up at the front end). Adding links from each article to related content wouldn’t solve the problem entirely, of course, but it would be something. The fact that Times’ platform currently doesn’t have this capacity is kind of perplexing.

The other point worth mentioning is that, in the aftermath of the tsunami of criticism he received, Lindstrom left a comment on several blogs (Russ Poldrack and David Dobbs were lucky recipients; sadly, I wasn’t on the guest list). Here’s the full text of the comment:

My first foray into neuro-marketing research was for my New York Times bestseller Buyology: Truth and Lies about Why We Buy. For that book I teamed up with Neurosense, a leading independent neuro-marketing company that specializes in consumer research using functional magnetic resonance imaging (fMRI) headed by Oxford University trained Gemma Calvert, BSc DPhil CPsychol FRSA and Neuro-Insight, a market research company that uses unique brain-imaging technology, called Steady-State Topography (SST), to measure how the brain responds to communications which is lead by Dr. Richard Silberstein, PhD. This was the single largest neuro-marketing study ever conducted—25x larger than any such study to date and cost more than seven million dollars to run.

In the three-year effort scientists scanned the brains of over 2,000 people from all over the world as they were exposed to various marketing and advertising strategies including clever product placements, sneaky subliminal messages, iconic brand logos, shocking health and safety warnings, and provocative product packages. The purpose of all of this was to understand, quite successfully I may add, the key drivers behind why we make the purchasing decisions that we do.

For the research that my recent Op-Ed column in the New York Times was based on I turned to Dr. David Hubbard, a board-certified neurologist and his company MindSign Neuro Marketing, an independently owned fMRI neuro-marketing company. I asked Dr. Hubbard and his team a simple question, “Are we addicted to our iPhones?“ After analyzing the brains of 8 men and 8 women between the ages of 18-25 using fMRI technology, MindSign answered my question using standardized answering methods and completely reproducible results. The conclusion was that we are not addicted to our iPhones, we are in love with them.

The thought provoking dialogue that has been generated from the article has been overwhelmingly positive and I look forward to the continued comments from professionals in the field, readers and fans.

Respectfully,

Martin Lindstrom

As evasive responses go, this is a masterpiece; at no point does Lindstrom ever actually address any of the substantive criticisms leveled at him. He spends most of his response name dropping (the list of credentials is almost long enough to make you forget that the rebuttal letter to his Op-Ed was signed by over 40 PhDs) and rambling about previous unrelated neuromarketing work (which may as well not exist, since none of it has ever been made public), and then closes by shifting the responsibility for the study to MindSign, the company he paid to run the iPhone study. The claim that MindSign “answered [his] question using standardized answering methods and completely reproducible results” is particularly ludicrous; as I explained in my last post, there currently aren’t any standardized methods for reading addiction or love off of brain images. And ‘completely reproducible results’ implies that one has, you know, successfully reproduced the results, which is simply false unless Lindstrom is suggesting that MindSign did the same experiment twice. It’s hard to see any “thought provoking dialogue” taking place here, and the neuroimaging community’s response to the Op-Ed column has been, virtually without exception, overwhelmingly negative, not positive (as Lindstrom claims).

That all said, I do think there’s one very positive aspect to this entire saga, and that’s the amazing speed and effectiveness of the response from scientists, science journalists, and other scientifically literate folks. Ten years ago, Lindstrom’s piece might have gone completely unchallenged–and even if someone like Russ Poldrack had written a response, it would probably have appeared much later, been signed by fewer scientists (because coordination would have been much more difficult), and received much less attention. But with 48 hours of Lindstrom’s Op-Ed being published, dozens of critical blog posts had appeared, and hundreds, if not thousands, of people all over the world had tweeted or posted links to these critiques (my last post alone received over 12,000 hits). Scientific discourse, which used to be confined largely to peer-reviewed print journals and annual conferences, now takes place at a remarkable pace online, and it’s fantastic to see social media used in this way. The hope is that as these technologies develop further and scientists take on a more active role in communicating with the public (something that platforms like Twitter and Google+ seem to be facilitating amazingly well), it’ll become increasingly difficult for people like Lindstrom to make crazy pseudoscientific claims without being immediately and visibly called out on it–even in those rare cases when the NYT makes the mistake of leaving one the biggest microphones on earth open and unmonitored.

the New York Times blows it big time on brain imaging

The New York Times has a terrible, terrible Op-Ed piece today by Martin Lindstrom (who I’m not going to link to, because I don’t want to throw any more bones his way). If you believe Lindstrom, you don’t just like your iPhone a lot; you love it. Literally. And the reason you love it, shockingly, is your brain:

Earlier this year, I carried out an fMRI experiment to find out whether iPhones were really, truly addictive, no less so than alcohol, cocaine, shopping or video games. In conjunction with the San Diego-based firm MindSign Neuromarketing, I enlisted eight men and eight women between the ages of 18 and 25. Our 16 subjects were exposed separately to audio and to video of a ringing and vibrating iPhone.

But most striking of all was the flurry of activation in the insular cortex of the brain, which is associated with feelings of love and compassion. The subjects’ brains responded to the sound of their phones as they would respond to the presence or proximity of a girlfriend, boyfriend or family member.

In short, the subjects didn’t demonstrate the classic brain-based signs of addiction. Instead, they loved their iPhones.

There’s so much wrong with just these three short paragraphs (to say nothing of the rest of the article, which features plenty of other whoppers) that it’s hard to know where to begin. But let’s try. Take first the central premise–that an fMRI experiment could help determine whether iPhones are no less addictive than alcohol or cocaine. The tacit assumption here is that all the behavioral evidence you could muster–say, from people’s reports about how they use their iPhones, or clinicians’ observations about how iPhones affect their users–isn’t sufficient to make that determination; to “really, truly” know if something’s addictive, you need to look at what the brain is doing when people think about their iPhones. This idea is absurd inasmuch as addiction is defined on the basis of its behavioral consequences, not (right now, anyway) by the presence or absence of some biomarker. What makes someone an alcoholic is the fact that they’re dependent on alcohol, have trouble going without it, find that their alcohol use interferes with multiple aspects of their day-to-day life, and generally suffer functional impairment because of it–not the fact that their brain lights up when they look at pictures of Johnny Walker red. If someone couldn’t stop drinking–to the point where they lost their job, family, and friends–but their brain failed to display a putative biomarker for addiction, it would be strange indeed to say “well, you show all the signs, but I guess you’re not really addicted to alcohol after all.”

Now, there may come a day (and it will be a great one) when we have biomarkers sufficiently accurate that they can stand in for the much more tedious process of diagnosing someone’s addiction the conventional way. But that day is, to put it gently, a long way off. Right now, if you want to know if iPhones are addictive, the best way to do that is to, well, spend some time observing and interviewing iPhone users (and some quantitative analysis would be helpful).

Of course, it’s not clear what Lindstrom thinks an appropriate biomarker for addiction would be in any case. Presumably it would have something to do with the reward system; but what? Suppose Lindstrom had seen robust activation in the ventral striatum–a critical component of the brain’s reward system–when participants gazed upon the iPhone: what then? Would this have implied people are addicted to iPhones? But people also show striatal activity when gazing on food, money, beautiful faces, and any number of other stimuli. Does that mean the average person is addicted to all of the above? A marker of pleasure or reward, maybe (though even that’s not certain), but addiction? How could a single fMRI experiment with 16 subjects viewing pictures of iPhones confirm or disconfirm the presence of addiction? Lindstrom doesn’t say. I suppose he has good reason not to say: if he really did have access to an accurate fMRI-based biomarker for addiction, he’d be in a position to make millions (billions?) off the technology. To date, no one else has come close to identifying a clinically accurate fMRI biomarker for any kind of addiction (for more technical readers, I’m talking here about cross-validated methods that have both sensitivity and specificity comparable to traditional approaches when applied to new subjects–not individual studies that claim 90% with-sample classification accuracy based on simple regression models). So we should, to put it mildly, be very skeptical that Lindstrom’s study was ever in a position to do what he says it was designed to do.

We should also ask all sorts of salient and important questions about who the people are who are supposedly in love with their iPhones. Who’s the “You” in the “You Love Your iPhone” of the title? We don’t know, because we don’t know who the participants in Lindstrom’s sample, were, aside from the fact that they were eight men and eight women aged 18 to 25. But we’d like to know some other important things. For instance, were they selected for specific characteristics? Were they, say, already avid iPhone users? Did they report loving, or being addicted to their iPhones? If so, would it surprise us that people chosen for their close attachment to their iPhones also showed brain activity patterns typical of close attachment? (Which, incidentally, they actually don’t–but more on that below.) And if not, are we to believe that the average person pulled off the street–who probably has limited experience with iPhones–really responds to the sound of their phones “as they would respond to the presence or proximity of a girlfriend, boyfriend or family member”? Is the takeaway message of Lindstrom’s Op-Ed that iPhones are actually people, as far as our brains are concerned?

In fairness, space in the Times is limited, so maybe it’s not fair to demand this level of detail in the Op-Ed iteslf. But the bigger problem is that we have no way of evaluating Lindstrom’s claims, period, because (as far as I can tell), his study hasn’t been published or peer-reviewed anywhere. Presumably, it’s proprietary information that belongs to the neuromarketing firm in question. Which is to say, the NYT is basically giving Lindstrom license to talk freely about scientific-sounding findings that can’t actually be independently confirmed, disputed, or critiqued by members of the scientific community with expertise in the very methods Lindstrom is applying (expertise which, one might add, he himself lacks). For all we know, he could have made everything up. To be clear, I don’t really think he did make everything up–but surely, somewhere in the editorial process someone at the NYT should have stepped in and said, “hey, these are pretty strong scientific claims; is there any way we can make your results–on which your whole article hangs–available for other experts to examine?”

This brings us to what might be the biggest whopper of all, and the real driver of the article title: the claim that “most striking of all was the flurry of activation in the insular cortex of the brain, which is associated with feelings of love and compassion“. Russ Poldrack already tore this statement to shreds earlier this morning:

Insular cortex may well be associated with feelings of love and compassion, but this hardly proves that we are in love with our iPhones.  In Tal Yarkoni’s recent paper in Nature Methods, we found that the anterior insula was one of the most highly activated part of the brain, showing activation in nearly 1/3 of all imaging studies!  Further, the well-known studies of love by Helen Fisher and colleagues don’t even show activation in the insula related to love, but instead in classic reward system areas.  So far as I can tell, this particular reverse inference was simply fabricated from whole cloth.  I would have hoped that the NY Times would have learned its lesson from the last episode.

But you don’t have to take Russ’s word for it; if you surf for a few terms on our Neurosynth website, making sure to select “forward inference” under image type, you’ll notice that the insula shows up for almost everything. That’s not an accident; it’s because the insula (or at least the anterior part of the insula) plays a very broad role in goal-directed cognition. It really is activated when you’re doing almost anything that involves, say, following instructions an experimenter gave you, or attending to external stimuli, or mulling over something salient in the environment. You can see this pretty clearly in this modified figure from our Nature Methods paper (I’ve circled the right insula):

Proportion of studies reporting activation at each voxel

The insula is one of a few ‘hotspots’ where activation is reported very frequently in neuroimaging articles (the other major one being the dorsal medial frontal cortex). So, by definition, there can’t be all that much specificity to what the insula is doing, since it pops up so often. To put it differently, as Russ and others have repeatedly pointed out, the fact that a given region activates when people are in a particular psychological state (e.g., love) doesn’t give you license to conclude that that state is present just because you see activity in the region in question. If language, working memory, physical pain, anger, visual perception, motor sequencing, and memory retrieval all activate the insula, then knowing that the insula is active is of very little diagnostic value. That’s not to say that some psychological states might not be more strongly associated with insula activity (again, you can see this on Neurosynth if you switch the image type to ‘reverse inference’ and browse around); it’s just that, probabilistically speaking, the mere fact that the insula is active gives you very little basis for saying anything concrete about what people are experiencing.

In fact, to account for Lindstrom’s findings, you don’t have to appeal to love or addiction at all. There’s a much simpler way to explain why seeing or hearing an iPhone might elicit insula activation. For most people, the onset of visual or auditory stimulation is a salient event that causes redirection of attention to the stimulated channel. I’d be pretty surprised, actually, if you could present any picture or sound to participants in an fMRI scanner and not elicit robust insula activity. Orienting and sustaining attention to salient things seems to be a big part of what the anterior insula is doing (whether or not that’s ultimately its ‘core’ function). So the most appropriate conclusion to draw from the fact that viewing iPhone pictures produces increased insula activity is something vague like “people are paying more attention to iPhones”, or “iPhones are particularly salient and interesting objects to humans living in 2011.” Not something like “no, really, you love your iPhone!”

In sum, the NYT screwed up. Lindstrom appears to have a habit of making overblown claims about neuroimaging evidence, so it’s not surprising he would write this type of piece; but the NYT editorial staff is supposedly there to filter out precisely this kind of pseudoscientific advertorial. And they screwed up. It’s a particularly big screw-up given that (a) as of right now, Lindstrom’s Op-Ed is the single most emailed article on the NYT site, and (b) this incident almost perfectly recapitulates another NYT article 4 years ago in which some neuroscientists and neuromarketers wrote a grossly overblown Op-Ed claiming to be able to infer, in detail, people’s opinions about presidential candidates. That time, Russ Poldrack and a bunch of other big names in cognitive neuroscience wrote a concise rebuttal that appeared in the NYT (but unfortunately, isn’t linked to from the original Op-Ed, so anyone who stumbles across the original now has no way of knowing how ridiculous it is). One hopes the NYT follows up in similar fashion this time around. They certainly owe it to their readers–some of whom, if you believe Lindstrom, are now in danger of dumping their current partners for their iPhones.

h/t: Molly Crockett

the APS likes me!

Somehow I wound up profiled in this month’s issue of the APS Observer as a “Rising Star“. I’d like to believe this means I’m a really big deal now, but I suspect what it actually means is that someone on the nominating committee at APS has extraordinarily bad judgment. I say this in no small part because I know some of the other people who were named Rising Stars quite well (congrats to Karl Szpunar,  Jason Chan, and Alan Castel, among many other people!), so I’m pretty sure I can distinguish people who actually deserve this from, say, me.

Of course, I’m not going to look a gift horse in the mouth. And I’m certainly thrilled to be picked for this. I know these things are kind of a crapshoot, but it still feels really nice. So while the part of my brain that understands measurement error is saying “meh, luck of the draw,” that other part of my brain that likes to be told it’s awesome is in the middle of a three day coke bender right now*. The only regret both parts of the brain have is that there isn’t any money attached to the award–or even a token prize like, say, a free statistician for a year. But I don’t think I’m going to push my luck by complaining to APS about it.

One thing I like a lot about the format of the Rising Star awards is they give you a full page to talk about yourself and your research. If there’s one thing I like to talk about, it’s myself. Usually, you can’t talk about yourself for very long before people start giving you dirty looks. But in this case, it’s sanctioned, so I guess it’s okay. In any case, the kind folks at the Observer sent me a series of seven questions to answer. And being an upstanding gentleman who likes to be given fancy awards, I promptly obliged. I figured they would just run what I sent them with minor edits… but I WAS VERY WRONG. They promptly disassembled nearly all of my brilliant observations and advice and replaced them with some very tame ramblings. So if you actually bother to read my responses, and happen to fall asleep halfway through, you’ll know who to blame. But just to set the record straight, I figured I would run through each of the boilerplate questions I was asked, and show you the answer that was printed in the Observer as compared to what I actually wrote**:

What does your research focus on?

What they printed: Most of my current research focuses on what you might call psychoinformatics: the application of information technology to psychology, with the aim of advancing our ability to study the human mind and brain. I’m interested in developing new ways to acquire, synthesize, and share data in psychology and cognitive neuroscience. Some of the projects I’ve worked on include developing new ways to measure personality more efficiently, adapting computer science metrics of string similarity to visual word recognition, modeling fMRI data on extremely short timescales, and conducting large-scale automated synthesis of published neuroimaging findings. The common theme that binds these disparate projects together is the desire to develop new ways of conceptualizing and addressing psychological problems; I believe very strongly in the transformative power of good methods.

What I actually said: I don’t know! There’s so much interesting stuff to think about! I can’t choose!

What drew you to this line of research? Why is it exciting to you?

What they printed: Technology enriches and improves our lives in every domain, and science is no exception. In the biomedical sciences in particular, many revolutionary discoveries would have been impossible without substantial advances in information technology. Entire subfields of research in molecular biology and genetics are now synonymous with bioinformatics, and neuroscience is currently also experiencing something of a neuroinformatics revolution. The same trend is only just beginning to emerge in psychology, but we’re already able to do amazing things that would have been unthinkable 10 or 20 years ago. For instance, we can now collect data from thousands of people all over the world online, sample people’s inner thoughts and feelings in real time via their phones, harness enormous datasets released by governments and corporations to study everything from how people navigate their spatial world to how they interact with their friends, and use high-performance computing platforms to solve previously intractable problems through large-scale simulation. Over the next few years, I think we’re going to see transformative changes in the way we study the human mind and brain, and I find that a tremendously exciting thing to be involved in.

What I actually said: I like psychology a lot, and I like technology a lot. Why not combine them!

Who were/are your mentors or psychological influences?

What they printed: I’ve been fortunate to have outstanding teachers and mentors at every stage of my training. I actually started my academic career quite disinterested in science and owe my career trajectory in no small part to two stellar philosophy professors (Rob Stainton and Chris Viger) who convinced me as an undergraduate that engaging with empirical data was a surprisingly good way to discover how the world really works. I can’t possibly do justice to all the valuable lessons my graduate and postdoctoral mentors have taught me, so let me just pick a few out of a hat. Among many other things, Todd Braver taught me how to talk through problems collaboratively and keep recursively questioning the answers to problems until a clear understanding materializes. Randy Larsen taught me that patience really is a virtue, despite my frequent misgivings. Tor Wager has taught me to think more programmatically about my research and to challenge myself to learn new skills. All of these people are living proof that you can be an ambitious, hard-working, and productive scientist and still be extraordinarily kind and generous with your time. I don’t think I embody those qualities myself right now, but at least I know what to shoot for.

What I actually said: Richard Feynman, Richard Hamming, and my mother. Not necessarily in that order.

To what do you attribute your success in the science?

What they printed: Mostly to blind luck. So far I’ve managed to stumble from one great research and mentoring situation to another. I’ve been fortunate to have exceptional advisors who’ve provided me with the perfect balance of freedom and guidance and amazing colleagues and friends who’ve been happy to help me out with ideas and resources whenever I’m completely out of my depth — which is most of the time.

To the extent that I can take personal credit for anything, I think I’ve been good about pursuing ideas I’m passionate about and believe in, even when they seem unlikely to pay off at first. I’m also a big proponent of exploratory research; I think pure exploration is tremendously undervalued in psychology. Many of my projects have developed serendipitously, as a result of asking, “What happens if we try doing it this way?”

What I actually said: Mostly to blind luck.

What’s your future research agenda?

What they printed: I’d like to develop technology-based research platforms that improve psychologists’ ability to answer existing questions while simultaneously opening up entirely new avenues of research. That includes things like developing ways to collect large amounts of data more efficiently, tracking research participants over time, automatically synthesizing the results of published studies, building online data repositories and collaboration tools, and more. I know that all sounds incredibly vague, and if you have some ideas about how to go about any of it, I’d love to collaborate! And by collaborate, I mean that I’ll brew the coffee and you’ll do the work.

What I actually said: Trading coffee for publications?

Any advice for even younger psychological scientists? What would you tell someone just now entering graduate school or getting their PhD?

What they printed: The responsible thing would probably be to say “Don’t go to graduate school.” But if it’s too late for that, I’d recommend finding brilliant mentors and colleagues and serving them coffee exactly the way they like it. Failing that, find projects you’re passionate about, work with people you enjoy being around, develop good technical skills, and don’t be afraid to try out crazy ideas. Leave your office door open, and talk to everyone you can about the research they’re doing, even if it doesn’t seem immediately relevant. Good ideas can come from anywhere and often do.

What I actually said: “Don’t go to graduate school.”

What publication you are most proud of or feel has been most important to your career?

What they printed: Yarkoni, T., Poldrack, R. A., Nichols, T. E., Van Essen, D. C., & Wager, T. D. (2011). Large-scale automated synthesis of human functional neuroimaging data. Manuscript submitted for publication.

In this paper, we introduce a highly automated platform for synthesizing data from thousands of published functional neuroimaging studies. We used a combination of text mining, meta-analysis, and machine learning to automatically generate maps of brain activity for hundreds of different psychological concepts, and we showed that these results could be used to “decode” cognitive states from brain activity in individual human subjects in a relatively open-ended way. I’m very proud of this work, and I’m quite glad that my co-authors agreed to make me first author in return for getting their coffee just right. Unfortunately, the paper isn’t published yet, so you’ll just have to take my word for it that it’s really neat stuff. And if you’re thinking, “Isn’t it awfully convenient that his best paper is unpublished?”… why, yes. Yes it is.

What I actually said: …actually, that’s almost exactly what I said. Except they inserted that bit about trading coffee for co-authorship. Really all I had to do was ask my co-authors nicely.

Anyway, like I said, it’s really nice to be honored in this way, even if I don’t really deserve it (and that’s not false modesty–I’m generally the first to tell other people when I think I’ve done something awesome). But I’m a firm believer in regression to the mean, so I suspect the run of good luck won’t last. In a few years, when I’ve done almost no new original work, failed to land a tenure-track job, and dropped out of academia to ride horses around the racetrack***, you can tell people that you knew me back when I was a Rising Star. Right before you tell them you don’t know what the hell happened.

———————————-

* But not really.

** Totally lying. Pretty much every word is as I wrote it. And the Observer staff were great.

*** Hopefully none of these things will happen. Except the jockey thing; that would be awesome.

trouble with biomarkers and press releases

The latest issue of the Journal of Neuroscience contains an interesting article by Ecker et al in which the authors attempted to classify people with autism spectrum disorder (ASD) and health controls based on their brain anatomy, and report achieving “a sensitivity and specificity of up to 90% and 80%, respectively.” Before unpacking what that means, and why you probably shouldn’t get too excited (about the clinical implications, at any rate; the science is pretty cool), here’s a snippet from the decidedly optimistic press release that accompanied the study:

“Scientists funded by the Medical Research Council (MRC) have developed a pioneering new method of diagnosing autism in adults. For the first time, a quick brain scan that takes just 15 minutes can identify adults with autism with over 90% accuracy. The method could lead to the screening for autism spectrum disorders in children in the future.”

If you think this sounds too good to be true, that’s because it is. Carl Heneghan explains why in an excellent article in the Guardian:

How the brain scans results are portrayed is one of the simplest mistakes in interpreting diagnostic test accuracy to make. What has happened is, the sensitivity has been taken to be the positive predictive value, which is what you want to know: if I have a positive test do I have the disease? Not, if I have the disease, do I have a positive test? It would help if the results included a measure called the likelihood ratio (LR), which is the likelihood that a given test result would be expected in a patient with the target disorder compared to the likelihood that the same result would be expected in a patient without that disorder. In this case the LR is 4.5. We’ve put up an article if you want to know more on how to calculate the LR.

In the general population the prevalence of autism is 1 in 100; the actual chances of having the disease are 4.5 times more likely given a positive test. This gives a positive predictive value of 4.5%; about 5 in every 100 with a positive test would have autism.

For those still feeling confused and not convinced, let’s think of 10,000 children. Of these 100 (1%) will have autism, 90 of these 100 would have a positive test, 10 are missed as they have a negative test: there’s the 90% reported accuracy by the media.

But what about the 9,900 who don’t have the disease? 7,920 of these will test negative (the specificity3 in the Ecker paper is 80%). But, the real worry though, is the numbers without the disease who test positive. This will be substantial: 1,980 of the 9,900 without the disease. This is what happens at very low prevalences, the numbers falsely misdiagnosed rockets. Alarmingly, of the 2,070 with a positive test, only 90 will have the disease, which is roughly 4.5%.

In other words, if you screened everyone in the population for autism, and assume the best about the classifier reported in the JNeuro article (e.g., that the sample of 20 ASD participants they used is perfectly representative of the broader ASD population, which seems unlikely), only about 1 in 20 people who receive a positive diagnosis would actually deserve one.

Ecker et al object to this characterization, and reply to Heneghan in the comments (through the MRC PR office):

Our test was never designed to screen the entire population of the UK. This is simply not practical in terms of costs and effort, and besides totally  unjustified- why would we screen everybody in the UK for autism if there is no evidence whatsoever that an individual is affected?. The same case applies to other diagnostic tests. Not every single individual in the UK is tested for HIV. Clearly this would be too costly and unnecessary. However, in the group of individuals that are test for the virus, we can be very confident that if the test is positive that means a patient is infected. The same goes for our approach.

Essentially, the argument is that, since people would presumably be sent for an MRI scan because they were already under consideration for an ASD diagnosis, and not at random, the false positive rate would in fact be much lower than 95%, and closer to the 20% reported in the article.

One response to this reply–which is in fact Heneghan’s response in the comments–is to point out that the pre-test probability of ASD would need to be pretty high already in order for the classifier to add much. For instance, even if fully 30% of people who were sent for a scan actually had ASD, the posterior probability of ASD given a positive result would still be only 66% (Heneghan’s numbers, which I haven’t checked). Heneghan nicely contrasts these results with the standard for HIV testing, which “reports sensitivity of 99.7% and specificity of 98.5% for enzyme immunoassay.” Clearly, we have a long way to go before doctors can order MRI-based tests for ASD and feel reasonably confident that a positive result is sufficient grounds for an ASD diagnosis.

Setting Heneghan’s concerns about base rates aside, there’s a more general issue that he doesn’t touch on. It’s one that’s not specific to this particular study, and applies to nearly all studies that attempt to develop “biomarkers” for existing disorders. The problem is that the sensitivity and specificity values that people report for their new diagnostic procedure in these types of studies generally aren’t the true parameters of the procedure. Rather, they’re the sensitivity and specificity under the assumption that the diagnostic procedures used to classify patients and controls in the first place are themselves correct. In other words, in order to believe the results, you have to assume that the researchers correctly classified the subjects into patient and control groups using other procedures. In cases where the gold standard test used to make the initial classification is known to have near 100% sensitivity and specificity (e.g., for the aforementioned HIV tests), one can reasonably ignore this concern. But when we’re talking about mental health disorders, where diagnoses are fuzzy and borderline cases abound, it’s very likely that the “gold standard” isn’t really all that great to begin with.

Concretely,  studies that attempt to develop biomarkers for mental health disorders face two substantial problems. One is that it’s extremely unlikely that the clinical diagnoses are ever perfect; after all, if they were perfect, there’d be little point in trying to develop other diagnostic procedures! In this particular case, the authors selected subjects into the ASD group based on standard clinical instruments and structured interviews. I don’t know that there are many clinicians who’d claim with a straight face that the current diagnostic criteria for ASD (and there are multiple sets to choose from!) are perfect. From my limited knowledge, the criteria for ASD seem to be even more controversial than those for most other mental health disorders (which is saying something, if you’ve been following the ongoing DSM-V saga). So really, the accuracy of the classifier in the present study, even if you put the best face on it and ignore the base rate issue Heneghan brings up, is undoubtedly south of the 90% sensitivity / 80% specificity the authors report. How much south, we just don’t know, because we don’t really have any independent, objective way to determine who “really” should get an ASD diagnosis and who shouldn’t (assuming you think it makes sense to make that kind of dichotomous distinction at all). But 90% accuracy is probably a pipe dream, if for no other reason than it’s hard to imagine that level of consensus about autism spectrum diagnoses.

The second problem is that, because the researchers are using the MRI-based classifier to predict the clinician-based diagnosis, it simply isn’t possible for the former to exceed the accuracy of the latter. That bears repeating, because it’s important: no matter how good the MRI-based classifier is, it can only be as good as the procedures used to make the original diagnosis, and no better. It cannot, by definition, make diagnoses that are any more accurate than the clinicians who screened the participants in the authors’ ASD sample. So when you see the press release say this:

For the first time, a quick brain scan that takes just 15 minutes can identify adults with autism with over 90% accuracy.

You should really read it as this:

The method relies on structural (MRI) brain scans and has an accuracy rate approaching that of conventional clinical diagnosis.

That’s not quite as exciting, obviously, but it’s more accurate.

To be fair, there’s something of a catch-22 here, in that the authors didn’t really have a choice about whether or not to diagnose the ASD group using conventional criteria. If they hadn’t, reviewers and other researchers would have complained that we can’t tell if the ASD group is really an ASD group, because they authors used non-standard criteria. Under the circumstances, they did the only thing they could do. But that doesn’t change the fact that it’s misleading to intimate, as the press release does, that the new procedure might be any better than the old ones. It can’t be, by definition.

Ultimately, if we want to develop brain-based diagnostic tools that are more accurate than conventional clinical diagnoses, we’re going to need to show that these tools are capable of predicting meaningful outcomes that clinician diagnoses can’t. This isn’t an impossible task, but it’s a very difficult one. One approach you could take, for instance, would be to compare the ability of clinician diagnosis and MRI-based diagnosis to predict functional outcomes among subjects at a later point in time. If you could show that MRI-based classification of subjects at an early age was a stronger predictor of receiving an ASD diagnosis later in life than conventional criteria, that would make a really strong case for using the former approach in the real world. Short of that type of demonstration though, the only reason I can imagine wanting to use a procedure that was developed by trying to duplicate the results of an existing procedure is in the event that the new procedure is substantially cheaper or more efficient than the old one. Meaning, it would be reasonable enough to say “well, look, we don’t do quite as well with this approach as we do with a full clinical evaluation, but at least this new approach costs much less.” Unfortunately, that’s not really true in this case, since the price of even a short MRI scan is generally going to outweigh that of a comprehensive evaluation by a psychiatrist or psychotherapist. And while it could theoretically be much faster to get an MRI scan than an appointment with a mental health professional, I suspect that that’s not generally going to be true in practice either.

Having said all that, I hasten to note that all this is really a critique of the MRC press release and subsequently lousy science reporting, and not of the science itself. I actually think the science itself is very cool (but the Neuroskeptic just wrote a great rundown of the methods and results, so there’s not much point in me describing them here). People have been doing really interesting work with pattern-based classifiers for several years now in the neuroimaging literature, but relatively few studies have applied this kind of technique to try and discriminate between different groups of individuals in a clinical setting. While I’m not really optimistic that the technique the authors introduce in this paper is going to change the way diagnosis happens any time soon (or at least, I’d argue that it shouldn’t), there’s no question that the general approach will be an important piece of future efforts to improve clinical diagnoses by integrating biological data with existing approaches. But that’s not going to happen overnight, and in the meantime, I think it’s pretty irresponsible of the MRC to be issuing press releases claiming that its researchers can diagnose autism in adults with 90% accuracy.

ResearchBlogging.orgEcker C, Marquand A, Mourão-Miranda J, Johnston P, Daly EM, Brammer MJ, Maltezos S, Murphy CM, Robertson D, Williams SC, & Murphy DG (2010). Describing the brain in autism in five dimensions–magnetic resonance imaging-assisted diagnosis of autism spectrum disorder using a multiparameter classification approach. The Journal of neuroscience : the official journal of the Society for Neuroscience, 30 (32), 10612-23 PMID: 20702694

elsewhere on the net, vacation edition

I’m hanging out in Boston for a few days, so blogging will probably be sporadic or nonexistent. Which is to say, you probably won’t notice any difference.

The last post on the Dunning-Kruger effect somehow managed to rack up 10,000 hits in 48 hours; but that was last week. Today I looked at my stats again, and the blog is back to a more normal 300 hits, so I feel like it’s safe to blog again. Here are some neat (and totally unrelated) links from the past week:

  • OKCupid has another one of those nifty posts showing off all the cool things they can learn from their gigantic userbase (who else gets to say things like “this analysis includes 1.51 million users’ data”???). Apparently, tall people (claim to) have more sex, attractive photos are more likely to be out of date, and most people who claim to be bisexual aren’t really bisexual.
  • After a few months off, my department-mate Chris Chatham is posting furiously again over at Developing Intelligence, with a series of excellent posts reviewing recent work on cognitive control and the perils of fMRI research. I’m not really sure what Chris spent his blogging break doing, but given the frequency with which he’s been posting lately, my suspicion is that he spent it secretly writing blog posts.
  • Mark Liberman points out a fundamental inconsistency in the way we view attributions of authorship: we get appropriately angry at academics who pass someone else’s work off as their own, but think it’s just fine for politicians to pay speechwriters to write for them. It’s an interesting question, and leads to an intimately related, and even more important question–namely, will anyone get mad at me if I pay someone else to write a blog post for me about someone else’s blog post discussing people getting angry at people paying or not paying other people to write material for other people that they do or don’t own the copyright on?
  • I like oohing and aahing over large datasets, and the Guardian’s Data Blog provides a nice interface to some of the most ooh- and aah-able datasets out there. [via R-Chart]
  • Ed Yong has a characteristically excellent write-up about recent work on the magnetic vision of birds. Yong also does link dump posts better than anyone else, so you should probably stop reading this one right now and read his instead.
  • You’ve probably heard about this already, but some time last week, the brain trust at ScienceBlogs made the amazingly clever decision to throw away their integrity by selling PepsiCo its very own “science” blog. Predictably, a lot of the bloggers weren’t happy with the decision, and many have now moved onto greener pastures; Carl Zimmer’s keeping score. Personally, I don’t have anything intelligent to add to everything that’s already been said; I’m literally dumbfounded.
  • Andrew Gelman takes apart an obnoxious letter from pollster John Zogby to Nate Silver of fivethirtyeight.com. I guess now we know that Zogby didn’t get where he is by not being an ass to other people.
  • Vaughan Bell of Mind Hacks points out that neuroplasticity isn’t a new concept, and was discussed seriously in the literature as far back as the 1800s. Apparently our collective views about the malleability of mind are not, themselves, very plastic.
  • NPR ran a three-part story by Barbara Bradley Hagerty on the emerging and somewhat uneasy relationship between neuroscience and the law. The articles are pretty good, but much better, in my opinion, was the Talk of the Nation episode that featured Hagerty as a guest alongside Joshua Greene, Kent Kiehl, and Stephen Morse–people who’ve all contributed in various ways to the emerging discipline of NeuroLaw. It’s a really interesting set of interviews and discussions. For what it’s worth, I think I agree with just about everything Greene has to say about these issues–except that he says things much more eloquently than I think them.
  • Okay, this one’s totally frivolous, but does anyone want to buy me one of these things? I don’t even like dried food; I just think it would be fun to stick random things in there and watch them come out pale, dried husks of their former selves. Is it morbid to enjoy watching the life slowly being sucked out of apples and mushrooms?

and the runner up is…

This one’s a bit of a head-scratcher. Thomson-Reuters just released its 2009 Journal Citation Report–essentially a comprehensive ranking of scientific journals by their impact factor (IF). The odd part, as reported by Bob Grant in The Scientist, is that the journal with the second-highest IF is Acta Crystallographica – Section A–ahead of heavyweights like the New England Journal of Medicine. For perspective, the same journal had an IF of 2.051 in 2008. The reason for the jump?

A single article published in a 2008 issue of the journal seems to be responsible for the meteoric rise in the Acta Crystallographica – Section A‘s impact factor. “A short history of SHELX,” by University of Göttingen crystallographer George Sheldrick, which reviewed the development of the computer system SHELX, has been cited more than 6,600 times, according to ISI. This paper includes a sentence that essentially instructs readers to cite the paper they’re reading — “This paper could serve as a general literature citation when one or more of the open-source SHELX programs (and the Bruker AXS version SHELXTL) are employed in the course of a crystal-structure determination.” (Note: This may be a good way to boost your citations.)

Setting aside the good career advice (and yes, I’ve made a mental note to include the phrase “this paper could serve as a general literature citation…” in my next paper), it’s perplexing that Thomson-Reuters didn’t downweight Acta Crystallographica‘s IF considerably given the obvious outlier. There’s no question they would have noticed that the second-ranked journal was only there in virtue of one article, so I’m curious what the thought process was. Perhaps the deliberation went something like this:

Thomson-Reuters statistician A: We need to take it out! We can’t have a journal with an impact factor of 2 last year beat out the NEJM!

Thomson-Reuters statistician B: But if we take it out, it’ll look like we tampered with the IF!

TRS-A: But we already tamper with the IF! No one knows how we come up with these numbers! Sometimes we can’t even replicate our own results ourselves! And anyway, it’s really not a big deal if we just leave the article in; scientists know better than to think Acta Crystallographica is the second most influential science journal on the planet. They’ll figure it out.

TRS-B: But that’s like asking them to just disregard our numbers! If you’re supposed to ignore the impact factor in cases where it contradicts your perception of journal quality, what’s the point of having an impact factor at all?

TRS-A: Beats me.

So okay, I’m sure it didn’t go down quite like that. But it’s still pretty weird.
And now, having bitched about how arbitrary the IF is, I’m going to go off and spend the next 15 minutes perusing the psychology and neuroscience journal rankings…

elsewhere on the net

Some neat links from the past few weeks:

  • You Are No So Smart: A celebration of self-delusion. An excellent blog by journalist David McCraney that deconstructs common myths about the way the mind works.
  • NPR has a great story by Jon Hamilton about the famous saga of Einstein’s brain and what it’s helped teach us about brain function. [via Carl Zimmer]
  • The Neuroskeptic has a characteristically excellent 1,000 word explanation of how fMRI works.
  • David Rock has an interesting post on some recent work from Baumeister’s group purportedly showing that it’s good to believe in free will (whether or not it exists). My own feeling about this is that Baumeister’s not really studying people’s philosophical views about free will, but rather a construct closely related to self-efficacy and locus of control. But it’s certainly an interesting line of research.
  • The Prodigal Academic is a great new blog about all things academic. I’ve found it particularly interesting since several of the posts so far have been about job searches and job-seeking–something I’ll be experiencing my fill of over the next few months.
  • Prof-like Substance has a great 5-part series (1, 2, 3, 4, 5) on how blogging helps him as an academic. My own (much less eloquent) thoughts on that are here.
  • Cameron Neylon makes a nice case for the development of social webs for data mining.
  • Speaking of data mining, Michael Driscoll of Dataspora has an interesting pair of posts extolling the virtues of Big Data.
  • And just to balance things out, there’s this article in the New York Times by John Allen Paulos that offers some cautionary words about the challenges of using empirical data to support policy decisions.
  • On a totally science-less note, some nifty drawings (or is that photos?) by Ben Heine (via Crooked Brains):

fMRI, not coming to a courtroom near you so soon after all

That’s a terribly constructed title, I know, but bear with me. A couple of weeks ago I blogged about a courtroom case in Tennessee where the defense was trying to introduce fMRI to the courtroom as a way of proving the defendant’s innocence (his brain, apparently, showed no signs of guilt). The judge’s verdict is now in, and…. fMRI is out. In United States v. Lorne Semrau, Judge Pham recommended that the government’s motion to exclude fMRI scans from consideration be granted. That’s the outcome I think most respectable cognitive neuroscientists were hoping for; as many people associated with the case or interviewed about it have noted (and as the judge recognized), there just isn’t a shred of evidence to suggest that fMRI has any utility as a lie detector in real-world situations.

The judge’s decision, which you can download in PDF form here (hat-tip: Thomas Nadelhoffer), is really quite elegant, and worth reading (or at least skimming through). He even manages some subtle snark in places. For instance (my italics):

Regarding the existence and maintenance of standards, Dr. Laken testified as to the protocols and controlling standards that he uses for his own exams. Because the use of fMRI-based lie detection is still in its early stages of development, standards controlling the real-life application have not yet been established. Without such standards, a court cannot adequately evaluate the reliability of a particular lie detection examination. Cordoba, 194 F.3d at 1061. Assuming, arguendo, that the standards testified to by Dr. Laken could satisfy Daubert, it appears that Dr. Laken violated his own protocols when he re-scanned Dr. Semrau on the AIMS tests SIQs, after Dr. Semrau was found “deceptive“ on the first AIMS tests scan. None of the studies cited by Dr. Laken involved the subject taking a second exam after being found to have been deceptive on the first exam. His decision to conduct a third test begs the question whether a fourth scan would have revealed Dr. Semrau to be deceptive again.

The absence of real-life error rates, lack of controlling standards in the industry for real-life exams, and Dr. Laken’s apparent deviation from his own protocols are negative factors in the analysis of whether fMRI-based lie detection is scientifically valid. See Bonds, 12 F.3d at 560.

The reference here is to the fact that Laken and his company scanned Semrau (the defendant) on three separate occasions. The first two scans were planned ahead of time, but the third apparently wasn’t:

From the first scan, which included SIQs relating to defrauding the government, the results showed that Dr. Semrau was “not deceptive.“ However, from the second scan, which included SIQs relating to AIMS tests, the results showed that Dr. Semrau was “being deceptive.“ According to Dr. Laken, “testing indicates that a positive test result in a person purporting to tell the truth is accurate only 6% of the time.“ Dr. Laken also believed that the second scan may have been affected by Dr. Semrau’s fatigue. Based on his findings on the second test, Dr. Laken suggested that Dr. Semrau be administered another fMRI test on the AIMS tests topic, but this time with shorter questions and conducted later in the day to reduce the effects of fatigue. … The third scan was conducted on January 12, 2010 at around 7:00 p.m., and according to Dr. Laken, Dr. Semrau tolerated it well and did not express any fatigue. Dr. Laken reviewed this data on January 18, 2010, and concluded that Dr. Semrau was not deceptive. He further stated that based on his prior studies, “a finding such as this is 100% accurate in determining truthfulness from a truthful person.“

I may very well be misunderstanding something here (and so might the judge), but if the positive predictive value of the test is only 6%, I’m guessing that the probability that the test is seriously miscalibrated is somewhat higher than 6%. Especially since the base rate for lying among people who are accused of committing serious fraud is probably reasonably high (this matters, because when base rates are very low, low positive predictive values are not unexpected). But then, no one really knows how to calibrate these tests properly, because the data you’d need to do that simply don’t exist. Serious validation of fMRI as a tool for lie detection would require assembling a large set of brain scans from defendants accused of various crimes (real crimes, not simulated ones) and using that data to predict whether those defendants were ultimately found guilty or not. There really isn’t any substitute for doing a serious study of that sort, but as far as I know, no one’s done it yet. Fortunately, the few judges who’ve had to rule on the courtroom use of fMRI seem to recognize that.

Regarding the existence and maintenance of standards, Dr. Laken testified as to the protocols and controlling standards that he uses for his own exams. Because the use of fMRI-based lie detection is still in its early stages of development, standards controlling the real-life application have not yet been established. Without such standards, a court cannot adequately evaluate the reliability of a particular lie detection examination. Cordoba, 194 F.3d at 1061. Assuming, arguendo, that the standards testified to by Dr. Laken could satisfy Daubert, it appears that Dr. Laken violated his own protocols when he re-scanned Dr. Semrau on the AIMS tests SIQs, after Dr. Semrau was found “deceptive“ on the first AIMS tests scan. None of the studies cited by Dr. Laken involved the subject taking a second exam after being found to have been deceptive on the first exam. His decision to conduct a third test begs the question whether a fourth scan would have revealed Dr. Semrau to be deceptive again.
The absence of real-life error rates, lack of controlling standards in the industry for real-life exams, and Dr. Laken’s apparent deviation from his own protocols are negative factors in the analysis of whether fMRI-based lie detection is scientifically valid. See Bonds, 12 F.3d at 560

elsewhere on the net

I’ve been swamped with work lately, so blogging has taken a backseat. I keep a text file on my desktop of interesting things I’d like to blog about; normally, about three-quarters of the links I paste into it go unblogged, but in the last couple of weeks it’s more like 98%. So here are some things I’ve found interesting recently, in no particular order:

It’s World Water Day 2010! Or at least it was a week ago, which is when I should have linked to these really moving photos.

Carl Zimmer has a typically brilliant (and beautifully illustrated) article in the New York Times about “Unseen Beasts, Then and Now“:

Somewhere in England, about 600 years ago, an artist sat down and tried to paint an elephant. There was just one problem: he had never seen one.

John Horgan writes a surprisingly bad guest blog post for Scientific American in which he basically accuses neuroscientists (not a neuroscientist or some neuroscientists, but all of us, collectively) of selling out by working with the US military. I’m guessing that the number of working neuroscientists who’ve ever received any sort of military funding is somewhere south of 10%, and is probably much smaller than the corresponding proportion in any number of other scientific disciplines, but why let data get in the way of a good anecdote or two. [via Peter Reiner]

Mark Liberman follows up his first critique of Louann Brizendine’s new “book” The Male Brain with second one, now that he’s actually got his hands on a copy. Verdict: the book is still terrible. Mark was also kind enough to answer my question about what the mysterious “sexual pursuit area” is. Apparently it’s the medial preoptic area. And the claim that this area governs sexual behavior in humans and is 2.5 times larger in males is, once again, based entirely on work in the rat.

Commuting sucks. Jonah Lehrer discusses evidence from happiness studies (by way of David Brooks) suggesting that most people would be much happier living in a smaller house close to work than a larger house that requires a lengthy commute:

According to the calculations of Frey and Stutzer, a person with a one-hour commute has to earn 40 percent more money to be as satisfied with life as someone who walks to the office.

I’ve taken these findings to heart, and whenever my wife and I move now, we prioritize location over space. We’re currently paying through the nose to live in a 750 square foot apartment near downtown Boulder. It’s about half the size of our old place in St. Louis, but it’s close to everything, including our work, and we love living here.

The modern human brain is much bigger than it used to be, but we didn’t get that way overnight. John Hawks disputes Colin Blakemore’s claim that “the human brain got bigger by accident and not through evolution“.

Sanjay Srivastava leans (or maybe used to lean) toward the permissive side; Andrew Gelman is skeptical. Attitudes toward causal modeling of correlational (and even some experimental) data differ widely. There’s been a flurry of recent work suggesting that causal modeling techniques like mediation analysis and SEM suffer from a number of serious and underappreciated problems, and after reading this paper by Bullock, Green and Ha, I guess I incline to agree.

A landmark ruling by a New York judge yesterday has the potential to invalidate existing patents on genes, which currently cover about 20% of the human genome in some form. Daniel MacArthur has an excellent summary.