repost: narrative tips from a grad school applicant

Since it’s grad school application season for undergraduates, I thought I’d repost some narrative tips about how to go about writing a personal statement for graduate programs in psychology. This is an old, old post from a long-deceased blog; it’s from way back in 2002 when I was applying to grad school. It’s kind of a serious piece; if I were to rewrite it today, the tone would be substantially lighter. I can’t guarantee that following these tips will get you into grad school, but I can promise that you’ll be amazed at the results.

The first draft of my personal statement was an effortful attempt to succinctly sum up my motivation for attending graduate school. I wanted to make my rationale for applying absolutely clear, so I slaved over the statement for three or four days, stopping only for the occasional bite of food and hour or two of sleep every night. I was pretty pleased with the result. For a first draft, I thought it showed great promise. Here’s how it started:

I want to go to,o grajit skool cuz my frend steve is in grajit and he says its ez and im good at ez stuff

When I showed this to my advisor he said, “I don’t know if humor is the way to go for this thing.“

I said, “What do you mean, humor?“

After that I took a three month break from writing my personal statement while I completed a grade 12 English equivalency exam and read a few of the classics to build up my vocabulary. My advisor said that even clever people like me needed help sometimes. I read Ulysses, The Odyssey, and a few other Greek sounding books, and a book called The Cat in the Hat which was by the same author as the others, but published posthumously. Satisfied that I was able to write a letter that would impress every graduate admissions committee in the world, I set about writing a second version of my personal statement. Here’s how that went:

Dear Dirty Admissions Committee,
Solemn I came forward and mounted the round gunrest. I faced about and blessed gravely thrice the Ivory Tower, the surrounding country, and all the Profs. Then catching sight of the fMRI machine, I bent towards it and made rapid crosses in the air, gurgling in my throat and shaking my head.

“Too literary,“ said my advisor when I showed him.

“Mud,“ I said, and went back to the drawing board.

The third effort was much better. I had weaned myself off the classics and resolved to write a personal statement that fully expressed what a unique human being I was and why I would be an asset to the program. I talked about how I could juggle three bean bags and almost four, I was working on four, and how I’d stopped biting my fingernails last year so I had lots of free time to do psychology now. To show that I was good at following through on things that I started, I said,

p.s. when I can juggle four bean bags ( any day now) I will write you to let you know so you can update your file.

Satisfied that I had written the final copy of my statement, I showed it to my advisor. He was wild-eyed about it.

“You just don’t get it, do you,“ he said, ripping my statement in two and throwing it into the wastepaper basket. “Tell you what. Why don’t I write a statement for you. And then you can go through it and make small changes to personalize it. Ok?“

“Sure,“ I said. So the next day my advisor gave me a two-page personal statement he had written for me. Now I won’t bore you with all of the details, but I have to say, it was pretty bad. Here’s how it started:

After studying psychology for nearly four years at the undergraduate level, I have decided to pursue a Ph.D. in the field. I have developed a keen interest in [list your areas of interest here] and believe [university name here] will offer me outstanding opportunities.

“Now go make minor changes,“ said my advisor.

“Mud,“ I said, and went to make minor changes.

I came back with the final version a week later. It was truly a masterpiece; co-operating with my advisor had really helped. At first I had been skeptical because what he wrote was so bad the way he gave it to me, but with a judicious sprinkling of helpful clarifications, it turned into something really good. It was sort of like an ugly cocoon (his draft) bursting into a beautiful rainbow (my version). It went like this:

After studying psychology (and juggling!) for nearly four years at the undergraduate level (of university), I have decided to pursue a Ph.D. in the field. Cause I need it to become a Prof. I have developed a keen interest in [list your areas of interest here Vision, Language, Memory, Brain] and believe [university name hereStanford Princeton Mishigan] will offer me outstanding opportunities in psychology and for the juggling society.

“Brilliant,“ said my advisor when I showed it to him. “You’ve truly outdone yourself.“

“Mud,“ I said, and went to print six more copies.

what the arsenic effect means for scientific publishing

I don’t know very much about DNA (and by ‘not very much’ I sadly mean ‘next to nothing’), so when someone tells me that life as we know it generally doesn’t use arsenic to make DNA, and that it’s a big deal to find a bacterium that does, I’m willing to believe them. So too, apparently, are at least two or three reviewers for Science, which published a paper last week by a NASA group purporting to demonstrate exactly that.

Turns out the paper might have a few holes. In the last few days, the blogosphere has reached fever delirium pitch as critiques of the article have emerged from every corner; it seems like pretty much everyone with some knowledge of the science in question is unhappy about the paper. Since I’m not in any position to critique the article myself, I’ll take Carl Zimmer’s word for it in Slate yesterday:

Was this merely a case of a few isolated cranks? To find out, I reached out to a dozen experts on Monday. Almost unanimously, they think the NASA scientists have failed to make their case.  “It would be really cool if such a bug existed,” said San Diego State University’s Forest Rohwer, a microbiologist who looks for new species of bacteria and viruses in coral reefs. But, he added, “none of the arguments are very convincing on their own.” That was about as positive as the critics could get. “This paper should not have been published,” said Shelley Copley of the University of Colorado.

Zimmer then follows his Slate piece up with a blog post today in which he provides 13 experts’ unadulterated comments. While there are one or two (somewhat) positive reviews, the consensus clearly seems to be that the Science paper is (very) bad science.

Of course, scientists (yes, even Science reviewers) do occasionally make mistakes, so if we’re being charitable about it, we might chalk it up to human error (though some of the critiques suggest that these are elementary problems that could have been very easily addressed, so it’s possible there’s some disingenuousness involved). But what many bloggers (1, 2, 3, etc.) have found particularly inexcusable is the way NASA and the research team have handled the criticism. Zimmer again, in Slate:

I asked two of the authors of the study if they wanted to respond to the criticism of their paper. Both politely declined by email.

“We cannot indiscriminately wade into a media forum for debate at this time,” declared senior author Ronald Oremland of the U.S. Geological Survey. “If we are wrong, then other scientists should be motivated to reproduce our findings. If we are right (and I am strongly convinced that we are) our competitors will agree and help to advance our understanding of this phenomenon. I am eager for them to do so.”

“Any discourse will have to be peer-reviewed in the same manner as our paper was, and go through a vetting process so that all discussion is properly moderated,” wrote Felisa Wolfe-Simon of the NASA Astrobiology Institute. “The items you are presenting do not represent the proper way to engage in a scientific discourse and we will not respond in this manner.”

A NASA spokesperson basically reiterated this point of view, indicating that NASA scientists weren’t going to respond to criticism of their work unless that criticism appeared in, you know, a respectable, peer-reviewed outlet. (Fortunately, at least one of the critics already has a draft letter to Science up on her blog.)

I don’t think it’s surprising that people who spend much of their free time blogging about science, and think it’s important to discuss scientific issues in a public venue, generally aren’t going to like being told that science blogging isn’t a legitimate form of scientific discourse. Especially considering that the critics here aren’t laypeople without scientific training; they’re well-respected scientists with areas of expertise that are directly relevant to the paper. In this case, dismissing trenchant criticism because it’s on the web rather than in a peer-reviewed journal seems kind of like telling someone who’s screaming at you that your house is on fire that you’re not going to listen to them until they adopt a more polite tone. It just seems counterproductive.

That said, I personally don’t think we should take the NASA team’s statements at face value. I very much doubt that what the NASA researchers are saying really reflect any deep philosophical view about the role of blogs in scientific discourse; it’s much more likely that they’re simply trying to buy some time while they figure out how to respond. On the face of it, they have a choice between two lousy options: either ignore the criticism entirely, which would be antithetical to the scientific process and would look very bad, or address it head-on–which, judging by the vociferousness and near-unanimity of the commentators, is probably going to be a losing battle. Shifting the terms of the debate by insisting on responding only in a peer-reviewed venue doesn’t really change anything, but it does buy the authors two or three weeks. And two or three weeks is worth like, forty attentional cycles in the blogosphere.

Mind you, I’m not saying we should sympathize with the NASA researchers just because they’re in a tough position. I think one of the main reasons the story’s attracted so much attention is precisely because people see it as a case of justice being served. The NASA team called a major press conference ahead of the paper’s publication, published its results in one of the world’s most prestigious science journals, and yet apparently failed to run relatively basic experimental controls in support of its conclusions. If the critics are to be believed, the NASA researchers are either disingenuous or incompetent; either way, we shouldn’t feel sorry for them.

What I do think this episode shows is that the rules of scientific publishing have fundamentally changed in the last few years–and largely for the better. I haven’t been doing science for very long, but even in the halcyon days of 2003, when I started graduate school, science blogging was practically nonexistent, and the main way you’d find out what other people thought about an influential new paper was by talking to people you knew at conferences (which could take several months) or waiting for critiques or replication failures to emerge in other peer-reviewed journals (which could take years). That kind of delay between publication and evaluation is disastrous for science, because in the time it takes for a consensus to emerge that a paper is no good, several research teams might have already started trying to replicate and extend the reported findings, and several dozen other researchers might have uncritically cited their paper peripherally in their own work. This delay is probably why, as John Ioannidis’ work so elegantly demonstrates, major studies published in high-impact journals tend to exert a disproportionate influence on the literature long after they’ve been resoundingly discredited.

The Arsenic Effect, if we can call it that, provides a nice illustration of the impact of new media on scientific communication. It’s a safe bet that there are now very few people who do anything even vaguely related to the NASA team’s research who haven’t been made aware that the reported findings are controversial. Which means that the process of attempting to replicate (or falsify) the findings will proceed much more quickly than it might have ten or twenty years ago, and there probably won’t be very many people who cite the Science paper as compelling evidence of terrestrial arsenic-based life. Perhaps more importantly, as researchers get used to the idea that their high-profile work is going to be instantly evaluated by thousands of pairs of highly trained eyes, any of which might be attached to a highly prolific pair of typing hands, there will be an increasingly strong disincentive to avoid being careless. That isn’t to say that bad science will disappear, of course; just that, in cases where the badness reflects a pressure to tell a good story at all costs, we’ll probably see less of it.

just a quick note to say…

…I’m not dead, just applying for jobs and trying to get some papers out of the door. Regular posting (meaning, weekly, as opposed to monthly) will resume soon! Until then, go read something interesting. Like this, or this, or this, or this, or this, or this.

Okay, none of those ‘this’es actually link to anything. I was going to do a quick link dump, and then realized that in my current fatigued state, digging up six interesting links would be an epic undertaking. Like, elephant-lifting epic. So no links today. But I promise I’ll make up for it when I’m less tired. We’ll go to Disney World, eat ice cream, and gossip about what all the other science bloggers are wearing. There will be blood posts!

PLoS ONE needs new subjects

I like the PLoS journals, including PLoS ONE, a lot. But it drives me a little bit crazy that the list of PLoS ONE subjects includes things like Non-Clinical Health, Nutrition, and Science Policy, while perfectly respectable subjects like Psychology, Economics, and Political Science are nowhere to be found (note: I’m not saying there’s anything wrong with Nutrition, just that there’s also nothing wrong with Psychology).

I can sort of understand the rationale; PLoS ONE is supposed to be a science journal, and I imagine the editors feel that if they opened up the door to the aforementioned categories, some of the submissions they’d start receiving would have tenuous or nonexistent relationships to anything that you could call science. But in practice, PLoS ONE already does take articles in all of those subjects–and many others. And what then happens, no doubt, is that the editorial board has epic battles over which of the 40-odd existing subjects is going to become the proud beneficiary of a completely unrelated article.

I imagine it goes down something like this:

Editor A: Look, “Patriarchal principles of pop music in a post-Jacksonian era” is clearly an Epidemiology article. It’s going under Public Health and Epidemiology.

Editor B: Don’t be a fool. There isn’t a single word in the paper about health or disease. You’d know that if you’d bothered to read it. It obviously belongs under Mental Health.

Editor A: Absolutely not. Infectious Diseases, Pediatrics and Child Health, or Anesthesiology and Pain Management. Pick one. Final offer.

Editor B: No. But I’ll tell you what. Send it back to the authors, ask them to add a section on the influence of barbiturates and opiates on modern composition, and then we’ll stick it under Pharmacology.

Editor A: Deal.

Lest you think I’m making shit up exaggerating, witness exhibit A: a paper published today by Araújo et al entitled “Tactical Voting in Plurality Elections”. To be fair, I don’t know anything about tactics, voting, plurality, or elections, so I can’t tell you if the paper is any good or not. It looks interesting, but I don’t understand much more than the abstract.

What I can tell you though with something approaching certainty is that the paper has absolutely nothing to do with Neuroscience–which is one of the categories it’s filed under (the other is Physics, which it also seems to bear no relation to, save for the fact that the authors are physicists). It doesn’t mention the words ‘brain’, ‘neuro-‘, ‘neural’, or ‘neuron’ anywhere in the text, which is pretty much a necessary condition for a neuroscience article in my book. The only conceivable link I can think of is that it’s a paper about voting, and voting is done by people, and people have brains. But that’s not very compelling. Really, it should go under Political Science, or Economics, or Applied Statistics, or even a catch-all category like Social Sciences. Except that none of those exist.

Pretty please, PLoS ONE, can we get a Social Sciences section?

the naming of things

Let’s suppose you were charged with the important task of naming all the various subdisciplines of neuroscience that have anything to do with the field of research we now know as psychology. You might come up with some or all of the following terms, in no particular order:

  • Neuropsychology
  • Biological psychology
  • Neurology
  • Cognitive neuroscience
  • Cognitive science
  • Systems neuroscience
  • Behavioral neuroscience
  • Psychiatry

That’s just a partial list; you’re resourceful, so there are probably others (biopsychology? psychobiology? psychoneuroimmunology?). But it’s a good start. Now suppose you decided to make a game out of it, and threw a dinner party where each guest received a copy of your list (discipline names only–no descriptions!) and had to guess what they thought people in that field study. If your nomenclature made any sense at all, and tried to respect the meanings of the individual words used to generate the compound words or phrases in your list, your guests might hazard something like the following guesses:

  • Neuropsychology: “That’s the intersection of neuroscience and psychology. Meaning, the study of the neural mechanisms underlying cognitive function.”
  • Biological psychology: “Similar to neuropsychology, but probably broader. Like, it includes the role of genes and hormones and kidneys in cognitive function.”
  • Neurology: “The pure study of the brain, without worrying about all of that associated psychological stuff.”
  • Cognitive neuroscience: “Well if it doesn’t mean the same thing as neuropsychology and biological psychology, then it probably refers to the branch of neuroscience that deals with how we think and reason. Kind of like cognitive psychology, only with brains!”
  • Cognitive science: “Like cognitive neuroscience, but not just for brains. It’s the study of human cognition in general.”
  • Systems neuroscience: “Mmm… I don’t really know. The study of how the brain functions as a whole system?”
  • Behavioral neuroscience: “Easy: it’s the study of the relationship between brain and behavior. For example, how we voluntarily generate actions.”
  • Psychiatry: “That’s the branch of medicine that concerns itself with handing out multicolored pills that do funny things to your thoughts and feelings. Of course.”

If this list seems sort of sensible to you, you probably live in a wonderful world where compound words mean what you intuitively think they mean, the subject matter of scientific disciplines can be transparently discerned, and everyone eats ice cream for dinner every night terms that sound extremely similar have extremely similar referents rather than referring to completely different fields of study. Unfortunately, that world is not the world we happen to actually inhabit. In our world, most of the disciplines at the intersection of psychology and neuroscience have funny names that reflect accidents of history, and tell you very little about what the people in that field actually study.

Here’s the list your guests might hand back in this world, if you ever made the terrible, terrible mistake of inviting a bunch of working scientists to dinner:

  • Neuropsychology: The study of how brain damage affects cognition and behavior. Most often focusing on the effects of brain lesions in humans, and typically relying primarily on behavioral evaluations (i.e., no large magnetic devices that take photographs of the space inside people’s skulls). People who call themselves neuropsychologists are overwhelmingly trained as clinical psychologists, and many of them work in big white buildings with a red cross on the front. Note that this isn’t the definition of neuropsychology that Wikipedia gives you; Wikipedia seems to think that neuropsychology is “the basic scientific discipline that studies the structure and function of the brain related to specific psychological processes and overt behaviors.” Nice try, Wikipedia, but that’s much too general. You didn’t even use the words ‘brain damage’, ‘lesion’, or ‘patient’ in the first sentence.
  • Biological psychology: To be perfectly honest, I’m going to have to step out of dinner-guest character for a moment and admit I don’t really have a clue what biological psychologists study. I can’t remember the last time I heard someone refer to themselves as a biological psychologist. To an approximation, I think biological psychology differs from, say, cognitive neuroscience in placing greater emphasis on everything outside of higher cognitive processes (sensory systems, autonomic processes, the four F’s, etc.). But that’s just idle speculation based largely on skimming through the chapter names of my old “Biological Psychology” textbook. What I can definitively confidently comfortably tentatively recklessly assert is that you really don’t want to trust the Wikipedia definition here, because when you type ‘biological psychology‘ into that little box that says ‘search’ on Wikipedia, it redirects you to the behavioral neuroscience entry. And that can’t be right, because, as we’ll see in a moment, behavioral neuroscience refers to something very different…
  • Neurology: Hey, look! A wikipedia entry that doesn’t lie to our face! It says neurology is “a medical specialty dealing with disorders of the nervous system. Specifically, it deals with the diagnosis and treatment of all categories of disease involving the central, peripheral, and autonomic nervous systems, including their coverings, blood vessels, and all effector tissue, such as muscle.” That’s a definition I can get behind, and I think 9 out of 10 dinner guests would probably agree (the tenth is probably drunk). But then, I’m not (that kind of) doctor, so who knows.
  • Cognitive neuroscience: In principle, cognitive neuroscience actually means more or less what it sounds like it means. It’s the study of the neural mechanisms underlying cognitive function. In practice, it all goes to hell in a handbasket when you consider that you can prefix ‘cognitive neuroscience’ with pretty much any adjective you like and end up with a valid subdiscipline. Developmental cognitive neuroscience? Check. Computational cognitive neuroscience? Check. Industrial/organizational cognitive neuroscience? Amazingly, no; until just now, that phrase did not exist on the internet. But by the time you read this, Google will probably have a record of this post, which is really all it takes to legitimate I/OCN as a valid field of inquiry. It’s just that easy to create a new scientific discipline, so be very afraid–things are only going to get messier.
  • Cognitive science: A field that, by most accounts, lives up to its name. Well, kind of. Cognitive science sounds like a blanket term for pretty much everything that has to do with cognition, and it sort of is. You have psychology and linguistics and neuroscience and philosophy and artificial intelligence all represented. I’ve never been to the annual CogSci conference, but I hear it’s a veritable orgy of interdisciplinary activity. Still, I think there’s a definite bias towards some fields at the expense of others. Neuroscientists (of any stripe), for instance, rarely call themselves cognitive scientists. Conversely, philosophers of mind or language love to call themselves cognitive scientists, and the jerk cynic in me says it’s because it means they get to call themselves scientists. Also, in terms of content and coverage, there seems to be a definite emphasis among self-professed cognitive scientists on computational and mathematical modeling, and not so much emphasis on developing neuroscience-based models (though neural network models are popular). Still, if you’re scoring terms based on clarity of usage, cognitive science should score at least an 8.5 / 10.
  • Systems neuroscience: The study of neural circuits and the dynamics of information flow in the central nervous system (note: I stole part of that definition from MIT’s BCS website, because MIT people are SMART). Systems neuroscience doesn’t overlap much with psychology; you can’t defensibly argue that the temporal dynamics of neuronal assemblies in sensory cortex have anything to do with human cognition, right? I just threw this in to make things even more confusing.
  • Behavioral neuroscience: This one’s really great, because it has almost nothing to do with what you think it does. Well, okay, it does have something to do with behavior. But it’s almost exclusively animal behavior. People who refer to themselves as behavioral neuroscientists are generally in the business of poking rats in the brain with very small, sharp, glass objects; they typically don’t care much for human beings (professionally, that is). I guess that kind of makes sense when you consider that you can have rats swim and jump and eat and run while electrodes are implanted in their heads, whereas most of the time when we study human brains, they’re sitting motionless in (a) a giant magnet, (b) a chair, or (c) a jar full of formaldehyde. So maybe you could make an argument that since humans don’t get to BEHAVE very much in our studies, people who study humans can’t call themselves behavioral neuroscientists. But that would be a very bad argument to make, and many of the people who work in the so-called “behavioral sciences” and do nothing but study human behavior would probably be waiting to thump you in the hall the next time they saw you.
  • Psychiatry: The branch of medicine that concerns itself with handing out multicolored pills that do funny things to your thoughts and feelings. Of course.

Anyway, the basic point of all this long-winded nonsense is just that, for all that stuff we tell undergraduates about how science is such a wonderful way to achieve clarity about the way the world works, scientists–or at least, neuroscientists and psychologists–tend to carve up their disciplines in pretty insensible ways. That doesn’t mean we’re dumb, of course; to the people who work in a field, the clarity (or lack thereof) of the terminology makes little difference, because you only need to acquire it once (usually in your first nine years of grad school), and after that you always know what people are talking about. Come to think of it, I’m pretty sure the whole point of learning big words is that once you’ve successfully learned them, you can stop thinking deeply about what they actually mean.

It is kind of annoying, though, to have to explain to undergraduates that, DUH, the class they really want to take given their interests is OBVIOUSLY cognitive neuroscience and NOT neuropsychology or biological psychology. I mean, can’t they read? Or to pedantically point out to someone you just met at a party that saying “the neurological mechanisms of such-and-such” makes them sound hopelessly unsophisticated, and what they should really be saying is “the neural mechanisms,” or “the neurobiological mechanisms”, or (for bonus points) “the neurophysiological substrates”. Or, you know, to try (unsuccessfully) to convince your mother on the phone that even though it’s true that you study the relationship between brains and behavior, the field you work in has very little to do with behavioral neuroscience, and so you really aren’t an expert on that new study reported in that article she just read in the paper the other day about that interesting thing that’s relevant to all that stuff we all do all the time.

The point is, the world would be a slightly better place if cognitive science, neuropsychology, and behavioral neuroscience all meant what they seem like they should mean. But only very slightly better.

Anyway, aside from my burning need to complain about trivial things, I bring these ugly terminological matters up partly out of idle curiosity. And what I’m idly curious about is this: does this kind of confusion feature prominently in other disciplines too, or is psychology-slash-neuroscience just, you know, “special”? My intuition is that it’s the latter; subdiscipline names in other areas just seem so sensible to me whenever I hear them. For instance, I’m fairly confident that organic chemists study the chemistry of Orgas, and I assume condensed matter physicists spend their days modeling the dynamics of teapots. Right? Yes? No? Perhaps my  millions thousands hundreds dozens three regular readers can enlighten me in the comments…

does functional specialization exist in the language system?

One of the central questions in cognitive neuroscience–according to some people, at least–is how selective different chunks of cortex are for specific cognitive functions. The paradigmatic examples of functional selectivity are pretty much all located in sensory cortical regions or adjacent association cortices. For instance, the fusiform face area (FFA), is so named because it (allegedly) responds selectively to faces but not to other stimuli. Other regions with varying selectivity profiles are similarly named: the visual word form area (VWFA), parahippocampal place area (PPA), extrastriate body area (EBA), and so on.

In a recent review paper, Fedorenko and Kanwisher (2009) sought to apply insights from the study of functionally selective visual regions to the study of language. They posed the following question with respect to the neuroimaging of language in the title of their paper: Why hasn’t a clearer picture emerged? And they gave the following answer: it’s because brains differ from one another, stupid.

Admittedly, I’m paraphrasing; they don’t use exactly those words. But the basic point they make is that it’s difficult to identify functionally selective regions when you’re averaging over a bunch of very different brains. And the solution they propose–again, imported from the study of visual areas–is to identify potentially selective language regions-of-interest (ROIs) on a subject-specific basis rather than relying on group-level analyses.

The Fedorenko and Kanwisher paper apparently didn’t please Greg Hickok of Talking Brains, who’s done a lot of very elegant work on the neurobiology of language.  A summary of Hickok’s take:

What I found a bit on the irritating side though was the extremely dim and distressingly myopic view of progress in the field of the neural basis of language.

He objects to Fedorenko and Kanwisher on several grounds, and the post is well worth reading. But since I’m very lazy tired, I’ll just summarize his points as follows:

  • There’s more functional specialization in the language system than F&K give the field credit for
  • The use of subject-specific analyses in the domain of language isn’t new, and many researchers (including Hickok) have used procedures similar to those F&K recommend in the past
  • Functional selectivity is not necessarily a criterion we should care about all that much anyway

As you might expect, F&K disagree with Hickok on these points, and Hickok was kind enough to post their response. He then responded to their response in the comments (which are also worth reading), which in turn spawned a back-and-forth with F&K, a cameo by Brad Buchsbaum (who posted his own excellent thoughts on the matter here), and eventually, an intervention by a team of professional arbitrators. Okay, I made that last bit up; it was a very civil disagreement, and is exactly what scientific debates on the internet should look like, in my opinion.

Anyway, rather than revisit the entire thread, which you can read for yourself, I’ll just summarize my thoughts:

  • On the whole, I think my view lines up pretty closely with Hickok’s and Buchsbaum’s. Although I’m very far from an expert on the neurobiology of language (is there a word in English for someone’s who’s the diametric opposite of an expert–i.e., someone who consistently and confidently asserts exactly the wrong thing? Cause that’s what I am), I agree with Hickok’s argument that the temporal poles show a response profile that looks suspiciously like sentence- or narrative-specific processing (I have a paper on the neural mechanisms of narrative comprehension that supports that claim to some extent), and think F&K’s review of the literature is probably not as balanced as it could have been.
  • More generally, I agree with Hickok that demonstrating functional specialization isn’t necessarily that important to the study of language (or most other domains). This seems to be a major point of contention for F&K, but I don’t think they make a very strong case for their view. They suggest that they “are not sure what other goals (besides understanding a region’s computations) could drive studies aimed at understanding how functionally specialized a region is,” which I think is reasonable, but affirms the consequent. Hickok isn’t saying there’s no reason to search for functional specialization in the F&K sense; as I read him, he’s simply saying that you can study the nature of neural computation in lots of interesting ways that don’t require you to demonstrate functional specialization to the degree F&K seem to require. Seems hard to disagree with that.
  • Buchsbaum points out that it’s questionable whether there are any brain regions that meet the criteria F&K set out for functional specialization–namely that “A brain region R is specialized for cognitive function x if this region (i) is engaged in tasks that rely on cognitive function x, and (ii) is not engaged in tasks that do not rely on cognitive function x.Buchsbaum and Hickok both point out that the two examples F&K give of putatively specialized regions (the FFA and the temporo-parietal junction, which some people believe is selectively involved in theory of mind) are hardly uncontroversial. Plenty of people have argued that the FFA isn’t really selective to faces, and even more people have argued that the TPJ isn’t selective to theory of mind. As far as I can tell, F&K don’t really address this issue in the comments. They do refer to a recent paper of Kanwisher’s that discusses the evidence for functional specificity in the FFA, but I’m not sure the argument made in that paper is itself uncontroversial, and in any case, Kanwisher does concede that there’s good evidence for at least some representation of non-preferred stimuli (i.e., non-faces in the FFA). In any case, the central question here is whether or not F&K really unequivocally believe that FFA and TPJ aren’t engaged by any tasks that don’t involve face or theory of mind processing. If not, then it’s unfair to demand or expect the same of regions implicated in language.
  • Although I think there’s a good deal to be said for subject-specific analyses, I’m not as sanguine as F&K that a subject-specific approach offers a remedy to the problems that they perceive afflict the study of the neural mechanisms of language. While there’s no denying that group analyses suffer from a number of limitations, subject-specific analyses have their own weaknesses, which F&K don’t really mention in their paper. One is that such analyses typically require the assumption that two clusters located in slightly different places for different subjects must be carrying out the same cognitive operations if they respond similarly to a localizer task. That’s a very strong assumption for which there’s very little evidence (at least in the language domain)–especially because the localizer task F&K promote in this paper involves a rather strong manipulation that may confound several different aspects of language processing.
    Another problem is that it’s not at all obvious how you determine which regions are the “same” (in their 2010 paper, F&K argue for an algorithmic parcellation approach, but the fact that you get sensible-looking results is no guarantee that your parcellation actually reflects meaningful functional divisions in individual subjects). And yet another is that serious statistical problems can arise in cases where one or more subjects fail to show activation in a putative region (which is generally the norm rather than the exception). Say you have 25 subjects in your sample, and 7 don’t show activation anywhere in a region that can broadly be called Broca’s area. What do you do? You can’t just throw those subjects out of the analysis, because that would grossly and misleadingly inflate your effect sizes. Conversely, you can’t just identify any old region that does activate and lump it in with the regions identified in all the other subjects. This is a very serious problem, but it’s one that group analyses, for all their weaknesses, don’t have to contend with.

Disagreements aside, I think it’s really great to see serious scientific discussion taking place in this type of forum. In principle, this is the kind of debate that should be resolved (or not) in the peer-reviewed literature; in practice, peer review is slow, writing full-blown articles takes time, and journal space is limited. So I think blogs have a really important role to play in scientific communication, and frankly, I envy Hickok and Poeppel for the excellent discussion they consistently manage to stimulate over at Talking Brains!

the Bactrian camel and prefrontal cortex: evidence from somatosensory function

I’ve been swamped with work lately, and don’t expect to see the light at the end of the tunnel for a few more weeks, so there won’t be any serious blogging here for the foreseeable future. But on a completely frivolous note, someone reminded me the other day of a cognitive neuroscience paper title generator I wrote a few years ago and had forgotten about. So I brushed it off and added a small amount of new content, and now it’s alive again here. I think it’s good for a few moments of entertainment, and occasionally produces a rare gem–like the one in the title of this post, or my all-time favorite, Neural correlates of nicotine withdrawal in infants.

Feel free to post any other winners in the comments…

trouble with biomarkers and press releases

The latest issue of the Journal of Neuroscience contains an interesting article by Ecker et al in which the authors attempted to classify people with autism spectrum disorder (ASD) and health controls based on their brain anatomy, and report achieving “a sensitivity and specificity of up to 90% and 80%, respectively.” Before unpacking what that means, and why you probably shouldn’t get too excited (about the clinical implications, at any rate; the science is pretty cool), here’s a snippet from the decidedly optimistic press release that accompanied the study:

“Scientists funded by the Medical Research Council (MRC) have developed a pioneering new method of diagnosing autism in adults. For the first time, a quick brain scan that takes just 15 minutes can identify adults with autism with over 90% accuracy. The method could lead to the screening for autism spectrum disorders in children in the future.”

If you think this sounds too good to be true, that’s because it is. Carl Heneghan explains why in an excellent article in the Guardian:

How the brain scans results are portrayed is one of the simplest mistakes in interpreting diagnostic test accuracy to make. What has happened is, the sensitivity has been taken to be the positive predictive value, which is what you want to know: if I have a positive test do I have the disease? Not, if I have the disease, do I have a positive test? It would help if the results included a measure called the likelihood ratio (LR), which is the likelihood that a given test result would be expected in a patient with the target disorder compared to the likelihood that the same result would be expected in a patient without that disorder. In this case the LR is 4.5. We’ve put up an article if you want to know more on how to calculate the LR.

In the general population the prevalence of autism is 1 in 100; the actual chances of having the disease are 4.5 times more likely given a positive test. This gives a positive predictive value of 4.5%; about 5 in every 100 with a positive test would have autism.

For those still feeling confused and not convinced, let’s think of 10,000 children. Of these 100 (1%) will have autism, 90 of these 100 would have a positive test, 10 are missed as they have a negative test: there’s the 90% reported accuracy by the media.

But what about the 9,900 who don’t have the disease? 7,920 of these will test negative (the specificity3 in the Ecker paper is 80%). But, the real worry though, is the numbers without the disease who test positive. This will be substantial: 1,980 of the 9,900 without the disease. This is what happens at very low prevalences, the numbers falsely misdiagnosed rockets. Alarmingly, of the 2,070 with a positive test, only 90 will have the disease, which is roughly 4.5%.

In other words, if you screened everyone in the population for autism, and assume the best about the classifier reported in the JNeuro article (e.g., that the sample of 20 ASD participants they used is perfectly representative of the broader ASD population, which seems unlikely), only about 1 in 20 people who receive a positive diagnosis would actually deserve one.

Ecker et al object to this characterization, and reply to Heneghan in the comments (through the MRC PR office):

Our test was never designed to screen the entire population of the UK. This is simply not practical in terms of costs and effort, and besides totally  unjustified- why would we screen everybody in the UK for autism if there is no evidence whatsoever that an individual is affected?. The same case applies to other diagnostic tests. Not every single individual in the UK is tested for HIV. Clearly this would be too costly and unnecessary. However, in the group of individuals that are test for the virus, we can be very confident that if the test is positive that means a patient is infected. The same goes for our approach.

Essentially, the argument is that, since people would presumably be sent for an MRI scan because they were already under consideration for an ASD diagnosis, and not at random, the false positive rate would in fact be much lower than 95%, and closer to the 20% reported in the article.

One response to this reply–which is in fact Heneghan’s response in the comments–is to point out that the pre-test probability of ASD would need to be pretty high already in order for the classifier to add much. For instance, even if fully 30% of people who were sent for a scan actually had ASD, the posterior probability of ASD given a positive result would still be only 66% (Heneghan’s numbers, which I haven’t checked). Heneghan nicely contrasts these results with the standard for HIV testing, which “reports sensitivity of 99.7% and specificity of 98.5% for enzyme immunoassay.” Clearly, we have a long way to go before doctors can order MRI-based tests for ASD and feel reasonably confident that a positive result is sufficient grounds for an ASD diagnosis.

Setting Heneghan’s concerns about base rates aside, there’s a more general issue that he doesn’t touch on. It’s one that’s not specific to this particular study, and applies to nearly all studies that attempt to develop “biomarkers” for existing disorders. The problem is that the sensitivity and specificity values that people report for their new diagnostic procedure in these types of studies generally aren’t the true parameters of the procedure. Rather, they’re the sensitivity and specificity under the assumption that the diagnostic procedures used to classify patients and controls in the first place are themselves correct. In other words, in order to believe the results, you have to assume that the researchers correctly classified the subjects into patient and control groups using other procedures. In cases where the gold standard test used to make the initial classification is known to have near 100% sensitivity and specificity (e.g., for the aforementioned HIV tests), one can reasonably ignore this concern. But when we’re talking about mental health disorders, where diagnoses are fuzzy and borderline cases abound, it’s very likely that the “gold standard” isn’t really all that great to begin with.

Concretely,  studies that attempt to develop biomarkers for mental health disorders face two substantial problems. One is that it’s extremely unlikely that the clinical diagnoses are ever perfect; after all, if they were perfect, there’d be little point in trying to develop other diagnostic procedures! In this particular case, the authors selected subjects into the ASD group based on standard clinical instruments and structured interviews. I don’t know that there are many clinicians who’d claim with a straight face that the current diagnostic criteria for ASD (and there are multiple sets to choose from!) are perfect. From my limited knowledge, the criteria for ASD seem to be even more controversial than those for most other mental health disorders (which is saying something, if you’ve been following the ongoing DSM-V saga). So really, the accuracy of the classifier in the present study, even if you put the best face on it and ignore the base rate issue Heneghan brings up, is undoubtedly south of the 90% sensitivity / 80% specificity the authors report. How much south, we just don’t know, because we don’t really have any independent, objective way to determine who “really” should get an ASD diagnosis and who shouldn’t (assuming you think it makes sense to make that kind of dichotomous distinction at all). But 90% accuracy is probably a pipe dream, if for no other reason than it’s hard to imagine that level of consensus about autism spectrum diagnoses.

The second problem is that, because the researchers are using the MRI-based classifier to predict the clinician-based diagnosis, it simply isn’t possible for the former to exceed the accuracy of the latter. That bears repeating, because it’s important: no matter how good the MRI-based classifier is, it can only be as good as the procedures used to make the original diagnosis, and no better. It cannot, by definition, make diagnoses that are any more accurate than the clinicians who screened the participants in the authors’ ASD sample. So when you see the press release say this:

For the first time, a quick brain scan that takes just 15 minutes can identify adults with autism with over 90% accuracy.

You should really read it as this:

The method relies on structural (MRI) brain scans and has an accuracy rate approaching that of conventional clinical diagnosis.

That’s not quite as exciting, obviously, but it’s more accurate.

To be fair, there’s something of a catch-22 here, in that the authors didn’t really have a choice about whether or not to diagnose the ASD group using conventional criteria. If they hadn’t, reviewers and other researchers would have complained that we can’t tell if the ASD group is really an ASD group, because they authors used non-standard criteria. Under the circumstances, they did the only thing they could do. But that doesn’t change the fact that it’s misleading to intimate, as the press release does, that the new procedure might be any better than the old ones. It can’t be, by definition.

Ultimately, if we want to develop brain-based diagnostic tools that are more accurate than conventional clinical diagnoses, we’re going to need to show that these tools are capable of predicting meaningful outcomes that clinician diagnoses can’t. This isn’t an impossible task, but it’s a very difficult one. One approach you could take, for instance, would be to compare the ability of clinician diagnosis and MRI-based diagnosis to predict functional outcomes among subjects at a later point in time. If you could show that MRI-based classification of subjects at an early age was a stronger predictor of receiving an ASD diagnosis later in life than conventional criteria, that would make a really strong case for using the former approach in the real world. Short of that type of demonstration though, the only reason I can imagine wanting to use a procedure that was developed by trying to duplicate the results of an existing procedure is in the event that the new procedure is substantially cheaper or more efficient than the old one. Meaning, it would be reasonable enough to say “well, look, we don’t do quite as well with this approach as we do with a full clinical evaluation, but at least this new approach costs much less.” Unfortunately, that’s not really true in this case, since the price of even a short MRI scan is generally going to outweigh that of a comprehensive evaluation by a psychiatrist or psychotherapist. And while it could theoretically be much faster to get an MRI scan than an appointment with a mental health professional, I suspect that that’s not generally going to be true in practice either.

Having said all that, I hasten to note that all this is really a critique of the MRC press release and subsequently lousy science reporting, and not of the science itself. I actually think the science itself is very cool (but the Neuroskeptic just wrote a great rundown of the methods and results, so there’s not much point in me describing them here). People have been doing really interesting work with pattern-based classifiers for several years now in the neuroimaging literature, but relatively few studies have applied this kind of technique to try and discriminate between different groups of individuals in a clinical setting. While I’m not really optimistic that the technique the authors introduce in this paper is going to change the way diagnosis happens any time soon (or at least, I’d argue that it shouldn’t), there’s no question that the general approach will be an important piece of future efforts to improve clinical diagnoses by integrating biological data with existing approaches. But that’s not going to happen overnight, and in the meantime, I think it’s pretty irresponsible of the MRC to be issuing press releases claiming that its researchers can diagnose autism in adults with 90% accuracy.

ResearchBlogging.orgEcker C, Marquand A, Mourão-Miranda J, Johnston P, Daly EM, Brammer MJ, Maltezos S, Murphy CM, Robertson D, Williams SC, & Murphy DG (2010). Describing the brain in autism in five dimensions–magnetic resonance imaging-assisted diagnosis of autism spectrum disorder using a multiparameter classification approach. The Journal of neuroscience : the official journal of the Society for Neuroscience, 30 (32), 10612-23 PMID: 20702694

elsewhere on the net, vacation edition

I’m hanging out in Boston for a few days, so blogging will probably be sporadic or nonexistent. Which is to say, you probably won’t notice any difference.

The last post on the Dunning-Kruger effect somehow managed to rack up 10,000 hits in 48 hours; but that was last week. Today I looked at my stats again, and the blog is back to a more normal 300 hits, so I feel like it’s safe to blog again. Here are some neat (and totally unrelated) links from the past week:

  • OKCupid has another one of those nifty posts showing off all the cool things they can learn from their gigantic userbase (who else gets to say things like “this analysis includes 1.51 million users’ data”???). Apparently, tall people (claim to) have more sex, attractive photos are more likely to be out of date, and most people who claim to be bisexual aren’t really bisexual.
  • After a few months off, my department-mate Chris Chatham is posting furiously again over at Developing Intelligence, with a series of excellent posts reviewing recent work on cognitive control and the perils of fMRI research. I’m not really sure what Chris spent his blogging break doing, but given the frequency with which he’s been posting lately, my suspicion is that he spent it secretly writing blog posts.
  • Mark Liberman points out a fundamental inconsistency in the way we view attributions of authorship: we get appropriately angry at academics who pass someone else’s work off as their own, but think it’s just fine for politicians to pay speechwriters to write for them. It’s an interesting question, and leads to an intimately related, and even more important question–namely, will anyone get mad at me if I pay someone else to write a blog post for me about someone else’s blog post discussing people getting angry at people paying or not paying other people to write material for other people that they do or don’t own the copyright on?
  • I like oohing and aahing over large datasets, and the Guardian’s Data Blog provides a nice interface to some of the most ooh- and aah-able datasets out there. [via R-Chart]
  • Ed Yong has a characteristically excellent write-up about recent work on the magnetic vision of birds. Yong also does link dump posts better than anyone else, so you should probably stop reading this one right now and read his instead.
  • You’ve probably heard about this already, but some time last week, the brain trust at ScienceBlogs made the amazingly clever decision to throw away their integrity by selling PepsiCo its very own “science” blog. Predictably, a lot of the bloggers weren’t happy with the decision, and many have now moved onto greener pastures; Carl Zimmer’s keeping score. Personally, I don’t have anything intelligent to add to everything that’s already been said; I’m literally dumbfounded.
  • Andrew Gelman takes apart an obnoxious letter from pollster John Zogby to Nate Silver of fivethirtyeight.com. I guess now we know that Zogby didn’t get where he is by not being an ass to other people.
  • Vaughan Bell of Mind Hacks points out that neuroplasticity isn’t a new concept, and was discussed seriously in the literature as far back as the 1800s. Apparently our collective views about the malleability of mind are not, themselves, very plastic.
  • NPR ran a three-part story by Barbara Bradley Hagerty on the emerging and somewhat uneasy relationship between neuroscience and the law. The articles are pretty good, but much better, in my opinion, was the Talk of the Nation episode that featured Hagerty as a guest alongside Joshua Greene, Kent Kiehl, and Stephen Morse–people who’ve all contributed in various ways to the emerging discipline of NeuroLaw. It’s a really interesting set of interviews and discussions. For what it’s worth, I think I agree with just about everything Greene has to say about these issues–except that he says things much more eloquently than I think them.
  • Okay, this one’s totally frivolous, but does anyone want to buy me one of these things? I don’t even like dried food; I just think it would be fun to stick random things in there and watch them come out pale, dried husks of their former selves. Is it morbid to enjoy watching the life slowly being sucked out of apples and mushrooms?

what the Dunning-Kruger effect is and isn’t

If you regularly read cognitive science or psychology blogs (or even just the lowly New York Times!), you’ve probably heard of something called the Dunning-Kruger effect. The Dunning-Kruger effect refers to the seemingly pervasive tendency of poor performers to overestimate their abilities relative to other people–and, to a lesser extent, for high performers to underestimate their abilities. The explanation for this, according to Kruger and Dunning, who first reported the effect in an extremely influential 1999 article in the Journal of Personality and Social Psychology, is that incompetent people by lack the skills they’d need in order to be able to distinguish good performers from bad performers:

…people who lack the knowledge or wisdom to perform well are often unaware of this fact. We attribute this lack of awareness to a deficit in metacognitive skill. That is, the same incompetence that leads them to make wrong choices also deprives them of the savvy necessary to recognize competence, be it their own or anyone else’s.

For reasons I’m not really clear on, the Dunning-Kruger effect seems to be experiencing something of a renaissance over the past few months; it’s everywhere in the blogosphere and media. For instance, here are just a few alleged Dunning-Krugerisms from the past few weeks:

So what does this mean in business? Well, it’s all over the place. Even the title of Dunning and Kruger’s paper, the part about inflated self-assessments, reminds me of a truism that was pointed out by a supervisor early in my career: The best employees will invariably be the hardest on themselves in self-evaluations, while the lowest performers can be counted on to think they are doing excellent work…

Heidi Montag and Spencer Pratt are great examples of the Dunning-Kruger effect. A whole industry of assholes are making a living off of encouraging two attractive yet untalented people they are actually genius auteurs. The bubble around them is so thick, they may never escape it. At this point, all of America (at least those who know who they are), is in on the joke ““ yet the two people in the center of this tragedy are completely unaware…

Not so fast there — the Dunning-Kruger effect comes into play here. People in the United States do not have a high level of understanding of evolution, and this survey did not measure actual competence. I’ve found that the people most likely to declare that they have a thorough knowledge of evolution are the creationists“¦but that a brief conversation is always sufficient to discover that all they’ve really got is a confused welter of misinformation…

As you can see, the findings reported by Kruger and Dunning are often interpreted to suggest that the less competent people are, the more competent they think they are. People who perform worst at a task tend to think they’re god’s gift to said task, and the people who can actually do said task often display excessive modesty. I suspect we find this sort of explanation compelling because it appeals to our implicit just-world theories: we’d like to believe that people who obnoxiously proclaim their excellence at X, Y, and Z must really not be so very good at X, Y, and Z at all, and must be (over)compensating for some actual deficiency; it’s much less pleasant to imagine that people who go around shoving their (alleged) superiority in our faces might really be better than us at what they do.

Unfortunately, Kruger and Dunning never actually provided any support for this type of just-world view; their studies categorically didn’t show that incompetent people are more confident or arrogant than competent people. What they did show is this:

This is one of the key figures from Kruger and Dunning’s 1999 paper (and the basic effect has been replicated many times since). The critical point to note is that there’s a clear positive correlation between actual performance (gray line) and perceived performance (black line): the people in the top quartile for actual performance think they perform better than the people in the second quartile, who in turn think they perform better than the people in the third quartile, and so on. So the bias is definitively not that incompetent people think they’re better than competent people. Rather, it’s that incompetent people think they’re much better than they actually are. But they typically still don’t think they’re quite as good as people who, you know, actually are good. (It’s important to note that Dunning and Kruger never claimed to show that the unskilled think they’re better than the skilled; that’s just the way the finding is often interpreted by others.)

That said, it’s clear that there is a very large discrepancy between the way incompetent people actually perform and the way they perceive their own performance level, whereas the discrepancy is much smaller for highly competent individuals. So the big question is why. Kruger and Dunning’s explanation, as I mentioned above, is that incompetent people lack the skills they’d need in order to know they’re incompetent. For example, if you’re not very good at learning languages, it might be hard for you to tell that you’re not very good, because the very skills that you’d need in order to distinguish someone who’s good from someone who’s not are the ones you lack. If you can’t hear the distinction between two different phonemes, how could you ever know who has native-like pronunciation ability and who doesn’t? If you don’t understand very many words in another language, how can you evaluate the size of your own vocabulary in relation to other people’s?

This appeal to people’s meta-cognitive abilities (i.e., their knowledge about their knowledge) has some intuitive plausibility, and Kruger, Dunning and their colleagues have provided quite a bit of evidence for it over the past decade. That said, it’s by no means the only explanation around; over the past few years, a fairly sizeable literature criticizing or extending Kruger and Dunning’s work has developed. I’ll mention just three plausible (and mutually compatible) alternative accounts people have proposed (but there are others!)

1. Regression toward the mean. Probably the most common criticism of the Dunning-Kruger effect is that it simply reflects regression to the mean–that is, it’s a statistical artifact. Regression to the mean refers to the fact that any time you select a group of individuals based on some criterion, and then measure the standing of those individuals on some other dimension, performance levels will tend to shift (or regress) toward the mean level. It’s a notoriously underappreciated problem, and probably explains many, many phenomena that people have tried to interpret substantively. For instance, in placebo-controlled clinical trials of SSRIs, depressed people tend to get better in both the drug and placebo conditions. Some of this is undoubtedly due to the placebo effect, but much of it is probably also due to what’s often referred to as “natural history”. Depression, like most things, tends to be cyclical: people get better or worse better over time, often for no apparent rhyme or reason. But since people tend to seek help (and sign up for drug trials) primarily when they’re doing particularly badly, it follows that most people would get better to some extent even without any treatment. That’s regression to the mean (the Wikipedia entry has other nice examples–for example, the famous Sports Illustrated Cover Jinx).

In the context of the Dunning-Kruger effect, the argument is that incompetent people simply regress toward the mean when you ask them to evaluate their own performance. Since perceived performance is influenced not only by actual performance, but also by many other factors (e.g., one’s personality, meta-cognitive ability, measurement error, etc.), it follows that, on average, people with extreme levels of actual performance won’t be quite as extreme in terms of their perception of their performance. So, much of the Dunning-Kruger effect arguably doesn’t need to be explained at all, and in fact, it would be quite surprising if you didn’t see a pattern of results that looks at least somewhat like the figure above.

2. Regression to the mean plus better-than-average. Having said that, it’s clear that regression to the mean can’t explain everything about the Dunning-Kruger effect. One problem is that it doesn’t explain why the effect is greater at the low end than at the high end. That is, incompetent people tend to overestimate their performance to a much greater extent than competent people underestimate their performance. This asymmetry can’t be explained solely by regression to the mean. It can, however, be explained by a combination of RTM and a “better-than-average” (or self-enhancement) heuristic which says that, in general, most people have a tendency to view themselves excessively positively. This two-pronged explanation was proposed by Krueger and Mueller in a 2002 study (note that Krueger and Kruger are different people!), who argued that poor performers suffer from a double whammy: not only do their perceptions of their own performance regress toward the mean, but those perceptions are also further inflated by the self-enhancement bias. In contrast, for high performers, these two effects largely balance each other out: regression to the mean causes high performers to underestimate their performance, but to some extent that underestimation is offset by the self-enhancement bias. As a result, it looks as though high performers make more accurate judgments than low performers, when in reality the high performers are just lucky to be where they are in the distribution.

3. The instrumental role of task difficulty. Consistent with the notion that the Dunning-Kruger effect is at least partly a statistical artifact, some studies have shown that the asymmetry reported by Kruger and Dunning (i.e., the smaller discrepancy for high performers than for low performers) actually goes away, and even reverses, when the ability tests given to participants are very difficult. For instance, Burson and colleagues (2006), writing in JPSP, showed that when University of Chicago undergraduates were asked moderately difficult trivia questions about their university, the subjects who performed best were just as poorly calibrated as the people who performed worst, in the sense that their estimates of how well they did relative to other people were wildly inaccurate. Here’s what that looks like:

Notice that this finding wasn’t anomalous with respect to the Kruger and Dunning findings; when participants were given easier trivia (the diamond-studded line), Burson et al observed the standard pattern, with poor performers seemingly showing worse calibration. Simply knocking about 10% off the accuracy rate on the trivia questions was enough to induce a large shift in the relative mismatch between perceptions of ability and actual ability. Burson et al then went on to replicate this pattern in two additional studies involving a number of different judgments and tasks, so this result isn’t specific to trivia questions. In fact, in the later studies, Burson et al showed that when the task was really difficult, poor performers were actually considerably better calibrated than high performers.

Looking at the figure above, it’s not hard to see why this would be. Since the slope of the line tends to be pretty constant in these types of experiments, any change in mean performance levels (i.e., a shift in intercept on the y-axis) will necessarily result in a larger difference between actual and perceived performance at the high end. Conversely, if you raise the line, you maximize the difference between actual and perceived performance at the lower end.

To get an intuitive sense of what’s happening here, just think of it this way: if you’re performing a very difficult task, you’re probably going to find the experience subjectively demanding even if you’re at the high end relative to other people. Since people’s judgments about their own relative standing depends to a substantial extent on their subjective perception of their own performance (i.e., you use your sense of how easy a task was as a proxy of how good you must be at it), high performers are going to end up systematically underestimating how well they did. When a task is difficult, most people assume they must have done relatively poorly compared to other people. Conversely, when a task is relatively easy (and the tasks Dunning and Kruger studied were on the easier side), most people assume they must be pretty good compared to others. As a result, it’s going to look like the people who perform well are well-calibrated when the task is easy and poorly-calibrated when the task is difficult; less competent people are going to show exactly the opposite pattern. And note that this doesn’t require us to assume any relationship between actual performance and perceived performance. You would expect to get the Dunning-Kruger effect for easy tasks even if there was exactly zero correlation between how good people actually are at something and how good they think they are.

Here’s how Burson et al summarized their findings:

Our studies replicate, eliminate, or reverse the association between task performance and judgment accuracy reported by Kruger and Dunning (1999) as a function of task difficulty. On easy tasks, where there is a positive bias, the best performers are also the most accurate in estimating their standing, but on difficult tasks, where there is a negative bias, the worst performers are the most accurate. This pattern is consistent with a combination of noisy estimates and overall bias, with no need to invoke differences in metacognitive abilities. In this  regard, our findings support Krueger and Mueller’s (2002) reinterpretation of Kruger and Dunning’s (1999) findings. An association between task-related skills and metacognitive insight may indeed exist, and later we offer some suggestions for ways to test for it. However, our analyses indicate that the primary drivers of errors in judging relative standing are general inaccuracy and overall biases tied to task difficulty. Thus, it is important to know more about those sources of error in order to better understand and ameliorate them.

What should we conclude from these (and other) studies? I think the jury’s still out to some extent, but at minimum, I think it’s clear that much of the Dunning-Kruger effect reflects either statistical artifact (regression to the mean), or much more general cognitive biases (the tendency to self-enhance and/or to use one’s subjective experience as a guide to one’s standing in relation to others). This doesn’t mean that the meta-cognitive explanation preferred by Dunning, Kruger and colleagues can’t hold in some situations; it very well may be that in some cases, and to some extent, people’s lack of skill is really what prevents them from accurately determining their standing in relation to others. But I think our default position should be to prefer the alternative explanations I’ve discussed above, because they’re (a) simpler, (b) more general (they explain lots of other phenomena), and (c) necessary (frankly, it’d be amazing if regression to the mean didn’t explain at least part of the effect!).

We should also try to be aware of another very powerful cognitive bias whenever we use the Dunning-Kruger effect to explain the people or situations around us–namely, confirmation bias. If you believe that incompetent people don’t know enough to know they’re incompetent, it’s not hard to find anecdotal evidence for that; after all, we all know people who are both arrogant and not very good at what they do. But if you stop to look for it, it’s probably also not hard to find disconfirming evidence. After all, there are clearly plenty of people who are good at what they do, but not nearly as good as they think they are (i.e., they’re above average, and still totally miscalibrated in the positive direction). Just like there are plenty of people who are lousy at what they do and recognize their limitations (e.g., I don’t need to be a great runner in order to be able to tell that I’m not a great runner–I’m perfectly well aware that I have terrible endurance, precisely because I can’t finish runs that most other runners find trivial!). But the plural of anecdote is not data, and the data appear to be equivocal. Next time you’re inclined to chalk your obnoxious co-worker’s delusions of grandeur down to the Dunning-Kruger effect, consider the possibility that your co-worker’s simply a jerk–no meta-cognitive incompetence necessary.

ResearchBlogging.orgKruger J, & Dunning D (1999). Unskilled and unaware of it: how difficulties in recognizing one’s own incompetence lead to inflated self-assessments. Journal of personality and social psychology, 77 (6), 1121-34 PMID: 10626367
Krueger J, & Mueller RA (2002). Unskilled, unaware, or both? The better-than-average heuristic and statistical regression predict errors in estimates of own performance. Journal of personality and social psychology, 82 (2), 180-8 PMID: 11831408
Burson KA, Larrick RP, & Klayman J (2006). Skilled or unskilled, but still unaware of it: how perceptions of difficulty drive miscalibration in relative comparisons. Journal of personality and social psychology, 90 (1), 60-77 PMID: 16448310