No, it’s not The Incentives—it’s you

There’s a narrative I find kind of troubling, but that unfortunately seems to be growing more common in science. The core idea is that the mere existence of perverse incentives is a valid and sufficient reason to knowingly behave in an antisocial way, just as long as one first acknowledges the existence of those perverse incentives. The way this dynamic usually unfolds is that someone points out some fairly serious problem with the way many scientists behave—say, our collective propensity to p-hack as if it’s going out of style, or the fact that we insist on submitting our manuscripts to publishers that are actively trying to undermine our interests—and then someone else will say, “I know, right—but what are you going to do, those are the incentives.”

As best I can tell, the words “it’s the incentives” are magic. Once they’re uttered by someone, natural law demands that everyone else involved in the conversation immediately stop whatever else they were doing, solemnly nod, and mumble something to the effect that, yes, the incentives are very bad, very bad indeed, and it’s a real tragedy that so many smart, hard-working people are being crushed under the merciless, gigantic boot of The System. Then there’s usually a brief pause, and after that, everyone goes back to discussing whatever they were talking about a moment earlier.

Perhaps I’m getting senile in my early middle age, but my anecdotal perception is that it used to be that, when somebody pointed out to a researcher that they might be doing something questionable, that researcher would typically either (a) argue that they weren’t doing anything questionable (often incorrectly, because there used to be much less appreciation for some of the statistical issues involved), or (b) look uncomfortable for a little while, allow an awkward silence to bloom, and then change the subject. In the last few years, I’ve noticed that uncomfortable discussions about questionable practices disproportionately seem to end with a chuckle or shrug, followed by a comment to the effect that we are all extremely sophisticated human beings who recognize the complexity of the world we live in, and sure it would be great if we lived in a world where one didn’t have to occasionally engage in shenanigans, but that would be extremely naive, and after all, we are not naive, are we?

There is, of course,  an element of truth to this kind of response. I’m not denying that perverse incentives exist; they obviously do. There’s no question that many aspects of modern scientific culture systematically incentivize antisocial behavior, and I don’t think we can or should pretend otherwise. What I do object to quite strongly is the narrative that scientists are somehow helpless in the face of all these awful incentives—that we can’t possibly be expected to take any course of action that has any potential, however small, to impede our own career development.

“I would publish in open access journals,” your friendly neighborhood scientist will say. “But those have a lower impact factor, and I’m up for tenure in three years.”

Or: “if I corrected for multiple comparisons in this situation, my effect would go away, and then the reviewers would reject the paper.”

Or: “I can’t ask my graduate students to collect an adequately-powered replication sample; they need to publish papers as quickly as they can so that they can get a job.”

There are innumerable examples of this kind, and they’ve become so routine that it appears many scientists have stopped thinking about what the words they’re saying actually mean, and instead simply glaze over and nod sagely whenever the dreaded Incentives are invoked.

A random bystander who happened to eavesdrop on a conversation between a group of scientists kvetching about The Incentives could be forgiven for thinking that maybe, just maybe, a bunch of very industrious people who generally pride themselves on their creativity, persistence, and intelligence could find some way to work around, or through, the problem. And I think they would be right. The fact that we collectively don’t see it as a colossal moral failing that we haven’t figured out a way to get our work done without having to routinely cut corners in the rush for fame and fortune is deeply troubling.

It’s also aggravating on an intellectual level, because the argument that we’re all being egregiously and continuously screwed over by The Incentives is just not that good. I think there are a lot of reasons why researchers should be very hesitant to invoke The Incentives as a justification for why any of us behave the way we do. I’ll give nine of them here, but I imagine there are probably others.

1. You can excuse anything by appealing to The Incentives

No, seriously—anything. Once you start crying that The System is Broken in order to excuse your actions (or inactions), you can absolve yourself of responsibility for all kinds of behaviors that, on paper, should raise red flags. Consider just a few behaviors that few scientists would condone:

  • Fabricating data or results
  • Regulary threatening to fire trainees in order to scare them into working harder
  • Deliberately sabotaging competitors’ papers or grants by reviewing them negatively

I think it’s safe to say most of us consider such practices to be thoroughly immoral, yet there are obviously people who engage in each of them. And when those people are caught or confronted, one of the most common justifications they fall back on is… you guessed it: The Incentives! When Diederik Stapel confessed to fabricating the data used in over 50 publications, he didn’t explain his actions by saying “oh, you know, I’m probably a bit of a psychopath”; instead, he placed much of the blame squarely on The Incentives:

I did not withstand the pressure to score, to publish, the pressure to get better in time. I wanted too much, too fast. In a system where there are few checks and balances, where people work alone, I took the wrong turn. I want to emphasize that the mistakes that I made were not born out of selfish ends.

Stapel wasn’t acting selfishly, you see… he was just subject to intense pressures. Or, you know, Incentives.

Or consider these quotes from a New York Times article describing Stapel’s unraveling:

In his early years of research — when he supposedly collected real experimental data — Stapel wrote papers laying out complicated and messy relationships between multiple variables. He soon realized that journal editors preferred simplicity. “They are actually telling you: ‘Leave out this stuff. Make it simpler,'” Stapel told me. Before long, he was striving to write elegant articles.

The experiment — and others like it — didn’t give Stapel the desired results, he said. He had the choice of abandoning the work or redoing the experiment. But he had already spent a lot of time on the research and was convinced his hypothesis was valid. “I said — you know what, I am going to create the data set,” he told me.

Reading through such accounts, it’s hard to avoid the conclusion that Stapel’s self-narrative is strikingly similar to the one that gets tossed out all the time on social media, or in conference bar conversations: here I am, a good scientist trying to do an honest job, and yet all around me is a system that incentivizes deception and corner-cutting. What do you expect me to do?.

Curiously, I’ve never heard any of my peers—including many of the same people who are quick to invoke The Incentives to excuse their own imperfections—seriously endorse The Incentives as an acceptable justification for Stapel’s behavior. In Stapel’s case, the inference we overwhelmingly jump to is that there must be something deeply wrong with Stapel, seeing as the rest of us also face the same perverse incentives on a daily basis, yet we somehow manage to get by without fabricating data. But this conclusion should make us a bit uneasy, I think, because if it’s correct (and I think it is), it implies that we aren’t really such slaves to The Incentives after all. When our morals get in the way, we appear to be perfectly capable of resisting temptation. And I mean, it’s not even like it’s particularly difficult; I doubt many researchers actively have to fight the impulse to manipulate their data, despite the enormous incentives to do so. I submit that the reason many of us feel okay doing things like reporting exploratory results as confirmatory results, or failing to mention that we ran six other studies we didn’t report, is not really that The Incentives are forcing us to do things we don’t like, but that it’s easier to attribute our unsavory behaviors to unstoppable external forces than to take responsibility for them and accept the consequences.

Needless to say, I think this kind of attitude is fundamentally hypocritical. If we’re not comfortable with pariahs like Stapel blaming The Incentives for causing them to fabricate data, we shouldn’t use The Incentives as an excuse for doing things that are on the same spectrum, albeit less severe. If you think that what the words “I did not withstand the pressure to score” really mean when they fall out of Stapel’s mouth is something like “I’m basically a weak person who finds the thought of not being important so intolerable I’m willing to cheat to get ahead”, then you shouldn’t give yourself a free pass just because when you use that excuse, you’re talking about much smaller infractions. Consider the possibility that maybe, just like Stapel, you’re actually appealing to The Incentives as a crutch to avoid having to make your life very slightly more difficult.

2. It would break the world if everyone did it

When people start routinely accepting that The System is Broken and The Incentives Are Fucking Us Over, bad things tend to happen. It’s very hard to have a stable, smoothly functioning society once everyone believes (rightly or wrongly) that gaming the system is the only way to get by. Imagine if every time you went to your doctor—and I’m aware that this analogy won’t work well for people living outside the United States—she sent you to get a dozen expensive and completely unnecessary medical tests, and then, when prompted for an explanation, simply shrugged and said “I know I’m not an angel—but hey, them’s The Incentives.” You would be livid—even though it’s entirely true (at least in the United States; other developed countries seem to have figured this particular problem out) that many doctors have financial incentives to order unnecessary tests.

To be clear, I’m not saying perverse incentives never induce bad behavior in medicine or other fields. Of course they do. My point is that practitioners in other fields at least appear to have enough sense not to loudly trumpet The Incentives as a reasonable justification for their antisocial behavior—or to pat themselves on the back for being the kind of people who are clever enough to see the fiendish Incentives for exactly what they are. My sense is that when doctors, lawyers, journalists, etc. fall prey to The Incentives, they generally consider that to be a source of shame. I won’t go so far as to suggest that we scientists take pride in behaving badly—we obviously don’t—but we do seem to have collectively developed a rather powerful form of learned helplessness that doesn’t seem to be matched by other communities. Which is a fortunate thing, because if every other community also developed the same attitude, we would be in a world of trouble.

3. You are not special

Individual success in science is, to a first approximation, a zero-sum game—at least in the short term. many scientists who appeal to The Incentives seem to genuinely believe that opting out of doing the right thing is a victimless crime. I mean, sure, it might make the system a bit less efficient overall… but that’s just life, right? It’s not like anybody’s actually suffering.

Well yeah, people actually do suffer. There are many scientists who are willing to do the right things—to preregister their analysis plans, to work hard to falsify rather than confirm their hypotheses, to diligently draw attention to potential confounds that complicate their preferred story, and so on. When you assert your right to opt out of these things because apparently your publications, your promotions, and your students are so much more important than everyone else’s, you’re cheating those people.

No, really, you are. If you don’t like to think of yourself as someone who cheats other people, don’t reflexively collapse on a crutch made out of stainless steel Incentives any time someone questions your process. You are not special. Your publications, job, and tenure are not more important than other people’s. The fact that there are other people in your position engaging in the same behaviors doesn’t mean you and your co-authors are all very sophisticated, and that the people who refuse to cut corners are naive simpletons. What it actually demonstrates is that, somewhere along the way, you developed the reflexive ability to rationalize away behavior that you would disapprove of in others and that, viewed dispassionately, is clearly damaging to science.

4. You (probably) have no data

It’s telling that appeals to The Incentives are rarely supported by any actual data. It’s simply taken for granted that engaging in the practice in question would be detrimental to one’s career. The next time you’re tempted to blame The System for making you do bad things, you might want to ask yourself this: Do you actually know that, say, publishing in PLOS ONE rather than [insert closed society journal of your choice] would hurt your career? If so, how do you know that? Do you have any good evidence for it, or have you simply accepted it as stylized fact?

Coming by the kind of data you’d need to answer this question is actually not that easy: it’s not enough to reflexively point to, say, the fact that some journals have higher impact factors than others, To identify the utility-maximizing course of action, you’d need to integrate over both benefits and costs, and the costs are not always so obvious. For example, the opportunity cost of submitting your paper to a “good” journal will be offset to some extent by the likelihood of faster publication (no need to spend two years racking up rejections at high-impact venues), by the positive image you send to at least some of your peers that you support open scientific practices, and so on.

I’m not saying that a careful consideration of the pros and cons of doing the right thing would usually lead people to change their minds. It often won’t. What I’m saying is that people who blame The Incentives for forcing them to submit their papers to certain journals, to tell post-hoc stories about their work, or to use suboptimal analytical methods don’t generally support their decisions with data, or even with well-reasoned argument. The defense is usually completely reflexive—which should raise our suspicion that it’s also just a self-serving excuse.

5. It (probably) won’t matter anyway

This one might hurt a bit, but I think it’s important to consider—particularly for early-career researchers. Let’s suppose you’re right that doing the right thing in some particular case would hurt your career. Maybe it really is true that if you comprehensively report in your paper on all the studies you ran, and not just the ones that “worked”, your colleagues will receive your work less favorably. In such cases it may seem natural to think that there has to be a tight relationship between the current decision and the global outcome—i.e., that if you don’t drop the failed studies, you won’t get a tenure-track position three years down the road. After all, you’re focusing on that causal relationship right now, and it seems so clear in your head!

Unfortunately (or perhaps fortunately?), reality doesn’t operate that way. Outcomes in academia are multiply determined and enormously complex. You can tell yourself that getting more papers out faster will get you a job if it makes you feel better, but that doesn’t make it true. If you’re a graduate student on the job market these days, I have sad news for you: you’re probably not getting a tenure-track job no matter what you do. It doesn’t matter how many p-hacked papers you publish, or how thinly you slice your dissertation into different “studies”; there are not nearly enough jobs to go around for everyone who wants one.

Suppose you’re right, and your sustained pattern of corner-cutting is in fact helping you get ahead. How far ahead do you think it’s helping you get? Is it taking you from a 3% chance of getting a tenure-track position at an R1 university to an 80% chance? Almost certainly not. Maybe it’s increasing that probability from 7% to 11%; that would still be a non-trivial relative increase, but it doesn’t change the fact that, for the average grad student, there is no full-time faculty position waiting at the end of the road. Despite what the environment around you may make you think, the choice most graduate students and postdocs face is not actually between (a) maintaining your integrity and “failing” out of science or (b) cutting a few corners and achieving great fame and fortune as a tenured professor. The Incentives are just not that powerful. The vastly more common choice you face as a trainee is between (a) maintaining your integrity and having a pretty low chance of landing a permanent research position, or (b) cutting a bunch of corners that threaten the validity of your work and having a slightly higher (but still low in absolute terms) chance of landing a permanent research position. And even that’s hardly guaranteed, because you never know when there’s someone on a hiring committee who’s going to be turned off by the obvious p-hacking in your work.

The point is, the world is complicated, and as a general rule, very few things—including the number of publications you produce—are as important as they seem to be when you’re focusing on them in the moment. If you’re an early-career researcher and you regularly find yourself strugging between doing what’s right and doing what isn’t right but (you think) benefits your career, you may want to take a step back and dispassionately ask yourself whether this integrity versus expediency conflict is actually a productive way to frame things. Instead, consider the alternative framing I suggested above: you are most likely going to leave academia eventually, no matter what you do, so why not at least try to see the process through with some intellectual integrity? And I mean, if you’re really so convinced that The System is Broken, why would you want to stay in it anyway? Do you think standards are going to change dramatically in the next few years? Are you laboring under the impression that you, of all people, are going to somehow save science?

This brings us directly to the next point…

6. You’re (probably) not going to “change things from the inside”

Over the years, I’ve talked to quite a few early-career researchers who have told me that while they can’t really stop engaging in questionable research practices right now without hurting their career, they’re definitely going to do better once they’re in a more established position. These are almost invariably nice, well-intentioned people, and I don’t doubt that they genuinely believe what they say. Unfortunately, what they say is slippery, and has a habit of adapting to changing circumstances. As a grad student or postdoc, it’s easy to think that once you get a faculty position, you’ll be able to start doing research the “right” way. But once you get a faculty position, it then turns out you need to get papers and grants in order to get tenure (I mean, who knew?), so you decide to let the dreaded Incentives win for just a few more years. And then, once you secure tenure, well, now the problem is that your graduate students also need jobs, just like you once did, so you can’t exactly stop publishing at the same rate, can you? Plus, what would all your colleagues think if you effectively said, “oh, you should all treat the last 15 years of my work with skepticism—that was just for tenure”?

I’m not saying there aren’t exceptions. I’m sure there are. But I can think of at least a half-dozen people off-hand who’ve regaled me with me some flavor of “once I’m in a better position” story, and none of them, to my knowledge, have carried through on their stated intentions in a meaningful way. And I don’t find this surprising: in most walks of life, course correction generally becomes harder, not easier, the longer you’ve been traveling on the wrong bearing. So if part of your unhealthy respect for The Incentives is rooted in an expectation that those Incentives will surely weaken their grip on you just as soon as you reach the next stage of your career, you may want to rethink your strategy. The Incentives are not going to dissipate as you move up the career ladder; if anything, you’re probably going to have an increasingly difficult time shrugging them off.

7. You’re not thinking long-term

One of the most frustrating aspects of appeals to The Incentives is that they almost invariably seem to focus exclusively on the short-to-medium term. But the long term also matters. And there, I would argue that The Incentives very much favor a radically different—and more honest—approach to scientific research. To see this, we need only consider the ongoing “replication crisis” in many fields of science. One thing that I think has been largely overlooked in discussions about the current incentive structure of science is what impact the replication crisis will have on the legacies of a huge number of presently famous scientists.

I’ll tell you what impact it will have: many of those legacies will be completely zeroed out. And this isn’t just hypothetical scaremongering. It’s happening right now to many former stars of psychology (and, I imagine, other fields I’m less familiar with). There are many researchers we can point to right now who used to be really famous (like, major-chunks-of-the-textbook famous), are currently famous-with-an-asterisk, and will in all likelihood, be completely unknown again within a couple of decades. The unlucky ones are probably even fated to become infamous—their entire scientific legacies eventually reduced to footnotes in cautionary histories illustrating how easily entire areas of scientific research can lose their footing when practitioners allow themselves to be swept away by concerns about The Incentives.

You probably don’t want this kind of thing to happen to you. I’m guessing you would like to retire with at least some level of confidence that your work, while maybe not Earth-shattering in its implications, isn’t going to be tossed on the scrap heap of history one day by a new generation of researchers amazed at how cavalier you and your colleagues once were about silly little things like “inferential statistics” and “accurate reporting”. So if your justification for cutting corners is that you can’t otherwise survive or thrive in the present environment, you should consider the prospect—and I mean, really take some time to think about it—that any success you earn within the next 10 years by playing along with The Incentives could ultimately make your work a professional joke within the 20 years after that.

8. It achieves nothing and probably makes things worse

Hey, are you a scientist? Yes? Great, here’s a quick question for you: do you think there’s any working scientist on Planet Earth who doesn’t already know that The Incentives are fucked up? No? I didn’t think so. Which means you really don’t need to keep bemoaning The Incentives; I promise you that you’re not helping to draw much-needed attention to an important new problem nobody’s recognized before. You’re not expressing any deep insight by pointing out that hiring committees prefer applicants with lots of publications in high-impact journals to applicants with a few publications in journals no one’s ever heard of. If your complaints are achieving anything at all, they’re probably actually making things worse by constantly (and incorrectly) reminding everyone around you about just how powerful The Incentives are.

Here’s a suggestion: maybe try not talking about The Incentives for a while. You could even try, I don’t know, working against The Incentives for a change. Or, if you can’t do that, just don’t say anything at all. Probably nobody will miss anything, and the early-career researchers among us might even be grateful for a respite from their senior colleagues’ constant reminder that The System—the very same system those senior colleagues are responsible for creating!—is so fucked up.

9. It’s your job

This last one seems so obvious it should go without saying, but it does need saying, so I’ll say it: a good reason why you should avoid hanging bad behavior on The Incentives is that you’re a scientist, and trying to get closer to the truth, and not just to tenure, is in your fucking job description. Taxpayers don’t fund you because they care about your career; they fund you to learn shit, cure shit, and build shit. If you can’t do your job without having to regularly excuse sloppiness on the grounds that you have no incentive to be less sloppy, at least have the decency not to say that out loud in a crowded room or Twitter feed full of people who indirectly pay your salary. Complaining that you would surely do the right thing if only these terrible Incentives didn’t exist doesn’t make you the noble martyr you think it does; to almost anybody outside your field who has a modicum of integrity, it just makes you sound like you’re looking for an easy out. It’s not sophisticated or worldly or politically astute, it’s just dishonest and lazy. If you find yourself unable to do your job without regularly engaging in practices that clearly devalue the very science you claim to care about, and this doesn’t bother you deeply, then maybe the problem is not actually The Incentives—or at least, not The Incentives alone. Maybe the problem is You.

“Open Source, Open Science” Meeting Report – March 2015

[The report below was collectively authored by participants at the Open Source, Open Science meeting, and has been cross-posted in other places.]

On March 19th and 20th, the Center for Open Science hosted a small meeting in Charlottesville, VA, convened by COS and co-organized by Kaitlin Thaney (Mozilla Science Lab) and Titus Brown (UC Davis). People working across the open science ecosystem attended, including publishers, infrastructure non-profits, public policy experts, community builders, and academics.
Open Science has emerged into the mainstream, primarily due to concerted efforts from various individuals, institutions, and initiatives. This small, focused gathering brought together several of those community leaders. The purpose of the meeting was to define common goals, discuss common challenges, and coordinate on common efforts.

We had good discussions about several issues at the intersection of technology and social hacking including badging, improving standards for scientific APIs, and developing shared infrastructure. We also talked about coordination challenges due to the rapid growth of the open science community. At least three collaborative projects emerged from the meeting as concrete outcomes to combat the coordination challenges.

A repeated theme was how to make the value proposition of open science more explicit. Why should scientists become more open, and why should institutions and funders support open science? We agreed that incentives in science are misaligned with practices, and we identified particular pain points and opportunities to nudge incentives. We focused on providing information about the benefits of open science to researchers, funders, and administrators, and emphasized reasons aligned with each stakeholders’ interests. We also discussed industry interest in “open”, both in making good use of open data, and also in participating in the open ecosystem. One of the collaborative projects emerging from the meeting is a paper or papers to answer the question “Why go open?“ for researchers.

Many groups are providing training for tools, statistics, or workflows that could improve openness and reproducibility. We discussed methods of coordinating training activities, such as a training “decision tree” defining potential entry points and next steps for researchers. For example, Center for Open Science offers statistics consulting, rOpenSci offers training on tools, and Software Carpentry, Data Carpentry, and Mozilla Science Lab offer training on workflows. A federation of training services could be mutually reinforcing and bolster collective effectiveness, and facilitate sustainable funding models.

The challenge of supporting training efforts was linked to the larger challenge of funding the so-called “glue” – the technical infrastructure that is only noticed when it fails to function. One such collaboration is the SHARE project, a partnership between the Association of Research Libraries, its academic association partners, and the Center for Open Science. There is little glory in training and infrastructure, but both are essential elements for providing knowledge to enable change, and tools to enact change.

Another repeated theme was the “open science bubble”. Many participants felt that they were failing to reach people outside of the open science community. Training in data science and software development was recognized as one way to introduce people to open science. For example, data integration and techniques for reproducible computational analysis naturally connect to discussions of data availability and open source. Re-branding was also discussed as a solution – rather than “post preprints!”, say “get more citations!” Another important realization was that researchers who engage with open practices need not, and indeed may not want to, self-identify as “open scientists” per se. The identity and behavior need not be the same.

A number of concrete actions and collaborative activities emerged at the end, including a more coordinated effort around badging, collaboration on API connections between services and producing an article on best practices for scientific APIs, and the writing of an opinion paper outlining the value proposition of open science for researchers. While several proposals were advanced for “next meetings” such as hackathons, no decision has yet been reached. But, a more important decision was clear – the open science community is emerging, strong, and ready to work in concert to help the daily scientific practice live up to core scientific values.

Authors
[Authors are listed in reverse alphabetical order; order does not denote relative contribution.]

  1. Tal Yarkoni, University of Texas at Austin
  2. Kara Woo, NCEAS
  3. Andrew Updegrove, Gesmer Updegrove and ConsortiumInfo.org
  4. Kaitlin Thaney, Mozilla Science Lab
  5. Jeffrey Spies, Center for Open Science
  6. Courtney Soderberg, Center for Open Science
  7. Elliott Shore, Association of Research Libraries
  8. Andrew Sallans, Center for Open Science
  9. Karthik Ram, rOpenSci and Berkeley Institute for Data Science
  10. Min Ragan-Kelley, IPython and UC Berkeley
  11. Brian Nosek, Center for Open Science and University of Virginia
  12. Erin C, McKiernan, Wilfrid Laurier University
  13. Jennifer Lin, PLOS
  14. Amye Kenall, BioMed Central
  15. Mark Hahnel, figshare
  16. C. Titus Brown, UC Davis
  17. Sara D. Bowman, Center for Open Science

the truth is not optional: five bad reasons (and one mediocre one) for defending the status quo

You could be forgiven for thinking that academic psychologists have all suddenly turned into professional whistleblowers. Everywhere you look, interesting new papers are cropping up purporting to describe this or that common-yet-shady methodological practice, and telling us what we can collectively do to solve the problem and improve the quality of the published literature. In just the last year or so, Uri Simonsohn introduced new techniques for detecting fraud, and used those tools to identify at least 3 cases of high-profile, unabashed data forgery. Simmons and colleagues reported simulations demonstrating that standard exploitation of research degrees of freedom in analysis can produce extremely high rates of false positive findings. Pashler and colleagues developed a “Psych file drawer” repository for tracking replication attempts. Several researchers raised trenchant questions about the veracity and/or magnitude of many high-profile psychological findings such as John Bargh’s famous social priming effects. Wicherts and colleagues showed that authors of psychology articles who are less willing to share their data upon request are more likely to make basic statistical errors in their papers. And so on and so forth. The flood shows no signs of abating; just last week, the APS journal Perspectives in Psychological Science announced that it’s introducing a new “Registered Replication Report” section that will commit to publishing pre-registered high-quality replication attempts, irrespective of their outcome.

Personally, I think these are all very welcome developments for psychological science. They’re solid indications that we psychologists are going to be able to police ourselves successfully in the face of some pretty serious problems, and they bode well for the long-term health of our discipline. My sense is that the majority of other researchers–perhaps the vast majority–share this sentiment. Still, as with any zeitgeist shift, there are always naysayers. In discussing these various developments and initiatives with other people, I’ve found myself arguing, with somewhat surprising frequency, with people who for various reasons think it’s not such a good thing that Uri Simonsohn is trying to catch fraudsters, or that social priming findings are being questioned, or that the consequences of flexible analyses are being exposed. Since many of the arguments I’ve come across tend to recur, I thought I’d summarize the most common ones here–along with the rebuttals I usually offer for why, with one possible exception, the arguments for giving a pass to sloppy-but-common methodological practices are not very compelling.

“But everyone does it, so how bad can it be?”

We typically assume that long-standing conventions must exist for some good reason, so when someone raises doubts about some widespread practice, it’s quite natural to question the person raising the doubts rather than the practice itself. Could it really, truly be (we say) that there’s something deeply strange and misguided about using p values? Is it really possible that the reporting practices converged on by thousands of researchers in tens of thousands of neuroimaging articles might leave something to be desired? Could failing to correct for the many researcher degrees of freedom associated with most datasets really inflate the false positive rate so dramatically?

The answer to all these questions, of course, is yes–or at least, we should allow that it could be yes. It is, in principle, entirely possible for an entire scientific field to regularly do things in a way that isn’t very good. There are domains where appeals to convention or consensus make perfect sense, because there are few good reasons to do things a certain way except inasmuch as other people do them the same way. If everyone else in your country drives on the right side of the road, you may want to consider driving on the right side of the road too. But science is not one of those domains. In science, there is no intrinsic benefit to doing things just for the sake of convention. In fact, almost by definition, major scientific advances are ones that tend to buck convention and suggest things that other researchers may not have considered possible or likely.

In the context of common methodological practice, it’s no defense at all to say but everyone does it this way, because there are usually relatively objective standards by which we can gauge the quality of our methods, and it’s readily apparent that there are many cases where the consensus approach leave something to be desired. For instance, you can’t really justify failing to correct for multiple comparisons when you report a single test that’s just barely significant at p < .05 on the grounds that nobody else corrects for multiple comparisons in your field. That may be a valid explanation for why your paper successfully got published (i.e., reviewers didn’t want to hold your feet to the fire for something they themselves are guilty of in their own work), but it’s not a valid defense of the actual science. If you run a t-test on randomly generated data 20 times, you will, on average, get a significant result, p < .05, once. It does no one any good to argue that because the convention in a field is to allow multiple testing–or to ignore statistical power, or to report only p values and not effect sizes, or to omit mention of conditions that didn’t ‘work’, and so on–it’s okay to ignore the issue. There’s a perfectly reasonable question as to whether it’s a smart career move to start imposing methodological rigor on your work unilaterally (see below), but there’s no question that the mere presence of consensus or convention surrounding a methodological practice does not make that practice okay from a scientific standpoint.

“But psychology would break if we could only report results that were truly predicted a priori!”

This is a defense that has some plausibility at first blush. It’s certainly true that if you force researchers to correct for multiple comparisons properly, and report the many analyses they actually conducted–and not just those that “worked”–a lot of stuff that used to get through the filter will now get caught in the net. So, by definition, it would be harder to detect unexpected effects in one’s data–even when those unexpected effects are, in some sense, ‘real’. But the important thing to keep in mind is that raising the bar for what constitutes a believable finding doesn’t actually prevent researchers from discovering unexpected new effects; all it means is that it becomes harder to report post-hoc results as pre-hoc results. It’s not at all clear why forcing researchers to put in more effort validating their own unexpected finding is a bad thing.

In fact, forcing researchers to go the extra mile in this way would have one exceedingly important benefit for the field as a whole: it would shift the onus of determining whether an unexpected result is plausible enough to warrant pursuing away from the community as a whole, and towards the individual researcher who discovered the result in the first place. As it stands right now, if I discover an unexpected result (p < .05!) that I can make up a compelling story for, there’s a reasonable chance I might be able to get that single result into a short paper in, say, Psychological Science. And reap all the benefits that attend getting a paper into a “high-impact” journal. So in practice there’s very little penalty to publishing questionable results, even if I myself am not entirely (or even mostly) convinced that those results are reliable. This state of affairs is, to put it mildly, not A Good Thing.

In contrast, if you as an editor or reviewer start insisting that I run another study that directly tests and replicates my unexpected finding before you’re willing to publish my result, I now actually have something at stake. Because it takes time and money to run new studies, I’m probably not going to bother to follow up on my unexpected finding unless I really believe it. Which is exactly as it should be: I’m the guy who discovered the effect, and I know about all the corners I have or haven’t cut in order to produce it; so if anyone should make the decision about whether to spend more taxpayer money chasing the result, it should be me. You, as the reviewer, are not in a great position to know how plausible the effect truly is, because you have no idea how many different types of analyses I attempted before I got something to ‘work’, or how many failed studies I ran that I didn’t tell you about. Given the huge asymmetry in information, it seems perfectly reasonable for reviewers to say, You think you have a really cool and unexpected effect that you found a compelling story for? Great; go and directly replicate it yourself and then we’ll talk.

“But mistakes happen, and people could get falsely accused!”

Some people don’t like the idea of a guy like Simonsohn running around and busting people’s data fabrication operations for the simple reason that they worry that the kind of approach Simonsohn used to detect fraud is just not that well-tested, and that if we’re not careful, innocent people could get swept up in the net. I think this concern stems from fundamentally good intentions, but once again, I think it’s also misguided.

For one thing, it’s important to note that, despite all the press, Simonsohn hasn’t actually done anything qualitatively different from what other whistleblowers or skeptics have done in the past. He may have suggested new techniques that improve the efficiency with which cheating can be detected, but it’s not as though he invented the ability to report or investigate other researchers for suspected misconduct. Researchers suspicious of other researchers’ findings have always used qualitatively similar arguments to raise concerns. They’ve said things like, hey, look, this is a pattern of data that just couldn’t arise by chance, or, the numbers are too similar across different conditions.

More to the point, perhaps, no one is seriously suggesting that independent observers shouldn’t be allowed to raise their concerns about possible misconduct with journal editors, professional organizations, and universities. There really isn’t any viable alternative. Naysayers who worry that innocent people might end up ensnared by false accusations presumably aren’t suggesting that we do away with all of the existing mechanisms for ensuring accountability; but since the role of people like Simonsohn is only to raise suspicion and provide evidence (and not to do the actual investigating or firing), it’s clear that there’s no way to regulate this type of behavior even if we wanted to (which I would argue we don’t). If I wanted to spend the rest of my life scanning the statistical minutiae of psychology articles for evidence of misconduct and reporting it to the appropriate authorities (and I can assure you that I most certainly don’t), there would be nothing anyone could do to stop me, nor should there be. Remember that accusing someone of misconduct is something anyone can do, but establishing that misconduct has actually occurred is a serious task that requires careful internal investigation. No one–certainly not Simonsohn–is suggesting that a routine statistical test should be all it takes to end someone’s career. In fact, Simonsohn himself has noted that he identified a 4th case of likely fraud that he dutifully reported to the appropriate authorities only to be met with complete silence. Given all the incentives universities and journals have to look the other way when accusations of fraud are made, I suspect we should be much more concerned about the false negative rate than the false positive rate when it comes to fraud.

“But it hurts the public’s perception of our field!”

Sometimes people argue that even if the field does have some serious methodological problems, we still shouldn’t discuss them publicly, because doing so is likely to instill a somewhat negative view of psychological research in the public at large. The unspoken implication being that, if the public starts to lose confidence in psychology, fewer students will enroll in psychology courses, fewer faculty positions will be created to teach students, and grant funding to psychologists will decrease. So, by airing our dirty laundry in public, we’re only hurting ourselves. I had an email exchange with a well-known researcher to exactly this effect a few years back in the aftermath of the Vul et al “voodoo correlations” paper–a paper I commented on to the effect that the problem was even worse than suggested. The argument my correspondent raised was, in effect, that we (i.e., neuroimaging researchers) are all at the mercy of agencies like NIH to keep us employed, and if it starts to look like we’re clowning around, the unemployment rate for people with PhDs in cognitive neuroscience might start to rise precipitously.

While I obviously wouldn’t want anyone to lose their job or their funding solely because of a change in public perception, I can’t say I’m very sympathetic to this kind of argument. The problem is that it places short-term preservation of the status quo above both the long-term health of the field and the public’s interest. For one thing, I think you have to be quite optimistic to believe that some of the questionable methodological practices that are relatively widespread in psychology (data snooping, selective reporting, etc.) are going to sort themselves out naturally if we just look the other way and let nature run its course. The obvious reason for skepticism in this regard is that many of the same criticisms have been around for decades, and it’s not clear that anything much has improved. Maybe the best example of this is Gigerenzer and Sedlmeier’s 1989 paper entitled “Do studies of statistical power have an effect on the power of studies?“, in which the authors convincingly showed that despite three decades of work by luminaries like Jacob Cohen advocating power analyses, statistical power had not risen appreciably in psychology studies. The presence of such unwelcome demonstrations suggests that sweeping our problems under the rug in the hopes that someone (the mice?) will unobtrusively take care of them for us is wishful thinking.

In any case, even if problems did tend to solve themselves when hidden away from the prying eyes of the media and public, the bigger problem with what we might call the “saving face” defense is that it is, fundamentally, an abuse of taxypayers’ trust. As with so many other things, Richard Feynman summed up the issue eloquently in his famous Cargo Cult science commencement speech:

For example, I was a little surprised when I was talking to a friend who was going to go on the radio. He does work on cosmology and astronomy, and he wondered how he would explain what the applications of this work were. “Well,” I said, “there aren’t any.” He said, “Yes, but then we won’t get support for more research of this kind.” I think that’s kind of dishonest. If you’re representing yourself as a scientist, then you should explain to the layman what you’re doing–and if they don’t want to support you under those circumstances, then that’s their decision.

The fact of the matter is that our livelihoods as researchers depend directly on the goodwill of the public. And the taxpayers are not funding our research so that we can “discover” interesting-sounding but ultimately unreplicable effects. They’re funding our research so that we can learn more about the human mind and hopefully be able to fix it when it breaks. If a large part of the profession is routinely employing practices that are at odds with those goals, it’s not clear why taxpayers should be footing the bill. From this perspective, it might actually be a good thing for the field to revise its standards, even if (in the worst-case scenario) that causes a short-term contraction in employment.

“But unreliable effects will just fail to replicate, so what’s the big deal?”

This is a surprisingly common defense of sloppy methodology, maybe the single most common one. It’s also an enormous cop-out, since it pre-empts the need to think seriously about what you’re doing in the short term. The idea is that, since no single study is definitive, and a consensus about the reality or magnitude of most effects usually doesn’t develop until many studies have been conducted, it’s reasonable to impose a fairly low bar on initial reports and then wait and see what happens in subsequent replication efforts.

I think this is a nice ideal, but things just don’t seem to work out that way in practice. For one thing, there doesn’t seem to be much of a penalty for publishing high-profile results that later fail to replicate. The reason, I suspect, is that we incline to give researchers the benefit of the doubt: surely (we say to ourselves), Jane Doe did her best, and we like Jane, so why should we question the work she produces? If we’re really so skeptical about her findings, shouldn’t we go replicate them ourselves, or wait for someone else to do it?

While this seems like an agreeable and fair-minded attitude, it isn’t actually a terribly good way to look at things. Granted, if you really did put in your best effort–dotted all your i’s and crossed all your t’s–and still ended up reporting a false result, we shouldn’t punish you for it. I don’t think anyone is seriously suggesting that researchers who inadvertently publish false findings should be ostracized or shunned. On the other hand, it’s not clear why we should continue to celebrate scientists who ‘discover’ interesting effects that later turn out not to replicate. If someone builds a career on the discovery of one or more seemingly important findings, and those findings later turn out to be wrong, the appropriate attitude is to update our beliefs about the merit of that person’s work. As it stands, we rarely seem to do this.

In any case, the bigger problem with appeals to replication is that the delay between initial publication of an exciting finding and subsequent consensus disconfirmation can be very long, and often spans entire careers. Waiting decades for history to prove an influential idea wrong is a very bad idea if the available alternative is to nip the idea in the bud by requiring stronger evidence up front.

There are many notable examples of this in the literature. A well-publicized recent one is John Bargh’s work on the motor effects of priming people with elderly stereotypes–namely, that priming people with words related to old age makes them walk away from the experiment more slowly. Bargh’s original paper was published in 1996, and according to Google Scholar, has now been cited over 2,000 times. It has undoubtedly been hugely influential in directing many psychologists’ research programs in certain directions (in many cases, in directions that are equally counterintuitive and also now seem open to question). And yet it’s taken over 15 years for a consensus to develop that the original effect is at the very least much smaller in magnitude than originally reported, and potentially so small as to be, for all intents and purposes, “not real”. I don’t know who reviewed Bargh’s paper back in 1996, but I suspect that if they ever considered the seemingly implausible size of the effect being reported, they might have well thought to themselves, well, I’m not sure I believe it, but that’s okay–time will tell. Time did tell, of course; but time is kind of lazy, so it took fifteen years for it to tell. In an alternate universe, a reviewer might have said, well, this is a striking finding, but the effect seems implausibly large; I would like you to try to directly replicate it in your lab with a much larger sample first. I recognize that this is onerous and annoying, but my primary responsibility is to ensure that only reliable findings get into the literature, and inconveniencing you seems like a small price to pay. Plus, if the effect is really what you say it is, people will be all the more likely to believe you later on.

Or take the actor-observer asymmetry, which appears in just about every introductory psychology textbook written in the last 20 – 30 years. It states that people are relatively more likely to attribute their own behavior to situational factors, and relatively more likely to attribute other agents’ behaviors to those agents’ dispositions. When I slip and fall, it’s because the floor was wet; when you slip and fall, it’s because you’re dumb and clumsy. This putative asymmetry was introduced and discussed at length in a book by Jones and Nisbett in 1971, and hundreds of studies have investigated it at this point. And yet a 2006 meta-analysis by Malle suggested that the cumulative evidence for the actor-observer asymmetry is actually very weak. There are some specific circumstances under which you might see something like the postulated effect, but what is quite clear is that it’s nowhere near strong enough an effect to justify being routinely invoked by psychologists and even laypeople to explain individual episodes of behavior. Unfortunately, at this point it’s almost impossible to dislodge the actor-observer asymmetry from the psyche of most researchers–a reality underscored by the fact that the Jones and Nisbett book has been cited nearly 3,000 times, whereas the 1996 meta-analysis has been cited only 96 times (a very low rate for an important and well-executed meta-analysis published in Psychological Bulletin).

The fact that it can take many years–whether 15 or 45–for a literature to build up to the point where we’re even in a position to suggest with any confidence that an initially exciting finding could be wrong means that we should be very hesitant to appeal to long-term replication as an arbiter of truth. Replication may be the gold standard in the very long term, but in the short and medium term, appealing to replication is a huge cop-out. If you can see problems with an analysis right now that cast aspersions on a study’s results, it’s an abdication of responsibility to downplay your concerns and wait for someone else to come along and spend a lot more time and money trying to replicate the study. You should point out now why you have concerns. If the authors can address them, the results will look all the better for it. And if the authors can’t address your concerns, well, then, you’ve just done science a service. If it helps, don’t think of it as a matter of saying mean things about someone else’s work, or of asserting your own ego; think of it as potentially preventing a lot of very smart people from wasting a lot of time chasing down garden paths–and also saving a lot of taxpayer money. Remember that our job as scientists is not to make other scientists’ lives easy in the hopes they’ll repay the favor when we submit our own papers; it’s to establish and apply standards that produce convergence on the truth in the shortest amount of time possible.

“But it would hurt my career to be meticulously honest about everything I do!”

Unlike the other considerations listed above, I think the concern that being honest carries a price when it comes to do doing research has a good deal of merit to it. Given the aforementioned delay between initial publication and later disconfirmation of findings (which even in the best case is usually longer than the delay between obtaining a tenure-track position and coming up for tenure), researchers have many incentives to emphasize expediency and good story-telling over accuracy, and it would be disingenuous to suggest otherwise. No malevolence or outright fraud is implied here, mind you; the point is just that if you keep second-guessing and double-checking your analyses, or insist on routinely collecting more data than other researchers might think is necessary, you will very often find that results that could have made a bit of a splash given less rigor are actually not particularly interesting upon careful cross-examination. Which means that researchers who have, shall we say, less of a natural inclination to second-guess, double-check, and cross-examine their own work will, to some degree, be more likely to publish results that make a bit of a splash (it would be nice to believe that pre-publication peer review filters out sloppy work, but empirically, it just ain’t so). So this is a classic tragedy of the commons: what’s good for a given individual, career-wise, is clearly bad for the community as a whole.

I wish I had a good solution to this problem, but I don’t think there are any quick fixes. The long-term solution, as many people have observed, is to restructure the incentives governing scientific research in such a way that individual and communal benefits are directly aligned. Unfortunately, that’s easier said than done. I’ve written a lot both in papers (1, 2, 3) and on this blog (see posts linked here) about various ways we might achieve this kind of realignment, but what’s clear is that it will be a long and difficult process. For the foreseeable future, it will continue to be an understandable though highly lamentable defense to say that the cost of maintaining a career in science is that one sometimes has to play the game the same way everyone else plays the game, even if it’s clear that the rules everyone plays by are detrimental to the communal good.

 

Anyway, this may all sound a bit depressing, but I really don’t think it should be taken as such. Personally I’m actually very optimistic about the prospects for large-scale changes in the way we produce and evaluate science within the next few years. I do think we’re going to collectively figure out how to do science in a way that directly rewards people for employing research practices that are maximally beneficial to the scientific community as a whole. But I also think that for this kind of change to take place, we first need to accept that many of the defenses we routinely give for using iffy methodological practices are just not all that compelling.

in praise of self-policing

It’s IRB week over at The Hardest Science; Sanjay has an excellent series of posts (1, 2, 3) discussing some proposed federal rule changes to the way IRBs oversee research. The short of it is that the proposed changes are mostly good news for people who do minimal risk-type research with human subjects (i.e., stuff that doesn’t involve poking people with needles); if the changes pass as written, most of us will no longer have to file any documents with our IRBs before running our studies. We’ll just put in a short note saying we’ve determined that our studies are excused from review, and then we can start collecting data right away. It’ll work something like this*:

This doesn’t mean federal oversight of human subjects research will cease, of course. There will still be guidelines we all have to follow. But instead of making researchers jump through flaming hoops preemptively, enforcement will take place on an ad-hoc basis and via random audits. For the most part, the important decisions will be left to investigators rather than IRBs. For more details, see Sanjay’s excellent breakdown.

I also agree with Sanjay’s sentiment in his latest post that this is the right way to do things; researchers should police themselves, rather than employing an entire staff of people whose jobs it is to tell researchers how to safely and ethically do their research. In principle, the idea of having trained IRB analysts go over every study sounds nice; the problem is that it takes a very long time, generates a lot of extra work for everyone, and perhaps most problematically, sets up all sorts of perverse incentives. Namely, IRB analysts have an incentive to be pedantic (since they rarely lose their jobs if they ask for too much detail, but could be liable if they give too much leeway and something bad happens), and investigators have an incentive to off-load their conscience onto the IRB rather than actually having to think about the impact of their experiment on subjects. I catch myself doing this more often than I’d like, and I’m not really happy about it. (For instance, I recently found myself telling someone it was okay for them to present gruesome pictures to subjects “because the IRB doesn’t mind that”, and not because I thought the psychological impact was negligible. I gave myself twenty lashes for that one**.) I suspect that, aside from saving everyone a good deal of time and effort, placing the responsibility of doing research on researchers’ shoulders would actually lead them to give more, and not less, consideration to ethical issues.

Anyway, it remains to be seen whether the proposed rules actually pass in their current form. One of the interesting features of the situation is that IRBs may now perversely actually have an incentive to fight against these rules going into effect, since they’d almost certainly need to lay off staff if we move to a system where most studies are entirely excused from review. I don’t really think that this will be much of an issue, and on balance I’m sure university administrations recognize how much IRBs slow down research; but it still can’t hurt for those of us who do research with human subjects to stick our heads past the Department of Health and Human Service’s doors and affirm that excusing most non-invasive human subjects research from review is the right thing to do.


* I know, I know. I managed to go two whole years on this blog without a single lolcat appearance, and now I throw it all away for this. Sorry.

** With a feather duster.