Yes, your research is very noble. No, that’s not a reason to flout copyright law.

Scientific research is cumulative; many elements of a typical research project would not and could not exist but for the efforts of many previous researchers. This goes not only for knowledge, but also for measurement. In much of the clinical world–and also in many areas of “basic” social and life science research–people routinely save themselves inordinate amounts of work by using behavioral or self-report measures developed and validated by other researchers.

Among many researchers who work in fields heavily dependent on self-report instruments (e.g., personality psychology), there appears to be a tacit belief that, once a measure is publicly available–either because it’s reported in full in a journal article, or because all of the items and instructions be found on the web–it’s fair game for use in subsequent research. There’s a time-honored ttradition of asking one’s colleagues if they happen to “have a copy” of the NEO-PI-3, or the Narcissistic Personality Inventory, or the Hamilton Depression Rating Scale. The fact that many such measures are technically published under restrictive copyright licenses, and are often listed for sale at rather exorbitant prices (e.g., you can buy 25 paper copies of the NEO-PI-3 from the publisher for $363 US), does not seem to deter researchers much. The general understanding seems to be that if a measure is publicly available, it’s okay to use it for research purposes. I don’t think most researchers have a well-thought out, internally consistent justification for this behavior; it seems to almost invariably be an article of tacit belief that nothing bad can or should happen to someone who uses a commercially available instrument for a purpose as noble as scientific research.

The trouble with tacit beliefs is that, like all beliefs, they can sometimes be wrong–only, because they’re tacit, they’re often not evaluated openly until things go horribly wrong. Exhibit A on the frontier of horrible wrongness is a recent news article in Science that reports on a rather disconcerting case where the author of a measure (the Eight-Item Morisky Medication Adherence Scale–which also provides a clue to its author’s name) has been demanding rather large sums of money (ranging from $2000 to $6500) from the authors of hundreds of published articles that have used the MMAS-8 without explicitly requesting permission. As the article notes, there appears to be a general agreement that Morisky is within his legal rights to demand such payment; what people seem to be objecting to is the amount Morisky is requesting, and the way he’s going about the process (i.e., with lawyers):

Morisky is well within his rights to seek payment for use of his copyrighted tool. U.S. law encourages academic scientists and their universities to protect and profit from their inventions, including those developed with public funds. But observers say Morisky’s vigorous enforcement and the size of his demands stand out. “It’s unusual that he is charging as much as he is,” says Kurt Geisinger, director of the Buros Center for Testing at the University of Nebraska in Lincoln, which evaluates many kinds of research-related tests. He and others note that many scientists routinely waive payments for such tools, as long as they are used for research.

It’s a nice article, and and I think it suggests two things fairly clearly. First, Morisky is probably not a very nice man. He seems to have no compunction charging resource-strapped researchers in third-world countries licensing fees that require them to take out loans from their home universities, and he would apparently rather see dozens of published articles retracted from the literature than suffer the indignity of having someone use his measure without going through the proper channels (and paying the corresponding fees).

Second, the normative practice in many areas of science that depend on the (re)use of measures developed by other people is to essentially flout copyright law, bury one’s head in the sand, and hope for the best.

I don’t know that anything can be done about the first observation–and even if something could be done, there will always be other Moriskys. I do, however, think that we could collectively do quite a few things to change the way scientists think about, and deal with, the re-use of self-report (and other kinds of) measures. Most of these amount to providing better guidance and training. In principle, this shouldn’t be hard to do; in most disciplines, scientists are trained in all manner of research method, statistical praxis, and scientific convention. Yet I know of no graduate program in my own discipline (psychology) that provides its students with even a cursory overview of intellectual property law. This despite the fact that many scientists’ chief assets–and the things they most closely identify their career achievements with–are their intellectual products.

This is, in my view, a serious training failure. More important, it’s an unnecessary failure, because there isn’t really very much that a social scientist needs to know about copyright law in order to dramatically reduce their odds of ending up a target of legal action. The goal is not to train PhDs who can moonlight as bad attorneys; it’s to prevent behavior that flagrantly exposes one to potential Moriskying (look! I coined a verb!). For that, a single 15-minute segment of a research methods class would likely suffice. While I’m sure someone better-informed and more lawyer-like than me could come up with a more accurate precis, here’s the gist of what I think one would want to cover:

  • Just because a measure is publicly available does not mean it’s in the public domain. It’s intuitive to suppose that any measure that can be found in a publicly accessible place (e.g., on the web) is, by default, okay for public use–meaning that, unless the author of a measure has indicated that they don’t want their measure to be used by others, it can be. In fact, the opposite is true. By default, the author of a newly produced work retains all usage and distribution rights to that work. The author can, if they are so inclined, immediately place that work in the public domain. Alternatively, they could stipulate that every time someone uses their measure, that user must, within 72 hours of use, immediately send the author 22 green jelly beans in an unmarked paper bag. You don’t like those terms of use? Fine: don’t use the measure.

Importantly, an author isn’t under any obligation to say anything at all about how they wish their work to be reproduced or used. This means that when a researcher uses a measure that lacks explicit licensing information, that researcher is assuming the risk of running afoul of the measure author’s desires, whether or not those desires have been made publicly known. The fact that the measure happens to be publicly available may be a mitigating factor (e.g., one could potentially claim fair use, though as far as I know there’s little precedent for this type of thing in the scientific domain), but that’s a matter for lawyers to hash out, and I think most of us scientists would rather avoid lawyer-hashing if we can help it.

This takes us directly to the next point…

  • Don’t use a measure unless you’ve read, and agree with, its licensing terms. Of course, in practice, very few scientific measures are currently released with an explicit license–which gives rise to an important corollary injunction: don’t use a measure that doesn’t come with a license.

The latter statement may seem unfair; after all, it’s clear enough that most measures developed by social scientist are missing licenses not because their authors are intentionally trying to capitalize on ambiguity, but simply because most authors are ignorant of the fact that the lack of a license creates a significant liability for potential users. Walking away from unlicensed measures would amount to giving up on huge swaths of potential research, which surely doesn’t seem like a good idea.

Fortunately, I’m not suggesting anything nearly this drastic. Because the lack of licensing is typically unintentional, often, a simple, friendly email to an author may be sufficient to magic an explicit license into existence. While I haven’t had occasion to try this yet for self-report measures, I’ve been on both ends of such requests on multiple occasions when dealing with open-source software. In virtually every case I’ve been involved in, the response to an inquiry along the lines of “hey, I’d like to use your software, but there’s no license information attached” has been to either add a license to the repository (for example…), or provide an explicit statement to the effect of “you’re welcome to use this for the use case you describe”. Of course, if a response is not forthcoming, that too is instructive, as it suggests that perhaps steering clear of the tool (or measure) in question might be a good idea.

Of course, taking licensing seriously requires one to abide by copyright law–which, like it or not, means that there may be cases where the responsible (and legal) thing to do is to just walk away from a measure, even if it seems perfect for your use case from a research standpoint. If you’re serious about taking copyright seriously, and, upon emailing the author to inquire about the terms of use, you’re informed that the terms of use involve paying $100 per participant, you can either put up the money, or use a different measure. Burying your head in the sand and using the measure anyway, without paying for it, is not a good look.

  • Attach a license to every reusable product you release into the wild. This follows directly from the previous point: if you want responsible, informed users to feel comfortable using your measure, you should tell them what they can and can’t do with it. If you’re so inclined, you can of course write your own custom license, which can involve dollar bills, jelly beans, or anything else your heart desires. But unless you feel a strong need to depart from existing practices, it’s generally a good idea to select one of the many pre-existing licenses out there, because most of them have the helpful property of having been written by lawyers, and lawyers are people who generally know how to formulate sentiments like “you must give me heap big credit” in somewhat more precise language.

There are a lot of practical recommendations out there about what license one should or shouldn’t choose; I won’t get into those here, except to say that in general, I’m a strong proponent of using permissive licenses (e.g., MIT or CC-BY), and also, that I agree with many people’s sentiment that placing restrictions on commercial use–while intuitively appealing to scientists who value public goods–is generally counterproductive. In any case, the real point here is not to push people to use any particular license, but just to think about it for a few minutes when releasing a measure. I mean, you’re probably going to spend tens or hundreds of hours thinking about the measure itself; the least you can do is make sure you tell people what they’re allowed to do with it.

I think covering just the above three points in the context of a graduate research methods class–or at the very least, in those methods classes slanted towards measure development or evaluation (e.g., psychometrics)–would go a long way towards changing scientific norms surrounding measure use.

Most importantly, perhaps, the point of learning a little bit about copyright law is not just to reduce one’s exposure to legal action. There are also large communal benefits. If academic researchers collectively decided to stop flouting copyright law when choosing research measures, the developers of measures would face a very different–and, from a societal standpoint, much more favorable–set of incentives. The present state of affairs–where an instrument’s author is able to legally charge well-meaning researchers exorbitant fees post-hoc for use of an 8-item scale–exists largely because researchers refuse to take copyright seriously, and insist on acting as if science, being such a noble and humanitarian enterprise, is somehow exempt from legal considerations that people in other fields have to constantly worry about. Perversely, the few researchers who do the right thing by offering to pay for the scales they use then end up incurring large costs, while the majority who use the measures without permission suffer no consequences (except on the rare occasions when someone like Morisky comes knocking on the door with a lawyer).

By contrast, in an academic world that cared more about copyright law, many widely-used measures that are currently released under ambiguous or restrictive licenses (or, most commonly, no license at all) would never have attained widespread use in the first place. If, say, Costa & McCrae’s NEO measures–used by thousands of researchers every year–had been developed in a world where academics had a standing norm of avoiding restrictively licensed measures, the most likely outcome is that the NEO would have changed to accommodate the norm, and not vice versa. The net result is that we would be living in a world where the vast majority of measures–just like the vast majority of open-source software–really would be free to use in every sense of the word, without risk of lawsuits, and with the ability to redistribute, reuse, and modify freely. That, I think, is a world we should want to live in. And while the ship may have already sailed when it comes to the most widely used existing measures, it’s a world we could still have going forward. We just have to commit to not using new measures unless they have a clear license–and be prepared to follow the terms of that license to the letter.

Why I still won’t review for or publish with Elsevier–and think you shouldn’t either

In 2012, I signed the Cost of Knowledge pledge, and stopped reviewing for, and publishing in, all Elsevier journals. In the four years since, I’ve adhered closely to this policy; with a couple of exceptions (see below), I’ve turned down every review request I’ve received from an Elsevier-owned journal, and haven’t sent Elsevier journals any of my own papers for publication.

Contrary to what a couple of people I talked to at the time intimated might happen, my scientific world didn’t immediately collapse. The only real consequences I’ve experienced as a result of avoiding Elsevier are that (a) on perhaps two or three occasions, I’ve had to think a little bit longer about where to send a particular manuscript, and (b) I’ve had a few dozen conversations (all perfectly civil) about Elsevier and/or academic publishing norms that I otherwise probably wouldn’t have had. Other than that, there’s been essentially no impact on my professional life. I don’t feel that my unwillingness to publish in NeuroImage, Neuron, or Journal of Research in Personality has hurt my productivity or reputation in any meaningful way. And I continue to stand by my position that it’s a mistake for scientists to do business with a publishing company that actively lobbies against the scientific community’s best interests.

While I’ve never hidden the fact that I won’t deal with Elsevier, and am perfectly comfortable talking about the subject when it comes up, I also haven’t loudly publicized my views. Aside from a parenthetical mention of the issue in one or two (sometimes satirical) blog posts, and an occasional tweet, I’ve never written anything vocally suggesting that others adopt the same stance. The reason for this is not that I don’t believe it’s an important issue; it’s that I thought Elsevier’s persistently antagonistic behavior towards scientists’ interests was common knowledge, and that most scientists continue to provide their free expert labor to Elsevier because they’ve decided that the benefits outweigh the costs. In other words, I was under the impression that other people share my facts, just not my interpretation of the facts.

I now think I was wrong about this. A series of tweets a few months ago (yes, I know, I’m slow to get blog posts out these days) prompted my reevaluation. It began with this:

Which led a couple of people to ask why I don’t review for Elsevier. I replied:


All of this information is completely public, and much of it features prominently in Elsevier’s rather surreal Wikipedia entry–nearly two thirds of which consists of “Criticism and Controversies” (and no, I haven’t personally contributed anything to that entry). As such, I assumed Elsevier’s track record of bad behavior was public knowledge. But the responses to my tweets suggested otherwise. And in the months since, I’ve had several other twitter or real-life conversations with people where it quickly became clear that the other party was not, in fact, aware of (m)any of the scandals Elsevier has been embroiled in.

In hindsight, this shouldn’t have surprised me. There’s really no good reason why most scientists should be aware of what Elsevier’s been up to all this time. Sure, most scientists cross path with Elsevier at some point; but so what? It’s not as though I thoroughly research every company I have contractual dealings with; I usually just go about my business and assume the best about the people I’m dealing with–or at the very least, I try not to assume the worst.

Unfortunately, sometimes it turns out that that assumption is wrong. And on those occasions, I generally want to know about it. So, in that spirit, I thought I’d expand on my thoughts about Elsevier beyond the 140-character format I’ve adopted in the past, in the hopes that other people might also be swayed to at least think twice about submitting their work to Elsevier journals.

Is Elsevier really so evil?

Yeah, kinda. Here’s a list of just some of the shady things Elsevier has been previously caught doing–and none of which, as far as I know, the company contests at this point:

  • They used to organize arms trade fairs, until a bunch of academics complained that a scholarly publisher probably shouldn’t be in the arms trade, at which point they sold that division off;
  • In 2009, they were caught for having created and sold half a dozen entire fake journals to pharmaceutical companies (e.g., Merck), so that those companies could fill the pages of the journals, issue after issue, with reprinted articles that cast a positive light on their drugs;
  • They regularly sell access to articles they don’t own, including articles licensed for non-commercial use–in clear contravention of copyright law, and despite repeated observations by academics that this kind of thing should not be technically difficult to stop if Elsevier actually wanted it to stop;
  • Their pricing model is based around the concept of the “Big Deal”: Elsevier (and, to be fair, most other major publishers) forces universities to pay for huge numbers of their journals at once by pricing individual journals prohibitively, ensuring that institutions can’t order only the journals they think they’ll actually use (this practice is very much like the “bundling” exercised by the cable TV industry); they also bar customers from revealing how much they paid for access, and freedom-of-information requests reveal enormous heterogeneity across universities, often at costs that are prohibitive to libraries;
  • They recently bought the SSRN preprint repository, and after promising to uphold SSRN’s existing operating procedures, almost immediately began to remove articles that were legally deposited on the service, but competed with “official” versions published elsewhere;
  • They have repeatedly spurned requests from the editorial boards of their journals to lower journal pricing, decrease open access fees, or make journals open access; this has resulted in several editorial boards abandoning the Elsevier platform wholesale and moving their operation elsewhere (Lingua being perhaps the best-known example)–often taking large communities with them;
  • Perhaps most importantly (at least in my view), they actively lobbied the US government against open access mandates, making multiple donations to the congressional sponsors of a bill called the Research Works Act that would have resulted in the elimination of the current law mandating deposition of all US government-funded scientific works in public repositories within 12 months after publication.

The pattern in these cases is almost always the same: Elsevier does something that directly works against the scientific community’s best interests (and in some cases, also the law), and then, when it gets caught with its hand in the cookie jar, it apologizes and fixes the problem (well, at least to some degree; they somehow can’t seem to stop selling OA-licensed articles, because it is apparently very difficult for a multibillion dollar company to screen the papers that appear on its websites). A few months later, another scandal comes to light, and then the cycle repeats.

Elsevier is, of course, a large company, and one could reasonably chalk one or two of the above actions down to poor management or bad judgment. But there’s a point at which the belief that this kind of thing is just an unfortunate accident–as opposed to an integral part of the business model–becomes very difficult to sustain. In my case, I was aware of a number of the above practices before I signed The Cost of Knowledge pledge; for me, the straw that broke the camel’s back was Elsevier’s unabashed support of the Research Works Act. While I certainly don’t expect any corporation (for-profit or otherwise) to actively go out and sabotage its own financial interests, most organizations seem to know better than to publicly lobby for laws that would actively and unequivocally hurt the primary constituency they make their money off of. While Elsevier wasn’t alone in its support of the RWA, it’s notable that many for-profit (and most non-profit) publishers explicitly expressed their opposition to the bill (e.g., MIT Press, Nature Publishing Group, and the AAAS). To my mind, there wasn’t (and isn’t) any reason to support a company that, on top of arms sales, fake journals, and copyright violations, thinks it’s okay to lobby the government to make it harder for taxpayers to access the results of publicly-funded research that’s generated and reviewed at no cost to Elsevier itself. So I didn’t, and still don’t.

Objections (and counter-objections)

In the 4 years since I stopped writing or reviewing for Elsevier, I’ve had many conversations with colleagues about this issue. Since most of my colleagues don’t share my position (though there are a few exceptions), I’ve received a certain amount of pushback. While I’m always happy to engage on the issue, so far, I can’t say that I’ve found any of the arguments I’ve heard sufficiently compelling to cause me to change my position. I’m not sure if my arguments have led anyone else to change their view either, but in the interest of consolidating discussion in one place (if only so that I can point people to it in future, instead of reprising the same arguments over and over again), I thought I’d lay out all of the major objections I’ve heard to date, along with my response(s) to each one. If you have other objections you feel aren’t addressed here, please leave a comment, and I’ll do my best to address them (and perhaps add them to the list).

Without further ado, and in no particular order, here are the pro-Elsevier (or at least, anti-anti-Elsevier) arguments, as I’ve heard and understood them:

“You can’t really blame Elsevier for doing this sort of thing. Corporations exist to make money; they have a fiduciary responsibility to their shareholders to do whatever they legally can to increase revenue and decrease expenses.”

For what it’s worth, I think the “fiduciary responsibility” argument–which seemingly gets trotted out almost any time anyone calls out a publicly traded corporation for acting badly–is utterly laughable. As far as I can tell, the claim it relies on is both unverifiable and unenforceable. In practice, there is rarely any way for anyone to tell whether a particular policy will hurt or help a company’s bottom line, and virtually any action one takes can be justified post-hoc by saying that it was the decision-makers’ informed judgment that it was in the company’s best interest. Presumably part of the reason publishing groups like NPG or MIT Press don’t get caught pulling this kind of shit nearly as often as Elsevier is that part of their executives’ decision-making process includes thoughts like gee, it would be really bad for our bottom line if scientists caught wind of what we’re doing here and stopped giving us all this free labor. You can tell a story defending pretty much any policy, or its polar opposite, on grounds of fiduciary responsibility, but I think it’s very unlikely that anyone is ever going to knock on an Elsevier executive’s door threatening to call in the lawyers because Elsevier just hasn’t been working hard enough lately to sell fake journals.

That said, even if you were to disagree with my assessment, and decided to take the fiduciary responsibility argument at face value, it would still be completely and utterly irrelevant to my personal decision not to work for Elsevier any more. The fact that Elsevier is doing what it’s (allegedly) legally obligated to do doesn’t mean that I have to passively go along with it. Elsevier may be legally allowed or even obligated to try to take advantage of my labor, but I’m just as free to follow my own moral compass and refuse. I can’t imagine how my individual decision to engage in moral purchasing could possibly be more objectionable to anyone than a giant corporation’s “we’ll do anything legal to make money” policy.

“It doesn’t seem fair to single out Elsevier when all of the other for-profit publishers are just as bad.”

I have two responses to this. First, I think the record pretty clearly suggests that Elsevier does in fact behave more poorly than the vast majority of other major academic publishers (there are arguably a number of tiny predatory publishers that are worse–but of course, I don’t think anyone should review for or publish with them either!). It’s not that publishers like Springer or Wiley are without fault; but they at least don’t seem to get caught working against the scientific community’s interests nearly as often. So I think Elsevier’s particularly bad track record makes it perfectly reasonable to focus attention on Elsevier in particular.

Second, I don’t think it would, or should, make any difference to the analysis even if it turned out that Springer or Wiley were just as bad. The reason I refuse to publish with Elsevier is not that they’re the only bad apples, but that I know that they’re bad apples. The fact that there might be other bad actors we don’t know about doesn’t mean we shouldn’t take actions against the bad actors we do know about. In fact, it wouldn’t mean that even if we did know of other equally bad actors. Most people presumably think there are many charities worth giving money to, but when we learn that someone donated money to a breast cancer charity, we don’t get all indignant and say, oh sure, you give money to cancer, but you don’t think heart disease is a serious enough problem to deserve your support? Instead, we say, it’s great that you’re doing what you can–we know you don’t have unlimited resources.

Moreover, from a collective action standpoint, there’s a good deal to be said for making an example out of a single bad actor rather than trying to distribute effort across a large number of targets. The reality is that very few academics perceive themselves to be in a position to walk away from all academic publishers known to engage in questionable practices. Collective action provides a means for researchers to exercise positive force on the publishing ecosystem in a way that cannot be achieved by each individual researcher making haphazard decisions about where to send their papers. So I would argue that as long as researchers agree that (a) Elsevier’s policies hurt scientists and taxpayers, and (b) Elsevier is at the very least one of the worst actors, it makes a good deal of sense to focus our collective energy on Elsevier. I would hazard a guess that if a concerted action on the part of scientists had a significant impact on Elsevier’s bottom line, other publishers would sit up and take notice rather quickly.

“You can choose to submit your own articles wherever you like; that’s totally up to you. But when you refuse to review for all Elsevier journals, you do a disservice to your colleagues, who count on you to use your expertise to evaluate other people’s manuscripts and thereby help maintain the quality of the literature as a whole.”

I think this is a valid concern in the case of very early-career academics, who very rarely get invited to review papers, and have no good reason to turn such requests down. In such cases, refusing to review because Elsevier would indeed make everyone else’s life a little bit more difficult (even if it also helps a tiny bit to achieve the long-term goal of incentivizing Elsevier to either shape up or disappear). But I don’t think the argument carries much force with most academics, because most of us have already reached the review saturation point of our careers–i.e., the point at which we can’t possibly (or just aren’t willing to) accept all the review assignments we receive. For example, at this point, I average about 3 – 4 article reviews a month, and I typically turn down about twice that many invitations to review. If I accepted any invitations from Elsevier journals, I would simply have to turn down an equal number of invitations from non-Elsevier journals–almost invariably ones with policies that I view as more beneficial to the scientific community. So it’s not true that I’m doing the scientific community a disservice by refusing to review for Elsevier; if anything, I’m doing it a service by preferentially reviewing for journals that I believe are better aligned with the scientific community’s long-term interests.

Now, on fairly rare occasions, I do get asked to review papers focusing on issues that I think I have particularly strong expertise in. And on even rarer occasions, I have reason to think that there are very few if any other people besides me who would be able to write a review that does justice to the paper. In such cases, I willingly make an exception to my general policy. But it doesn’t happen often; in fact, it’s happened exactly twice in the past 4 years. In both cases, the paper in question was built to a very significant extent on work that I had done myself, and it seemed to me quite unlikely that the editor would be able to find another reviewer with the appropriate expertise given the particulars reported in the abstract. So I agreed to review the paper, even for an Elsevier journal, because to not do so would indeed have been a disservice to the authors. I don’t have any regrets about this, and I will do it again in future if the need arises. Exceptions are fine, and we shouldn’t let the perfect be the enemy of the good. But it simply isn’t true, in my view, that my general refusal to review for Elsevier is ever-so-slightly hurting science. On the contrary, I would argue that it’s actually ever-so-slightly helping it, by using my limited energies to support publishers and journals that work in favor of, rather than against, scientists’ interests.

“If everyone did as you do, Elsevier journals might fall apart, and that would impact many people’s careers. What about all the editors, publishing staff, proof readers, etc., who would all lose at least part of their livelihood?”

This is the universal heartstring-pulling argument, in that it can be applied to virtually any business or organization ever created that employs at least one person. For example, it’s true that if everyone stopped shopping at Wal-Mart, over a million Americans would lose their jobs. But given the externalities that Wal-Mart imposes on the American taxpayer, that hardly seems like a sufficient reason to keep shopping at Wal-Mart (note that I’m not saying you shouldn’t shop at Wal-Mart, just that you’re not under any moral obligation to view yourself as a one-person jobs program). Almost every decision that involves reallocation of finite resources hurts somebody; the salient question is whether, on balance, the benefits to the community as a whole outweigh the costs. In this case, I find it very hard to see how Elsevier’s policies benefit the scientific community as a whole when much cheaper, non-profit alternatives–to say nothing of completely different alternative models of scientific evaluation–are readily available.

It’s also worth remembering that the vast majority of the labor that goes into producing Elsevier’s journals is donated to Elsevier free of charge. Given Elsevier’s enormous profit margin (over 30% in each of the last 4 years), it strains credulity to think that other publishers couldn’t provide essentially the same services while improving the quality of life of the people who provide most of the work. For an example of such a model, take a look at Collabra, where editors receive a budget of $250 per paper (which comes out of the author publication charge) that they can divide up however they like between themselves, the reviewers, and publishing subsidies to future authors who lack funds (full disclosure: I’m an editor at Collabra). So I think an argument based on treating people well clearly weighs against supporting Elsevier, not in favor of it. If nothing else, it should perhaps lead one to question why Elsevier insists it can’t pay the academics who review its articles a nominal fee, given that paying for even a million reviews per year (surely a gross overestimate) at $200 a pop would still only eat up less than 20% of Elsevier’s profit in each of the past few years.

“Whatever you may think of Elsevier’s policies at the corporate level, the editorial boards at the vast majority of Elsevier journals function autonomously, with no top-down direction from the company. Any fall-out from a widespread boycott would hurt all of the excellent editors at Elsevier journals who function with complete independence–and by extension, the field as a whole.”

I’ve now heard this argument from at least four or five separate editors at Elsevier journals, and I don’t doubt that its premise is completely true. Meaning, I’m confident that the scientific decisions made by editors at Elsevier journals on a day-to-day basis are indeed driven entirely by scientific considerations, and aren’t influenced in any way by publishing executives. That said, I’m completely unmoved by this argument, for two reasons. First, the allocation of resources–including peer reviews, submitted manuscripts, and editorial effort–is, to a first approximation, a zero-sum game. While I’m happy to grant that editorial decisions at Elsevier journals are honest and unbiased, the same is surely true of the journals owned by virtually every other publisher. So refusing to send a paper to NeuroImage doesn’t actually hurt the field as a whole in any way, unless one thinks that there is a principled reason why the editorial process at Cerebral Cortex, Journal of Neuroscience, or Journal of Cognitive Neuroscience should be any worse. Obviously, there can be no such reason. If Elsevier went out of business, many of its current editors would simply move to other journals, where they would no doubt resume making equally independent decisions about the manuscripts they receive. As I noted above, in a number of cases, entire editorial boards at Elsevier journals have successfully moved wholesale to new platforms. So there is clearly no service Elsevier provides that can’t in principle be provided more cheaply by other publishers or plaforms that aren’t saddled with Elsevier’s moral baggage or absurd profit margins.

Second, while I don’t doubt the basic integrity of the many researchers who edit for Elsevier journals, I also don’t think they’re completely devoid of responsibility for the current state of affairs. When a really shitty company offers you a position of power, it may be true that accepting that position–in spite of the moral failings of your boss’s boss’s boss–may give you the ability to do some real good for the community you care about. But it’s also true that you’re still working for a really shitty company, and that your valiant efforts could at any moment be offset by some underhanded initiative in some other branch of the corporation. Moreover, if you’re really good at your job, your success–whatever its short-term benefits to your community–will generally serve to increase your employer’s shit-creating capacity. So while I don’t think accepting an editorial position at an Elsevier journal makes anyone a bad person (some of my best friends are editors for Elsevier!), I also see no reason for anyone to voluntarily do business with a really shitty company rather than a less shitty one. As far as I can tell, there is no service I care about that NeuroImage offers me but Cerebral Cortex or The Journal of Neuroscience don’t. As a consequence, it seems reasonable for me to submit my papers to journals owned by companies that seem somewhat less intent on screwing me and my institution out of as much money as possible. If that means that some very good editors at NeuroImage ultimately have to move to JNeuro, JCogNeuro, or (dare I say it!) PLOS ONE, I think I’m okay with that.

“It’s fine for you to decide not to deal with Elsevier, but you don’t have a right to make that decision for your colleagues or trainees when they’re co-authors on your papers.”

This is probably the only criticism I hear regularly that I completely agree with. Which is why I’ve always been explicit that I can and will make exceptions when required. Here’s what I said when I originally signed The Cost of Knowledge years ago:

costofknowledge

Basically, my position is that I’ll still submit a manuscript to an Elsevier journal if either (a) I think a trainee’s career would be significantly disadvantaged by not doing so, or (b) I’m not in charge of a project, and have no right to expect to exercise control over where a paper is submitted. The former has thankfully never happened so far (though I’m always careful to make it clear to trainees that if they really believe that it’s important to submit to a particular Elsevier journal, I’m okay with it). As for the latter, in the past 4 years, I’ve been a co-author on two Elsevier papers (1, 2). In both cases, I argued against submitting the paper to those journals, but was ultimately overruled. I don’t have any problem with either of those decisions, and remain on good terms with both lead authors. If I collaborate with you on a project, you can expect to receive an email from me suggesting in fairly strong terms that we should consider submitting to a non-Elsevier-owned journal, but I certainly won’t presume to think that what makes sense to me must also make sense to you.

“Isn’t it a bit silly to think that your one-person boycott of Elsevier is going to have any meaningful impact?”

No, because it isn’t a one-person boycott. So far, over 16,000 researchers have signed The Cost of Knowledge pledge. And there are very good reasons to think that the 16,000-strong (and growing!) boycott has already had important impacts. For one thing, Elsevier withdrew its support of the RWA in 2012 shortly after The Cost of Knowledge was announced (and several thousand researchers quickly signed on). The bill itself was withdrawn shortly after that. That seems like a pretty big deal to me, and frankly I find it hard to imagine that Elsevier would have voluntarily stopped lobbying Congress this way if not for thousands of researchers putting their money where their mouth is.

Beyond that clear example, it’s hard to imagine that 16,000 researchers walking away from a single publisher wouldn’t have a significant impact on the publishing landscape. Of course, there’s no clear way to measure that impact. But consider just a few points that seem difficult to argue against:

  • All of the articles that would have been submitted to Elsevier journals presumably ended up in other publishers’ journals (many undoubtedly run by OA publishers). There has been continual growth in the number of publishers and journals; some proportion of that seems almost guaranteed to reflect the diversion of papers away from Elsevier.

  • Similarly, all of the extra time spent reviewing non-Elsevier articles instead of Elsevier articles presumably meant that other journals received better scrutiny and faster turnaround times than they would have otherwise.

  • A number of high-profile initiatives–for example, the journal Glossa–arose directly out of researchers’ refusal to keep working with Elsevier (and many others are likely to have arisen indirectly, in part). These are not insignificant. Aside from their immediate impact on the journal landscape, the involvement of leading figures like Timothy Gowers in the movement to develop better publishing and evaluation options is likely to have a beneficial long-term impact.

All told, it seems to me that, far from being ineffectual, the Elsevier boycott–consisting of nothing more than individual researchers cutting ties with the publisher–has actually achieved a considerable amount in the past 4 years. Of course, Elsevier continues to bring in huge profits, so it’s not like it’s in any danger of imminent collapse (nor should that be anyone’s goal). But I think it’s clear that, on balance, the scientific publishing ecosystem is healthier for having the boycott in place, and I see much more reason to push for even greater adoption of the policy than to reconsider it.

More importantly, I think the criticism that individual action has limited efficacy overlooks what is probably the single biggest advantage the boycott has in this case: it costs a researcher essentially nothing. If I were to boycott, say, Trader Joe’s, on the grounds that it mistreats its employees (for the record, I don’t think it does), my quality of life would go down measurably, as I would have to (a) pay more for my groceries, and (b) travel longer distances to get them (there’s a store just down the street from my apartment, so I shop there a lot). By contrast, cutting ties with Elsevier has cost me virtually nothing so far. So even if the marginal benefit to the scientific community of each additional individual boycotting Elsevier is very low, the cost to that individual will typically be still much lower. Which, in principle, makes it very easy to organize and maintain a collective action of this sort on a very large scale (and is probably a lot of what explains why over 16,000 researchers have already signed on).

What you can do

Let’s say you’ve read this far and find yourself thinking, okay, that all kind of makes sense. Maybe you agree with me that Elsevier is an amazingly shitty company whose business practices actively bite the hand that feeds it. But maybe you’re also thinking, well, the thing is, I almost exclusively publish primary articles in the field of neuroimaging [or insert your favorite Elsevier-dominated discipline here], and there’s just no way I can survive without publishing in Elsevier journals. So what can I do?

The first thing to point out is that there’s a good chance your fears are at least somewhat (and possibly greatly) exaggerated. As I noted at the outset of this post, I was initially a bit apprehensive about the impact that taking a principled stand would have on my own career, but I can’t say that I perceive any real cost to my decision, nearly five years on. One way you can easily see this is to observe that most people are surprised when I first tell them I haven’t published in Elsevier journals in five years. It’s not like the absence would ever jump out at anyone who looked at my publication list, so it’s unclear how it could hurt me. Now, I’m not saying that everyone is in a position to sign on to a complete boycott without experiencing some bumps in the road. But I do think many more people could do so than might be willing to admit it at first. There are very few fields that are completely dominated by Elsevier journals. Neuroimaging is probably one of the fields where Elsevier’s grip is strongest, but I publish several neuroimaging-focused papers a year, and have never had to work very hard to decide where to submit my papers next.

That said, the good news is that you can still do a lot to actively work towards an Elsevier-free world even if you’re unable or unwilling to completely part ways with the publisher. Here are a number of things you can do that take virtually no work, are very unlikely to harm your career in any meaningful way, and are likely to have nearly the same collective benefit as a total boycott:

  • Reduce or completely eliminate your Elsevier reviewing and/or editorial load. Even if you still plan to submit your papers to Elsevier journals, nothing compels you to review or edit for them. You should, of course, consider the pros and cons of turning down any review request; and, as I noted above, it’s fine to make occasional exceptions in cases where you think declining to review a particular paper would be a significant disservice to your peers. But such occasions are–at least in my own experience–quite rare. As I noted above, one of the reasons I’ve had no real compunction about rejecting Elsevier review requests is that I already receive many more requests than I can handle, so declining Elsevier reviews just means I review more for other (better) publishers. If you’re at an early stage of your career, and don’t get asked to review very often, the considerations may be different–though of course, you could still consider turning down the review and doing something nice for the scientific community with the time you’ve saved (e.g., reviewing openly on site like PubPeer or PubMed Commons, or spend some time making all the data, code, and materials from your previous work openly available).

  • Make your acceptance of a review assignment conditional on some other prosocial perk. As a twist on simply refusing Elsevier review invitations, you can always ask the publisher for some reciprocal favor. You could try asking for monetary compensation, of course–and in the extremely unlikely event that Elsevier obliges, you could (if needed) soothe your guilty conscience by donating your earnings to a charity of your choice. Alternatively, you could try to extract some concession from the journal that would help counteract your general aversion to reviewing for Elsevier. Chris Gorgolewski provided one example in this tweet:

Mandating open science practices (e.g., public deposition of data and code) as a requirement for review is something that many people strongly favor completely independently of commercial publishers’ shenanigans (see my own take here). Making one’s review conditional on an Elsevier journal following best practices is a perfectly fair and even-handed approach, since there are other journals that either already mandate such standards (e.g., PLOS ONE), or are likely to be able to oblige you. So if you get an affirmative response from an Elsevier journal, then great–it’s still Elsevier, but at least you’ve done something useful to improve their practices. If you get a negative review, well, again, you can simply reallocate your energy somewhere else.

  • Submit fewer papers to Elsevier journals. If you publish, say, 5 – 10 fMRI articles a year, it’s completely understandable if you might not feel quite ready to completely give up on NeuroImage and the other three million neuroimaging journals in Elsevier’s stable. Fortunately, you don’t have to. This is a nice example of the Pareto principle in action: 20% of the effort goes maybe 80% of the way in this case. All you have to do to exert almost exactly the same impact as a total boycott of Elsevier is drop NeuroImage (or whatever other journal you routinely submit to) to the bottom of the queue of whatever journals you perceive as being in the same class. So, for example, instead of reflexively thinking, “oh, I should send this to NeuroImage–it’s not good enough for Nature Neuroscience, but I don’t want to send it to just any dump journal”, you can decide to submit it to Cerebral Cortex or The Journal of Neuroscience first, and only go to NeuroImage if the first two journals reject it. Given that most Elsevier journals have a fairly large equivalence class of non-Elsevier journals, a policy like this one would almost certainly cut submissions to Elsevier journals significantly if widely implemented by authors–which would presumably reduce the perceived prestige of those journals still further, potentially precipitating a death spiral.

  • Go cold turkey. Lastly, you could always just bite the bullet and cut all ties with Elsevier. Honestly, it really isn’t that bad. As I’ve already said, the fall-out in my case has been considerably smaller than I thought it would be when I signed The Cost of Knowledge pledge as a post-doc (i.e., I expected it to have some noticeable impact, but in hindsight I think it’s had essentially none). Again, I recognize that not everyone is in a position to do this. But I do think that the reflexive “that’s a crazy thing to do” reaction that some people seem to have when The Cost of Knowledge boycott is brought up isn’t really grounded in a careful consideration of the actual risks to one’s career. I don’t know how many of the 16,000 signatories to the boycott have had to drop out of science as a direct result of their decision to walk away from Elsevier, but I’ve never heard anyone suggest this happened to them, and I suspect the number is very, very small.

The best thing about all of the above action items–with the possible exception of the last–is that they require virtually no effort, and incur virtually no risk. In fact, you don’t even have to tell anyone you’re doing any of them. Let’s say you’re a graduate student, and your advisor asks you where you want to submit your next fMRI paper. You don’t have to say “well, on principle, anywhere but an Elsevier journal” and risk getting into a long argument about the issue; you can just say “I think I’d like to try Cerebral Cortex.” Nobody has to know that you’re engaging in moral purchasing, and your actions are still almost exactly as effective. You don’t have to march down the street holding signs and chanting loudly; you don’t have to show up in front of anyone’s office to picket. You can do your part to improve the scientific publishing ecosystem just by making a few tiny decisions here and there–and if enough other people do the same thing, Elsevier and its peers will eventually be left with a stark choice: shape up, or crumble.

Neurosynth is joining the Elsevier family

[Editorial note: this was originally posted on April 1, 2016. April 1 is a day marked by a general lack of seriousness. Interpret this post accordingly.]

As many people who follow this blog will be aware, much of my research effort over the past few years has been dedicated to developing Neurosynth—a framework for large-scale, automated meta-analysis of neuroimaging data. Neurosynth has expanded steadily over time, with an ever-increasing database of studies, and a host of new features in the pipeline. I’m very grateful to NIMH for the funding that allows me to keep working on the project, and also to the hundreds (thousands?) of active Neurosynth users who keep finding novel applications for the data and tools we’re generating.

That said, I have to confess that, over the past year or so, I’ve gradually grown dissatisfied at my inability to scale up the Neurosynth operation in a way that would take the platform to the next level . My colleagues and I have come up with, and in some cases even prototyped, a number of really exciting ideas that we think would substantially advance the state of the art in neuroimaging. But we find ourselves spending an ever-increasing chunk of our time applying for the grants we need to support the work, and having little time left to over to actually do the work. Given the current funding climate and other logistical challenges (e.g., it’s hard to hire professional software developers on postdoc budgets), it’s become increasingly clear to me that the Neurosynth platform will be hard to sustain in an academic environment over the long term. So, for the past few months, I’ve been quietly exploring opportunities to help Neurosynth ladder up via collaborations with suitable industry partners.

Initially, my plan was simply to license the Neurosynth IP and use the proceeds to fund further development of Neurosynth out of my lab at UT-Austin. But as I started talking to folks in industry, I realized that there were opportunities available outside of academia that would allow me to take Neurosynth in directions that the academic environment would never allow. After a lot of negotiation, consultation, and soul-searching, I’m happy (though also a little sad) to announce that I’ll be leaving my position at the University of Texas at Austin later this year and assuming a new role as Senior Technical Fellow at Elsevier Open Science (EOS). EOS is a brand new division of Elsevier that seeks to amplify and improve scientific communication and evaluation by developing cutting-edge open science tools. The initial emphasis will be on the neurosciences, but other divisions are expected to come online in the next few years (and we’ll be hiring soon!). EOS will be building out a sizable insight-as-a-service operation that focuses on delivering real value to scientists—no p-hacking, no gimmicks, just actionable scientific information. The platforms we build will seek to replace flawed citation-based metrics with more accurate real-time measures that quantify how researchers actually use one another’s data, ideas, and tools—ultimately paving the way to a new suite of microneuroservices that reward researchers both professionally and financially for doing high-quality science.

On a personal level, I’m thrilled to be in a position to help launch an initiative like this. Having spent my entire career in an academic environment, I was initially a bit apprehensive at the thought of venturing into industry. But the move to Elsevier ended up feeling very natural. I’ve always seen Elsevier as a forward-thinking company at the cutting edge of scientific publishing, so I wasn’t shocked to hear about the EOS initiative. But as I’ve visited a number of Elsevier offices over the past few weeks (in the process of helping to decide where to locate EOS), I’ve been continually struck at how open and energetic—almost frenetic—the company is. It’s the kind of environment that combines many of the best elements of the tech world and academia, but without a lot of the administrative bureaucracy of the latter. At the end of the day, it was an opportunity I couldn’t pass up.

It will, of course, be a bittersweet transition for me; I’ve really enjoyed my 3 years in Austin, both professionally and personally. While I’m sure I’ll enjoy Norwich, CT (where EOS will be based), I’m going to really miss Austin. The good news is, I won’t be making the move alone! A big part of what sold me on Elsevier’s proposal was their commitment to developing an entire open science research operation; over the next five years, the goal is to make Elsevier the premier place to work for anyone interested in advancing open science. I’m delighted to say that Chris Gorgolewski (Stanford), Satrajit Ghosh (MIT), and Daniel Margulies (Max Planck Institute for Human Cognitive and Brain Sciences) have all also been recruited to Elsevier, and will be joining EOS at (or in Satra’s case, shortly after) launch. I expect that they’ll make their own announcements shortly, so I won’t steal their thunder much. But the short of it is that Chris, Satra, and I will be jointly spearheading the technical operation. Daniel will be working on other things, and is getting the fancy title of “Director of Interactive Neuroscience”; I think this means he’ll get to travel a lot and buy expensive pictures of brains to put on his office walls. So really, it’s a lot like his current job.

It goes without saying that Neurosynth isn’t making the jump to Elsevier all alone; NeuroVault—a whole-brain image repository developed by Chris—will also be joining the Elsevier family. We have some exciting plans in the works for much closer NeuroVault-Neurosynth integration, and we think the neuroimaging community is going to really like the products we develop. We’ll also be bringing with us the OpenfMRI platform created by Russ Poldrack. While Russ wasn’t interested in leaving Stanford (as I recall, his exact words were “over all of your dead bodies”), he did agree to release the OpenfMRI IP to Elsevier (and in return, Elsevier is endowing a permanent Open Science fellowship at Stanford). Russ will, of course, continue to actively collaborate on OpenfMRI, and all data currently in the OpenfMRI database will remain where it is (though all original contributors will be given the opportunity to withdraw their datasets if they choose). We also have some new Nipype-based tools rolling out over the coming months that will allow researchers to conduct state-of-the-art neuroimaging analyses in the cloud (for a small fee)–but I’ll have much more to say about that in a later post.

Naturally, a transition like this one can’t be completed without hitting a few speed bumps along the way. The most notable one is that the current version of Neurosynth will be retired permanently in mid-April (so grab any maps you need right now!). A new and much-improved version will be released in September, coinciding with the official launch of EOS. One of the things I’m most excited about is that the new version will support an “Enhanced Usage” tier. The vertical integration of Neurosynth with the rest of the Elsevier ecosystem will be a real game-changer; for example, authors submitting papers to NeuroImage will automatically be able to push their content into NeuroVault and Neurosynth upon acceptance, and readers will be able to instantly visualize and cognitively decode any activation map in the Elsevier system (for a nominal fee handled via an innovative new micropayment system). Users will, of course, retain full control over their content, ensuring that only readers who have the appropriate permissions (and a valid micropayment account of their own) can access other people’s data. We’re even drawing up plans to return a portion of the revenues earned through the system to the content creators (i.e., article authors)—meaning that for the first time, neuroimaging researchers will be able to easily monetize their research.

As you might expect, the Neurosynth brand will be undergoing some changes to reflect the new ownership. While Chris and I initially fought hard to preserve the names Neurosynth and NeuroVault, Elsevier ultimately convinced us that using a consistent name for all of our platforms would reduce confusion, improve branding, and make for a much more streamlined user experience*. There’s also a silver lining to the name we ended up with: Chris, Russ, and I have joked in the past that we should unite our various projects into a single “NeuroStuff” website—effectively the Voltron of neuroimaging tools—and I even went so far as to register neurostuff.org a while back. When we mentioned this to the Elsevier execs (intending it as a joke), we were surprised at their positive response! The end result (after a lot of discussion) is that Neurosynth, NeuroVault, and OpenfMRI will be merging into The NeuroStuff Collection, by Elsevier (or just NeuroStuff for short)–all coming in late 2016!

Admittedly, right now we don’t have a whole lot to show for all these plans, except for a nifty logo created by Daniel (and reluctantly approved by Elsevier—I think they might already be rethinking this whole enterprise). But we’ll be rolling out some amazing new services in the very near future. We also have some amazing collaborative projects that will be announced in the next few weeks, well ahead of the full launch. A particularly exciting one that I’m at liberty to mention** is that next year, EOS will be teaming up with Brian Nosek and folks at the Center for Open Science (COS) in Charlottesville to create a new preregistration publication stream. All successful preregistered projects uploaded to the COS’s flagship Open Science Framework (OSF) will be eligible, at the push of a button, for publication in EOS’s new online-only journal Preregistrations. Submission fees will be competitive with the very cheapest OA journals (think along the lines of PeerJ’s $99 lifetime subscription model).

It’s been a great ride working on Neurosynth for the past 5 years, and I hope you’ll all keep using (and contributing to) Neurosynth in its new incarnation as Elsevier NeuroStuff!

* Okay, there’s no point in denying it—there was also some money involved.

** See? Money can’t get in the way of open science—I can talk about whatever I want!

Now I am become DOI, destroyer of gatekeeping worlds

Digital object identifiers (DOIs) are much sought-after commodities in the world of academic publishing. If you’ve never seen one, a DOI is a unique string associated with a particular digital object (most commonly a publication of some kind) that lets the internet know where to find the stuff you’ve written. For example, say you want to know where you can get a hold of an article titled, oh, say, Designing next-generation platforms for evaluating scientific output: what scientists can learn from the social web. In the real world, you’d probably go to Google, type that title in, and within three or four clicks, you’d arrive at the document you’re looking for. As it turns out, the world of formal resource location is fairly similar to the real world, except that instead of using Google, you go to a website called dx.DOI.org, and then you plug in the string ‘10.3389/fncom.2012.00072’, which is the DOI associated with the aforementioned article. And then, poof, you’re automagically linked directly to the original document, upon which you can gaze in great awe for as long as you feel comfortable.

Historically, DOIs have almost exclusively been issued by official-type publishers: Elsevier, Wiley, PLoS and such. Consequently, DOIs have had a reputation as a minor badge of distinction–probably because you’d traditionally only get one if your work was perceived to be important enough for publication in a journal that was (at least nominally) peer-reviewed. And perhaps because of this tendency to view the presence of a DOIs as something like an implicit seal of approval from the Great Sky Guild of Academic Publishing, many journals impose official or unofficial commandments to the effect that, when writing a paper, one shalt only citeth that which hath been DOI-ified. For example, here’s a boilerplate Elsevier statement regarding references (in this case, taken from the Neuron author guidelines):

References should include only articles that are published or in press. For references to in press articles, please confirm with the cited journal that the article is in fact accepted and in press and include a DOI number and online publication date. Unpublished data, submitted manuscripts, abstracts, and personal communications should be cited within the text only.

This seems reasonable enough until you realize that citations that occur “within the text only” aren’t very useful, because they’re ignored by virtually all formal citation indices. You want to cite a blog post in your Neuron paper and make sure it counts? Well, you can’t! Blog posts don’t have DOIs! You want to cite a what? A tweet? That’s just crazy talk! Tweets are 140 characters! You can’t possibly cite a tweet; the citation would be longer than the tweet itself!

The injunction against citing DOI-less documents is unfortunate, because people deserve to get credit for the interesting things they say–and it turns out that they have, on rare occasion, been known to say interesting things in formats other than the traditional peer-reviewed journal article. I’m pretty sure if Mark Twain were alive today, he’d write the best tweets EVER. Well, maybe it would be a tie between Mark Twain and the NIH Bear. But Mark Twain would definitely be up there. And he’d probably write some insightful blog posts too. And then, one imagines that other people would probably want to cite this brilliant 21st-century man of letters named @MarkTwain in their work. Only they wouldn’t be allowed to, you see, because 21st-century Mark Twain doesn’t publish all, or even most, of his work in traditional pre-publication peer-reviewed journals. He’s too impatient to rinse-and-repeat his way through the revise-and-resubmit process every time he wants to share a new idea with the world, even when those ideas are valuable. 21st-century @MarkTwain just wants his stuff out there already where people can see it.

Why does Elsevier hate 21st-century Mark Twain, you ask? I don’t know. But in general, I think there are two main reasons for the disdain many people seem to feel at the thought of allowing authors to freely cite DOI-less objects in academic papers. The first reason has to do with permanence—or lack thereof. The concern here is that if we allowed everyone to cite just any old web page, blog post, or tweet in academic articles, there would be no guarantee that those objects would still be around by the time the citing work was published, let alone several years hence. Which means that readers might be faced with a bunch of dead links. And dead links are not very good at backing up scientific arguments. In principle, the DOI requirement is supposed to act like some kind of safety word that protects a citation from the ravages of time—presumably because having a DOI means the cited work is important enough for the watchful eye of Sauron Elsevier to periodically scan across it and verify that it hasn’t yet fallen off of the internet’s cliffside.

The second reason has to do with quality. Here, the worry is that we can’t just have authors citing any old opinion someone else published somewhere on the web, because, well, think of the children! Terrible things would surely happen if we allowed authors to link to unverified and unreviewed works. What would stop me from, say, writing a paper criticizing the idea that human activity is contributing to climate change, and supporting my argument with “citations” to random pages I’ve found via creative Google searches? For that matter, what safeguard would prevent a brazen act of sockpuppetry in which I cite a bunch of pages that I myself have (anonymously) written? Loosening the injunction against formally citing non-peer-reviewed work seems tantamount to inviting every troll on the internet to a formal academic dinner.

To be fair, I think there’s some merit to both of these concerns. Or at least, I think there used to be some merit to these concerns. Back when the internet was a wee nascent flaky thing winking in and out of existence every time a dial-up modem connection went down, it made sense to worry about permanence (I mean, just think: if we had allowed people to cite GeoCities webpages in published articles, every last one of those citations links would now be dead!) And similarly, back in the days when peer review was an elite sort of activity that could only be practiced by dignified gentlepersons at the cordial behest of a right honorable journal editor, it probably made good sense to worry about quality control. But the merits of such concerns have now largely disappeared, because we now live in a world of marvelous technology, where bits of information cost virtually nothing to preserve forever, and a new post-publication platform that allows anyone to review just about any academic work in existence seems to pop up every other week (cf. PubPeer, PubMed Commons, Publons, etc.). In the modern world, nothing ever goes out of print, and if you want to know what a whole bunch of experts think about something, you just have to ask them about it on Twitter.

Which brings me to this blog post. Or paper. Whatever you want to call it. It was first published on my blog. You can find it–or at least, you could find it at one point in time–at the following URL: http://www.talyarkoni.org/blog/2015/03/04/now-i-am-become-doi-destroyer-of-gates.

Unfortunately, there’s a small problem with this URL: it contains nary a DOI in sight. Really. None of the eleventy billion possible substrings in it look anything like a DOI. You can even scramble the characters if you like; I don’t care. You’re still not going to find one. Which means that most journals won’t allow you to officially cite this blog post in your academic writing. Or any other post, for that matter. You can’t cite my post about statistical power and magical sample sizes; you can’t cite Joe Simmons’ Data Colada post about Mturk and effect sizes; you can’t cite Sanjay Srivastava’s discussion of replication and falsifiability; and so on ad infinitum. Which is a shame, because it’s a reasonably safe bet that there are at least one or two citation-worthy nuggets of information trapped in some of those blog posts (or millions of others), and there’s no reason to believe that these nuggets must all have readily-discoverable analogs somewhere in the “formal” scientific literature. As the Elsevier author guidelines would have it, the appropriate course of action in such cases is to acknowledge the source of an idea or finding in the text of the article, but not to grant any other kind of formal credit.

Now, typically, this is where the story would end. The URL can’t be formally cited in an Elsevier article; end of story. BUT! In this case, the story doesn’t quite end there. A strange thing happens! A short time after it appears on my blog, this post also appears–in virtually identical form–on something called The Winnower, which isn’t a blog at all, but rather, a respectable-looking alternative platform for scientific publication and evaluation.

Even more strangely, on The Winnower, a mysterious-looking set of characters appear alongside the text. For technical reasons, I can’t tell you what the set of characters actually is (because it isn’t assigned until this piece is published!). But I can tell you that it starts with “10.15200/winn”. And I can also tell you what it is: It’s a DOI! It’s one bona fide free DOI, courtesy of The Winnower. I didn’t have to pay for it, or barter any of my services for it, or sign away any little pieces of my soul to get it*. I just installed a WordPress plugin, pressed a few buttons, and… poof, instant DOI. So now this is, proudly, one of the world’s first N (where N is some smallish number probably below 1000) blog posts to dress itself up in a nice DOI (Figure 1). Presumably because it’s getting ready for a wild night out on the academic town.

sticks and stones may break my bones, but DOIs make me feel pretty
Figure 1. Effects of assigning DOIs to blog posts: an anthropomorphic depiction. (A) A DOI-less blog post feels exposed and inadequate; it envies its more reputable counterparts and languishes in a state of torpor and existential disarray. (B) Freshly clothed in a newly-minted DOI, the same blog post feels confident, charismatic, and alert. Brimming with energy, it eagerly awaits the opportunity to move mountains and reshape scientific discourse. Also, it has longer arms.

Does the mere fact that my blog post now has a DOI actually change anything, as far as the citation rules go? I don’t know. I have no idea if publishers like Elsevier will let you officially cite this piece in an article in one of their journals. I would guess not, but I strongly encourage you to try it anyway (in fact, I’m willing to let you try to cite this piece in every paper you write for the next year or so—that’s the kind of big-hearted sacrifice I’m willing to make in the name of science). But I do think it solves both the permanence and quality control issues that are, in theory, the whole reason for journals having a no-DOI-no-shoes-no-service policy in the first place.

How? Well, it solves the permanence problem because The Winnower is a participant in the CLOCKSS archive, which means that if The Winnower ever goes out of business (a prospect that, let’s face it, became a little bit more likely the moment this piece appeared on their site), this piece will be immediately, freely, and automatically made available to the worldwide community in perpetuity via the associated DOI. So you don’t need to trust the safety of my blog—or even The Winnower—any more. This piece is here to stay forever! Rejoice in the cheapness of digital information and librarians’ obsession with archiving everything!

As for the quality argument, well, clearly, this here is not what you would call a high-quality academic work. But I still think you should be allowed to cite it wherever and whenever you want. Why? For several reasons. First, it’s not exactly difficult to determine whether or not it’s a high-quality academic work—even if you’re not willing to exercise your own judgment. When you link to a publication on The Winnower, you aren’t just linking to a paper; you’re also linking to a review platform. And the reviews are very prominently associated with the paper. If you dislike this piece, you can use the comment form to indicate exactly why you dislike it (if you like it, you don’t need to write a comment; instead, send an envelope stuffed with money to my home address).

Second, it’s not at all clear that banning citations to non-prepublication-reviewed materials accomplishes anything useful in the way of quality control. The reliability of the peer-review process is sufficiently low that there is simply no way for it to consistently sort the good from the bad. The problem is compounded by the fact that rejected manuscripts are rarely discarded forever; typically, they’re quickly resubmitted to another journal. The bibliometric literature shows that it’s possible to publish almost anything in the peer-reviewed literature given enough persistence.

Third, I suspect—though I have no data to support this claim—that a worldview that treats having passed peer review and/or receiving a DOI as markers of scientific quality is actually counterproductive to scientific progress, because it promotes a lackadaisical attitude on the part of researchers. A reader who believes that a claim is significantly more likely to be true in virtue of having a DOI is a reader who is slightly less likely to take the extra time to directly evaluate the evidence for that claim. The reality, unfortunately, is that most scientific claims are wrong, because the world is complicated and science is hard. Pretending that there is some reasonably accurate mechanism that can sort all possible sources into reliable and unreliable buckets—even to a first order of approximation—is misleading at best and dangerous at worst. Of course, I’m not suggesting that you can’t trust a paper’s conclusions unless you’ve read every work it cites in detail (I don’t believe I’ve ever done that for any paper!). I’m just saying that you can’t abdicate the responsibility of evaluating the evidence to some shapeless, anonymous mass of “reviewers”. If I decide not to chase down the Smith & Smith (2007) paper that Jones & Jones (2008) cite as critical support for their argument, I shouldn’t be able to turn around later and say something like “hey, Smith & Smith (2007) was peer reviewed, so it’s not my fault for not bothering to read it!”

So where does that leave us? Well, if you’ve read this far, and agree with most or all of the above arguments, I hope I can convince you of one more tiny claim. Namely, that this piece represents (a big part of) the future of academic publishing. Not this particular piece, of course; I mean the general practice of (a) assigning unique identifiers to digital objects, (b) preserving those objects for all posterity in a centralized archive, and (c) allowing researchers to cite any and all such objects in their work however they like. (We could perhaps also add (d) working very hard to promote centralized “post-publication” peer review of all of those objects–but that’s a story for another day.)

These are not new ideas, mind you. People have been calling for a long time for a move away from a traditional gatekeeping-oriented model of pre-publication review and towards more open publication and evaluation models. These calls have intensified in recent years; for instance, in 2012, a special topic in Frontiers in Computational Neuroscience featured 18 different papers that all independently advocated for very similar post-publication review models. Even the actual attachment of DOIs to blog posts isn’t new; as a case in point, consider that C. Titus Brown—in typical pioneering form—was already experimenting with ways to automatically DOIfy his blog posts via FigShare way back in the same dark ages of 2012. What is new, though, is the emergence and widespread adoption of platforms like The Winnower, FigShare, or Research Gate that make it increasingly easy to assign a DOI to academically-relevant works other than traditional journal articles. Thanks to such services, you can now quickly and effortlessly attach a DOI to your open-source software packages, technical manuals and white papers, conference posters, or virtually any other kind of digital document.

Once such efforts really start to pick up steam—perhaps even in the next two or three years—I think there’s a good chance we’ll fall into a positive feedback loop, because it will become increasingly clear that for many kinds of scientific findings or observations, there’s simply nothing to be gained by going through the cumbersome, time-consuming conventional peer review process. To the contrary, there will be all kinds of incentives for researchers to publish their work as soon as they feel it’s ready to share. I mean, look, I can write blog posts a lot faster than I can write traditional academic papers. Which means that if I write, say, one DOI-adorned blog post a month, my Google Scholar profile is going to look a lot bulkier a year from now, at essentially no extra effort or cost (since I’m going to write those blog posts anyway!). In fact, since services like The Winnower and FigShare can assign DOIs to documents retroactively, you might not even have to wait that long. Check back this time next week, and I might have a dozen new indexed publications! And if some of these get cited—whether in “real” journals or on other indexed blog posts—they’ll then be contributing to my citation count and h-index too (at least on Google Scholar). What are you going to do to keep up?

Now, this may all seem a bit off-putting if you’re used to thinking of scientific publication as a relatively formal, laborious process, where two or three experts have to sign off on what you’ve written before it gets to count for anything. If you’ve grown comfortable with the idea that there are “real” scientific contributions on the one hand, and a blooming, buzzing confusion of second-rate opinions on the other, you might find the move to suddenly make everything part of the formal record somewhat disorienting. It might even feel like some people (like, say, me) are actively trying to game the very system that separates science from tabloid news. But I think that’s the wrong perspective. I don’t think anybody—certainly not me—is looking to get rid of peer review. What many people are actively working towards are alternative models of peer review that will almost certainly work better.

The right perspective, I would argue, is to embrace the benefits of technology and seek out new evaluation models that emphasize open, collaborative review by the community as a whole instead of closed pro forma review by two or three semi-randomly selected experts. We now live in an era where new scientific results can be instantly shared at essentially no cost, and where sophisticated collaborative filtering algorithms and carefully constructed reputation systems can potentially support truly community-driven, quantitatively-grounded open peer review on a massive scale. In such an environment, there are few legitimate excuses for sticking with archaic publication and evaluation models—only the familiar, comforting pull of the status quo. Viewed in this light, using technology to get around the limitations of old gatekeeper-based models of scientific publication isn’t gaming the system; it’s actively changing the system—in ways that will ultimately benefit us all. And in that context, the humble self-assigned DOI may ultimately become—to liberally paraphrase Robert Oppenheimer and the Bhagavad Gita—one of the destroyers of the old gatekeeping world.

strong opinions about data sharing mandates–mine included

Apparently, many scientists have rather strong feelings about data sharing mandates. In the wake of PLOS’s recent announcement–which says that, effective now, all papers published in PLOS journals must deposit their data in a publicly accessible location–a veritable gaggle of scientists have taken to their blogs to voice their outrage and/or support for the policy. The nays have posts like DrugMonkey’s complaint that the inmates are running the asylum at PLOS (more choice posts are here, here, here, and here); the yays have Edmund Hart telling the nays to get over themselves and share their data (more posts here, here, and here). While I’m a bit late to the party (mostly because I’ve been traveling and otherwise indisposed), I guess I’ll go ahead and throw my hat into the ring in support of data sharing mandates. For a number of reasons outlined below, I think time will show the anti-PLOS folks to very clearly be on the wrong side of this issue.

Mandatory public deposition is like, totally way better than a “share-upon-request” approach

You might think that proactive data deposition has little incremental utility over a philosophy of sharing one’s data upon request, since emails are these wordy little things that only take a few minutes of a data-seeker’s time to write. But it’s not just the time and effort that matter. It’s also the psychology and technology. Psychology, because if you don’t know the person on the other end, or if the data is potentially useful but not essential to you, or if you’re the agreeable sort who doesn’t like to bother other people, it’s very easy to just say, “nah, I’ll just go do something else”. Scientists are busy people. If a dataset is a click away, many people will be happy to download that dataset and play with it who wouldn’t feel comfortable emailing the author to ask for it. Technology, because data that isn’t publicly available is data that isn’t publicly indexed. It’s all well and good to say that if someone really wants a dataset, they can email you to ask for it, but if someone doesn’t know about your dataset in the first place–because it isn’t in the first three pages of Google results–they’re going to have a hard time asking.

People don’t actually share on request

Much of the criticism of the PLoS data sharing policy rests on the notion that the policy is unnecessary, because in practice most journals already mandate that authors must share their data upon request. One point that defenders of the PLOS mandate haven’t stressed enough is that such “soft” mandates are largely meaningless. Empirical studies have repeatedly demonstrated  that it’s actually very difficult  to get authors to share their data upon request —even when they’re obligated to do so by the contractual agreement they’ve signed with a publisher. And when researchers do fulfill data sharing requests, they often take inordinately long to do so, and the data often don’t line up properly with what was reported in the paper (as the PLOS editors noted in their explanation for introducing the policy), or reveal potentially serious errors.

Personally, I have to confess that I often haven’t fulfilled other researchers’ requests for my data–and in at least two cases, I never even responded to the request. These failures to share didn’t reflect my desire to hide anything; they occurred largely because I knew it would be a lot of work, and/or the data were no longer readily accessible to me, and/or I was too busy to take care of the request right when it came in. I think I’m sufficiently aware of my own character flaws to know that good intentions are no match for time pressure and divided attention–and that’s precisely why I’d rather submit my work to journals that force me to do the tedious curation work up front, when I have a strong incentive to do it, rather than later, when I don’t.

Comprehensive evaluation requires access to the data

It’s hard to escape the feeling that some of the push-back against the policy is actually rooted in the fear that other researchers will find mistakes in one’s work by going through one’s data. In some cases, this fear is made explicit. For example, DrugMonkey suggested that:

There will be efforts to say that the way lab X deals with their, e.g., fear conditioning trials, is not acceptable and they MUST do it the way lab Y does it. Keep in mind that this is never going to be single labs but rather clusters of lab methods traditions. So we’ll have PLoS inserting itself in the role of how experiments are to be conducted and interpreted!

This rather dire premonition prompted a commenter to ask if it’s possible that DM might ever be wrong about what his data means–necessitating other pairs of eyes and/or opinions. DM’s response was, in essence, “No.”. But clearly, this is wishful thinking: we have plenty of reasons to think that everyone in science–even the luminaries among us–make mistakes all the time. Science is hard. In the fields I’m most familiar with, I rarely read a paper that I don’t feel has some serious flaws–even though nearly all of these papers were written by people who have, in DM’s words, “been at this for a while”. By the same token, I’m certain that other people read each of my papers and feel exactly the same way. Of course, it’s not pleasant to confront our mistakes by putting everything out into the open, and I don’t doubt that one consequence of sharing data proactively is that error-finding will indeed become much more common. At least initially (i.e., until we develop an appreciation for the true rate of error in the average dataset, and become more tolerant of minor problems), this will probably cause everyone some discomfort. But temporary discomfort surely isn’t a good excuse to continue to support practices that clearly impede scientific progress.

Part of the problem, I suspect, is that scientists have collectively internalized as acceptable many practices that are on some level clearly not good for the community as a whole. To take just one example, it’s an open secret in biomedical science that so-called “representative figures” (of spiking neurons, Western blots, or whatever else you like) are rarely truly representative. Frequently, they’re actually among the best examples the authors of a paper were able to find. The communal wink-and-shake agreement to ignore this kind of problem is deeply problematic, in that it likely allows many claims to go unchallenged that are actually not strongly supported by the data. In a world where other researchers could easily go through my dataset and show that the “representative” raster plot I presented in Figure 2C was actually the best case rather than the norm, I would probably have to be more careful about making that kind of claim up front–and someone else might not waste a lot of their time chasing results that can’t possibly be as good as my figures make them look.

Figure 1.  A representative planet.

The Data are a part of the Methods

If you still don’t find this convincing, consider that one could easily have applied nearly all of the arguments people having been making in the blogosphere these past two weeks to that dastardly scientific timesink that is the common Methods sections. Imagine that we lived in a culture where scientists always reported their Results telegraphically–that is, with the brevity of a typical Nature or Science paper, but without the accompanying novel’s worth of Supplementary Methods. Then, when someone first suggested that it might perhaps be a good idea to introduce a separate section that describes in dry, technical language how authors actually produced all those exciting results, we would presumably see many people in the community saying something like the following:

Why should I bother to tell you in excruciating detail what software, reagents, and stimuli I used in my study? The vast majority of readers will never try to directly replicate my experiment, and those who do want to can just email me to get the information they need–which of course I’m always happy to provide in a timely and completely disinterested fashion. Asking me to proactively lay out every little methodological step I took is really unreasonable; it would take a very long time to write a clear “Methods” section of the kind you propose, and the benefits seem very dubious. I mean, the only thing that will happen if I adopt this new policy is that half of my competitors will start going through this new section with a fine-toothed comb in order to find problems, and the other half will now be able to scoop me by repeating the exact procedures I used before I have a chance to follow them up myself! And for what? Why do I need to tell everyone exactly what I did? I’m an expert with many years of experience in this field! I know what I’m doing, and I don’t appreciate your casting aspersions on my work and implying that my conclusions might not always be 100% sound!

As far as I can see, there isn’t any qualitative difference between reporting detailed Methods and providing comprehensive Data. In point of fact, many decisions about which methods one should use depend entirely on the nature of the data, so it’s often actually impossible to evaluate the methodological choices the authors made without seeing their data. If DrugMonkey et al think it’s crazy for one researcher to want access to another researcher’s data in order to determine whether the distribution of some variable looks normal, they should also think it’s crazy for researchers to have to report their reasoning for choosing a particular transformation in the first place. Or for using a particular reagent. Or animal strain. Or learning algorithm, or… you get the idea. But as Bjorn Brembs succinctly put it, in the digital age, this is silly: for all intents and purposes, there’s no longer any difference between text and data.

The data are funded by the taxpayers, and (in some sense) belong to the taxpayers

People vary widely in the extent to which they feel the public deserves to have access to the products of the work it funds. I don’t think I hold a particularly extreme position in this regard, in the sense that I don’t think the mere fact that someone’s effort is funded by the public automatically means any of their products should be publicly available for anyone’s perusal or use. However, when we’re talking about scientific data–where the explicit rationale for funding the work is to produce new generalizable knowledge, and where the marginal cost of replicating digital data is close to zero–I really don’t see any reason not to push very strongly to force scientists to share their data. I’m sympathetic to claims about scooping and credit assignment, but as a number of other folks have pointed out in comment threads, these are fundamentally arguments in favor of better credit assignment, and not arguments against sharing data. The fear some people have of being scooped is not sufficient justification for impeding our collective scientific progress.

It’s also worth noting that, in principle, PLOS’s new data sharing policy shouldn’t actually make it any easier for someone else to scoop you. Remember that under PLOS’s current data sharing mandate–as well as the equivalent policies at most other scientific journals–authors are already required to provide their data to anyone else upon request. Critics who argue that the new public archiving mandate opens the door to being scooped are in effect admitting that the old mandate to share upon request doesn’t work, because in theory there already shouldn’t really be anything preventing me from scooping you with your data simply by asking you for it (other than social norms–but then, the people who are actively out to usurp others’ ideas are the least likely to abide by those norms anyway). It’s striking to see how many of the posts defending the “share-upon-request” approach have no compunction in saying that they’re currently only willing to share their data after determining what the person on the other end wants to use it for–in clear violation of most journals’ existing policy.

It’s really not that hard

Organizing one’s data or code in a form minimally suitable for public consumption isn’t much fun. I do it fairly regularly; I know it sucks. It takes some time out of your day, and requires you to allocate resources to the problem that could otherwise be directed elsewhere. That said, a lot of the posts complaining about how much effort the new policy requires seem absurdly overwrought. There seems to be a widespread belief–which, as far as I can tell, isn’t supported by a careful reading of the actual PLOS policy–that there’s some incredibly strict standard that datasets have to live up to before pulic release. I don’t really understand where this concern comes from. Personally, I spend much of my time analyzing data other people have collected. I’ve worked with many other people’s data, and rarely is it in exactly the form I would like. Often times it’s not even in the ballpark of what I’d like. And I’ve had to invest a considerable amount of my time understanding what columns and rows mean, and scrounging for morsels of (poor) documentation. My working assumption when I do this–and, I think, most other people’s–is that the onus is on me to expend some effort figuring out what’s in a dataset I wish to use, and not on the author to release that dataset in a form that a completely naive person could understand without any effort. Of course it would be nice if everyone put their data up on the web in a form that maximized accessibility, but it certainly isn’t expected*. In asking authors to deposit their data publicly, PLOS isn’t asserting that there’s a specific format or standard that all data must meet; they’re just saying data must meet accepted norms. Since those norms depend on one’s field, it stands to reason that expectations will be lower for a 10-TB fMRI dataset than for an 800-row spreadsheet of behavioral data.

There are some valid concerns, but…

I don’t want to sound too Pollyannaish about all this. I’m not suggesting that the PLOS policy is perfect, or that issues won’t arise in the course of its implementation and enforcement. It’s very clear that there are some domains in which data sharing is a hassle, and I sympathize with the people who’ve pointed out that it’s not really clear what “all” the data means–is it the raw data, which aren’t likely to be very useful to anyone, or the post-processed data, which may be too close to the results reported in the paper? But such domain- or case-specific concerns are grossly outweighed by the very general observation that it’s often impossible to evaluate previous findings adequately, or to build a truly replicable science, if you don’t have access to other scientists’ data. There’s no doubt that edge cases will arise in the course of enforcing the new policy. But they’ll be dealt with on a case-by-case basis, exactly as the PLOS policy indicates. In the meantime, our default assumption should be that editors at PLOS–who are, after all, also working scientists–will behave reasonably, since they face many of the same considerations in their own research. When a researcher tells an editor that she doesn’t have anywhere to put the 50 TB of raw data for her imaging study, I expect that that editor will typically respond by saying, “fine, but surely you can drag and drop a directory full of the first- and second-level beta images, along with a basic description, into NeuroVault, right?”, and not “Whut!? No raw DICOM images, no publication!”

As for the people who worry that by sharing their data, they’ll be giving away a competitive advantage… to be honest, I think many of these folks are mistaken about the dire consequences that would ensue if they shared their data publicly. I suspect that many of the researchers in question would be pleasantly surprised at the benefits of data sharing (increased citation rates, new offers of collaboration, etc.) Still, it’s clear enough that some of the people who’ve done very well for themselves in the current scientific system–typically by leveraging some incredibly difficult-to-acquire dataset into a cottage industry of derivative studies–would indeed do much less well in a world where open data sharing was mandatory. What I fail to see, though, is why PLOS, or the scientific community as a whole, should care very much about this latter group’s concerns. As far as I can tell, PLOS’s new policy is a significant net positive for the scientific community as a whole, even if it hurts one segment of that community in the short term. For the moment, scientists who harbor proprietary attitudes towards their data can vote with their feet by submitting their papers somewhere other than PLOS. Contrary to the dire premonitions floating around, I very much doubt any potential drop in submissions is going to deliver a terminal blow to PLOS (and the upside is that the articles that do get published in PLOS will arguably be of higher quality). In the medium-to-long term, I suspect that cultural norms surrounding who gets credit for acquiring and sharing data vs. analyzing and reporting new findings based on those data are are going to undergo a sea change–to the point where in the not-too-distant future, the scoopophobia that currently drives many people to privately hoard their data is a complete non-factor. At that point, it’ll be seen as just plain common sense that if you want your scientific assertions to be taken seriously, you need to make the data used to support those assertions available for public scrutiny, re-analysis, and re-use.

 

* As a case in point, just yesterday I came across a publicly accessible dataset I really wanted to use, but that was in SPSS format. I don’t own a copy of SPSS, so I spent about an hour trying to get various third-party libraries to extract the data appropriately, without any luck. So eventually I sent the file to a colleague who was helpful enough to convert it. My first thought when I received the tab-delimited file in my mailbox this morning was not “ugh, I can’t believe they released the file in SPSS”, it was “how amazing is it that I can download this gigantic dataset acquired half the world away instantly, and with just one minor hiccup, be able to test a novel hypothesis in a high-powered way without needing to spend months of time collecting data?”

the truth is not optional: five bad reasons (and one mediocre one) for defending the status quo

You could be forgiven for thinking that academic psychologists have all suddenly turned into professional whistleblowers. Everywhere you look, interesting new papers are cropping up purporting to describe this or that common-yet-shady methodological practice, and telling us what we can collectively do to solve the problem and improve the quality of the published literature. In just the last year or so, Uri Simonsohn introduced new techniques for detecting fraud, and used those tools to identify at least 3 cases of high-profile, unabashed data forgery. Simmons and colleagues reported simulations demonstrating that standard exploitation of research degrees of freedom in analysis can produce extremely high rates of false positive findings. Pashler and colleagues developed a “Psych file drawer” repository for tracking replication attempts. Several researchers raised trenchant questions about the veracity and/or magnitude of many high-profile psychological findings such as John Bargh’s famous social priming effects. Wicherts and colleagues showed that authors of psychology articles who are less willing to share their data upon request are more likely to make basic statistical errors in their papers. And so on and so forth. The flood shows no signs of abating; just last week, the APS journal Perspectives in Psychological Science announced that it’s introducing a new “Registered Replication Report” section that will commit to publishing pre-registered high-quality replication attempts, irrespective of their outcome.

Personally, I think these are all very welcome developments for psychological science. They’re solid indications that we psychologists are going to be able to police ourselves successfully in the face of some pretty serious problems, and they bode well for the long-term health of our discipline. My sense is that the majority of other researchers–perhaps the vast majority–share this sentiment. Still, as with any zeitgeist shift, there are always naysayers. In discussing these various developments and initiatives with other people, I’ve found myself arguing, with somewhat surprising frequency, with people who for various reasons think it’s not such a good thing that Uri Simonsohn is trying to catch fraudsters, or that social priming findings are being questioned, or that the consequences of flexible analyses are being exposed. Since many of the arguments I’ve come across tend to recur, I thought I’d summarize the most common ones here–along with the rebuttals I usually offer for why, with one possible exception, the arguments for giving a pass to sloppy-but-common methodological practices are not very compelling.

“But everyone does it, so how bad can it be?”

We typically assume that long-standing conventions must exist for some good reason, so when someone raises doubts about some widespread practice, it’s quite natural to question the person raising the doubts rather than the practice itself. Could it really, truly be (we say) that there’s something deeply strange and misguided about using p values? Is it really possible that the reporting practices converged on by thousands of researchers in tens of thousands of neuroimaging articles might leave something to be desired? Could failing to correct for the many researcher degrees of freedom associated with most datasets really inflate the false positive rate so dramatically?

The answer to all these questions, of course, is yes–or at least, we should allow that it could be yes. It is, in principle, entirely possible for an entire scientific field to regularly do things in a way that isn’t very good. There are domains where appeals to convention or consensus make perfect sense, because there are few good reasons to do things a certain way except inasmuch as other people do them the same way. If everyone else in your country drives on the right side of the road, you may want to consider driving on the right side of the road too. But science is not one of those domains. In science, there is no intrinsic benefit to doing things just for the sake of convention. In fact, almost by definition, major scientific advances are ones that tend to buck convention and suggest things that other researchers may not have considered possible or likely.

In the context of common methodological practice, it’s no defense at all to say but everyone does it this way, because there are usually relatively objective standards by which we can gauge the quality of our methods, and it’s readily apparent that there are many cases where the consensus approach leave something to be desired. For instance, you can’t really justify failing to correct for multiple comparisons when you report a single test that’s just barely significant at p < .05 on the grounds that nobody else corrects for multiple comparisons in your field. That may be a valid explanation for why your paper successfully got published (i.e., reviewers didn’t want to hold your feet to the fire for something they themselves are guilty of in their own work), but it’s not a valid defense of the actual science. If you run a t-test on randomly generated data 20 times, you will, on average, get a significant result, p < .05, once. It does no one any good to argue that because the convention in a field is to allow multiple testing–or to ignore statistical power, or to report only p values and not effect sizes, or to omit mention of conditions that didn’t ‘work’, and so on–it’s okay to ignore the issue. There’s a perfectly reasonable question as to whether it’s a smart career move to start imposing methodological rigor on your work unilaterally (see below), but there’s no question that the mere presence of consensus or convention surrounding a methodological practice does not make that practice okay from a scientific standpoint.

“But psychology would break if we could only report results that were truly predicted a priori!”

This is a defense that has some plausibility at first blush. It’s certainly true that if you force researchers to correct for multiple comparisons properly, and report the many analyses they actually conducted–and not just those that “worked”–a lot of stuff that used to get through the filter will now get caught in the net. So, by definition, it would be harder to detect unexpected effects in one’s data–even when those unexpected effects are, in some sense, ‘real’. But the important thing to keep in mind is that raising the bar for what constitutes a believable finding doesn’t actually prevent researchers from discovering unexpected new effects; all it means is that it becomes harder to report post-hoc results as pre-hoc results. It’s not at all clear why forcing researchers to put in more effort validating their own unexpected finding is a bad thing.

In fact, forcing researchers to go the extra mile in this way would have one exceedingly important benefit for the field as a whole: it would shift the onus of determining whether an unexpected result is plausible enough to warrant pursuing away from the community as a whole, and towards the individual researcher who discovered the result in the first place. As it stands right now, if I discover an unexpected result (p < .05!) that I can make up a compelling story for, there’s a reasonable chance I might be able to get that single result into a short paper in, say, Psychological Science. And reap all the benefits that attend getting a paper into a “high-impact” journal. So in practice there’s very little penalty to publishing questionable results, even if I myself am not entirely (or even mostly) convinced that those results are reliable. This state of affairs is, to put it mildly, not A Good Thing.

In contrast, if you as an editor or reviewer start insisting that I run another study that directly tests and replicates my unexpected finding before you’re willing to publish my result, I now actually have something at stake. Because it takes time and money to run new studies, I’m probably not going to bother to follow up on my unexpected finding unless I really believe it. Which is exactly as it should be: I’m the guy who discovered the effect, and I know about all the corners I have or haven’t cut in order to produce it; so if anyone should make the decision about whether to spend more taxpayer money chasing the result, it should be me. You, as the reviewer, are not in a great position to know how plausible the effect truly is, because you have no idea how many different types of analyses I attempted before I got something to ‘work’, or how many failed studies I ran that I didn’t tell you about. Given the huge asymmetry in information, it seems perfectly reasonable for reviewers to say, You think you have a really cool and unexpected effect that you found a compelling story for? Great; go and directly replicate it yourself and then we’ll talk.

“But mistakes happen, and people could get falsely accused!”

Some people don’t like the idea of a guy like Simonsohn running around and busting people’s data fabrication operations for the simple reason that they worry that the kind of approach Simonsohn used to detect fraud is just not that well-tested, and that if we’re not careful, innocent people could get swept up in the net. I think this concern stems from fundamentally good intentions, but once again, I think it’s also misguided.

For one thing, it’s important to note that, despite all the press, Simonsohn hasn’t actually done anything qualitatively different from what other whistleblowers or skeptics have done in the past. He may have suggested new techniques that improve the efficiency with which cheating can be detected, but it’s not as though he invented the ability to report or investigate other researchers for suspected misconduct. Researchers suspicious of other researchers’ findings have always used qualitatively similar arguments to raise concerns. They’ve said things like, hey, look, this is a pattern of data that just couldn’t arise by chance, or, the numbers are too similar across different conditions.

More to the point, perhaps, no one is seriously suggesting that independent observers shouldn’t be allowed to raise their concerns about possible misconduct with journal editors, professional organizations, and universities. There really isn’t any viable alternative. Naysayers who worry that innocent people might end up ensnared by false accusations presumably aren’t suggesting that we do away with all of the existing mechanisms for ensuring accountability; but since the role of people like Simonsohn is only to raise suspicion and provide evidence (and not to do the actual investigating or firing), it’s clear that there’s no way to regulate this type of behavior even if we wanted to (which I would argue we don’t). If I wanted to spend the rest of my life scanning the statistical minutiae of psychology articles for evidence of misconduct and reporting it to the appropriate authorities (and I can assure you that I most certainly don’t), there would be nothing anyone could do to stop me, nor should there be. Remember that accusing someone of misconduct is something anyone can do, but establishing that misconduct has actually occurred is a serious task that requires careful internal investigation. No one–certainly not Simonsohn–is suggesting that a routine statistical test should be all it takes to end someone’s career. In fact, Simonsohn himself has noted that he identified a 4th case of likely fraud that he dutifully reported to the appropriate authorities only to be met with complete silence. Given all the incentives universities and journals have to look the other way when accusations of fraud are made, I suspect we should be much more concerned about the false negative rate than the false positive rate when it comes to fraud.

“But it hurts the public’s perception of our field!”

Sometimes people argue that even if the field does have some serious methodological problems, we still shouldn’t discuss them publicly, because doing so is likely to instill a somewhat negative view of psychological research in the public at large. The unspoken implication being that, if the public starts to lose confidence in psychology, fewer students will enroll in psychology courses, fewer faculty positions will be created to teach students, and grant funding to psychologists will decrease. So, by airing our dirty laundry in public, we’re only hurting ourselves. I had an email exchange with a well-known researcher to exactly this effect a few years back in the aftermath of the Vul et al “voodoo correlations” paper–a paper I commented on to the effect that the problem was even worse than suggested. The argument my correspondent raised was, in effect, that we (i.e., neuroimaging researchers) are all at the mercy of agencies like NIH to keep us employed, and if it starts to look like we’re clowning around, the unemployment rate for people with PhDs in cognitive neuroscience might start to rise precipitously.

While I obviously wouldn’t want anyone to lose their job or their funding solely because of a change in public perception, I can’t say I’m very sympathetic to this kind of argument. The problem is that it places short-term preservation of the status quo above both the long-term health of the field and the public’s interest. For one thing, I think you have to be quite optimistic to believe that some of the questionable methodological practices that are relatively widespread in psychology (data snooping, selective reporting, etc.) are going to sort themselves out naturally if we just look the other way and let nature run its course. The obvious reason for skepticism in this regard is that many of the same criticisms have been around for decades, and it’s not clear that anything much has improved. Maybe the best example of this is Gigerenzer and Sedlmeier’s 1989 paper entitled “Do studies of statistical power have an effect on the power of studies?“, in which the authors convincingly showed that despite three decades of work by luminaries like Jacob Cohen advocating power analyses, statistical power had not risen appreciably in psychology studies. The presence of such unwelcome demonstrations suggests that sweeping our problems under the rug in the hopes that someone (the mice?) will unobtrusively take care of them for us is wishful thinking.

In any case, even if problems did tend to solve themselves when hidden away from the prying eyes of the media and public, the bigger problem with what we might call the “saving face” defense is that it is, fundamentally, an abuse of taxypayers’ trust. As with so many other things, Richard Feynman summed up the issue eloquently in his famous Cargo Cult science commencement speech:

For example, I was a little surprised when I was talking to a friend who was going to go on the radio. He does work on cosmology and astronomy, and he wondered how he would explain what the applications of this work were. “Well,” I said, “there aren’t any.” He said, “Yes, but then we won’t get support for more research of this kind.” I think that’s kind of dishonest. If you’re representing yourself as a scientist, then you should explain to the layman what you’re doing–and if they don’t want to support you under those circumstances, then that’s their decision.

The fact of the matter is that our livelihoods as researchers depend directly on the goodwill of the public. And the taxpayers are not funding our research so that we can “discover” interesting-sounding but ultimately unreplicable effects. They’re funding our research so that we can learn more about the human mind and hopefully be able to fix it when it breaks. If a large part of the profession is routinely employing practices that are at odds with those goals, it’s not clear why taxpayers should be footing the bill. From this perspective, it might actually be a good thing for the field to revise its standards, even if (in the worst-case scenario) that causes a short-term contraction in employment.

“But unreliable effects will just fail to replicate, so what’s the big deal?”

This is a surprisingly common defense of sloppy methodology, maybe the single most common one. It’s also an enormous cop-out, since it pre-empts the need to think seriously about what you’re doing in the short term. The idea is that, since no single study is definitive, and a consensus about the reality or magnitude of most effects usually doesn’t develop until many studies have been conducted, it’s reasonable to impose a fairly low bar on initial reports and then wait and see what happens in subsequent replication efforts.

I think this is a nice ideal, but things just don’t seem to work out that way in practice. For one thing, there doesn’t seem to be much of a penalty for publishing high-profile results that later fail to replicate. The reason, I suspect, is that we incline to give researchers the benefit of the doubt: surely (we say to ourselves), Jane Doe did her best, and we like Jane, so why should we question the work she produces? If we’re really so skeptical about her findings, shouldn’t we go replicate them ourselves, or wait for someone else to do it?

While this seems like an agreeable and fair-minded attitude, it isn’t actually a terribly good way to look at things. Granted, if you really did put in your best effort–dotted all your i’s and crossed all your t’s–and still ended up reporting a false result, we shouldn’t punish you for it. I don’t think anyone is seriously suggesting that researchers who inadvertently publish false findings should be ostracized or shunned. On the other hand, it’s not clear why we should continue to celebrate scientists who ‘discover’ interesting effects that later turn out not to replicate. If someone builds a career on the discovery of one or more seemingly important findings, and those findings later turn out to be wrong, the appropriate attitude is to update our beliefs about the merit of that person’s work. As it stands, we rarely seem to do this.

In any case, the bigger problem with appeals to replication is that the delay between initial publication of an exciting finding and subsequent consensus disconfirmation can be very long, and often spans entire careers. Waiting decades for history to prove an influential idea wrong is a very bad idea if the available alternative is to nip the idea in the bud by requiring stronger evidence up front.

There are many notable examples of this in the literature. A well-publicized recent one is John Bargh’s work on the motor effects of priming people with elderly stereotypes–namely, that priming people with words related to old age makes them walk away from the experiment more slowly. Bargh’s original paper was published in 1996, and according to Google Scholar, has now been cited over 2,000 times. It has undoubtedly been hugely influential in directing many psychologists’ research programs in certain directions (in many cases, in directions that are equally counterintuitive and also now seem open to question). And yet it’s taken over 15 years for a consensus to develop that the original effect is at the very least much smaller in magnitude than originally reported, and potentially so small as to be, for all intents and purposes, “not real”. I don’t know who reviewed Bargh’s paper back in 1996, but I suspect that if they ever considered the seemingly implausible size of the effect being reported, they might have well thought to themselves, well, I’m not sure I believe it, but that’s okay–time will tell. Time did tell, of course; but time is kind of lazy, so it took fifteen years for it to tell. In an alternate universe, a reviewer might have said, well, this is a striking finding, but the effect seems implausibly large; I would like you to try to directly replicate it in your lab with a much larger sample first. I recognize that this is onerous and annoying, but my primary responsibility is to ensure that only reliable findings get into the literature, and inconveniencing you seems like a small price to pay. Plus, if the effect is really what you say it is, people will be all the more likely to believe you later on.

Or take the actor-observer asymmetry, which appears in just about every introductory psychology textbook written in the last 20 – 30 years. It states that people are relatively more likely to attribute their own behavior to situational factors, and relatively more likely to attribute other agents’ behaviors to those agents’ dispositions. When I slip and fall, it’s because the floor was wet; when you slip and fall, it’s because you’re dumb and clumsy. This putative asymmetry was introduced and discussed at length in a book by Jones and Nisbett in 1971, and hundreds of studies have investigated it at this point. And yet a 2006 meta-analysis by Malle suggested that the cumulative evidence for the actor-observer asymmetry is actually very weak. There are some specific circumstances under which you might see something like the postulated effect, but what is quite clear is that it’s nowhere near strong enough an effect to justify being routinely invoked by psychologists and even laypeople to explain individual episodes of behavior. Unfortunately, at this point it’s almost impossible to dislodge the actor-observer asymmetry from the psyche of most researchers–a reality underscored by the fact that the Jones and Nisbett book has been cited nearly 3,000 times, whereas the 1996 meta-analysis has been cited only 96 times (a very low rate for an important and well-executed meta-analysis published in Psychological Bulletin).

The fact that it can take many years–whether 15 or 45–for a literature to build up to the point where we’re even in a position to suggest with any confidence that an initially exciting finding could be wrong means that we should be very hesitant to appeal to long-term replication as an arbiter of truth. Replication may be the gold standard in the very long term, but in the short and medium term, appealing to replication is a huge cop-out. If you can see problems with an analysis right now that cast aspersions on a study’s results, it’s an abdication of responsibility to downplay your concerns and wait for someone else to come along and spend a lot more time and money trying to replicate the study. You should point out now why you have concerns. If the authors can address them, the results will look all the better for it. And if the authors can’t address your concerns, well, then, you’ve just done science a service. If it helps, don’t think of it as a matter of saying mean things about someone else’s work, or of asserting your own ego; think of it as potentially preventing a lot of very smart people from wasting a lot of time chasing down garden paths–and also saving a lot of taxpayer money. Remember that our job as scientists is not to make other scientists’ lives easy in the hopes they’ll repay the favor when we submit our own papers; it’s to establish and apply standards that produce convergence on the truth in the shortest amount of time possible.

“But it would hurt my career to be meticulously honest about everything I do!”

Unlike the other considerations listed above, I think the concern that being honest carries a price when it comes to do doing research has a good deal of merit to it. Given the aforementioned delay between initial publication and later disconfirmation of findings (which even in the best case is usually longer than the delay between obtaining a tenure-track position and coming up for tenure), researchers have many incentives to emphasize expediency and good story-telling over accuracy, and it would be disingenuous to suggest otherwise. No malevolence or outright fraud is implied here, mind you; the point is just that if you keep second-guessing and double-checking your analyses, or insist on routinely collecting more data than other researchers might think is necessary, you will very often find that results that could have made a bit of a splash given less rigor are actually not particularly interesting upon careful cross-examination. Which means that researchers who have, shall we say, less of a natural inclination to second-guess, double-check, and cross-examine their own work will, to some degree, be more likely to publish results that make a bit of a splash (it would be nice to believe that pre-publication peer review filters out sloppy work, but empirically, it just ain’t so). So this is a classic tragedy of the commons: what’s good for a given individual, career-wise, is clearly bad for the community as a whole.

I wish I had a good solution to this problem, but I don’t think there are any quick fixes. The long-term solution, as many people have observed, is to restructure the incentives governing scientific research in such a way that individual and communal benefits are directly aligned. Unfortunately, that’s easier said than done. I’ve written a lot both in papers (1, 2, 3) and on this blog (see posts linked here) about various ways we might achieve this kind of realignment, but what’s clear is that it will be a long and difficult process. For the foreseeable future, it will continue to be an understandable though highly lamentable defense to say that the cost of maintaining a career in science is that one sometimes has to play the game the same way everyone else plays the game, even if it’s clear that the rules everyone plays by are detrimental to the communal good.

 

Anyway, this may all sound a bit depressing, but I really don’t think it should be taken as such. Personally I’m actually very optimistic about the prospects for large-scale changes in the way we produce and evaluate science within the next few years. I do think we’re going to collectively figure out how to do science in a way that directly rewards people for employing research practices that are maximally beneficial to the scientific community as a whole. But I also think that for this kind of change to take place, we first need to accept that many of the defenses we routinely give for using iffy methodological practices are just not all that compelling.

Attention publishers: the data in your tables want to be free! Free!

The Neurosynth database is getting an upgrade over the next couple of weeks; it’s going to go from 4,393 neuroimaging studies to around 5,800. Unfortunately, updating the database is kind of a pain, because academic publishers like to change the format of their full-text HTML articles, which has a nasty habit of breaking the publisher-specific HTML parsers I’ve written. When you expect ScienceDirect to give you <table cellspacing=10>, but you get <table> with no cellspacing attribute (the horror!), bad things happen in XPath land. And then those bad things need to be repaired. And I hate repairing stuff! So I don’t do it very often. Like, once every 6 to 9 months.

In an ideal world, there would be no need to write (and fix) custom filters for different publishers, because the publishers would all simultaneously make XML representations of their articles available (in addition to HTML, PDF, etc.), and then people who have legitimate data mining reasons for regularly downloading hundreds of articles at a time wouldn’t have to cry themselves to sleep every night. But as it stands, only one major publisher of neuroimaging articles (PLoS) provides XML versions of all articles. A minority of articles from other publishers are available in XML from BioMed Central, but that’s still just a fraction of the existing literature.

Anyway, the HTML thing is annoying, but it’s possible to work around it. What’s much more problematic is that some publishers lock up the data in the tables of their articles. To make Neurosynth work, I have to be able to identify rows in tables that look like brain activations. That is, things that look roughly like this:

Most publishers are nice enough to format article tables as HTML tables; which is to say, I can look for tags like <table> and then work down the XPath tree to identify all the the rows, and then scan each rows for values that look activation-like. Then those values go into the database, and poof, next thing you know, you have meta-analytic brain activation maps from hundreds of studies. But some publishers–most notably, Frontiers–throw a wrench in the works by failing to format tables in HTML; instead, they present the tables as images (see for instance this JPEG table, pulled from this article). Which means I can’t really extract any data from them, and as a result, you’re not going to see activations from articles published in Frontiers journals in Neurosynth any time soon. So if you publish fMRI articles in Frontiers in Human Neuroscience regularly, and are wondering why I’ve been ignoring you (I like you! I promise!), now you know.

Anyway, on the remote chance that anyone reading this has any sway with people high up at Frontiers, could you please ask them to release their data? Pretty please? Lack of access to data in tables seems to be a pretty common complaint in the data mining community; I’ve talked to other people in the neuroinformatics world who’ve also expressed frustration about it, and I imagine the same is true of people in other disciplines. It’s particularly surprising given that Frontiers is, in theory, an open access publisher. I can see the data in your tables, Frontiers; why won’t you also let me read it?

Okay, I know this kind of stuff doesn’t really interest anyone; I’m just venting. The main point is, Neurosynth is going to be bigger and (very slightly) better in the near future.

tracking replication attempts in psychology–for real this time

I’ve written a few posts on this blog about how the development of better online infrastructure could help address and even solve many of the problems psychologists and other scientists face (e.g., the low reliability of peer review, the ‘fudge factor’ in statistical reporting, the sheer size of the scientific literature, etc.). Actually, that general question–how we can use technology to do better science–occupies a good chunk of my research these days (see e.g., Neurosynth). One question I’ve been interested in for a long time is how to keep track not only of ‘successful’ studies (i.e., those that produce sufficiently interesting effects to make it into the published literature), but also replication failures (or successes of limited interest) that wind up in researchers’ file drawers. A couple of years ago I went so far as to build a prototype website for tracking replication attempts in psychology. Unfortunately, it never went anywhere, partly (okay, mostly) because the site really sucked, and partly because I didn’t really invest much effort in drumming up interest (mostly due to lack of time). But I still think the idea is a valuable one in principle, and a lot of other people have independently had the same idea (which means it must be right, right?).

Anyway, it looks like someone finally had the cleverness, time, and money to get this right. Hal Pashler, Sean Kang*, and colleagues at UCSD have been developing an online database for tracking attempted replications of psychology studies for a while now, and it looks like it’s now in beta. PsychFileDrawer is a very slick, full-featured platform that really should–if there’s any justice in the world–provide the kind of service everyone’s been saying we need for a long time now. If it doesn’t work, I think we’ll have some collective soul-searching to do, because I don’t think it’s going to get any easier than this to add and track attempted replications. So go use it!

 

*Full disclosure: Sean Kang is a good friend of mine, so I’m not completely impartial in plugging this (though I’d do it anyway). Sean also happens to be amazingly smart and in search of a faculty job right now. If I were you, I’d hire him.

The reviewer’s dilemma, or why you shouldn’t get too meta when you’re supposed to be writing a review that’s already overdue

When I review papers for journals, I often find myself facing something of a tension between two competing motives. On the one hand, I’d like to evaluate each manuscript as an independent contribution to the scientific literature–i.e., without having to worry about how the manuscript stacks up against other potential manuscripts I could be reading. The rationale being that the plausibility of the findings reported in a manuscript shouldn’t really depend on what else is being published in the same journal, or in the field as a whole: if there are methodological problems that threaten the conclusions, they shouldn’t become magically more or less problematic just because some other manuscript has (or doesn’t have) gaping holes. Reviewing should simply be a matter of documenting one’s major concerns and suggestions and sending them back to the Editor for infallible judgment.

The trouble with this idea is that if you’re of a fairly critical bent, you probably don’t believe the majority of the findings reported in the manuscripts sent to you to review. Empirically, this actually appears to be the right attitude to hold, because as a good deal of careful work by biostatisticians like John Ioannidis shows, most published research findings are false, and most true associations are inflated. So, in some ideal world, where the job of a reviewer is simply to assess the likelihood that the findings reported in a paper provide an accurate representation of reality, and/or to identify ways of bringing those findings closer in line with reality, skepticism is the appropriate default attitude. Meaning, if you keep the question “why don’t I believe these results?” firmly in mind as you read through a paper and write your review, you probably aren’t going to go wrong all that often.

The problem is that, for better or worse, one’s job as a reviewer isn’t really–or at least, solely–to evaluate the plausibility of other people’s findings. In large part, it’s to evaluate the plausibility of reported findings in relation to the other stuff that routinely gets published in the same journal. For instance, if you regularly reviewing papers for a very low-tier journal, the editor is probably not going to be very thrilled to hear you say “well, Ms. Editor, none of the last 15 papers you’ve sent me are very good, so you should probably just shut down the journal.” So a tension arises between writing a comprehensive review that accurately captures what the reviewer really thinks about the results–which is often (at least in my case) something along the lines of “pffft, there’s no fucking way this is true”–and writing a review that weighs the merits of the reviewed manuscript relative to the other candidates for publication in the same journal.

To illustrate, suppose I review a paper and decide that, in my estimation, there’s only a 20% chance the key results reported in the paper would successfully replicate (for the sake of argument, we’ll pretend I’m capable of this level of precision). Should I recommend outright rejection? Maybe, since 1 in 5 odds of long-term replication don’t seem very good. But then again, what if 20% is actually better than average? What if I think the average article I’m sent to review only has a 10% chance of holding up over time? In that case, if I recommend rejection of the 20% article, and the editor follows my recommendation, most of the time I’ll actually be contributing to the journal publishing poorer quality articles than if I’d recommended accepting the manuscript, even if I’m pretty sure the findings reported in the manuscript are false.

Lest this sound like I’m needlessly overanalyzing the review process instead of buckling down and writing my own overdue reviews (okay, you’re right, now stop being a jerk), consider what happens when you scale the problem up. When journal editors send reviewers manuscripts to look over, the question they really want an answer to is, “how good is this paper compared to everything else that crosses my desk?” But most reviewers naturally incline to answer a somewhat different–and easier–question, namely, “in the grand scheme of life, the universe, and everything, how good is this paper?” The problem, then, is that if the variance in curmudgeonliness between reviewers exceeds the (reliable) variance within reviewers, then arguably the biggest factor in determining whether or not a given paper gets rejected is simply who happens to review it. Not how much expertise the reviewer has, or even how ‘good’ they are (in the sense that some reviewers are presumably better than others at identifying serious problems and overlooking trivial ones), but simply how critical they are on average. Which is to say, if I’m Reviewer 2 on your manuscript, you’ll probably have a better chance of rejection than if Reviewer 2 is someone who characteristically writes one-paragraph reviews that begin with the words “this is an outstanding and important piece of work…”

Anyway, on some level this is a pretty trivial observation; after all, we all know that the outcome of the peer review process is, to a large extent, tantamount to a roll of the dice. We know that there are cranky reviewers and friendly reviewers, and we often even have a sense of who they are, which is why we often suggest people to include or exclude as reviewers in our cover letters. The practical question though–and the reason for bringing this up here–is this: given that we have this obvious and ubiquitous problem of reviewers having different standards for what’s publishable, and that this undeniably impacts the outcome of peer review, are there any simple steps we could take to improve the reliability of the review process?

The way I’ve personally made peace between my desire to provide the most comprehensive and accurate review I can and the pragmatic need to evaluate each manuscript in relation to other manuscripts is to use the “comments to the Editor” box to provide some additional comments about my review. Usually what I end up doing is writing my review with little or no thought for practical considerations such as “how prestigious is this journal” or “am I a particularly harsh reviewer” or “is this a better or worse paper than most others in this journal”. Instead, I just write my review, and then when I’m done, I use the comments to the editor to say things like “I’m usually a pretty critical reviewer, so don’t take the length of my review as an indication I don’t like the manuscript, because I do,” or, “this may seem like a negative review, but it’s actually more positive than most of my reviews, because I’m a huge jerk.” That way I can appease my conscience by writing the review I want to while still giving the editor some indication as to where I fit in the distribution of reviewers they’re likely to encounter.

I don’t know if this approach makes any difference at all, and maybe editors just routinely ignore this kind of thing; it’s just the best solution I’ve come up with that I can implement all by myself, without asking anyone else to change their behavior. But if we allow ourselves to contemplate alternative approaches that include changes to the review process itself (while still adhering to the standard pre-publication review model, which, like many other people, I’ve argued is fundamentally dysfunctional), then there are many other possibilities.

One idea, for instance, would be to include calibration questions that could be used to estimate (and correct for) individual differences in curmudgeonliness. For instance, in addition to questions about the merit of the manuscript itself, the review form could have a question like “what proportion of articles you review do you estimate end up being rejected?” or “do you consider yourself a more critical or less critical reviewer than most of your peers?”

Another, logistically more difficult, idea would be to develop a centralized database of review outcomes, so that editors could see what proportion of each reviewer’s assignments ultimately end up being rejected (though they couldn’t see the actual content of the reviews). I don’t know if this type of approach would improve matters at all; it’s quite possible that the review process is fundamentally so inefficient and slow that editors just don’t have the time to spend worrying about this kind of thing. But it’s hard to believe that there aren’t some simple calibration steps we could take to bring reviewers into closer alignment with one another–even if we’re confined to working within the standard pre-publication model of peer review. And given the abysmally low reliability of peer review, even small improvements could potentially produce large benefits in the aggregate.

building better platforms for evaluating science: a request for feedback

UPDATE 4/20/2012: a revised version of the paper mentioned below is now available here.

A couple of months ago I wrote about a call for papers for a special issue of Frontiers in Computational Neuroscience focusing on “Visions for Open Evaluation of Scientific Papers by Post-Publication Peer Review“. I wrote a paper for the issue, the gist of which is that many of the features scientists should want out of a next-generation open evaluation platform are already implemented all over the place in social web applications, so that building platforms for evaluating scientific output should be more a matter of adapting existing techniques than having to come up with brilliant new approaches. I’m talking about features like recommendation engines, APIs, and reputation systems, which you can find everywhere from Netflix to Pandora to Stack Overflow to Amazon, but (unfortunately) virtually nowhere in the world of scientific publishing.

Since the official deadline for submission is two months away (no, I’m not so conscientious that I habitually finish my writing assignments two months ahead of time–I just failed to notice that the deadline had been pushed way back), I figured I may as well use the opportunity to make the paper openly accessible right now in the hopes of soliciting some constructive feedback. This is a topic that’s kind of off the beaten path for me, and I’m not convinced I really know what I’m talking about (well, fine, I’m actually pretty sure I don’t know what I’m talking about), so I’d love to get some constructive criticism from people before I submit a final version of the manuscript. Not only from scientists, but ideally also from people with experience developing social web applications–or actually, just about anyone with good ideas about how to implement and promote next-generation evaluation platforms. I mean, if you use Netflix or reddit regularly, you’re pretty much a de facto expert on collaborative filtering and recommendation systems, right?

Anyway, here’s the abstract:

Traditional pre-publication peer review of scientific output is a slow, inefficient, and unreliable process. Efforts to replace or supplement traditional evaluation models with open evaluation platforms that leverage advances in information technology are slowly gaining traction, but remain in the early stages of design and implementation. Here I discuss a number of considerations relevant to the development of such platforms. I focus particular attention on three core elements that next-generation evaluation platforms should strive to emphasize, including (a) open and transparent access to accumulated evaluation data, (b) personalized and highly customizable performance metrics, and (c) appropriate short-term incentivization of the userbase. Because all of these elements have already been successfully implemented on a large scale in hundreds of existing social web applications, I argue that development of new scientific evaluation platforms should proceed largely by adapting existing techniques rather than engineering entirely new evaluation mechanisms. Successful implementation of open evaluation platforms has the potential to substantially advance both the pace and the quality of scientific publication and evaluation, and the scientific community has a vested interest in shifting towards such models as soon as possible.

You can download the PDF here (or grab it from SSRN here). It features a cameo by Archimedes and borrows concepts liberally from sites like reddit, Netflix, and Stack Overflow (with attribution, of course). I’d love to hear your comments; you can either leave them below or email me directly. Depending on what kind of feedback I get (if any), I’ll try to post a revised version of the paper here in a month or so that works in people’s comments and suggestions.

(fanciful depiction of) Archimedes, renowned ancient Greek mathematician and co-inventor (with Al Gore) of the open access internet repository