what aspirin can tell us about the value of antidepressants

There’s a nice post on Science-Based Medicine by Harriet Hall pushing back (kind of) against the increasingly popular idea that antidepressants don’t work. For context, there have been a couple of large recent meta-analyses that used comprehensive FDA data on clinical trials of antidepressants (rather than only published studies, which are biased towards larger, statistically significant, effects) to argue that antidepressants are of little or no use in mild or moderately-depressed people, and achieve a clinically meaningful benefit only in the severely depressed.

Hall points out that whether you think antidepressants have a clinically meaningful benefit or not depends on how you define clinically meaningful (okay, this sounds vacuous, but bear with me). Most meta-analyses of antidepressant efficacy reveal an effect size of somewhere between 0.3 and 0.5 standard deviations. Historically, psychologists consider effect sizes of 0.2, 0.5, and 0.8 standard deviations to be small, medium, and large, respectively. But as Hall points out:

The psychologist who proposed these landmarks [Jacob Cohen] admitted that he had picked them arbitrarily and that they had “no more reliable a basis than my own intuition.“ Later, without providing any justification, the UK’s National Institute for Health and Clinical Excellence (NICE) decided to turn the 0.5 landmark (why not the 0.2 or the 0.8 value?) into a one-size-fits-all cut-off for clinical significance.

She goes on to explain why this ultimately leaves the efficacy of antidepressants open to interpretation:

In an editorial published in the British Medical Journal (BMJ), Turner explains with an elegant metaphor: journal articles had sold us a glass of juice advertised to contain 0.41 liters (0.41 being the effect size Turner, et al. derived from the journal articles); but the truth was that the “glass“ of efficacy contained only 0.31 liters. Because these amounts were lower than the (arbitrary) 0.5 liter cut-off, NICE standards (and Kirsch) consider the glass to be empty. Turner correctly concludes that the glass is far from full, but it is also far from empty. He also points out that patients’ responses are not all-or-none and that partial responses can be meaningful.

I think this pretty much hits the nail on the head; no one really doubts that antidepressants work at this point; the question is whether they work well enough to justify their side effects and the social and economic costs they impose. I don’t have much to add to Hall’s argument, except that I think she doesn’t sufficiently emphasize how big a role scale plays when trying to evaluate the utility of antidepressants (or any other treatment). At the level of a single individual, a change of one-third of a standard deviation may not seem very big (then again, if you’re currently depressed, it might!). But on a societal scale, even canonically ‘small’ effects can have very large effects in the aggregate.

The example I’m most fond of here is Robert Rosenthal’s famous illustration of the effects of aspirin on heart attack. The correlation between taking aspirin daily and decreased risk of heart attack is, at best, .03 (I say at best because the estimate is based on a large 1988 study, but my understanding is that more recent studies have moderated even this small effect). In most domains of psychology, a correlation of .03 is so small as to be completely uninteresting. Most psychologists would never seriously contemplate running a study to try to detect an effect of that size. And yet, at a population level, even an r of .03 can have serious implications. Cast in a different light, what this effect means is that 3% of people who would be expected to have a heart attack without aspirin would be saved from that heart attack given a daily aspirin regimen. Needless to say, this isn’t trivial. It amounts to a potentially life-saving intervention for 30 out of every 1,000 people. At a public policy level, you’d be crazy to ignore something like that (which is why, for a long time, many doctors recommended that people take an aspirin a day). And yet, by the standards of experimental psychology, this is a tiny, tiny effect that probably isn’t worth getting out of bed for.

The point of course is that when you consider how many people are currently on antidepressants (millions), even small effects–and certainly an effect of one-third of a standard deviation–are going to be compounded many times over. Given that antidepressants demonstrably reduce the risk of suicide (according to Hall, by about 20%), there’s little doubt that tens of thousands of lives have been saved by antidepressants. That doesn’t necessarily justify their routine use, of course, because the side effects and costs also scale up to the societal level (just imagine how many millions of bouts of nausea could be prevented by eliminating antidepressants from the market!). The point is that just that, if you think the benefits of antidepressants outweigh their costs even slightly at the level of the average depressed individual, you’re probably committing yourself to thinking that they have a hugely beneficial impact at a societal level–and that holds true irrespective of whether the effects are ‘clinically meaningful’ by conventional standards.

on the limitations of psychiatry, or why bad drugs can be good too

The Neuroskeptic offers a scathing indictment of the notion, editoralized in Nature this week, that the next decade is going to revolutionize the understanding and treatment of psychiatric disorders:

The 2010s is not the decade for psychiatric disorders. Clinically, that decade was the 1950s. The 50s was when the first generation of psychiatric drugs were discovered – neuroleptics for psychosis (1952), MAOis (1952) and tricyclics (1957) for depression, and lithium for mania (1949, although it took a while to catch on).

Since then, there have been plenty of new drugs invented, but not a single one has proven more effective than those available in 1959. New antidepressants like Prozac are safer in overdose, and have milder side effects, than older ones. New “atypical” antipsychotics have different side effects to older ones. But they work no better. Compared to lithium, newer “mood stabilizers” probably aren’t even as good. (The only exception is clozapine, a powerful antipsychotic, but dangerous side-effects limit its use.)

Those are pretty strong claims–especially the assertion that not a single psychiatric drug has proven more effective than those available in 1959. Are they true? I’m not in a position to know for certain, having had only fleeting contacts here and there with psychiatric research. But I guess I’d be surprised if many basic researchers in psychiatry concurred with that assessment. (I’m sure many clinicians wouldn’t, but that wouldn’t be very surprising.) Still, even if you suppose that present-day drugs are no more effective than those available in 1959 on the average (which may or may not be true), it doesn’t follow that there haven’t been major advances in psychiatric treatment. For one thing, the side effects of many modern drugs do tend to be less severe. The Neuroskeptic is right that atypical antipsychotics aren’t as side effect-free as was once hoped; but consider, in contrast, drugs like lamotrigine or valproate–anticonvulsants nowadays widely prescribed for bipolar disorder–which are undeniably less toxic than lithium (though also no more, and possibly less, effective). If you’re diagnosed with bipolar disorder in 2010, there’s still a good chance that you’ll eventually end up being prescribed with lithium;, but (in most cases) it’s unlikely that that’ll be the first line of treatment. And on the bright side, you could end up with a well-managed case of bipolar disorder that never requires you to take drugs with frequent and severe side effects–something that frankly wouldn’t have been an option for almost anyone in 1959.

That last point gets to what I think is the bigger reason for optimism: choice. Even if new drugs aren’t any better than old drugs on average, they’re probably going to work for different groups of people. One of the things that’s problematic about the way the results of clinical trials are typically interpreted is that if a new drug doesn’t outperform an old one, it’s often dismissed as unhelpful. The trouble with this worldview is that even if drug A helps 60% of people on average and drug B helps 54% of people on average (and the difference is statistically and clinically significant), it may well be that drug B helps people who don’t benefit from drug A. The unfortunate reality is that even relatively stable psychiatric patients usually take a while to find an effective treatment regime; most patients try several treatments before settling on one that works. Simply in virtue of there being dozens more drugs available in 2009 than in 1959, it follows that psychiatric patients are much better off living today than fifty years ago. If an atypical antipsychotic controls your schizophrenia without causing motor symptoms or metabolic syndrome, you never have to try a typical antipsychotic; if valproate works well for your bipolar disorder, there’s no reason for you to ever go on lithium. These aren’t small advances; when you’re talking about millions of people who suffer from each of these disorders worldwide, the introduction of any drug that might help even just a fraction of patients who weren’t helped by older medication is a big deal, translating into huge improvements in quality of life and many tens of thousands of lives saved. That’s not to say we shouldn’t strive to develop drugs that aren’t also better on average than the older treatments; it’s just that it shouldn’t be the only (and perhaps not even the main) criterion we use to gauge efficacy.

Having said that, I do agree with the Neuroskeptic’s assessment as to why psychiatric research and treatment seems to proceed more slowly than research in other areas of neuroscience or medicine:

Why? That’s an excellent question. But if you ask me, and judging by the academic literature I’m not alone, the answer is: diagnosis. The weak link in psychiatry research is the diagnoses we are forced to use: “major depressive disorder”, “schizophrenia”, etc.

There are all sorts of methodological reasons why it’s not a great idea to use discrete diagnostic categories when studying (or developing treatments for) mental health disorders. But perhaps the biggest one is that, in cases where a disorder has multiple contributing factors (which is to say, virtually always), drawing a distinction between people with the disorder and those without it severely restricts the range of expression of various related phenotypes, and may even assign people with positive symptomatology to the wrong half of the divide simply because they don’t have some other (relatively) arbitrary symptoms.

For example, take bipolar disorder. If you classify the population into people with bipolar disorder and people without it, you’re doing two rather unfortunate things. One is that you’re lumping together a group of people who have only a partial overlap of symptomatology, and treating them as though they have identical status. One person’s disorder might be characterized by persistent severe depression punctuated by short-lived bouts of mania every few months; another person might cycle rapidly between a variety of moods multiple times per month, week, or even day. Assigning both people the same diagnosis in a clinical study is potentially problematic in that there may be very different underlying organic disorders, which means you’re basically averaging over multiple discrete mechanisms in your analysis, resulting in a loss of both sensitivity and specificity.

The other problem, which I think is less widely appreciated, is that you’ll invariably have many “control” subjects who don’t receive the diagnosis but share many features with people who do. This problem is analogous to the injunction against using median splits: you almost never want to turn an interval-level variable into an ordinal one if you don’t have to, because you lose a tremendous amount of information. When you contrast a sample of people with a bipolar diagnosis with a group of “healthy” controls, you’re inadvertently weakening your comparison by including in the control group people who would be best characterizing as falling somewhere in between the extremes of pathological and healthy. For example, most of us probably know people who we would characterize as “functionally manic” (sometimes also known as “extraverts”)–that is, people who seem to reap the benefits of the stereotypical bipolar syndrome in the manic phase (high energy, confidence, and activity level) but have none of the downside of the depressive phase. And we certainly know people who seem to have trouble regulating their moods, and oscillate between periods of highs and lows–but perhaps just not to quite the extent necessary to obtain a DSM-IV diagnosis. We do ourselves a tremendous disservice if we call these people “controls”. Sure, they might be controls for some aspects of bipolar symptomatology (e.g., people who are consistently energetic serve as a good contrast to the dysphoria of the depressive phase); but in other respects, they may actually closer to the prototypical patient than to most other people.

From a methodological standpoint, there’s no question we’d be much better off focusing on symptoms rather than classifications. If you want to understand the many different factors that contribute to bipolar disorder or schizophrenia, you shouldn’t start from the diagnosis and work backwards; you should start by asking what symptom constellations are associated with specific mechanisms. And those symptoms may well be present (to varying extents) both in people with and without the disorder in question. That’s precisely the motivation behind the current “endophenotype” movement, where the rationale is that you’re better off trying to figure out what biological and (eventually) behavioral changes a given genetic polymorphism is associated with, and then using that information to reshape taxonomies of mental health disorders, than trying to go directly from diagnosis to genetic mechanisms.

Of course, it’s easy to talk about the problems associated with the way psychiatric diagnoses are applied, and not so easy to fix them. Part of the problem is that, while researchers in the lab have the luxury of using large samples that are defined on the basis of symptomatology rather than classification (a luxury that, as the Neuroskeptic and others have astutely observed, many researchers fail to take advantage of), clinicians generally don’t. When you see a patient come in complain of dsyphoria and mood swings, it’s not particularly useful to say “you seem to be in the 96th percentile for negative affect, and have unusual trouble controlling your mood; let’s study this some more, mmmkay?” What you need is some systematic way of going from symptoms to treatment, and the DSM-IV offers a relatively straightforward (though wildly imperfect) way to do that. And then too, the reality is that most clinicians (at least, the ones I’ve talked to) don’t just rely on some algorithmic scheme for picking out drugs; they instead rely on a mix of professional guidelines, implicit theories, and (occasionally) scientific literature when making decisions about what types of symptom constellations have, in their experience, benefited more or less from specific drugs. The problem is that those decisions often fail to achieve their intended goal, and so you end up with a process of trial-and-error, where most patients might try half a dozen medications before they find one that works (if they’re lucky). But that only takes us back to why it’s actually a good thing that we have so many more medications in 2009 than 1959, even if they’re not necessary individually more effective. So, yes, psychiatric research has some major failings compared to other areas of biomedical research–though I do think that’s partly (though certainly not entirely) because the problems are harder. But I don’t think it’s fair to suggest we haven’t made any solid advances in the treatment or understanding of psychiatric disorders in the last half-century. We have; it’s just that we could do much better.