the mysterious inefficacy of weather

I like to think of myself as a data-respecting guy–by which I mean that I try to follow the data wherever it leads, and work hard to suppress my intuitions in cases where those intuitions are convincingly refuted by the empirical evidence. Over the years, I’ve managed to argue myself into believing many things that I would have once found ludicrous–for instance, that parents have very little influence on their children’s personalities, or that in many fields, the judgments of acclaimed experts with decades of training are only marginally better than those of people selected at random, and often considerably worse than simple actuarial models. I believe these things not because I want to or like to, but because I think a dispassionate reading of the available evidence suggests that that’s just how the world works, whether I like it or not.

Still, for all of my efforts, there are times when I find myself unable to set aside my intuitions in the face of what would otherwise be pretty compelling evidence. A case in point is the putative relationship between weather and mood. I think most people–including me–take it as a self-evident fact that weather exerts a strong effect on mood. Climate is one of the first things people bring up when discussing places they’ve lived or visited. When I visit other cities and talk to people about what Austin, Texas (my current home) is like, my description usually amounts to something like it’s an amazing place to live so long as you don’t mind the heat. When people talk about Seattle, they bitch about the rain and the clouds; when people rave about living in California, they’re often thinking in no small part about the constant sunshine that pervades most of the state. When someone comments on the absurdly high rate of death metal bands in Finland, our first reaction is to chuckle and think well, what the hell else is there to do that far up north in the winter?–a reaction promptly followed by a twinge of guilt, because Seasonal Affective Disorder is no laughing matter.

And yet… and yet, the empirical evidence linking variations in the weather to variations in human mood is surprisingly scant. There are a few published reports of very large effects of weather on mood going back several decades, but these are invariably from very small samples–and we know that big correlations tend to occur in little studies. By contrast, large-scale studies with hundreds or thousands of subjects have found very little evidence of a relationship between mood and weather–and the effects identified are not necessarily consistent across studies.

For example, Denissen and colleagues (2008) fit a series of multilevel models of the relationship between objective weather parameters and self-reported mood in 1,233 German subjects, and found only very small associations between weather variables and negative (but not positive) affect. [Klimstra et al (2011)] found similarly negligible main effects in another sample of ~500 subjects. The state of the empirical literature on weather and mood was nicely summed up by Denissen et al in their Discussion:

As indicated by the relatively small regression weights, weather fluctuations accounted for very little variance in people’s day-to-day mood. This result may be unexpected given the existence of commonly held conceptions that weather exerts a strong influence on mood (Watson, 2000), though it replicates findings by Watson (2000) and Keller et al. (2005), who also failed to report main effects. –Dennisen et al (2008)

With the advent of social media and that whole Big Data thing, we can now conduct analyses on a scale that makes the Denissen or Klimstra studies look almost like case studies. In particular, the availability of hundreds of millions of tweets and facebook posts, coupled with comprehensive weather records from every part of the planet, means that we can now investigate the effects of almost every kind of weather pattern (cloud cover, temperature, humidity, barometric pressure, etc.) on many different indices of mood. And yet, here again, the evidence is not very kind to our intuitive notion of a strong association between weather and mood.

For example, in a study of 10 million facebook users in 100 US cities, Coviello et al (2014) found that the incidence of positive posts decreased by approximately 1%, and that of negative posts increased by 1%, on days when rain fell compared to days without rain. While that finding is certainly informative (and served as a starting point for other much more impressive analyses of network contagion), it’s not a terribly impressive demonstration of weather’s supposedly robust impact on mood. I mean, a 1% increase in rain-induced negative affect is probably not what’s really keeping anyone from moving to Seattle. Yet if anyone’s managed to detect a much bigger effect of weather on mood in a large-sample study, I’m not aware of it.

I’ve also had the pleasure of experiencing the mysterious absence of weather effects firsthand: as a graduate student, I once spent nearly two weeks trying to find effects of weather on mood in a large dataset (thousands of users from over twenty cities worldwide) culled from LiveJournal, taking advantage of users’ ability to indicate their mood in a status field via an emoticon (a feat of modern technology that’s now become nearly universal thanks to the introduction of those 4-byte UTF-8 emoji monstrosities 🙀👻🍧😻). I stratified my data eleventy different ways; I tried kneading it into infinity-hundred pleasant geometric shapes; I sang to it in the shower and brought it ice cream in bed. But nothing worked. And I’m pretty sure it wasn’t that my analysis pipeline was fundamentally broken, because I did manage (as a sanity check) to successfully establish that LiveJournal users are more likely to report feeling “cold” when the temperature outside is lower (❄️😢). So it’s not like physical conditions have no effect on people’s internal states. It’s just that the obvious weather variables (temperature, rain, humidity, etc.) don’t seem to shift our mood very much, despite our persistent convictions.

Needless to say, that project is currently languishing quite comfortably in the seventh level of file drawer hell (i.e., that bottom drawer that I locked then somehow lost the key to).

Anyway, the question I’ve been mulling over on and off for several years now–though, two-week data-mining binge aside, never for long enough to actually arrive at a satisfactory answer–is why empirical studies have been largely unable to detect an effect of weather on mood. Here are some of the potential answers I’ve come up with:

  • There really isn’t a strong effect of weather on mood, and the intuition that there is one stems from a perverse kind of cultural belief or confirmation bias that leads us all to behave in very strange, and often life-changing, ways–for example, to insist on moving to Miami instead of Seattle (which, climate aside, would be a crazy move, right?). This certainly allows for the possibility that there are weak effects on mood–which plenty of data already support–but then, that’s not so exciting, and doesn’t explain why so many people are so eager to move to Hawaii or California for the great weather.

  • Weather does exert a big effect on mood, but it does so in a highly idiosyncratic way that largely averages out across individuals. On this view, while most people’s mood might be sensitive to weather to some degree, the precise manifestation differs across individuals, so that some people would rather shoot themselves in the face than spend a week in an Edmonton winter, while others will swear up and down that it really is possible (no, literally!) to melt in the heat of a Texas summer. From a modeling standpoint, if the effects of weather on mood are reliable but extremely idiosyncratic, identifying consistent patterns could be a very difficult proposition, as it would potentially require us to model some pretty complex higher-order interactions. And the difficulty is further compounded by strong geographic selection biases: since people tend to move to places where they like the climate, the variance in mood attributable to weather changes is probably much smaller than it would be under random dispersal.

  • People’s mood is heavily influenced by the weather when they first spend time somewhere new, but then they get used to it. We habituate to almost everything else, so why not weather? Maybe people who live in California don’t really benefit from living in constant sunshine. Maybe they only enjoyed the sun for their first two weeks in California, and the problem is that now, whenever they travel somewhere else, the rain/snow/heat of other places makes them feel worse than their baseline (habituated) state. In other words, maybe Californians have been snorting sunshine for so long that they now need a hit of clarified sunbeams three times a day just to feel normal.

  • The relationship between objective weather variables and subjective emotional states is highly non-linear. Maybe we can’t consistently detect a relationship between high temperatures and anger because the perception of temperature is highly dependent on a range of other variables (e.g., 30 degrees celsius can feel quite pleasant on a cloudy day in a dry climate, but intolerable if it’s humid and the sun is out). This would make the modeling challenge more difficult, but certainly not insurmountable.

  • Our measures of mood are not very reliable, and since reliability limits validity, it’s no surprise if we can’t detect consistent effects of weather on mood. Personally I’m actually very skeptical about this one, since there’s plenty of evidence that self-reports of emotion are more than adequate in any number of other situations (e.g., it’s not at all hard to detect strong trait effects of personality on reported mood states). But it’s still not entirely crazy to suggest that maybe what we’re looking at is at least partly a measurement problem—especially once we start talking about algorithmically extracting sentiment from Twitter or Facebook posts, which is a notoriously difficult problem.

  • The effects of weather on mood are strong, but very transient, and we’re simply not very good at computing mental integrals over all of our moment-by-moment experiences. That is, we tend to overestimate the  impact of weather on our mood because we find it easy to remember instances when the weather affected our mood, and not so easy to track all of the other background factors that might influence our mood more deeply but less perceptibly. There are many heuristics and biases you could attribute this to (e.g., the peak-end rule, the availability heuristic, etc.), but the basic point is that, on this view, the belief that the weather robustly influences our mood is a kind of mnemonic illusion attributable to well-known bugs in (or, more charitably, features of) our cognitive architecture.

Anyway, as far as I can tell, none of the above explanations fully account for the available data. And, to be fair, there’s no reason to think any of them should: if I had to guess, I would put money on the true explanation being a convoluted mosaic of some or all of the above factors (plus others I haven’t considered, no doubt). But the proximal problem is that there just doesn’t seem to be much data to speak to the question one way or the other. And this annoys me more than I would like. I won’t go so far as to say I spend a lot of time thinking about the problem, because I don’t. But I think about it often enough that writing a 2,000-word blog post in the hopes that other folks will provide some compelling input seems like a very reasonable time investment.

And so, having read this far—which must mean you’re at least vaguely entertained, right?—it’s your turn to help me out. Please tell me: Why is it so damn hard to detect the effects of weather on mood? Make it rain comments! It will probably cheer me up. Slightly.

☀️🌞😎😅

To increase sustainability, NIH should yoke success rates to budgets

There’s a general consensus among biomedical scientists working in the United States that the NIH funding system is in a state of serious debilitation, if not yet on life support. After years of flat budgets and an ever-increasing number of PIs, success rates for R01s (the primary research grant mechanism at NIH) are at an all-time low, even as the average annual budget of awards has decreased in real dollars. The problem, unfortunately, is that there doesn’t appear to be an easy way to fix this problem. As many commentators have noted, there are some very deeply-rooted and systematic incentives that favor a perpetuation, and even exacerbation, of the current problems.

Last month, NIH released an RFI asking for suggestions for strategies to improve the impact and sustainability of biomedical research. This isn’t a formal program announcement, and doesn’t carry any real force at the moment, but it does at least signal some interest in making policy changes that could help prevent serious problems from getting worse.

Here’s my suggestion, which I’m also dutifully sending in to NIH in much-abridged form. The basic idea I’ll explore in this post is very simple: NIH should start yoking the success rates of proposals to the amount of money they request. The proposal is not meant to be a long-term solution, and is in some ways just a stopgap measure until more serious policy changes take place. But it’s a stopgap measure that could conceivably increase success rates by a few points for at least a few years, with relatively little implementation cost and few obvious downsides. So I think it’s at least worth considering.

The problem

At the moment, the NIH funding system arguably incentivizes PIs to ask for as much money as they think they can responsibly handle. To see why, let’s forget about NIH for the moment and consider, in day-to-day life, the typical relationship between investment cost and probability of investment (holding constant expected returns, which I’ll address later). Generally speaking, the two are inversely related. If a friend asks you to lend them $10, you might lend it without even asking them what they need it for. If, instead, your friend asks you for $100, you might want to know what it’s for, and you might also ask for some indication of how soon you’ll be paid back. But if your friend asks you for $10,000… well, you’re probably going to want to see a business plan and a legally-binding contract laying out a repayment schedule. There is a general understanding in most walks of life that if someone asks you to invest in them more heavily, they expect to see more evidence that you can deliver on whatever it is that you’re promising to do.

At NIH, things don’t work exactly that way. In many ways, there’s actually a positive incentive to ask for more money when writing a grant application. The perverse incentives play out at multiple levels–both across different grant mechanisms, and within the workhorse R01 mechanism. In the former case, a glance at the success rates for different R mechanisms reveals something that many PIs are, in my experience, completely unaware of: “small”-grant mechanisms like the R03 and R21 have lower–in some cases much lower–success rates than R01s at nearly all NIH institutes. This despite the fact that R21s and R03s are advertised as requiring little or no pilot data, and have low budget caps and short award durations (e.g., a maximum of $275,000 over  two years for the R21).

Now you might say: well, sure, if you have a grant program expressly designed for exploratory projects, it’s not surprising if the funding rate is much lower, because you’re probably getting an obscene number of applications from people who aren’t in a position to compete for a full-blown R01. But that’s not really it, because the number of R21 and R03 submissions is also much lower than the number of R01 submissions (e.g., in 2013, NCI funded 14.7% of 4,170 R01 applications, but only 10.6% of 2,557 R21 applications). In the grand scheme of things, the amount of money allocated to “small” grants at NIH pales in comparison to the amount allocated to R01s.

The reason that R21s and R03s aren’t much more common is… well, I actually don’t know. But the point is that the data suggest that, in general (though there are of course exceptions), it’s empirically a pretty bad idea to submit R03s and R21s (particularly if you’re an Early Stage Investigator). The succes rates for R01s are higher, you can ask for a lot more money, the project periods are longer, and the amount of work involved in writing the proposal is not dramatically higher. When you look at it that way, it’s not so surprising that PIs don’t submit that many R21/R03 applications: on average, they’re a bad time investment.

The same perverse incentives apply even if you focus on only R01 submissions. You might think that, other things being equal, NIH would prioritize proposals that ask for less money. That may well be true from an administrative standpoint, in the sense that, if two applications receive exactly the same score from a review panel, and are pretty similar in most respects, one imagines that most program officers would prefer to fund the proposal with the smaller budget. But the problem is that, in the grand scheme of things, discretionary awards (i.e., where the PO has the power to choose which award to fund) are a relatively small proportion of the total budget. The  majority of proposals get funded because they receive very good scores at review. And it turns out that, at review, asking for more money can actually work in a PI’s favor.

To see why, consider the official NIH guidelines for reviewing budgets. Reviewers are explicitly instructed not to judge a proposal’s merit based on its budget:

Unless specified otherwise in the Funding Opportunity Announcement, consideration of the budget and project period should not affect the overall impact score.

What should the reviewer do, in regards to the budget? Well, not much:

The reviewer should determine whether the requested budget is realistic for the conduct of the project proposed.

The explicit decoupling of budget from merit sets up a very serious problem, because if you allow yourself to ask for more money, you can also propose correspondingly grander work. By the time reviewers see your proposal, they have no real way of knowing whether you first decided on the minimum viable research program you want to run and then came up with an appropriate budget, or if you instead picked a largish number out of a hat and then proposed a perfectly reasonable (but large) amount of science you could do in order to fit that budget.

At the risk of making my own life a little bit more difficult, I’m willing to put my money where my mouth is on this point. For just about every proposal I’ve sent to NIH so far, I’ve asked for more money than I strictly need. Now, “need” is a tricky word in this context. I emphatically am not suggesting that I routinely ask NIH for more money just for the sake of having more money. I can honestly say that I’ve never asked for any funds that I didn’t think I could use responsibly in the pursuit of what I consider to be good science. But the trouble is, virtually every PI who’s ever applied for government funding will happily tell you that they could always do more good science if they just had more money. And, to a first order of approximation, they’re right. Unless a PI already has multiple major grants (which is a very small proportion of PIs at NIH), she or he probably could do more good work if given more money. There might be diminishing returns at some point, but for the most part it should not be terribly surprising if the average PI could increase her or his productivity level somewhat if given the money to hire more personnel, buy better equipment, run more experiments, and so on.

Unfortunately, the NIH budget is a zero-sum game. Every grant dollar I get is a grant dollar some other PI doesn’t get. So, when I go out and ask for a large-but-not-unreasonable amount of money, knowing full well that I could still run a research lab and get at least some good science done with less money, I am, in a sense, screwing everyone else over. Except that I’m not really screwing everyone else over, because everyone else is doing exactly the same thing I am. And the result is that we end up with a lot of PIs proposing a lot of very large projects. The PIs who win the grant lottery (because, increasingly, that’s what it is) will, generally, do a lot of good science with it. So it’s not so much that money is wasted; it’s more that it’s not distributed optimally, because the current system incentivizes people to ask for as much money as they think they can responsibly manage, rather than asking for the minimum amount they need to actually sustain a viable research enterprise.

The fix

The solution to this problem is, on paper, quite simple (which is probably why it’s only on paper). The way to induce PIs to ask for the minimum amount they think they can do their research with–thereby freeing up money for everyone else–is to explicitly yoke risk to reward, so that there’s a clearly discernible cost to asking for every increment in funding. You want $50,000 a year? Okay, that’s pretty easy to fund, so we’re not going to ask you a lot of questions. You want $500k/year? Well, hey, look, there are 10 people out in the hallway who each claim they can produce two papers a year on just $50k. So you’re going to have to explain why we should fund one of you instead of ten of them.

How would this proposal be implemented? There are many ways one could go about it, but here’s one that makes sense to me. First, we get rid of all of the research grant (R-type) mechanisms–except maybe for those that have some clearly differentiated purpose (e.g., R25s for training courses). Second, we introduce new R grant programs defined only by their budget caps and durations. For example, we might have R50s (max 50k/year for 2 years), R150s (max 150k/year for 3 years), R300s (max 300k/year for 5 years), and so on. The top tier would have no explicit cap, just like the current R01s. Third, we explicitly tie success rates to budget caps by deciding (and publicly disclosing) how much money we’re allocating to each tier. Each NIH institute would have to decide approximately what its payline for each tier would be for the next year–with the general constraint that the money would be allocated in such a way as to produce a strong inverse correlation between success rate and budget amount. So we might see, for instance, NIMH funding R50s at 50%, R150s at 28%, R300s at 22%, and R1000s at 8%. There would presumably be an initial period of fine-tuning, but over four or five award cycles, the system would almost certainly settle into a fairly stable equilibrium. Paylines would necessarily rise, because PIs would be incentivized to ask for only as much money as they truly need.

The objection(s)

Are there objections to the approach I’ve suggested above? Sure. Perhaps the most obvious concern will come from people who do genuinely “big” science–i.e., who work in fields where simply keeping a small lab running can cost hundreds of thousands of dollars a year. Researchers in such fields might complain that yoking success rates to budgets would mean that their colleagues who work on less expensive scientific problems have a major advantage when it comes to securing funding, and that Big Science types would consequently find it harder to survive.

There are several things to note about this objection. First, there’s actually no necessary reason why yoking success rates to budgets has to hurt larger applications. The only assumption this proposal depends on is that, at the moment, some proportion of budgets are inflated–i.e., there are many researchers who could operate successfully (if less comfortably) on smaller budgets than they currently do. The fact that many other investigators couldn’t operate on smaller budgets is immaterial. If 25% of NIH PIs voluntarily opt into a research grant program that guarantees higher success rates in return for smaller budgets, the other 75% of PIs could potentially benefit even if they do nothing at all (depending on how success rates are set). So if you currently run a lab that can’t possibly run on less than $500k/year, you don’t necessarily lose anything if one of your colleagues who was previously submitting grants with $250k annual budgets decides to start writing grants with $125k caps in return for, say, a 10% increase in funding likelihood. On the contrary, it could actually mean that there’s more money left over at the end of the day to fund your own big grants.

Now, it’s certainly true that NIH PIs who work in cheaper domains would have an easier time staying afloat than ones who work in expensive domains. And it’s also true that NIH could explicitly bias in favor of small grants by raising the success rates for small grants disproportionately. But that isn’t necessarily a problem. Personally, I would argue that a moderate bias towards small grants is actually a very good thing. Remember: funding is a zero-sum game. It may seem egalitarian to make success rates independent of operating costs, because it feels like we’re giving everyone a roughly equal shot at a career in biomedical science, no matter what science they like to do. But in another sense, we aren’t being egalitarian at all, because what we’re actually saying is that a scientist who likes to work on $500k problems is worth five times as much to the taxpayer as one who likes to work on $100k problems. That seems unlikely to be true in the general case (though it may certainly be true in a minority of cases), because it’s hard to believe that the cost of doing scientific research is very closely linked to the potential benefits to people’s health (i.e., there are almost certainly many very expensive scientific disciplines that don’t necessarily produce very big benefits to taxpayers). Personally, I don’t see anything wrong with setting a higher bar for research programs that cost more taxpayer money to fund. And note that I’m arguing against my own self-interest here, because my own research is relatively expensive (most of it involves software development, and the average developer salary is roughly double the average postdoc salary).

Lastly, it’s important to keep in mind that this proposal doesn’t in any way precludes the use of other, complementary, funding mechanisms. At present, NIH already routinely issues PAs and RFAs for proposals in areas of particular interest, or which for various reasons (including budget-related considerations) need to be considered separately from other applications. This wouldn’t change in any way under the proposed system. So, for example, if NIH officials decided that it was in the nation’s best interest to fund a round of $10 million grants to develop new heart transplant techniques, they could still issue a special call for such proposals. The plan I’ve sketched above would apply only to “normal” grants.

Okay, so that’s all I have. I was initially going to list a few other potential objections (and rebuttals), but decided to leave that for discussion. Please use the comments to tell me (and perhaps NIH) why this proposal would or wouldn’t work.