the capricious nature of p < .05, or why data peeking is evil

There’s a time-honored tradition in the social sciences–or at least psychology–that goes something like this. You decide on some provisional number of subjects you’d like to run in your study; usually it’s a nice round number like twenty or sixty, or some number that just happens to coincide with the sample size of the last successful study you ran. Or maybe it just happens to be your favorite number (which of course is forty-four). You get your graduate student to start running the study, and promptly forget about it for a couple of weeks while you go about writing up journal reviews that are three weeks overdue and chapters that are six months overdue.

A few weeks later, you decide you’d like to know how that Amazing New Experiment you’re running is going. You summon your RA and ask him, in magisterial tones, “how’s that Amazing New Experiment we’re running going?” To which he falteringly replies that he’s been very busy with all the other data entry and analysis chores you assigned him, so he’s only managed to collect data from eighteen subjects so far. But he promises to have the other eighty-two subjects done any day now.

“Not to worry,” you say. “We’ll just take a peek at the data now and see what it looks like; with any luck, you won’t even need to run any more subjects! By the way, here are my car keys; see if you can’t have it washed by 5 pm. Your job depends on it. Ha ha.”

Once your RA’s gone to soil himself somewhere, you gleefully plunge into the task of peeking at your data. You pivot your tables, plyr your data frame, and bravely sort your columns. Then you extract two of the more juicy variables for analysis, and after some careful surgery a t-test or six, you arrive at the conclusion that your hypothesis is… “marginally” supported. Which is to say, the magical p value is somewhere north of .05 and somewhere south of .10, and now it’s just parked by the curb waiting for you to give it better directions.

You briefly contemplate reporting your result as a one-tailed test–since it’s in the direction you predicted, right?–but ultimately decide against that. You recall the way your old Research Methods professor used to rail at length against the evils of one-sample tests, and even if you don’t remember exactly why they’re so evil, you’re not willing to take any chances. So you decide it can’t be helped; you need to collect some more data.

You summon your RA again. “Is my car washed yet?” you ask.

“No,” says your RA in a squeaky voice. “You just asked me to do that fifteen minutes ago.”

“Right, right,” you say. “I knew that.”

You then explain to your RA that he should suspend all other assigned duties for the next few days and prioritize running subjects in the Amazing New Experiment. “Abandon all other tasks!” you decree. “If it doesn’t involve collecting new data, it’s unimportant! Your job is to eat, sleep, and breathe new subjects! But not literally!”

Being quite clever, your RA sees an opening. “I guess you’ll want your car keys back, then,” he suggests.

“Nice try, Poindexter,” you say. “Abandon all other tasks… starting tomorrow.”

You also give your RA very careful instructions to email you the new data after every single subject, so that you can toss it into your spreadsheet and inspect the p value at every step. After all, there’s no sense in wasting perfectly good data; once your p value is below .05, you can just funnel the rest of the participants over to the Equally Amazing And Even Newer Experiment you’ve been planning to run as a follow-up. It’s a win-win proposition for everyone involved. Except maybe your RA, who’s still expected to return triumphant with a squeaky clean vehicle by 5 pm.

Twenty-six months and four rounds of review later, you publish the results of the Amazing New Experiment as Study 2 in a six-study paper in the Journal of Ambiguous Results. The reviewers raked you over the coals for everything from the suggested running head of the paper to the ratio between the abscissa and the ordinate in Figure 3. But what they couldn’t argue with was the p value in Study 2, which clocked in at just under p < .05, with only 21 subjects’ worth of data (compare that to the 80 you had to run in Study 4 to get a statistically significant result!). Suck on that, Reviewers!, you think to yourself pleasantly while driving yourself home from work in your shiny, shiny Honda Civic.

So ends our short parable, which has at least two subtle points to teach us. One is that it takes a really long time to publish anything; who has time to wait twenty-six months and go through four rounds of review?

The other, more important point, is that the desire to peek at one’s data, which often seems innocuous enough–and possibly even advisable (quality control is important, right?)–can actually be quite harmful. At least if you believe that the goal of doing research is to arrive at the truth, and not necessarily to publish statistically significant results.

The basic problem is that peeking at your data is rarely a passive process; most often, it’s done in the context of a decision-making process, where the goal is to determine whether or not you need to keep collecting data. There are two possible peeking outcomes that might lead you to decide to halt data collection: a very low p value (i.e., p < .05), in which case your hypothesis is supported and you may as well stop gathering evidence; or a very high p value, in which case you might decide that it’s unlikely you’re ever going to successfully reject the null, so you may as well throw in the towel. Either way, you’re making the decision to terminate the study based on the results you find in a provisional sample.

A complementary situation, which also happens not infrequently, occurs when you collect data from exactly as many participants as you decided ahead of time, only to find that your results aren’t quite what you’d like them to be (e.g., a marginally significant hypothesis test). In that case, it may be quite tempting to keep collecting data even though you’ve already hit your predetermined target. I can count on more than one hand the number of times I’ve overheard people say (often without any hint of guilt) something to the effect of “my p value’s at .06 right now, so I just need to collect data from a few more subjects.”

Here’s the problem with either (a) collecting more data in an effort to turn p < .06 into p < .05, or (b) ceasing data collection because you’ve already hit p < .05: any time you add another subject to your sample, there’s a fairly large probability the p value will go down purely by chance, even if there’s no effect. So there you are sitting at p < .06 with twenty-four subjects, and you decide to run a twenty-fifth subject. Well, let’s suppose that there actually isn’t a meaningful effect in the population, and that p < .06 value you’ve got is a (near) false positive. Adding that twenty-fifth subject can only do one of two things: it can raise your p value, or it can lower it. The exact probabilities of these two outcomes depends on the current effect size in your sample before adding the new subject; but generally speaking, they’ll rarely be very far from 50-50. So now you can see the problem: if you stop collecting data as soon as you get a significant result, you may well be capitalizing on chance. It could be that if you’d collected data from a twenty-sixth and twenty-seventh subject, the p value would reverse its trajectory and start rising. It could even be that if you’d collected data from two hundred subjects, the effect size would stabilize near zero. But you’d never know that if you stopped the study as soon as you got the results you were looking for.

Lest you think I’m exaggerating, and think that this problem falls into the famous class of things-statisticians-and-methodologists-get-all-anal-about-but-that-don’t-really-matter-in-the-real-world, here’s a sobering figure (taken from this chapter):

data_peeking

The figure shows the results of a simulation quantifying the increase in false positives associated with data peeking. The assumptions here are that (a) data peeking begins after about 10 subjects (starting earlier would further increase false positives, and starting later would decrease false positives somewhat), (b) the researcher stops as soon as a peek at the data reveals a result significant at p < .05, and (c) data peeking occurs at incremental steps of either 1 or 5 subjects. Given these assumptions, you can see that there’s a fairly monstrous rise in the actual Type I error rate (relative to the nominal rate of 5%). For instance, if the researcher initially plans to collect 60 subjects, but peeks at the data after every 5 subjects, there’s approximately a 17% chance that the threshold of p < .05 will be reached before the full sample of 60 subjects is collected. When data peeking occurs even more frequently (as might happen if a researcher is actively trying to turn p < .07 into p < .05, and is monitoring the results after each incremental participant), Type I error inflation is even worse. So unless you think there’s no practical difference between a 5% false positive rate and a 15 – 20% false positive rate, you should be concerned about data peeking; it’s not the kind of thing you just brush off as needless pedantry.

How do we stop ourselves from capitalizing on chance by looking at the data? Broadly speaking, there are two reasonable solutions. One is to just pick a number up front and stick with it. If you commit yourself to collecting data from exactly as many subjects as you said you would (you can proclaim the exact number loudly to anyone who’ll listen, if you find it helps), you’re then free to peek at the data all you want. After all, it’s not the act of observing the data that creates the problem; it’s the decision to terminate data collection based on your observation that matters.

The other alternative is to explicitly correct for data peeking. This is a common approach in large clinical trials, where data peeking is often ethically mandated, because you don’t want to either (a) harm people in the treatment group if the treatment turns out to have clear and dangerous side effects, or (b) prevent the control group from capitalizing on the treatment too if it seems very efficacious. In either event, you’d want to terminate the trial early. What researchers often do, then, is pick predetermined intervals at which to peek at the data, and then apply a correction to the p values that takes into account the number of, and interval between, peeking occasions. Provided you do things systematically in that way, peeking then becomes perfectly legitimate. Of course, the downside is that having to account for those extra inspections of the data makes your statistical tests more conservative. So if there aren’t any ethical issues that necessitate peeking, and you’re not worried about quality control issues that might be revealed by eyeballing the data, your best bet is usually to just pick a reasonable sample size (ideally, one based on power calculations) and stick with it.

Oh, and also, don’t make your RAs wash your car for you; that’s not their job.

undergraduates are WEIRD

This month’s issue of Nature Neuroscience contains an editorial lambasting the excessive reliance of psychologists on undergraduate college samples, which, it turns out, are pretty unrepresentative of humanity at large. The impetus for the editorial is a mammoth in-press review of cross-cultural studies by Joseph Henrich and colleagues, which, the authors suggest, collectively indicate that “samples drawn from Western, Educated, Industrialized, Rich and Democratic (WEIRD) societies … are among the least representative populations one could find for generalizing about humans.” I’ve only skimmed the article, but aside from the clever acronym, you could do a lot worse than these (rather graphic) opening paragraphs:

In the tropical forests of New Guinea the Etoro believe that for a boy to achieve manhood he must ingest the semen of his elders. This is accomplished through ritualized rites of passage that require young male initiates to fellate a senior member (Herdt, 1984; Kelley, 1980). In contrast, the nearby Kaluli maintain that  male initiation is only properly done by ritually delivering the semen through the initiate’s anus, not his mouth. The Etoro revile these Kaluli practices, finding them disgusting. To become a man in these societies, and eventually take a wife, every boy undergoes these initiations. Such boy-inseminating practices, which  are enmeshed in rich systems of meaning and imbued with local cultural values, were not uncommon among the traditional societies of Melanesia and Aboriginal Australia (Herdt, 1993), as well as in Ancient Greece and Tokugawa Japan.

Such in-depth studies of seemingly “exotic“ societies, historically the province of anthropology, are crucial for understanding human behavioral and psychological variation. However, this paper is not about these peoples. It’s about a truly unusual group: people from Western, Educated, Industrialized, Rich, and Democratic (WEIRD) societies. In particular, it’s about the Western, and more specifically American, undergraduates who form the bulk of the database in the experimental branches of psychology, cognitive science, and economics, as well as allied fields (hereafter collectively labeled the “behavioral sciences“). Given that scientific knowledge about human psychology is largely based on findings from this subpopulation, we ask just how representative are these typical subjects in light of the available comparative database. How justified are researchers in assuming a species-level generality for their findings? Here, we review the evidence regarding how WEIRD people compare to other
populations.

Anyway, it looks like a good paper. Based on a cursory read, the conclusions the authors draw seem pretty reasonable, if a bit strong. I think most researchers do already recognize that our dependence on undergraduates is unhealthy in many respects; it’s just that it’s difficult to break the habit, because the alternative is to spend a lot more time and money chasing down participants (and there are limits to that too; it just isn’t feasible for most researchers to conduct research with Etoro populations in New Guinea). Then again, just because it’s hard to do science the right way doesn’t really make it OK to do it the wrong way. So, to the extent that we care about our results generalizing across the entire human species (which, in many cases, we don’t), we should probably be investing more energy in weaning ourselves off undergraduates and trying to recruit more diverse samples.

cognitive training doesn’t work (much, if at all)

There’s a beautiful paper in Nature this week by Adrian Owen and colleagues that provides what’s probably as close to definitive evidence as you can get in any single study that “brain training” programs don’t work. Or at least, to the extent that they do work, the effects are so weak they’re probably not worth caring about.

Owen et al used a very clever approach to demonstrate their point. Rather than spending their time running small-sample studies that require people to come into the lab over multiple sessions (an expensive and very time-intensive effort that’s ultimately still usually underpowered), they teamed up with the BBC program ‘Bang Goes The Theory‘. Participants were recruited via the tv show, and were directed to an experimental website where they created accounts, engaged in “pre-training” cognitive testing, and then could repeatedly log on over the course of six weeks to perform a series of cognitive tasks supposedly capable of training executive abilities. After the training period, participants again performed the same battery of cognitive tests, enabling the researchers to compare performance pre- and post-training.

Of course, you expect robust practice effects with this kind of thing (i.e., participants would almost certainly do better on the post-training battery than on the pre-training battery solely because they’d been exposed to the tasks and had some practice). So Owen et al randomly assigned participants logging on to the website to two different training programs (involving different types of training tasks) or to a control condition in which participants answered obscure trivia questions rather than doing any sort of intensive cognitive training per se. The beauty of doing this all online was that the authors were able to obtain gargantuan sample sizes (several thousand in each condition), ensuring that statistical power wasn’t going to be an issue. Indeed, Owen et al focus almost explicitly on effect sizes rather than p values, because, as they point out, once you have several thousand participants in each group, almost everything is going to be statistically significant, so it’s really the effect sizes that matter.

The critical comparison was whether the experimental groups showed greater improvements in performance post-training than the control group did. And the answer, generally speaking, was no. Across four different tasks, the differences in training-related gains in the experimental group relative to the control group were always either very small (no larger than about a fifth of a standard deviation), or even nonexistent (to the extent that for some comparisons, the control group improved more than the experimental groups!). So the upshot is that if there is any benefit of cognitive training (and it’s not at all clear that there is, based on the data), it’s so small that it’s probably not worth caring about. Here’s the key figure:

owen_et_al

You could argue that the fact the y-axis spans the full range of possible values (rather than fitting the range of observed variation) is a bit misleading, since it’s only going to make any effects seem even smaller. But even so, it’s pretty clear these are not exactly large effects (and note that the key comparison is not the difference between light and dark bars, but the relative change from light to dark across the different groups).

Now, people who are invested (either intellectually or financially) in the efficacy of cognitive training programs might disagree, arguing that an effect of one-fifth of a standard deviation isn’t actually a tiny effect, and that there are arguably many situations in which that would be a meaningful boost in performance. But that’s the best possible estimate, and probably overstates the actual benefit. And there’s also the opportunity cost to consider: the average participant completed 20 – 30 training sessions, which, even at just 20 minutes a session (an estimate based on the description of the length of each of the training tasks), would take about 8 – 10 hours to complete (and some participants no doubt spent many more hours in training).  That’s a lot of time that could have been invested in other much more pleasant things, some of which might also conceivably improve cognitive ability (e.g., doing Sudoku puzzles, which many people actually seem to enjoy). Owen et al put it nicely:

To illustrate the size of the transfer effects observed in this study, consider the following representative example from the data. The increase in the number of digits that could be remembered following training on tests designed, at least in part, to improve memory (for example, in experimental group 2) was three-hundredth of a digit. Assuming a linear relationship between time spent training and improvement, it would take almost four years of training to remember one extra digit. Moreover, the control group improved by two-tenths of a digit, with no formal memory training at all.

If someone asked you if you wanted to spend six weeks doing a “brain training” program that would provide those kinds of returns, you’d probably politely (or impolitely) refuse. Especially since it’s not like most of us spend much of our time doing digit span tasks anyway; odds are that the kinds of real-world problems we’d like to perform a little better at (say, something trivial like figuring out what to buy or not to buy at the grocery store) are even further removed from the tasks Owen et al (and other groups) have used to test for transfer, so any observable benefits in the real world would presumably be even smaller.

Of course, no study is perfect, and there are three potential concerns I can see. The first is that it’s possible that there are subgroups within the tested population who do benefit much more from the cognitive training. That is, the miniscule overall effect could be masking heterogeneity within the sample, such that some people (say, maybe men above 60 with poor diets who don’t like intellectual activities) benefit much more. The trouble with this line of reasoning, though, is that the overall effects in the entire sample are so small that you’re pretty much forced to conclude that either (a) any group that benefits substantially from the training is a very small proportion of the total sample, or (b) that there are actually some people who suffer as a result of cognitive training, effectively balancing out the gains seen by other people. Neither of these possibilities seem particularly attractive.

The second concern is that it’s conceivable that the control group isn’t perfectly matched to the experimental group, because, by the authors’ own admission, the retention rate was much lower in the control group. Participants were randomly assigned to the three groups, but only about two-thirds as many control participants completed the study. The higher drop-out rate was apparently due to the fact that the obscure trivia questions used as a control task were pretty boring. The reason that’s a potential problem is that attrition wasn’t random, so there may be a systematic difference between participants in the experimental conditions and those in the control conditions. In particular, it’s possible that the remaining control participants had a higher tolerance for boredom and/or were somewhat smarter or more intellectual on average (answering obscure trivia questions clearly isn’t everyone’s cup of tea). If that were true, the lack of any difference between experimental and control conditions might be due to participant differences rather than an absence of a true training effect. Unfortunately, it’s hard to determine whether this might be true, because (as far as I can tell) Owen et al don’t provide the raw mean performance scores on the pre- and post-training testing for each group, but only report the changes in performance. What you’d want to know is that the control participants didn’t do substantially better or worse on the pre-training testing than the experimental participants (due to selective attrition of low-performing subjects), which might make changes in performance difficult to interpret. But at face value, it doesn’t seem very plausible that this would be a serious issue.

Lastly, Owen et al do report a small positive correlation between number of training sessions performed (which was under participants’ control) and gains in performance on the post-training test. Now, this effect was, as the authors note, very small (a maximal Spearman’s rho of .06), so that it’s also not really likely to have practical implications. Still, it does suggest that performance increases as a function of practice. So if we’re being pedantic, we should say that intensive cognitive training may improve cognitive performance in a generalized way, but that the effect is really minuscule and probably not worth the time and effort required to do the training in the first place. Which isn’t exactly the type of careful and measured claim that the people who sell brain training programs are generally interested in making.

At any rate, setting aside the debate over whether cognitive training works or not, one thing that’s perplexed me for a long time about the training literature is why people focus to such an extent on cognitive training rather than other training regimens that produce demonstrably larger transfer effects. I’m thinking in particular of aerobic exercise, which produces much more robust and replicable effects on cognitive performance. There’s a nice meta-analysis by Colcombe and colleagues that found effect sizes on the order of half a standard deviation and up for physical exercise in older adults–and effects were particularly large for the most heavily g-loaded tasks. Now, even if you allow for publication bias and other manifestations of the fudge factor, it’s almost certain that the true effect of physical exercise on cognitive performance is substantially larger than the (very small) effects of cognitive training as reported by Owen et al and others.

The bottom line is that, based on everything we know at the moment, the evidence seems to pretty strongly suggest that if your goal is to improve cognitive function, you’re more likely to see meaningful results by jogging or swimming regularly than by doing crossword puzzles or N-back tasks–particularly if you’re older. And of course, a pleasant side effect is that exercise also improves your health and (for at least some people) mood, which I don’t think N-back tasks do. Actually, many of the participants I’ve tested will tell you that doing the N-back is a distinctly dysphoric experience.

On a completely unrelated note, it’s kind of neat to see a journal like Nature publish what is essentially a null result. It goes to show that people do care about replication failures in some cases–namely, in those cases when the replication failure contradicts a relatively large existing literature, and is sufficiently highly powered to actually say something interesting about the likely effect sizes in question.

ResearchBlogging.org
Owen AM, Hampshire A, Grahn JA, Stenton R, Dajani S, Burns AS, Howard RJ, & Ballard CG (2010). Putting brain training to the test. Nature PMID: 20407435

Kahneman on happiness

The latest TED talk is an instant favorite of mine. Daniel Kahneman talks about the striking differences in the way we experience versus remember events:

It’s an entertaining and profoundly insightful 20-minute talk, and worth watching even if you think you’ve heard these ideas before.

The fundamental problem Kahneman discusses is that we all experience our lives on a moment-by-moment basis, and yet we make decisions based on our memories of the past. Unfortunately, it turns out that the experiencing self and the remembering self don’t necessarily agree about what things make us happy, and so we often end up in situations where we voluntarily make choices that actually substantially reduce our experienced utility. I won’t give away the examples Kahneman talks about, other than to say that they beautifully illustrate the relevance of psychology (or at least some branches of psychology) to the real-world decisions we all make–both the trival, day-to-day variety, and the rarer, life-or-death kind.

As an aside, Kahneman gave a talk at Brain Camp (or, officially, the annual Summer Institute in Cognitive Neuroscience, which may now be defunct–or perhaps only on hiatus?) the year I attended. There were a lot of great talks that year, but Kahneman’s really stood out for me, despite the fact that he hardly talked about research at all. It was more of a meditation on the scientific method–how to go about building and testing new theories. You don’t often hear a Nobel Prize winner tell an audience that the work that won the Nobel Prize was completely wrong, but that’s essentially what Kahneman claimed. Of course, his point wasn’t that Prospect Theory was useless, but rather, that many of the holes and limitations of the theory that people have gleefully pointed out over the last three decades were already well-recognized at the time the original findings were published. Kahneman and Tversky’s goal wasn’t to produce a perfect description or explanation of the mechanisms underlying human decision-making, but rather, an approximation that made certain important facts about human decision-making clear (e.g., the fact that people simply don’t follow the theory of Expected Utility), and opened the door to entirely new avenues of research. Kahneman seemed to think that ultimately what we really want isn’t a protracted series of incremental updates to Prospect Theory, but a more radical paradigm shift, and that in that sense, clinging to Prospect Theory might now actually be impeding progress.

You might think that’s a pretty pessimistic message–“hey, you can win a Nobel Prize for being completely wrong!”–but it really wasn’t; I actually found it quite uplifting (if Daniel Kahneman feels comfortable being mostly wrong about his ideas, why should the rest of us get attached to ours?). At least, that’s the way I remember it now. But that talk was nearly three years ago, you see, so my actual experience at the time may have been quite different. Turns out you can’t really trust my remembering self; it’ll tell you anything it thinks it wants me to hear.

in praise of (lab) rotation

I did my PhD in psychology, but in a department that had close ties and collaborations with neuroscience. One of the interesting things about psychology and neuroscience programs is that they seem to have quite different graduate training models, even in cases where the area of research substantively overlaps (e.g., in cognitive neuroscience). In psychology, there seem two be two general models (at least, at American and Canadian universities; I’m not really familiar with other systems). One is that graduate students are accepted into a specific lab and have ties to a specific advisor (or advisors); the other, more common at large state schools, is that graduate students are accepted into the program (or an area within the program) as a whole, and are then given the (relative) freedom to find an advisor they want to work with. There are pros and cons to either model: the former ensures that every student has a place in someone’s lab from the very beginning of training, so that no one falls through the cracks; but the downside is that beginning students often aren’t sure exactly what they want to work on, and there are occasional (and sometimes acrimonious) mentor-mentee divorces. The latter gives students more freedom to explore their research interests, but can make it more difficult for students to secure funding, and has more of a sink-or-swim flavor (i.e., there’s less institutional support for students).

Both of these models differ quite a bit from what I take to be the most common neuroscience model, which is that students spend all or part of their first year doing a series of rotations through various labs–usually for about 2 months at a time. The idea is to expose students to a variety of different lines of research so that they get a better sense of what people in different areas are doing, and can make a more informed judgment about what research they’d like to pursue. And there are obviously other benefits too: faculty get to evaluate students on a trial basis before making a long-term commitment, and conversely, students get to see the internal workings of the lab and have more contact with the lab head before signing on.

I’ve always thought the rotation model makes a lot of sense, and wonder why more psychology programs don’t try to implement one. I can’t complain about my own training, in that I had a really great experience on both personal and professional levels in the labs I worked in; but I recognize that this was almost entirely due to dumb luck. I didn’t really do my homework very well before entering graduate school, and I could easily have landed in a department or lab I didn’t mesh well with, and spent the next few years miserable and unproductive. I’ll freely admit that I was unusually clueless going into grad school (that’s a post for another time), but I think no matter how much research you do, there’s just no way to know for sure how well you’ll do in a particular lab until you’ve spent some time in it. And most first-year graduate students have kind of fickle interests anyway; it’s hard to know when you’re 22 or 23 exactly what problem you want to spend the rest of your life (or at least the next 4 – 7 years) working on. Having people do rotations in multiple labs seems like an ideal way to maximize the odds of students (and faculty) ending up in happy, productive working relationships.

A question, then, for people who’ve had experience on the administrative side of psychology (or neuroscience) departments: what keeps us from applying a rotation model in psychology too? Are there major disadvantages I’m missing? Is the problem one of financial support? Do we think that psychology students come into graduate programs with more focused interests? Or is it just a matter of convention? Inquiring minds (or at least one of them) want to know…

what do personality psychology and social psychology actually have in common?

Is there a valid (i.e., non-historical) reason why personality psychology and social psychology are so often lumped together as one branch of psychology? There are PSP journals, PSP conferences, PSP brownbags… the list goes on. It all seems kind of odd considering that, in some ways, personality psychologists and social psychologists have completely opposite focuses (foci?). Personality psychologists are all about the consistencies in people’s behavior, and classify situational variables under “measurement error”; social psychologists care not one whit for traits, and are all about how behavior is influenced by the situation. Also, aside from the conceptual tension, I’ve often gotten the sense that personality psychologists and social psychologists often just don’t like each other very much. Which I guess would make sense if you think these are two relatively distinct branches of psychology that, for whatever reason, have been lumped together inextricably for several decades. It’s kind of like being randomly assigned a roommate in college, except that you have to live with that roommate for the rest of your life.

I’m not saying there aren’t ways in which the two disciplines overlap. There are plenty of similarities; for example, they both tend to heavily feature self-report, and both often involve the study of social behavior. But that’s not really a good enough reason to lump them together. You can take almost any two branches of psychology and find a healthy intersection. For example, the interface between social psychology and cognitive psychology is one of the hottest areas of research in psychology at the moment. There’s a journal called Social Cognition–which, not coincidentally, is published by the International Social Cognition Network. Lots of people are interested in applying cognitive psychology models to social psychological issues. But you’d probably be taking bullets from both sides of the hallway if you ever suggested that your department should combine their social psychology and cognitive psychology brown bag series. Sure, there’s an overlap, but there’s also far more content that’s unique to each discipline.

The same is true for personality psychology and social psychology, I’d argue. Many (most?) personality psychologists aren’t intrinsically interested in social aspects of personality (at least, no more so than in other, non-social aspects), and many social psychologists couldn’t give a rat’s ass about the individual differences that make each of us a unique and special flower. And yet there we sit, week after week, all together in the same seminar room, as one half of the audience experiences rapture at the speaker’s words, and the other half wishes they could be slicing blades of grass off their lawn with dental floss. What gives?

what’s the point of intro psych?

Sanjay Srivastava comments on an article in Inside Higher Ed about the limitations of traditional introductory science courses, which (according to the IHE article) focus too much on rote memorization of facts and too little on the big questions central to scientific understanding. The IHE article is somewhat predictable in its suggestion that students should be engaged with key scientific concepts at an earlier stage:

One approach to breaking out of this pattern, [Shirley Tilghman] said, is to create seminars in which first-year students dive right into science — without spending years memorizing facts. She described a seminar — “The Role of Asymmetry in Development” — that she led for Princeton freshmen in her pre-presidential days.

Beyond the idea of seminars, Tilghman also outlined a more transformative approach to teaching introductory science material. David Botstein, a professor at the university, has developed the Integrated Science Curriculum, a two-year course that exposes students to the ideas they need to take advanced courses in several science disciplines. Botstein created the course with other faculty members and they found that they value many of the same scientific ideas, so an integrated approach could work.

Sanjay points out an interesting issue in translating this type of approach to psychology:

Would this work in psychology? I honestly don’t know. One of the big challenges in learning psychology — which generally isn’t an issue for biology or physics or chemistry — is the curse of prior knowledge. Students come to the class with an entire lifetime’s worth of naive theories about human behavior. Intro students wouldn’t invent hypotheses out of nowhere — they’d almost certainly recapitulate cultural wisdom, introspective projections, stereotypes, etc. Maybe that would be a problem. Or maybe it would be a tremendous benefit — what better way to start off learning psychology than to have some of your preconceptions shattered by data that you’ve collected yourself?

Prior knowledge certainly does seem to play a huge role in the study of psychology; there are some worldviews that are flatly incompatible with certain areas of psychological inquiry. So when some students encounter certain ideas in psychology classes–even introductory ones–they’re forced to either change their views about the way the world works, or (perhaps more commonly?) to discount those areas of psychology and/or the discipline as a whole.

One example of this is the aversion many people have to a reductionist, materialist worldview. If you really can’t abide by the idea that all of human experience ultimately derives from the machinations of dumb cells, with no ghost to be found anywhere in the machine, you’re probably not going to want to study the neural bases of romantic love. Similarly, if you can’t swallow the notion that our personalities appear to be shaped largely by our genes and random environmental influences–and show virtually no discernible influence of parental environment–you’re unlikely to want to be a behavioral geneticist when you grow up. More so than most other fields, psychology is full of ideas that turn our intuitions on our head. For many Intro Psych students who go on to study the mind professionally, that’s one of the things that makes the field so fascinating. But other students are probably turned off for the very same reason.

Taking a step back though, I think before you can evaluate how introductory classes ought to be taught, it’s important to ask what goal introductory courses are ultimately supposed to serve. Implicit in the views discussed in the IHE article is the idea that introductory science classes should basically serve as a jumping-off point for young scientists. The idea is that if you’re immersed in deep scientific ideas in your first year of university rather than your third or fourth, you’ll be that much better prepared for a career in science by the time you graduate. That’s certainly a valid view, but it’s far from the only one. Another perfectly legitimate view is that the primary purpose of an introductory science class isn’t really to serve the eventual practitioners of that science, who, after all, form a very small fraction of students in the class. Rather, it’s to provide a very large number of students with varying degrees of interest in science with a very cursory survey of the field. After all, the vast majority of students who sit through Intro Psych classes would never go onto careers in psychology no matter how the course was taught. You could mount a reasonable argument that exposing most students to “the ideas they need to take advanced courses in several science disciplines” would be a kind of academic malpractice,  because most students who take intro science classes (or at least, intro psychology) probably have no  real interest in taking advanced courses in the topic, and simply want to fill a distribution requirement or get a cursory overview of what the field is about.

The question of who intro classes should be designed for isn’t the only one that needs to be answered. Even if you feel quite certain that introductory science classes should always be taught with an eye to producing scientists, and you don’t care at all for the more populist idea of catering to the non-major masses, you still have to make other hard choices. For example, you need to decide whether you value breadth over depth, or information retention over enthusiasm for the course material. Say you’re determined to teach Intro Psych in such a way as to maximize the production of good psychologists. Do you pick just a few core topics that you think students will find most interesting, or most conducive to understanding key research concepts, and abandon those topics that turn people off? Such an approach might well encourage more students to take further psychology classes; but it does so at the risk of providing an unrepresentative view of the field, and failing to expose some students to ideas they might have benefited more from. Many Intro Psych students seem to really resent the lone week or two of the course when the lecturer covers neurons, action potentials and very basic neuroanatomy. For reasons that are quite inscrutable to me, many people just don’t like brainzzz. But I don’t think that common sentiment is sufficient grounds for cutting biology out of intro psychology entirely; you simply wouldn’t be getting an accurate picture of our current understanding of the mind without knowing at least something about the way the brain operates.

Of course, the trouble is that the way that people like me feel about the brain-related parts of intro psych is exactly the way other people feel about the social parts of intro psych, or the developmental parts, or the clown parts, and so on. Cut social psych out of intro psych so that you can focus on deep methodological issues in studying the brain, and you may well create students more likely to go on to a career in cognitive neuroscience. But you’re probably reducing the number of students who’ll end up going into social psychology. More generally, you’re turning Intro Psychology into Intro to Cognitive Neuroscience, which sort of defeats the point of it being an introductory course in the first place; after all, they’re called survey courses for a reason!

In an ideal world, we wouldn’t have to make these choices; we’d just make sure that all of our intro courses were always engaging and educational and promoted a deeper understanding of how to do science. But in the real world, it’s rarely possible to pull that off, and we’re typically forced to make trade-offs. You could probably promote student interest in psychology pretty easily by showing videos of agnosic patients all day long, but you’d be sacrificing breadth and depth of understanding. Conversely, you could maximize the amount of knowledge students retain from a class by hitting them over the head with information and testing them in every class, but then you shouldn’t be surprised if some students find the course unpleasant and lose interest in the subject matter. The balance between the depth, breadth, and entertainment value of introductory science classes is a delicate one, but it’s one that’s essential to consider before we can fairly evaluate different proposals as to how such classes ought to be structured.

got R? get social science for R!

Drew Conway has a great list of 10 must-have R packages for social scientists. If you’re a social scientist (or really, any kind of scientist) who doesn’t use R, now is a great time to dive in and learn; there are tons of tutorials and guides out there (my favorite is Quick-R, which is incredibly useful incredibly often), and packages are available for just about any application you can think of. Best of all, R is completely free, and is available for just about every platform. Admittedly, there’s a fairly steep learning curve if you’re used to GUI-based packages like SPSS (R’s syntax can be pretty idiosyncratic), but it’s totally worth the time investment, and once you’re comfortable with R you’ll never look back.

Anyway, Drew’s list contains a number of packages I’ve found invaluable in my work, as well as several packages I haven’t used before and am pretty eager to try. I don’t have much to add to his excellent summaries, but I’ll gladly second the inclusion of ggplot2 (the easiest way in the world to make beautiful graphs?) and plyr and sqldf (great for sanitizing, organizing, and manipulating large data sets, which are often a source of frustration in R). Most of the other packages I haven’t had any reason to use personally, though a few seem really cool, and worth finding an excuse to play around with (e.g., Statnet and igraph).

Since Drew’s list focuses on packages useful to social scientists in general, I thought I’d mention a couple of others that I’ve found particularly useful for psychological applications. The most obvious one is William Revelle‘s awesome psych package, which contains tons of useful functions for descriptive statistics, data reduction, simulation, and psychometrics. It’s saved me me tons of time validating and scoring personality measures, though it probably isn’t quite as useful if you don’t deal with individual difference measures regularly. Other packages I’ve found useful are sem for structural equation modeling (which interfaces nicely with GraphViz to easily produce clean-looking path diagrams), genalg for genetic algorithms, MASS (mostly for sampling from multivariate distributions), reshape (similar functionality to plyr), and car, which contains a bunch of useful regression-related functions (e.g., for my dissertation, I needed to run SPSS-like repeated measures ANOVAs in R, which turns out to be a more difficult proposition than you’d imagine, but was handled by the Anova function in car). I’m sure there are others I’m forgetting, but those are the ones that I’ve relied on most heavily in recent work. No doubt there are tons of other packages out there that are handly for common psychology applications, so if there are any you use regularly, I’d love to hear about them in the comments!