what Paul Meehl might say about graduate school admissions

Sanjay Srivastava has an excellent post up today discussing the common belief among many academics (or at least psychologists) that graduate school admission interviews aren’t very predictive of actual success, and should be assigned little or no weight when making admissions decisions:

The argument usually goes something like this: “All the evidence from personnel selection studies says that interviews don’t predict anything. We are wasting people’s time and money by interviewing grad students, and we are possibly making our decisions worse by substituting bad information for good.“

I have been hearing more or less that same thing for years, starting when I was grad school myself. In fact, I have heard it often enough that, not being familiar with the literature myself, I accepted what people were saying at face value. But I finally got curious about what the literature actually says, so I looked it up.

I confess that I must have been drinking from the kool-aid spigot, because until I read Sanjay’s post, I’d long believed something very much like this myself, and for much the same reason. I’d never bothered to actually, you know, look at the data myself. Turns out the evidence and the kool-aid are not compatible:

A little Google Scholaring for terms like “employment interviews“ and “incremental validity“ led me to a bunch of meta-analyses that concluded that in fact interviews can and do provide useful information above and beyond other valid sources of information (like cognitive ability tests, work sample tests, conscientiousness, etc.). One of the most heavily cited is a 1998 Psych Bulletin paper by Schmidt and Hunter (link is a pdf; it’s also discussed in this blog post). Another was this paper by Cortina et al, which makes finer distinctions among different kinds of interviews. The meta-analyses generally seem to agree that (a) interviews correlate with job performance assessments and other criterion measures, (b) interviews aren’t as strong predictors as cognitive ability, (c) but they do provide incremental (non-overlapping) information, and (d) in those meta-analyses that make distinctions between different kinds of interviews, structured interviews are better than unstructured interviews.

This seems entirely reasonable, and I agree with Sanjay that it clearly shows that admissions interviews aren’t useless, at least in an actuarial sense. That said, after thinking about it for a while, I’m not sure these findings really address the central question admissions committees care about. When deciding which candidates to admit as students, the relevant question isn’t really what factors predict success in graduate school?, it’s what factors should the admissions committee attend to when making a decision? These may seem like the same thing, but they’re not. And the reason they’re not is that knowing which factors are predictive of success is no guarantee that faculty are actually going to be able to use that information in an appropriate way. Knowing what predicts performance is only half the story, as it were; you also need to know exactly how to weight different factors appropriately in order to generate an optimal prediction.

In practice, humans turn out to be incredibly bad at predicting outcomes based on multiple factors. An enormous literature on mechanical (or actuarial) prediction, which Sanjay mentions in his post, has repeatedly demonstrated that in many domains, human judgments are consistently and often substantially outperformed by simple regression equations. There are several reasons for this gap, but one of the biggest ones is that people are just shitty at quantitatively integrating multiple continuous variables. When you visit a car dealership, you may very well be aware that your long-term satisfaction with any purchase is likely to depend on some combination of horsepower, handling, gas mileage, seating comfort, number of cupholders, and so on. But the odds that you’ll actually be able to combine that information in an optimal way are essentially nil. Our brains are simply not designed to work that way; you can’t internally compute the value you’d get out of a car using an equation like 1.03*cupholders + 0.021*horsepower + 0.3*mileage. Some of us try to do it that way–e.g., by making very long pro and con lists detailing all the relevant factors we can possibly think of–but it tends not to work out very well (e.g., you total up the numbers and realize, hey, that’s not the answer I wanted! And then you go buy that antique ’68 Cadillac you had your eye on the whole time you were pretending to count cupholders in the Nissan Maxima).

Admissions committees face much the same problem. The trouble lies not so much in determining which factors predict graduate school success (or, for that matter, many other outcomes we care about in daily life), but in determining how to best combine them. Knowing that interview performance incrementally improves predictions is only useful if you can actually trust decision-makers to weight that variable very lightly relative to other more meaningful predictors like GREs and GPAs. And that’s a difficult proposition, because I suspect that admissions discussions rarely go like this:

Faculty Member 1: I think we should accept Candidate X. Her GREs are off the chart, great GPA, already has two publications.
Faculty Member 2: I didn’t like X at all. She didn’t seem very excited to be here.
FM1: Well, that doesn’t matter so much. Unless you really got a strong feeling that she wouldn’t stick it out in the program, it probably won’t make much of a difference, performance-wise.
FM2: Okay, fine, we’ll accept her.

And more often go like this:

FM1: Let’s take Candidate X. Her GREs are off the chart, great GPA, already has two publications.
FM2: I didn’t like X at all. She didn’t seem very excited to be here.
FM1: Oh, you thought so too? That’s kind of how I felt too, but I didn’t want to say anything.
FM2: Okay, we won’t accept X. We have plenty of other good candidates with numbers that are nearly as good and who seemed more pleasant.

Admittedly, I don’t have any direct evidence to back up this conjecture. Except that I think it would be pretty remarkable if academic faculty departed from experts in pretty much every other domain that’s been tested (clinical practice, medical diagnosis, criminal recidivism, etc.) and were actually able to do as well (or even close to as well) as a simple regression equation. For what it’s worth, in many of the studies of mechanical prediction, the human experts are explicitly given all of the information passed to the prediction equation, and still do relatively poorly. In other words, you can hand a clinical psychologist a folder full of quantitative information about a patient, tell them to weight it however they want, and even the best clinicians are still going to be outperformed by a mechanical prediction (if you doubt this to be true, I second Sanjay in directing you to Paul Meehl’s seminal body of work–truly some of the most important and elegant work ever done in psychology, and if you haven’t read it, you’re missing out). And in some sense, faculty members aren’t really even experts about admissions, since they only do it once a year. So I’m pretty skeptical that admissions committees actually manage to weight their firsthand personal experience with candidates appropriately when making their final decisions. It seems much more likely that any personality impressions they come away with will just tend to drown out prior assessments based on (relatively) objective data.

That all said, I couldn’t agree more with Sanjay’s ultimate conclusion, so I’ll just end with this quote:

That, of course, is a testable question. So if you are an evidence-based curmudgeon, you should probably want some relevant data. I was not able to find any studies that specifically addressed the importance of rapport and interest-matching as predictors of later performance in a doctoral program. (Indeed, validity studies of graduate admissions are few and far between, and the ones I could find were mostly for medical school and MBA programs, which are very different from research-oriented Ph.D. programs.) It would be worth doing such studies, but not easy.

Oh, except that I do want to add that I really like the phrase “evidence-based curmudgeon“, and I’m totally stealing it.