better tools for mining the scientific literature

Freethinker’s Asylum has a great post reviewing a number of tools designed to help researchers mine the scientific literature–an increasingly daunting task. The impetus for the post is this article in the latest issue of Nature (note: restricted access), but the FA post discusses a lot of tools that the Nature article doesn’t, and focuses in particular on websites that are currently active and publicly accessible, rather than on proprietary tools currently under development in dark basement labs and warehouses. I hadn’t seen most of these before, but am looking forward to trying them out–e.g., pubget:

When you create an account, pubget signs in to your institution and allows you to search the subscribed resources. When you find a reference you want, just click the pdf icon and there it is. No clicking through to content provider websites. You can tag references as “keepers“ to come back to them later, or search for the newest articles from a particular journal.

Sounds pretty handy…

Many of the other sites–as well as most of those discussed in the Nature article–focus on data and literature mining in specific fields, e.g., PubGene and PubAnatomy. These services, which allow you to use specific keywords or topics (e.g., specific genes) to constrain literature searches, aren’t very useful to me personally. But it’s worth pointing out that there are some emerging services that fill much the same niche in the world of cognitive neuroscience that I’m more familiar with. The one that currently looks most promising, in my opinion, is the Cognitive Atlas project led by Russ Poldrack, which is “a collaborative knowledge building project that aims to develop a knowledge base (or ontology) that characterizes the state of current thought in cognitive science. … The Cognitive Atlas aims to capture knowledge from users with expertise in psychology, cognitive science, and neuroscience.”

The Cognitive Atlas is officially still in beta, and you need to have a background in cognitive neuroscience in order to sign up to contribute. But there’s already some content you can navigate, and the site, despite being in the early stages of development, is already pretty impressive. In the interest of full disclosure, as well as shameless plugging, I should note that Russ will be giving a talk about the Cognitive Atlas project as part of a symposium I’m chairing at CNS in Montreal this year. So if you want to learn more about it, stop by! Meantime, check out the Freethinker’s Asylum post for links to all sorts of other interesting tools…

one possible future of scientific publishing

Like many (most?) scientists, I’ve often wondered what a world without Elsevier would look like. Not just Elsevier, mind you; really, the entire current structure of academic publishing, which revolves around a gatekeeper model where decisions about what gets published where are concentrated in the hands of a very few people (typically, an editor and two or three reviewers). The way scientists publish papers really hasn’t kept up with the pace of technology; the tools we have these days allow us, in theory, to build systems that support the immediate and open publication of scientific findings, which could then be publicly reviewed, collaboratively filtered, and quantitatively evaluated using all sorts of metrics that just aren’t available in a closed system.

One particularly compelling vision is articulated by Niko Kriegeskorte, who presents a beautiful schematic of one potential approach to the future of academic publishing. I’m a big fan of Niko’s work (see e.g., this, this, or this)–almost everything he publishes is great, and his articles consistently feature absolutely stunning figures–and these ideas are no exception. The central motif, which I’m wholly sympathetic to, is to eliminate gatekeepers and promote open review and evaluation. Instead of a secret cabal small group of other researchers (and potential competitors) making behind-the-scenes decisions about whether to accept or reject your paper, you’d publish your findings online in a centralized repository as soon as you felt it was ready for prime time. At that point, the broader community of researchers would set about evaluating, rating, and commenting on your work. Crucially, all of the reviews would also be made public (either in signed or anonymous form), so that other researchers could evaluate not only the work itself, but also the responses to it. Reviews would therefore count as a form of publication, and one can then imagine all sorts of sophisticated metrics that could take into account not only the reception of one’s publications, but also the quality and nature of the reviews themselves, the quality of one’s own ratings of others’ work, and so on. Schematically, it looks like this:

review review review!

Anyway, that’s just a cursory overview; Niko’s clearly put a lot of thought into developing a publishing architecture that overcomes the drawbacks of the current system while providing researchers with an incentive to participate (the sociological obstacles are arguably greater than the technical ones in this case). Well, at least in theory. Admittedly, it’s always easier to design a complex system on paper than to actually build it and make it work. But you have to start somewhere, and this seems like a pretty good place.

what do turtles, sea slugs, religion, and TED all have in common?

…absolutely nothing, actually, except that they’re all mentioned in this post. I’m feeling lazy very busy this week, so instead of writing a long and boring diatribe about clowns, ROIs, or personality measures, I’ll just link to a few interesting pieces elsewhere:

Razib of Gene Expression has an interesting post on the rapid secularization of America, and the relation of religious affiliation to political party identification. You wouldn’t know it from the increasing political clout of the religious right, but Americans are substantially more likely to report having no religious affiliation today than they were 20 years ago. I mean a lot more likely. In Vermont, over a third of the population now reports having no religion. Here’s an idea, Vermont: want to generate more tourism? I present your new slogan: Vermont, America’s Europe.

Sea slugs are awesome. If you doubt this, consider Exhibit A: a sea slug found off the East Coast that lives off photosynthesis:

The slugs look just like a leaf, green and about three centimetres long, and are found off the east coast of North America from Nova Scotia to Florida.

They acquire the ability to photosynthesize by eating algae and incorporating the plants’ tiny chlorophyll-containing structures, called chloroplasts, into their own cells.

You can’t make this stuff up! It’s a slug! That eats algae! And then turns into  leaf!

I’m a big fan of TED, and there’s a great interview with its curator, Chris Anderson, conducted by reddit. Reddit interviews are usually pretty good (see, e.g., Barney Frank and Christopher Hitchens); who knew the internet had the makings of a great journalist?!?

Ok, now for the turtles. According to PalMD, they cause salmonella. So much so that the CDC banned the sale of turtles under 4 inches in length in 1975. Apparently children just loved to smooch those cute little turtles. And the turtles, being evil, loved to give children a cute little case of salmonella. Result: ban small turtles and prevent 200,000 infections. Next up: frog-banning and salami-banning! Both are currently also suspected of causing salmonella outbreaks. Is there any species those bacteria can’t corrupt?

sea slug or leaf?

elsewhere on the internets…

The good people over at OKCupid, the best dating site on Earth (their words, not mine! I’m happily married!), just released a new slew of data on their OKTrends blog. Apparently men like women with smiley, flirty profile photos, and women like dismissive, unsmiling men. It’s pretty neat stuff, and definitely worth a read. Mating rituals aside, thuough, what I really like to think about whenever I see a new OKTrends post is how many people I’d be willing to kill to get my hands on their data.

Genetic Future covers the emergence of Counsyl, a new player in the field of personal genomics. Unlike existing outfits like 23andme and, Counsyl focuses on rare Mendelian disorders, with an eye to helping prospective parents evaluate their genetic liabilities. What’s really interesting about Counsyl is its business model; if you have health insurance provided by Aetna or Blue Cross, you could potentially get a free test. Of course, the catch is that Aetna or Blue Cross get access to your results. In theory, this shouldn’t matter, since health insurers can’t use genetic information as grounds for discrimination. But then, on paper, employers can’t use race, gender, or sexual orientation as grounds for discrimination either, and yet we know it’s easier to get hired if your name is John than Jamal. That said, I’d probably go ahead and take Aetna up on its generous offer, except that my wife and I have no plans for kids, and the Counsyl test looks like it stays away from the garden-variety SNPs the other services cover…

The UK has banned the export of dowsing rods. In 2010! This would be kind of funny if not for the fact that dozens if not hundreds of Iraqis have probably died horrible deaths as a result of the Iraqi police force trying to detect roadside bombs using magic. [via Why Evolution is True].

Over at Freakonomics, regular contributor Ryan Hagen interviews psychologist, magician, and author Richard Wiseman, who just published a new empirically-based self-help book (can such a thing exist?). I haven’t read the book, but the interview is pretty good. Favorite quote:

What would I want to do? I quite like the idea of the random giving of animals. There’s a study where they took two groups of people and randomly gave people in one group a dog. But I’d quite like to replicate that with a much wider range of animals — including those that should be in zoos. I like the idea of signing up for a study, and you get home and find you’ve got to look after a wolf “¦ .

On a professional note, Professor in Training has a really great two part series (1, 2) on what new tenure-track faculty need to know before starting the job. I’ve placed both posts inside Google Reader’s golden-starred vault, and fully expect to come back to them next Fall when I’m on the job market. Which means if you’re reading this and you’re thinking of hiring me, be warned: I will demand that a life-size bobble-head doll of Hans Eysenck be installed in my office, and thanks to PiT, I do now have the awesome negotiating powers needed to make it happen.

why do we sing up?

While singing loudly to myself in the car the other day (windows safely rolled up, of course–I don’t want no funny looks from pedestrians), I noticed that the first few notes of the vocal melody of most songs seem to go up rather than down. That’s to say, the first pitch change in most songs seems to be from a lower note to a higher note; there don’t seem to be very many that show the opposite pattern. It actually took me a while to find a song that goes down at the beginning (Elliott Smith‘s Angeles–incidentally also my favorite song—which hangs on B for the first few bars before dropping to A); the first eight or nine I tried all went up. After carefully inspecting millions thousands hundreds several more songs on my drive home, I established that only around 10% of vocal melodies dip down initially (95% confidence interval = 0 – 80%); the rest all go up.

When I got home I did a slightly more systematic but still totally ascientific analysis. I write songs occasionally, so I went through them all, and found that only three or four go down; the rest all go up. But that could just be me. So then I went through a few of my favorite albums (not a random sample–I picked ones I knew well enough to rehearse mentally) and found the same pattern. I don’t know if this is a function of the genre of music I listen to (which I’d charitably describe as wuss-rock) or a general feature of most music, but it seems odd. Not having any musical training or talent, I’m not really sure why that would be, but I’d like to know. Does it have anything to do with the fact that most common chord progressions go up the scale initially? Is there some biological reason we find ascending notes more pleasant at the beginning of a melody? Is it a function of our speech production system? Does Broca’s Area just like going up more than going down? Is it just an arbitrary matter of convention, instilled in songwriters everywhere by all the upward-bound music that came before? And is the initial rise specific to English, or does it happen when people sing in other languages as well? Could it be something about the emotional connotation of rises versus drops? Do drops seem too depressing to kick off a song with? Are evil clowns behind it all? Or am I just imagining the whole thing, and there isn’t actually any bias toward initial upness?

Can we get an update on this? Are there any musicians/musicologists in the house?

how to measure 200 personality scales in 200 items

One of the frustrating things about personality research–for both researchers and participants–is that personality is usually measured using self-report questionnaires, and filling out self-report questionnaires can take a very long time. It doesn’t have to take a very long time, mind you; some questionnaires are very short, like the widely-used Ten-Item Personality Inventory (TIPI), which might take you a whole 2 minutes to fill out on a bad day. So you can measure personality quickly if you have to. But more often than not, researchers want to reliably measure a broad range of different personality traits, and that typically requires administering one or more long-ish questionnaires. For example, in my studies, I often give participants a battery of measures to fill out that includes some combination of the NEO-PI-R, EPQ-R, BIS/BAS scales, UPPS, GRAPES, BDI, TMAS, STAI, and a number of others. That’s a large set of acronyms, and yet it’s just a small fraction of what’s out there; every personality psychologist has his or her own set of favorite measures, and at personality conferences, duels-to-the-death often break out over silly things like whether measure X is better than measure Y, or whether measures A and B can be used interchangeably when no one’s looking. Personality measurement is a pretty intense sport.

The trouble with the way we usually measure personality is that it’s wildly inefficient, for two reasons. One is that many measures are much longer than they need to be. It’s not uncommon to see measures that score each personality trait using a dozen or more different items. In theory, the benefit of this type of redundancy is that you get a more reliable measure, because the error terms associated with individual items tends to cancel out. For example, if you want to know if I’m a depressive kind of guy, you shouldn’t just ask me, “hey, are you depressed?”, because lots of random factors could influence my answer to that one question. Instead, you should ask me a bunch of different questions, like: “hey, are you depressed?” and “why so glum, chum?”, and “does somebody need a hug?”. Adding up responses from multiple items is generally going to give you a more reliable measure. But in practice, it turns out that you typically don’t need more than a handful of items to measure most traits reliably. When people develop “short forms” of measures, the abbreviated scales often have just 4 – 5 items per trait, usually with relatively little loss of reliability and validity. So the fact that most of the measures we use have so many items on them is sort of a waste of both researchers’ and participants’ time.

The other reason personality measurement is inefficient is that most researchers recognize that different personality measures tend to measure related aspects of personality, and yet we persist in administering a whole bunch of questionnaires with similar content to our participants. If you’ve ever participated in a psychology experiment that involved filling out personality questionnaires, there’s a good chance you’ve wondered whether you’re just filling out the same questionnaire over and over. Well you are–kind of. Because the space of personality variation is limited (people can only differ from one another in so many ways), and because many personality constructs have complex interrelationships with one another, personality measures usually end up asking similarly-worded questions. So for example, one measure might give you Extraversion and Agreeableness scores whereas another gives you Dominance and Affiliation scores. But then it turns out that the former pair of dimensions can be “rotated” into the latter two; it’s just a matter of how you partition (or label) the variance. So really, when a researcher gives his or her participants a dozen measures to fill out, that’s not because anyone thinks that there are really a dozen completely different sets of traits to measures; it’s more because we recognize that each instrument gives you a slightly different take on personality, and we tend to think that having multiple potential viewpoints is generally a good thing.

Inefficient personality measurement isn’t inevitable; as I’ve already alluded to above, a number of researchers have developed abbreviated versions of common inventories that capture most of the same variance as much longer instruments. Probably the best-known example is the aforementioned TIPI, developed by Sam Gosling and colleagues, which gives you a workable index of people’s relative standing on the so-called Big Five dimensions of personality. But there are relatively few such abbreviated measures. And to the best of my knowledge, the ones that do exist are all focused on abbreviating a single personality measure. That’s unfortunate, because if you believe that most personality inventories have a substantial amount of overlap, it follows that you should be able to recapture scores on multiple different personality inventories using just one set of (non-redundant) items.

That’s exactly what I try to demonstrate in a paper to be published in the Journal of Research in Personality. The article’s entitled “The abbreviation of personality: How to measure 200 personality scales in 200 items“, which is a pretty accurate, if admittedly somewhat grandiose, description of the contents. The basic goal of the paper is two-fold. First, I develop an automated method for abbreviating personality inventories (or really, any kind of measure with multiple items and/or dimensions). The idea here is to shorten the time and effort required in order to generate shorter versions of existing measures, which should hopefully encourage more researchers to create such short forms. The approach I develop relies heavily on genetic algorithms, which are tools for programmatically obtaining high-quality solutions to high-dimensional problems using simple evolutionary principles. I won’t go into the details (read the paper if you want them!), but I think it works quite well. In the first two studies reported in the paper (data for which were very generously provided by Sam Gosling and Lew Goldberg, respectively), I show that you can reduce the length of existing measures (using the Big Five Inventory and the NEO-PI-R as two examples) quite dramatically with minimal loss of validity. It only takes a few minutes to generate the abbreviated measures, so in theory, it should be possible to build up a database of abbreviated versions of many different measures. I’ve started to put together a site that might eventually serve that purpose (, but it’s still in the preliminary stages of development, and may or may not get off the ground.

The other main goal of the paper is to show that the same general approach can be applied to simultaneously abbreviate more than one different measure. To make the strongest case I could think of, I took 8 different broadband personality inventories (“broadband” here just means they each measure a relatively large number of personality traits) that collectively comprise 203 different personality scales and 2,091 different items. Using the same genetic algorithm-based approach, I then reduce these 8 measures down to a single inventory that contains only 181 items (hence the title of the paper). I named the inventory the AMBI (Analog to Multiple Broadband Inventories), and it’s now freely available for use (items and scoring keys are provided both in the paper and at It’s certainly not perfect–it does a much better job capturing some scales than others–but if you have limited time available for personality measures, and still want a reasonably comprehensive survey of different traits, I think it does a really nice job. Certainly, I’d argue it’s better than having to administer many hundreds (if not thousands) of different items to achieve the same effect. So if you have about 15 – 20 minutes to spare in a study and want some personality data, please consider trying out the AMBI!

Yarkoni, T. (2010). The Abbreviation of Personality, or how to Measure 200 Personality Scales with 200 Items Journal of Research in Personality DOI: 10.1016/j.jrp.2010.01.002

on the limitations of psychiatry, or why bad drugs can be good too

The Neuroskeptic offers a scathing indictment of the notion, editoralized in Nature this week, that the next decade is going to revolutionize the understanding and treatment of psychiatric disorders:

The 2010s is not the decade for psychiatric disorders. Clinically, that decade was the 1950s. The 50s was when the first generation of psychiatric drugs were discovered – neuroleptics for psychosis (1952), MAOis (1952) and tricyclics (1957) for depression, and lithium for mania (1949, although it took a while to catch on).

Since then, there have been plenty of new drugs invented, but not a single one has proven more effective than those available in 1959. New antidepressants like Prozac are safer in overdose, and have milder side effects, than older ones. New “atypical” antipsychotics have different side effects to older ones. But they work no better. Compared to lithium, newer “mood stabilizers” probably aren’t even as good. (The only exception is clozapine, a powerful antipsychotic, but dangerous side-effects limit its use.)

Those are pretty strong claims–especially the assertion that not a single psychiatric drug has proven more effective than those available in 1959. Are they true? I’m not in a position to know for certain, having had only fleeting contacts here and there with psychiatric research. But I guess I’d be surprised if many basic researchers in psychiatry concurred with that assessment. (I’m sure many clinicians wouldn’t, but that wouldn’t be very surprising.) Still, even if you suppose that present-day drugs are no more effective than those available in 1959 on the average (which may or may not be true), it doesn’t follow that there haven’t been major advances in psychiatric treatment. For one thing, the side effects of many modern drugs do tend to be less severe. The Neuroskeptic is right that atypical antipsychotics aren’t as side effect-free as was once hoped; but consider, in contrast, drugs like lamotrigine or valproate–anticonvulsants nowadays widely prescribed for bipolar disorder–which are undeniably less toxic than lithium (though also no more, and possibly less, effective). If you’re diagnosed with bipolar disorder in 2010, there’s still a good chance that you’ll eventually end up being prescribed with lithium;, but (in most cases) it’s unlikely that that’ll be the first line of treatment. And on the bright side, you could end up with a well-managed case of bipolar disorder that never requires you to take drugs with frequent and severe side effects–something that frankly wouldn’t have been an option for almost anyone in 1959.

That last point gets to what I think is the bigger reason for optimism: choice. Even if new drugs aren’t any better than old drugs on average, they’re probably going to work for different groups of people. One of the things that’s problematic about the way the results of clinical trials are typically interpreted is that if a new drug doesn’t outperform an old one, it’s often dismissed as unhelpful. The trouble with this worldview is that even if drug A helps 60% of people on average and drug B helps 54% of people on average (and the difference is statistically and clinically significant), it may well be that drug B helps people who don’t benefit from drug A. The unfortunate reality is that even relatively stable psychiatric patients usually take a while to find an effective treatment regime; most patients try several treatments before settling on one that works. Simply in virtue of there being dozens more drugs available in 2009 than in 1959, it follows that psychiatric patients are much better off living today than fifty years ago. If an atypical antipsychotic controls your schizophrenia without causing motor symptoms or metabolic syndrome, you never have to try a typical antipsychotic; if valproate works well for your bipolar disorder, there’s no reason for you to ever go on lithium. These aren’t small advances; when you’re talking about millions of people who suffer from each of these disorders worldwide, the introduction of any drug that might help even just a fraction of patients who weren’t helped by older medication is a big deal, translating into huge improvements in quality of life and many tens of thousands of lives saved. That’s not to say we shouldn’t strive to develop drugs that aren’t also better on average than the older treatments; it’s just that it shouldn’t be the only (and perhaps not even the main) criterion we use to gauge efficacy.

Having said that, I do agree with the Neuroskeptic’s assessment as to why psychiatric research and treatment seems to proceed more slowly than research in other areas of neuroscience or medicine:

Why? That’s an excellent question. But if you ask me, and judging by the academic literature I’m not alone, the answer is: diagnosis. The weak link in psychiatry research is the diagnoses we are forced to use: “major depressive disorder”, “schizophrenia”, etc.

There are all sorts of methodological reasons why it’s not a great idea to use discrete diagnostic categories when studying (or developing treatments for) mental health disorders. But perhaps the biggest one is that, in cases where a disorder has multiple contributing factors (which is to say, virtually always), drawing a distinction between people with the disorder and those without it severely restricts the range of expression of various related phenotypes, and may even assign people with positive symptomatology to the wrong half of the divide simply because they don’t have some other (relatively) arbitrary symptoms.

For example, take bipolar disorder. If you classify the population into people with bipolar disorder and people without it, you’re doing two rather unfortunate things. One is that you’re lumping together a group of people who have only a partial overlap of symptomatology, and treating them as though they have identical status. One person’s disorder might be characterized by persistent severe depression punctuated by short-lived bouts of mania every few months; another person might cycle rapidly between a variety of moods multiple times per month, week, or even day. Assigning both people the same diagnosis in a clinical study is potentially problematic in that there may be very different underlying organic disorders, which means you’re basically averaging over multiple discrete mechanisms in your analysis, resulting in a loss of both sensitivity and specificity.

The other problem, which I think is less widely appreciated, is that you’ll invariably have many “control” subjects who don’t receive the diagnosis but share many features with people who do. This problem is analogous to the injunction against using median splits: you almost never want to turn an interval-level variable into an ordinal one if you don’t have to, because you lose a tremendous amount of information. When you contrast a sample of people with a bipolar diagnosis with a group of “healthy” controls, you’re inadvertently weakening your comparison by including in the control group people who would be best characterizing as falling somewhere in between the extremes of pathological and healthy. For example, most of us probably know people who we would characterize as “functionally manic” (sometimes also known as “extraverts”)–that is, people who seem to reap the benefits of the stereotypical bipolar syndrome in the manic phase (high energy, confidence, and activity level) but have none of the downside of the depressive phase. And we certainly know people who seem to have trouble regulating their moods, and oscillate between periods of highs and lows–but perhaps just not to quite the extent necessary to obtain a DSM-IV diagnosis. We do ourselves a tremendous disservice if we call these people “controls”. Sure, they might be controls for some aspects of bipolar symptomatology (e.g., people who are consistently energetic serve as a good contrast to the dysphoria of the depressive phase); but in other respects, they may actually closer to the prototypical patient than to most other people.

From a methodological standpoint, there’s no question we’d be much better off focusing on symptoms rather than classifications. If you want to understand the many different factors that contribute to bipolar disorder or schizophrenia, you shouldn’t start from the diagnosis and work backwards; you should start by asking what symptom constellations are associated with specific mechanisms. And those symptoms may well be present (to varying extents) both in people with and without the disorder in question. That’s precisely the motivation behind the current “endophenotype” movement, where the rationale is that you’re better off trying to figure out what biological and (eventually) behavioral changes a given genetic polymorphism is associated with, and then using that information to reshape taxonomies of mental health disorders, than trying to go directly from diagnosis to genetic mechanisms.

Of course, it’s easy to talk about the problems associated with the way psychiatric diagnoses are applied, and not so easy to fix them. Part of the problem is that, while researchers in the lab have the luxury of using large samples that are defined on the basis of symptomatology rather than classification (a luxury that, as the Neuroskeptic and others have astutely observed, many researchers fail to take advantage of), clinicians generally don’t. When you see a patient come in complain of dsyphoria and mood swings, it’s not particularly useful to say “you seem to be in the 96th percentile for negative affect, and have unusual trouble controlling your mood; let’s study this some more, mmmkay?” What you need is some systematic way of going from symptoms to treatment, and the DSM-IV offers a relatively straightforward (though wildly imperfect) way to do that. And then too, the reality is that most clinicians (at least, the ones I’ve talked to) don’t just rely on some algorithmic scheme for picking out drugs; they instead rely on a mix of professional guidelines, implicit theories, and (occasionally) scientific literature when making decisions about what types of symptom constellations have, in their experience, benefited more or less from specific drugs. The problem is that those decisions often fail to achieve their intended goal, and so you end up with a process of trial-and-error, where most patients might try half a dozen medications before they find one that works (if they’re lucky). But that only takes us back to why it’s actually a good thing that we have so many more medications in 2009 than 1959, even if they’re not necessary individually more effective. So, yes, psychiatric research has some major failings compared to other areas of biomedical research–though I do think that’s partly (though certainly not entirely) because the problems are harder. But I don’t think it’s fair to suggest we haven’t made any solid advances in the treatment or understanding of psychiatric disorders in the last half-century. We have; it’s just that we could do much better.

what’s the point of intro psych?

Sanjay Srivastava comments on an article in Inside Higher Ed about the limitations of traditional introductory science courses, which (according to the IHE article) focus too much on rote memorization of facts and too little on the big questions central to scientific understanding. The IHE article is somewhat predictable in its suggestion that students should be engaged with key scientific concepts at an earlier stage:

One approach to breaking out of this pattern, [Shirley Tilghman] said, is to create seminars in which first-year students dive right into science — without spending years memorizing facts. She described a seminar — “The Role of Asymmetry in Development” — that she led for Princeton freshmen in her pre-presidential days.

Beyond the idea of seminars, Tilghman also outlined a more transformative approach to teaching introductory science material. David Botstein, a professor at the university, has developed the Integrated Science Curriculum, a two-year course that exposes students to the ideas they need to take advanced courses in several science disciplines. Botstein created the course with other faculty members and they found that they value many of the same scientific ideas, so an integrated approach could work.

Sanjay points out an interesting issue in translating this type of approach to psychology:

Would this work in psychology? I honestly don’t know. One of the big challenges in learning psychology — which generally isn’t an issue for biology or physics or chemistry — is the curse of prior knowledge. Students come to the class with an entire lifetime’s worth of naive theories about human behavior. Intro students wouldn’t invent hypotheses out of nowhere — they’d almost certainly recapitulate cultural wisdom, introspective projections, stereotypes, etc. Maybe that would be a problem. Or maybe it would be a tremendous benefit — what better way to start off learning psychology than to have some of your preconceptions shattered by data that you’ve collected yourself?

Prior knowledge certainly does seem to play a huge role in the study of psychology; there are some worldviews that are flatly incompatible with certain areas of psychological inquiry. So when some students encounter certain ideas in psychology classes–even introductory ones–they’re forced to either change their views about the way the world works, or (perhaps more commonly?) to discount those areas of psychology and/or the discipline as a whole.

One example of this is the aversion many people have to a reductionist, materialist worldview. If you really can’t abide by the idea that all of human experience ultimately derives from the machinations of dumb cells, with no ghost to be found anywhere in the machine, you’re probably not going to want to study the neural bases of romantic love. Similarly, if you can’t swallow the notion that our personalities appear to be shaped largely by our genes and random environmental influences–and show virtually no discernible influence of parental environment–you’re unlikely to want to be a behavioral geneticist when you grow up. More so than most other fields, psychology is full of ideas that turn our intuitions on our head. For many Intro Psych students who go on to study the mind professionally, that’s one of the things that makes the field so fascinating. But other students are probably turned off for the very same reason.

Taking a step back though, I think before you can evaluate how introductory classes ought to be taught, it’s important to ask what goal introductory courses are ultimately supposed to serve. Implicit in the views discussed in the IHE article is the idea that introductory science classes should basically serve as a jumping-off point for young scientists. The idea is that if you’re immersed in deep scientific ideas in your first year of university rather than your third or fourth, you’ll be that much better prepared for a career in science by the time you graduate. That’s certainly a valid view, but it’s far from the only one. Another perfectly legitimate view is that the primary purpose of an introductory science class isn’t really to serve the eventual practitioners of that science, who, after all, form a very small fraction of students in the class. Rather, it’s to provide a very large number of students with varying degrees of interest in science with a very cursory survey of the field. After all, the vast majority of students who sit through Intro Psych classes would never go onto careers in psychology no matter how the course was taught. You could mount a reasonable argument that exposing most students to “the ideas they need to take advanced courses in several science disciplines” would be a kind of academic malpractice,  because most students who take intro science classes (or at least, intro psychology) probably have no  real interest in taking advanced courses in the topic, and simply want to fill a distribution requirement or get a cursory overview of what the field is about.

The question of who intro classes should be designed for isn’t the only one that needs to be answered. Even if you feel quite certain that introductory science classes should always be taught with an eye to producing scientists, and you don’t care at all for the more populist idea of catering to the non-major masses, you still have to make other hard choices. For example, you need to decide whether you value breadth over depth, or information retention over enthusiasm for the course material. Say you’re determined to teach Intro Psych in such a way as to maximize the production of good psychologists. Do you pick just a few core topics that you think students will find most interesting, or most conducive to understanding key research concepts, and abandon those topics that turn people off? Such an approach might well encourage more students to take further psychology classes; but it does so at the risk of providing an unrepresentative view of the field, and failing to expose some students to ideas they might have benefited more from. Many Intro Psych students seem to really resent the lone week or two of the course when the lecturer covers neurons, action potentials and very basic neuroanatomy. For reasons that are quite inscrutable to me, many people just don’t like brainzzz. But I don’t think that common sentiment is sufficient grounds for cutting biology out of intro psychology entirely; you simply wouldn’t be getting an accurate picture of our current understanding of the mind without knowing at least something about the way the brain operates.

Of course, the trouble is that the way that people like me feel about the brain-related parts of intro psych is exactly the way other people feel about the social parts of intro psych, or the developmental parts, or the clown parts, and so on. Cut social psych out of intro psych so that you can focus on deep methodological issues in studying the brain, and you may well create students more likely to go on to a career in cognitive neuroscience. But you’re probably reducing the number of students who’ll end up going into social psychology. More generally, you’re turning Intro Psychology into Intro to Cognitive Neuroscience, which sort of defeats the point of it being an introductory course in the first place; after all, they’re called survey courses for a reason!

In an ideal world, we wouldn’t have to make these choices; we’d just make sure that all of our intro courses were always engaging and educational and promoted a deeper understanding of how to do science. But in the real world, it’s rarely possible to pull that off, and we’re typically forced to make trade-offs. You could probably promote student interest in psychology pretty easily by showing videos of agnosic patients all day long, but you’d be sacrificing breadth and depth of understanding. Conversely, you could maximize the amount of knowledge students retain from a class by hitting them over the head with information and testing them in every class, but then you shouldn’t be surprised if some students find the course unpleasant and lose interest in the subject matter. The balance between the depth, breadth, and entertainment value of introductory science classes is a delicate one, but it’s one that’s essential to consider before we can fairly evaluate different proposals as to how such classes ought to be structured.

the parable of zoltan and his twelve sheep, or why a little skepticism goes a long way

What follows is a fictional piece about sheep and statistics. I wrote it about two years ago, intending it to serve as a preface to an article on the dangers of inadvertent data fudging. But then I decided that no journal editor in his or her right mind would accept an article that started out talking about thinking sheep. And anyway, the rest of the article wasn’t very good. So instead, I post this parable here for your ovine amusement. There’s a moral to the story, but I’m too lazy to write about it at the moment.

A shepherd named Zoltan lived in a small village in the foothills of the Carpathian Mountains. He tended to a flock of twelve sheep: Soffia, Krystyna, Anastasia, Orsolya, Marianna, Zigana, Julinka, Rozalia, Zsa Zsa, Franciska, Erzsebet, and Agi. Zoltan was a keen observer of animal nature, and would often point out the idiosyncracies of his sheep’s behavior to other shepherds whenever they got together.

“Anastasia and Orsolya are BFFs. Whatever one does, the other one does too. If Anastasia starts licking her face, Orsolya will too; if Orsolya starts bleating, Anastasia will start harmonizing along with her.“

“Julinka has a limp in her left leg that makes her ornery. She doesn’t want your pity, only your delicious clovers.“

“Agi is stubborn but logical. You know that old saying, spare the rod and spoil the sheep? Well, it doesn’t work for Agi. You need calculus and rhetoric with Agi.“

Zoltan’s colleagues were so impressed by these insights that they began to encourage him to record his observations for posterity.

“Just think, Zoltan,“ young Gergely once confided. “If something bad happened to you, the world would lose all of your knowledge. You should write a book about sheep and give it to the rest of us. I hear you only need to know six or seven related things to publish a book.“

On such occasions, Zoltan would hem and haw solemnly, mumbling that he didn’t know enough to write a book, and that anyway, nothing he said was really very important. It was false modestly of course; in reality, he was deeply flattered, and very much concerned that his vast body of sheep knowledge would disappear along with him one day. So one day, Zoltan packed up his knapsack, asked Gergely to look after his sheep for the day, and went off to consult with the wise old woman who lived in the next village.

The old woman listened to Zoltan’s story with a good deal of interest, nodding sagely at all the right moments. When Zoltan was done, the old woman mulled her thoughts over for a while.

“If you want to be taken seriously, you must publish your findings in a peer-reviewed journal,” she said finally.

“What’s Pier Evew?” asked Zoltan.

“One moment,” said the old woman, disappearing into her bedroom. She returned clutching a dusty magazine. “Here,” she said, handing the magazine to Zoltan. “This is peer review.”

That night, after his sheep had gone to bed, Zoltan stayed up late poring over Vol. IV, Issue 5 of Domesticated Animal Behavior Quarterly. Since he couldn’t understand the figures in the magazine, he read it purely for the articles. By the time he put the magazine down and leaned over to turn off the light, the first glimmerings of an empirical research program had begun to dance around in his head. Just like fireflies, he thought. No, wait, those really were fireflies. He swatted them away.

“I like this“¦ science,” he mumbled to himself as he fell asleep.

In the morning, Zoltan went down to the local library to find a book or two about science. He checked out a volume entitled Principia Scientifica Buccolica—a masterful derivation from first principles of all of the most common research methods, with special applications to animal behavior. By lunchtime, Zoltan had covered t-tests, and by bedtime, he had mastered Mordenkainen’s correction for inestimable herds.

In the morning, Zoltan made his first real scientific decision.

“Today I’ll collect some pilot data,” he thought to himself, “and tomorrow I’ll apply for an R01.”

His first set of studies tested the provocative hypothesis that sheep communicate with one another by moving their ears back and forth in Morse code. Study 1 tested the idea observationally. Zoltan and two other raters (his younger cousins), both blind to the hypothesis, studied sheep in pairs, coding one sheep’s ear movements and the other sheep’s behavioral responses. Studies 2 through 4 manipulated the sheep’s behavior experimentally. In Study 2, Zoltan taped the sheep’s ears to their head; in Study 3, he covered their eyes with opaque goggles so that they couldn’t see each other’s ears moving. In Study 4, he split the twelve sheep into three groups of four in order to determine whether smaller groups might promote increased sociability.

That night, Zoltan minded the data. “It’s a lot like minding sheep,“ Zoltan explained to his cousin Griga the next day. “You need to always be vigilant, so that a significant result doesn’t get away from you.“

Zoltan had been vigilant, and the first 4 studies produced a number of significant results. In Study 1, Zoltan found that sheep appeared to coordinate ear twitches: if one sheep twitched an ear several times in a row, it was a safe bet that other sheep would start to do the same shortly thereafter (p < .01). There was, however, no coordination of licking, headbutting, stamping, or bleating behaviors, no matter how you sliced and diced it. “It’s a highly selective effect,“ Zoltan concluded happily. After all, when you thought about it, it made sense. If you were going to pick just one channel for sheep to communicate through, ear twitching was surely a good one. One could make a very good evolutionary argument that more obvious methods of communication (e.g., bleating loudly) would have been detected by humans long ago, and that would be no good at all for the sheep.

Studies 2 and 3 further supported Zoltan’s story. Study 2 demonstrated that when you taped sheep’s ears to their heads, they ceased to communicate entirely. You could put Rozalia and Erzsebet in adjacent enclosures and show Rozalia the Jack of Spades for three or four minutes at a time, and when you went to test Erzsebet, she still wouldn’t know the Jack of Spades from the Three of Diamonds. It was as if the sheep were blind! Except they weren’t blind, they were dumb. Zoltan knew; he had made them that way by taping their ears to their heads.

In Study 3, Zoltan found that when the sheep’s eyes were covered, they no longer coordinated ear twitching. Instead, they now coordinated their bleating—but only if you excluded bleats that were produced when the sheep’s heads were oriented downwards. “Fantastic,“ he thought. “When you cover their eyes, they can’t see each other’s ears any more. So they use a vocal channel. This, again, makes good adaptive sense: communication is too important to eliminate entirely just because your eyes happen to be covered. Much better to incur a small risk of being detected and make yourself known in other, less subtle, ways.“

But the real clincher was Study 4, which confirmed that ear twitching occurred at a higher rate in smaller groups than larger groups, and was particularly common in dyads of well-adjusted sheep (like Anastasia and Orsolya, and definitely not like Zsa Zsa and Marianna).

“Sheep are like everyday people,“ Zoltan told his sister on the phone. “They won’t say anything to your face in public, but get them one-on-one, and they won’t stop gossiping about each other.“

It was a compelling story, Zoltan conceded to himself. The only problem was the F test. The difference in twitch rates as a function of group size wasn’t quite statistically significant. Instead, it hovered around p = .07, which the textbooks told Zoltan meant that he was almost right. Almost right was the same thing as potentially wrong, which wasn’t good enough. So the next morning, Zoltan asked Gergely to lend him four sheep so he could increase his sample size.

“Absolutely not,“ said Gergely. “I don’t want your sheep filling my sheep’s heads with all of your crazy new ideas.“

“Look,“ said Zoltan. “If you lend me four sheep, I’ll let you drive my Cadillac down to the village on weekends after I get famous.“

“Deal,“ said Gergely.

So Zoltan borrowed the sheep. But it turned out that four sheep weren’t quite enough; after adding Gergely’s sheep to the sample, the effect only went from p < .07 to p < .06. So Zoltan cut a deal with his other neighbor, Yuri: four of Yuri’s sheep for two days, in return for three days with Zoltan’s new Lexus (once he bought it). That did the trick. Once Zoltan repeated the experiment with Yuri’s sheep, the p-value for Study 2 now came to .046, which the textbooks assured Zoltan meant he was going to be famous.

Data in hand, Zoltan spent the next two weeks writing up his very first journal article. He titled it “Baa baa baa, or not: Sheep communicate via non-verbal channels“—a decidedly modest title for the first empirical work to demonstrate that sheep are capable of sophisticated propositional thought. The article was published to widespread media attention and scientific acclaim, and Zoltan went on to have a productive few years in animal behavioral research, studying topics as interesting and varied as giraffe calisthenics and displays of affection in the common leech.

Much later, it turned out that no one was able to directly replicate his original findings with sheep (though some other researchers did manage to come up with conceptual replications). But that didn’t really matter to Zoltan, because by then he’d decided science was too demanding a career anyway; it was way more fun to lay under trees counting his sheep. Counting sheep, and occasionally, on Saturdays, driving down to the village in his new Lexus,  just to impress all the young cowgirls.

got R? get social science for R!

Drew Conway has a great list of 10 must-have R packages for social scientists. If you’re a social scientist (or really, any kind of scientist) who doesn’t use R, now is a great time to dive in and learn; there are tons of tutorials and guides out there (my favorite is Quick-R, which is incredibly useful incredibly often), and packages are available for just about any application you can think of. Best of all, R is completely free, and is available for just about every platform. Admittedly, there’s a fairly steep learning curve if you’re used to GUI-based packages like SPSS (R’s syntax can be pretty idiosyncratic), but it’s totally worth the time investment, and once you’re comfortable with R you’ll never look back.

Anyway, Drew’s list contains a number of packages I’ve found invaluable in my work, as well as several packages I haven’t used before and am pretty eager to try. I don’t have much to add to his excellent summaries, but I’ll gladly second the inclusion of ggplot2 (the easiest way in the world to make beautiful graphs?) and plyr and sqldf (great for sanitizing, organizing, and manipulating large data sets, which are often a source of frustration in R). Most of the other packages I haven’t had any reason to use personally, though a few seem really cool, and worth finding an excuse to play around with (e.g., Statnet and igraph).

Since Drew’s list focuses on packages useful to social scientists in general, I thought I’d mention a couple of others that I’ve found particularly useful for psychological applications. The most obvious one is William Revelle‘s awesome psych package, which contains tons of useful functions for descriptive statistics, data reduction, simulation, and psychometrics. It’s saved me me tons of time validating and scoring personality measures, though it probably isn’t quite as useful if you don’t deal with individual difference measures regularly. Other packages I’ve found useful are sem for structural equation modeling (which interfaces nicely with GraphViz to easily produce clean-looking path diagrams), genalg for genetic algorithms, MASS (mostly for sampling from multivariate distributions), reshape (similar functionality to plyr), and car, which contains a bunch of useful regression-related functions (e.g., for my dissertation, I needed to run SPSS-like repeated measures ANOVAs in R, which turns out to be a more difficult proposition than you’d imagine, but was handled by the Anova function in car). I’m sure there are others I’m forgetting, but those are the ones that I’ve relied on most heavily in recent work. No doubt there are tons of other packages out there that are handly for common psychology applications, so if there are any you use regularly, I’d love to hear about them in the comments!