in which I apologize for my laziness, but not really

I got back from the Cognitive Neuroscience Society meeting last week. I was planning to write a post-CNS wrap-up thing like I did last year and the year before that, but I seem to have misplaced the energy that’s supposed to fuel such an exercise. So instead I’ll just say I had a great time and leave it at that. What happens in Chicago stays in Chicago, etc. etc.

Also, I really appreciate all the people who came up to me at CNS and said nice things about this blog–it’s nice to know that someone actually reads this (puzzling, mind you, because I’m not sure why anyone reads this, but nice nonetheless). A couple of people encouraged me to blog more often, so I’m making an effort to do that, though the most likely outcome will be miserable failure. Either that or I’ll just start pasting random YouTube videos in this space. Like this one:

p.s. on re-reading that, it kind of make it sound like I was swarmed by adoring fans at CNS. To clarify: “all the people” means, like, four people, and the “nice things” were really more like lukewarm “oh yeah, your blog’s not totally awful” sentiments.

p.p.s. I’ve noticed that a lot of my shorter posts take the form of “I was going to write about X, but I’m not actually going to write about X.” I think this is because I’m very lazy but still want partial credit for having good intentions. Which is kind of ridiculous.

on writing: some anecdotal observations, in no particular order

  • Early on in graduate school, I invested in the book “How to Write a Lot“. I enjoyed reading it–mostly because I (mistakenly) enjoyed thinking to myself, “hey, I bet as soon as I finish this book, I’m going to start being super productive!” But I can save you the $9 and tell you there’s really only one take-home point: schedule writing like any other activity, and stick to your schedule no matter what. Though, having said that, I don’t really do that myself. I find I tend to write about 20 hours a week on average. On a very good day, I manage to get a couple of thousand words written, but much more often, I get 200 words written that I then proceed to rewrite furiously and finally trash in frustration. But it all adds up in the long run I guess.
  • Some people are good at writing one thing at a time; they can sit down for a week and crank out a solid draft of a paper without every looking sideways at another project. Personally, unless I have a looming deadline (and I mean a real deadline–more on that below), I find that impossible to do; my general tendency is to work on one writing project for an hour or two, and then switch to something else. Otherwise I pretty much lose my mind. I also find it helps to reward myself–i.e., I’ll work on something I really don’t want to do for an hour, and then play video games for a while switch to writing something more pleasant.
  • I can rarely get any ‘real’ writing (i.e., stuff that leads to publications) done after around 6 pm; late mornings (i.e., right after I wake up) are usually my most productive writing time. And I generally only write for fun (blogging, writing fiction, etc.) after 9 pm. There are exceptions, but by and large that’s my system.
  • I don’t write many drafts. I don’t mean that I never revise papers, because I do–obsessively. But I don’t sit down thinking “I’m going to write a very rough draft, and then I’ll go back and clean up the language.” I sit down thinking “I’m going to write a perfect paper the first time around,” and then I very slowly crank out a draft that’s remarkably far from being perfect. I suspect the former approach is actually the more efficient one, but I can’t bring myself to do it. I hate seeing malformed sentences on the page, even if I know I’m only going to delete them later. It always amazes and impresses me when I get Word documents from collaborators with titles like “AmazingNatureSubmissionVersion18”. I just give my documents all the title “paper_draft”. There might be a V2 or a V3, but there will never, ever be a V18.
  • Papers are not meant to be written linearly. I don’t know anyone who starts with the Introduction, then does the Methods and Results, and then finishes with the Discussion. Personally I don’t even write papers one section at a time. I usually start out by frantically writing down ideas as they pop into my head, and jumping around the document as I think of other things I want to say. I frequently write half a sentence down and then finish it with a bunch of question marks (like so: ???) to indicate I need to come back later and patch it up. Incidentally, this is also why I’m terrified to ever show anyone any of my unfinished paper drafts: an unsuspecting reader would surely come away thinking I suffer from a serious thought disorder. (I suppose they might be right.)
  • Okay, that last point is not entirely true. I don’t write papers completely haphazardly; I do tend to write Methods and Results before Intro and Discussion. I gather that this is a pretty common approach. On the rare occasions when I’ve started writing the Introduction first, I’ve invariably ended up having to completely rewrite it, because it usually turns out the results aren’t actually what I thought they were.
  • My sense is that most academics get more comfortable writing as time goes on. Relatively few grad students have the perseverance to rapidly crank out publication-worthy papers from day 1 (I was definitely not one of them). I don’t think this is just a matter of practice; I suspect part of it is a natural maturation process. People generally get more conscientious as they age; it stands to reason that writing (as an activity most people find unpleasant) should get easier too. I’m better at motivating myself to write papers now, but I’m also much better about doing the dishes and laundry–and I’m pretty sure that’s not because practice makes dishwashing perfect.
  • When I started grad school, I was pretty sure I’d never publish anything, let alone graduate, because I’d never handed in a paper as an undergraduate that wasn’t written at the last minute, whereas in academia, there are virtually no hard deadlines (see below). I’m not sure exactly what changed. I’m still continually surprised every time something I wrote gets published. And I often catch myself telling myself, “hey, self, how the hell did you ever manage to pay attention long enough to write 5,000 words?” And then I reply to myself, “well, self, since you ask, I took a lot of stimulants.”
  • I pace around a lot when I write. A lot. To the point where my labmates–who are all uncommonly nice people–start shooting death glares my way. It’s a heritable tendency, I guess (the pacing, not the death glare attraction); my father also used to pace obsessively. I’m not sure what the biological explanation for it is. My best guess is it’s an arousal-mediated effect: I can think pretty well when I’m around other people, or when I’m in motion, but if I’m sitting at a desk and I don’t already know exactly what I want to say, I can’t get anything done. I generally pace around the lab or house for a while figuring out what I want to say, and then I sit down and write until I’ve forgotten what I want to say, or decide I didn’t really want to say that after all. In practice this usually works out to 10 minutes of pacing for every 5 minutes of writing. I envy people who can just sit down and calmly write for two or three hours without interruption (though I don’t think there are that many of them). At the same time, I’m pretty sure I burn a lot of calories this way.
  • I’ve been pleasantly surprised to discover that I much prefer writing grant proposals to writing papers–to the point where I actually enjoy writing grant proposals. I suspect the main reason for this is that grant proposals have a kind of openness that papers don’t; with a paper, you’re constrained to telling the story the data actually support, whereas a grant proposal is as good as your vision of what’s possible (okay, and plausible). A second part of it is probably the novelty of discovery: once you conduct your analyses, all that’s left is to tell other people what you found, which (to me) isn’t so exciting. I mean, I already think I know what’s going on; what do I care if you know? Whereas when writing a grant, a big part of the appeal for me is that I could actually go out and discover new stuff–just as long as I can convince someone to give me some money first.
  • At a a departmental seminar attended by about 30 people, I once heard a student express concern about an in-progress review article that he and several of the other people at the seminar were collaboratively working on. The concern was that if all of the collaborators couldn’t agree on what was going to go in the paper (and they didn’t seem to be able to at that point), the paper wouldn’t get written in time to make the rapidly approaching deadline dictated by the journal editor. A senior and very brilliant professor responded to the student’s concern by pointing out that this couldn’t possibly be a real problem seeing as in reality there is actually no such thing as a hard writing deadline. This observation didn’t go over so well with some of the other senior professors, who weren’t thrilled that their students were being handed the key to the kingdom of academic procrastination so early in their careers. But it was true, of course: with the major exception of grant proposals (EDIT: and as Garrett points out in the comments below, conference publications in disciplines like Computer Science), most of the things academics write (journal articles, reviews, commentaries, book chapters, etc.) operate on a very flexible schedule. Usually when someone asks you to write something for them, there is some vague mention somewhere of some theoretical deadline, which is typically a date that seems so amazingly far off into the future that you wonder if you’ll even be the same person when it rolls around. And then, much to your surprise, the deadline rolls around and you realize that you must in fact really bea different person, because you don’t seem to have any real desire to work on this thing you signed up for, and instead of writing it, why don’t you just ask the editor for an extension while you go rustle up some motivation. So you send a polite email, and the editor grudgingly says, “well, hmm, okay, you can have another two weeks,” to which you smile and nod sagely, and then, two weeks later, you send another similarly worded but even more obsequious email that starts with the words “so, about that extension…”

    The basic point here is that there’s an interesting dilemma: even though there rarely are any strict writing deadlines, it’s to almost everyone’s benefit to pretend they exist. If I ever find out that the true deadline (insofar as such a thing exists) for the chapter I’m working on right now is 6 months from now and not 3 months ago (which is what they told me), I’ll probably relax and stop working on it for, say, the next 5 and a half months. I sometimes think that the most productive academics are the ones who are just really really good at repeatedly lying to themselves.

  • I’m a big believer in structured procrastination when it comes to writing. I try to always have a really unpleasant but not-so-important task in the background, which then forces me to work on only-slightly-unpleasant-but-often-more-important tasks. Except it often turns out that the unpleasant-but-no-so-important task is actually an unpleasant-but-really-important task after all, and then I wake up in a cold sweat in the middle of the night thinking of all the ways I’ve screwed myself over. No, just kidding. I just bitch about it to my wife for a while and then drown my sorrows in an extra helping of ice cream.
  • I’m really, really, bad at restarting projects I’ve put on the back burner for a while. Right now there are 3 or 4 papers I’ve been working on on-and-off for 3 or 4 years, and every time I pick them up, I write a couple of hundred words and then put them away for a couple of months. I guess what I’m saying is that if you ever have the misfortune of collaborating on a paper with me, you should make sure to nag me several times a week until I get so fed up with you I sit down and write the damn paper. Otherwise it may never see the light of day.
  • I like writing fiction in my spare time. I also occasionally write whiny songs. I’m pretty terrible at both of these things, but I enjoy them, and I’m told (though I don’t believe it for a second) that that’s the important thing.

bio-, chemo-, neuro-, eco-informatics… why no psycho-?

The latest issue of the APS Observer features a special section on methods. I contributed a piece discussing the need for a full-fledged discipline of psychoinformatics:

Scientific progress depends on our ability to harness and apply modern information technology. Many advances in the biological and social sciences now emerge directly from advances in the large-scale acquisition, management, and synthesis of scientific data. The application of information technology to science isn’t just a happy accident; it’s also a field in its own right — one commonly referred to as informatics. Prefix that term with a Greek root or two and you get other terms like bioinformatics, neuroinformatics, and ecoinformatics — all well-established fields responsible for many of the most exciting recent discoveries in their parent disciplines.

Curiously, following the same convention also gives us a field called psychoinformatics — which, if you believe Google, doesn’t exist at all (a search for the term returns only 500 hits as of this writing; Figure 1). The discrepancy is surprising, because labels aside, it’s clear that psychological scientists are already harnessing information technology in powerful and creative ways — often reshaping the very way we collect, organize, and synthesize our data.

Here’s the picture that’s worth, oh, at least ten or fifteen words:

Figure 1. Number of Google search hits for informatics-related terms, by prefix.

You can read the rest of the piece here if you’re so inclined. Check out some of the other articles too; I particularly like Denny Borsboom’s piece on network analysis. EDIT: and Anna Mikulak’s piece on optogenetics! I forgot the piece on optogenetics! How can you not love optogenetics!

deconstructing the turducken

This is fiction. Which means it’s entirely made up, and definitely not at all based on any real people or events.

 

Cornelius Kipling came over to our house for Thanksgiving. I didn’t invite him; I would never, ever invite him. He was guaranteed to show up slightly drunk and very belligerent, carrying a two-thirds empty bottle of cheap wine, which he’d then hand to us as if it had arrived unopened from some fancy French cellar.

Cornelius Kiping was never invited; he invited himself.

“Good to see you,” he said to me when we let him in. “Thanks for inviting me over. It’s very kind of you, seeing as how my other plans fell through at the last minute.”

“Hi Kip,” I said, knowing full well he’d never had any other plans.

“Ella,” Kip nodded in my wife’s general direction, taking care not to make direct eye contact. He’d learned from extended experience that once he made eye contact with people, it became much harder to ignore social cues.

“Cornelius,” she said, through a mouth as thin as a zipper.

“Just Kip is fine,” said Kip.

“Cornelius,” my wife repeated, louder this time.

“What are we having for dinner,” Kip asked, handing me a two-thirds empty  bottle of Zinfandel.

“Well,” said Ella, “I was going to make a turducken. But now that you’re here, I figure I should make something special. So we’re having frozen chicken nuggets and mashed potatoes.”

“We spare no expense!” I added cheerfully.

“Funny you should mention turducken,” Kip said, ignoring our jabs. “My new business plan is based on the turducken.”

“Oh really,” I said. “Do pray tell.”

I wasn’t surprised Kip had a new business plan. If anything, I was surprised he’d managed to get as far as exchanging pleasantries before launching into a graphic description of his latest scheme.

“Well,” he said, “it’s not really based on the turducken. The turducken is more of an analogy. To illustrate what it is that my new startup does.”

“And what is it that your new startup does,” Ella’s mouth asked, though the rest of her face very clearly did not care to hear the answer.

“We miniaturize data,” Kip said. He waved his hands in the air with a flourish and looked at us expectantly. It made me think back to something my wife had said about Kip after the first time she ever met him: He thinks he’s a magician, and he acts like he’s a magician, but none of his tricks ever work.

“Prithee, do continue,” I said.

“We take big datasets,” he said. “Large datasets. Enormous datasets. Doesn’t matter what kind of data. You give it to us, and we miniaturize it. We give you back a much smaller dataset. And then you carry on your work with your wonderfully shrunken new spreadsheet, which keeps only the important trends and throws out all of the unnecessary details.”

“Interesting,” I nodded. On a scale of one-to-Kipsanity, this one was a solid five. “And the turducken figures into this how?”

“Weeeeeell, imagine someone hands you a turducken and asks you to figure out what’s in it,” said Kip. “I grant that this may not happen to you very often, but it happens all the time in KipLand. So, you know there’s a bunch of birds in there, all stuffed into each other’s–well, you know–but you don’t know which birds. All you see is this giant deep-fried bird collage, and you want to disassemble it into a set of discrete, identifiable fowls. Now, you hear a lot about how to construct a turducken. But if you think about it, deconstructing a turducken is a much more interesting engineering problem. And that’s what my new venture is all about. We take a complicated mass of data and pick out all the key elements that went into it. Deconstructing the turducken.”

He did the little flourish with his hands again. Again, Ella’s words rang out in my head. None of his tricks ever work.

“That’s quite possibly the craziest thing I’ve ever heard,” I observed. “This whole turducken analogy isn’t working so well for me. I hope you haven’t put it in your promotional materials.”

Kip stared at me unpleasantly for a good ten or fifteen seconds.

“Actually, I take that back,” I said. “That conversation we had about the shinbones on Isaac Newton’s coat of arms that time I ran into you at the dry cleaner’s… that was an order of magnitude more ridiculous.”

Maybe it was a mean thing to say, but you have to understand: my friendship with Kip is built entirely on mutual abuse. And he who flinches first, loses.

“Whatever,” Kip said. He looked annoyed, which filled me with schadenfreude. It wasn’t often he got to experience the full range of emotions he routinely visited on others.

“I didn’t come here to talk about turducken,” he continued. “You brought up the turducken, not me. I just wanted to get your opinion on something…”

Again the hand flourish. Again the voice.

“I’m trying to figure out what to call my new startup,” he said. “Which do you like better: ‘Small data’ or ‘little data’? Neither has the ring of ‘big data‘, but I think both sound better than ‘Kipling Data Miniaturization Services’.”

“How about MiniData,” Ella offered. I noticed she was hitting the wine pretty hard, though we both knew it would do nothing to blunt the Kipling trauma.

“Or maybe NanoData,” I offered. “If you can make the data small enough. What level of compression are you aiming for?”

“Oh, sky’s the limit. Actually, that’s one of the unique features of my service. Most compression schemes have a fixed limit. Take a standard algorithm like bzip2. You compress text, you might get a file 10% of the size if you’re lucky. But binary data? You’ll be lucky if you shrink it by a factor of three. Now, with my NanoData compression service, you as the customer get to choose how much or how little you want. And you select the output format. You can hand me a terabyte of data and say, ‘Dr. Kipling, sir, I want you to distill this eight-dimensional MATLAB array down to a single Excel spreadsheet, no more than 10 rows by 10 columns.’ And that’s exactly what you’ll get.”

“And this miraculously distilled dataset that you give me… will it, by chance, have any passing resemblance to the original dataset I gave you?”

“Oh, sure, if you want it to,” said Kip. “But the fidelity service costs double.”

I resisted the overpowering urge to facepalm.

“Well, it’s certainly not the worst idea you’ve ever had,” I said diplomatically. “But I have to say, I’m amazed you keep launching new startups. A lesser man would have given up ten or twelve bankruptcies ago.”

“I guess I just have an uncanny sense for ideas ten years ahead of their time,” Kip smiled.

“Ten years ahead of anyone’s time,” Ella muttered.

“Right,” I said. “You’re a visionary. You have… the visions. Hey, what happened to that deli you were going to open? The one that was going to sell premium hay sandwiches? I thought that one was going to make it for sure.”

“Terrible shame. Turns out it’s very difficult to get sandwich-grade hay in Colorado. So, you know, it didn’t pan out. Very sad; I even had a name picked out: Hay Day Sandwiches. Get it?”

I didn’t really get it, but still nodded in mock sympathy.

“Anyway, since you brought up my new startup,” Kip said, oblivious to the death rays radiating towards him from Ella’s head, “let me take this opportunity to give the both of you the opportunity of your lifetime. I like you guys, so I’m going to cut you in as my very first angel investors. All I’m asking…”

And here he paused, looking at us. I knew what he was doing; he was trying to gauge our level of displeasure with him so he could pick a number that was sufficiently high, but not completely ridiculous.

“…is fifteen thousand,” he finished “You get 5% of equity, and I’ll even throw in some nice swag. I’m having mugs and frisbees printed up as we speak.”

Around this time, Ella put her head down on her arms; she may or may not have been softly sobbing, I couldn’t really tell.

“That’s quite an offer, Kip,” I said. “And I’m really glad you like me enough to make it. It’s not like I’ve ever bought into your ideas before, but then, the thing I like best about you is how you never take repeated failure for an answer. Unfortunately, I just don’t have fifteen thousand right now. I just spent my last fifteen thousand souping up an old John Deer lawnmower so I can drive around the bike path blaring Ridin’ Dirty from three hundred watt speakers while glowing pink neon lights presage my arrival by five hundred feet. You should see it, it’s beautiful. But I swear, if I hadn’t done that, I’d be ready to sign on the dotted line right now.”

“That’s quite alright,” Kip said. “No harm, no foul. Your loss, my gain. It’s probably crazy of me to give up that much equity for so little anyway; this idea is going to make millions. No. Billions.”

He paused just long enough for some of the delusion to drip off; then I watched in real time as yet another unwise idea corkscrewed through his ear and crawled into his brain.

“Hey,” he said. “I’ve never thought of pimping out a John Deer lawnmower, but that’s a pretty good idea too. You sound like you have some experience with this now; want to go fifty-fifty on a startup? I’ll provide the salesmanship and take advantage of my many business contacts. You provide the technical knowledge. Ella, you can get in on this too; we’ll throw in a free turducken with every purchase.”

This time I definitely heard my wife sobbing, and just like that, it was time for Cornelius Kipling to leave.

a human and a monkey walk into an fMRI scanner…

Tor Wager and I have a “news and views” piece in Nature Methods this week; we discuss a paper by Mantini and colleagues (in the same issue) introducing a new method for identifying functional brain homologies across different species–essentially, identifying brain regions in humans and monkeys that seem to do roughly the same thing even if they’re not located in the same place anatomically. Mantini et al make some fairly strong claims about what their approach tells us about the evolution of the human brain (namely, that some cortical regions have undergone expansion relative to monkeys, while others have adapted substantively new functions). For reasons we articulate in our commentary, I’m personally not so convinced by the substantive conclusions, but I do think the core idea underlying the method is a very clever and potentially useful one:

Their technique, interspecies activity correlation (ISAC), uses functional magnetic resonance imaging (fMRI) to identify brain regions in which humans and monkeys exposed to the same dynamic stimulus—a 30-minute clip from the movie The Good, the Bad and the Ugly—show correlated patterns of activity (Fig. 1). The premise is that homologous regions should have similar patterns of activity across species. For example, a brain region sensitive to a particular configuration of features, including visual motion, hands, faces, object and others, should show a similar time course of activity in both species—even if its anatomical location differs across species and even if the precise features that drive the area’s neurons have not yet been specified.

Mo Costandi has more on the paper in an excellent Guardian piece (and I’m not just saying that because he quoted me a few times). All in all, I think it’s a very exciting method, and it’ll be interesting to see how it’s applied in future studies. I think there’s a fairly broad class of potential applications based loosely around the same idea of searching for correlated patterns. It’s an idea that’s already been used by Uri Hasson (an author on the Mantini et al paper) and others fairly widely in the fMRI literature to identify functional correspondences across different subjects; but you can easily imagine conceptually similar applications in other fields too–e.g., correlating gene expression profiles across species in order to identify structural homologies (actually, one could probably try this out pretty easily using the mouse and human data available in the Allen Brain Atlas).

ResearchBlogging.orgMantini D, Hasson U, Betti V, Perrucci MG, Romani GL, Corbetta M, Orban GA, & Vanduffel W (2012). Interspecies activity correlations reveal functional correspondence between monkey and human brain areas. Nature methods PMID: 22306809

Wager, T., & Yarkoni, T. (2012). Establishing homology between monkey and human brains Nature Methods DOI: 10.1038/nmeth.1869

no free lunch in statistics

Simon and Tibshirani recently posted a short comment on the Reshef et al MIC data mining paper I blogged about a while back:

The proposal of Reshef et. al. (“MIC“) is an interesting new approach for discovering non-linear dependencies among pairs of measurements in exploratory data mining. However, it has a potentially serious drawback. The authors laud the fact that MIC has no preference for some alternatives over others, but as the authors know, there is no free lunch in Statistics: tests which strive to have high power against all alternatives can have low power in many important situations.

They then report some simulation results clearly demonstrating that MIC is (very) underpowered relative to Pearson correlation in most situations, and performs even worse relative to Székely & Rizzo’s distance correlation (which I hadn’t heard about, but will have to look into now). I mentioned low power as a potential concern in my own post, but figured it would be an issue under relatively specific circumstances (i.e., only for certain kinds of associations in relatively small samples). Simon & Tibshirani’s simulations pretty clearly demonstrate that isn’t so. Which, needless to say, rather dampens the enthusiasm for the MIC statistic.

the neuroinformatics of Neopets

In the process of writing a short piece for the APS Observer, I was fiddling around with Google Correlate earlier this evening. It’s a very neat toy, but if you think neuroimaging or genetics have a big multiple comparisons problem, playing with Google Correlate for a few minutes will put things in perspective. Here’s a line graph displaying the search term most strongly correlated (over time) with searches for “neuroinformatics”:

That’s right, the search term that covaries most strongly with “neuroinformatics” is none other than “Illinois film office” (which, to be fair, has a pretty appealing website). Other top matches include “wma support”, “sim codes”, “bed-in-a-bag”, “neopets secret”, “neopets guild”, and “neopets secret avatars”.

I may not have learned much about neuroinformatics from this exercise, but I did get a pretty good sense of how neuroinformaticians like to spend their free time…

 

p.s. I was pretty surprised to find that normalized search volume for just about every informatics-related term has fallen sharply in the last 10 years. I went in expecting the opposite! Maybe all the informaticians were early search adopters, and the rest of the world caught up? No, probably not. Anyway, enough of this; Neopia is calling me!

p.p.s. Seriously though, this is why data fishing expeditions are dangerous. Any one of these correlations is significant at p-less-than-point-whatever-you-like. And if your publication record depended on it, you could probably tell yourself a convincing story about why neuroinformaticians need to look up Garmin eMaps…

Attention publishers: the data in your tables want to be free! Free!

The Neurosynth database is getting an upgrade over the next couple of weeks; it’s going to go from 4,393 neuroimaging studies to around 5,800. Unfortunately, updating the database is kind of a pain, because academic publishers like to change the format of their full-text HTML articles, which has a nasty habit of breaking the publisher-specific HTML parsers I’ve written. When you expect ScienceDirect to give you <table cellspacing=10>, but you get <table> with no cellspacing attribute (the horror!), bad things happen in XPath land. And then those bad things need to be repaired. And I hate repairing stuff! So I don’t do it very often. Like, once every 6 to 9 months.

In an ideal world, there would be no need to write (and fix) custom filters for different publishers, because the publishers would all simultaneously make XML representations of their articles available (in addition to HTML, PDF, etc.), and then people who have legitimate data mining reasons for regularly downloading hundreds of articles at a time wouldn’t have to cry themselves to sleep every night. But as it stands, only one major publisher of neuroimaging articles (PLoS) provides XML versions of all articles. A minority of articles from other publishers are available in XML from BioMed Central, but that’s still just a fraction of the existing literature.

Anyway, the HTML thing is annoying, but it’s possible to work around it. What’s much more problematic is that some publishers lock up the data in the tables of their articles. To make Neurosynth work, I have to be able to identify rows in tables that look like brain activations. That is, things that look roughly like this:

Most publishers are nice enough to format article tables as HTML tables; which is to say, I can look for tags like <table> and then work down the XPath tree to identify all the the rows, and then scan each rows for values that look activation-like. Then those values go into the database, and poof, next thing you know, you have meta-analytic brain activation maps from hundreds of studies. But some publishers–most notably, Frontiers–throw a wrench in the works by failing to format tables in HTML; instead, they present the tables as images (see for instance this JPEG table, pulled from this article). Which means I can’t really extract any data from them, and as a result, you’re not going to see activations from articles published in Frontiers journals in Neurosynth any time soon. So if you publish fMRI articles in Frontiers in Human Neuroscience regularly, and are wondering why I’ve been ignoring you (I like you! I promise!), now you know.

Anyway, on the remote chance that anyone reading this has any sway with people high up at Frontiers, could you please ask them to release their data? Pretty please? Lack of access to data in tables seems to be a pretty common complaint in the data mining community; I’ve talked to other people in the neuroinformatics world who’ve also expressed frustration about it, and I imagine the same is true of people in other disciplines. It’s particularly surprising given that Frontiers is, in theory, an open access publisher. I can see the data in your tables, Frontiers; why won’t you also let me read it?

Okay, I know this kind of stuff doesn’t really interest anyone; I’m just venting. The main point is, Neurosynth is going to be bigger and (very slightly) better in the near future.

in which Discover Card decides that my wife is also my daughter

Ever since I opted out of receiving preapproved credit card offers, I’ve stopped getting credit card spam in the mail (yay!). But companies I have an existing relationship with still have the right to send me various offers and updates, and there’s nothing I can do about that (except throw said offers in the trash after inspecting them and deciding that, no, I do not want to purchase the premium yacht travel insurance policy that comes with a bonus free set of matching lawn gnomes and a voucher for a buy-one-get-one-free meal at the Olive Garden). Discover Card is one of these companies, and the clever devils regularly take advantage of my amicable nature by sending me all kinds of wonderful offers. Take for instance the one I received yesterday, which starts like this:

Dear Tal,

You’ve worked for years to provide a better life for your children and prepare them for a successful future. Now that they’re in college, the overwhelming cost of higher education shouldn’t stand in the way of their success. We’re ready to help.

This is undoubtedly a very generous offer, but it comes at an inconvenient time for me, because, as it so happens, I don’t have any children right now–let alone college-aged children who need their father to front them some money. Somewhere, somehow, it seems Discover Card took a left turn at Albuquerque, when all along they were trying to get to Pismo Beach:

http://www.youtube.com/watch?v=v-s-_ME8Qns#t=1m24s

Of course, this isn’t a case of human error; I very much doubt that an overworked analyst is putting in long nights at Discover combing through random customers’ accounts looking for purchases diagnostic of college attendance (you know, like Ritalin receipts). The blame almost certainly rests with an over-inclusive algorithm that combed through my purchase history and automagically decided that I fit the profile of a middle-aged man who’s worked hard for years to provide a better life for his children. (I suppose I can take solace in the fact that while Discover probably knows what brand of toothpaste I like, it must not know my age, given that there aren’t many 31-year-old men with college-aged children.)

Anyway, I spent some time pondering what purchases I’ve made that could have tripped up Discover’s parental alarm system. And after scanning several months of statements, I’m proud to report it almost certainly has something to do with the giant monthly rent charge from “CU Residence Halls” (my wife and I live in on-campus housing). Either that or the many book-and-coffee-related charges from places with names like “University of Colorado Bookstore” and “Pretentious Coffeehouse on CU Campus”.

So that’s easy enough, right? It’s the on-campus purchases, stupid! Ah, but wait! That’s only one part of the mystery! The other, perhaps more interesting, part is this: who exactly does Discover think my college-aged child is, seeing as they clearly think I’m not the one caffeinating myself at the altar of higher education? Well, after thinking about that for a while, another clear answer emerges: it’s my wife! Discover thinks I have a college-aged daughter who also happens to be my wife! There’s no other explanation; to my knowledge, I don’t live with anyone else besides my wife (though, admittedly, I don’t check the storage closet very often).

Now, setting aside the fact that such a thing would be illegal in all fifty states, my wife and I are not very amused by this. We’re mildly amused, but we’re not very amused. But we’re refraining from making too big a fuss about it, because we’re still hoping we can get our hands on some of those sweet, sweet college loans.

In the interim, here are some questions I find myself pondering:

  • Who writes the logic that does this kind of thing? I’m not asking for names; no need to rat out your best friend who works in Discover’s data mining department. I’m just curious to know what kind of background the people who come up with these things have. Artificial intelligence? Marketing research? Dental surgery?
  • How sophisticated are the rules used to screen customers for these mailings? Is there some serious business logic operating behind the scenes that happened to go wrong here, or is a well-meaning Discover employee just running SQL queries like “SELECT name, address FROM members WHERE description LIKE ‘%residence hall%'” on their lunch break?
  • Do credit card companies that do this kind of thing (which I imagine is pretty much all of them) actually validate their logic against test datasets (in this case, a large group of Discover members whose parental status has been independently verified), or do they just pick some criteria that seem to make sense and immediately start blanketing the United States with flyers?
  • What proportion of false positives is considered reasonable? Clearly, with any kind of program like this, some small number of customers is almost invariably going to get a letter that makes some very bad lifestyle assumptions. At what point does the risk of a backlash start to outweigh the potential for increased revenue? Obviously, the vast majority of people are probably going to chalk this type of thing down to a harmless error, but I imagine some small proportion of people are going to get upset and call up Discover to rant and rave about how they don’t have any children at all, and how dare Discover mine their records like this, and doesn’t Discover have any respect for them as loyal long-standing cardholders, and what’s that, why yes, of course, they’d be quite happy to accept Discover’s apology for this tragic error if it came with a two-for-one gift certificate to the Olive Garden.
  • Most importantly: is it considered fraud if I knowingly fill out an application for student loans in my lovely wife-daughter’s name?

what Ben Parker wants you to know about neuroimaging

I have a short opinion piece in the latest issue of The European Health Psychologist that discusses some of the caveats and limits of functional MRI. It’s a short and (I think) pretty readable piece; I touch on a couple of issues I’ve discussed frequently in other papers as well as here on the blog–namely, the relatively low power of most fMRI analyses and the difficulties inherent in drawing causal inferences from neuroimaging results.

More importantly, though, I’ve finally fulfilled my long held goal of sneaking a Spiderman reference into an academic article (though, granted, one that wasn’t peer-reviewed). It would be going too far to say I can die happy now, but at least I can have an extra large serving of ice cream for dessert tonight without feeling guilty*. And no, I’m not going to spoil the surprise by revealing what Spidey has to do with fMRI. Though I will say that if you actually fall for the hook and go read the article just to find that out, you’re likely to be sorely disappointed.

 

* So okay, the truth is, I never, ever feel guilty for eating ice cream, no matter the serving size.