deconstructing the turducken

This is fiction. Which means it’s entirely made up, and definitely not at all based on any real people or events.

 

Cornelius Kipling came over to our house for Thanksgiving. I didn’t invite him; I would never, ever invite him. He was guaranteed to show up slightly drunk and very belligerent, carrying a two-thirds empty bottle of cheap wine, which he’d then hand to us as if it had arrived unopened from some fancy French cellar.

Cornelius Kiping was never invited; he invited himself.

“Good to see you,” he said to me when we let him in. “Thanks for inviting me over. It’s very kind of you, seeing as how my other plans fell through at the last minute.”

“Hi Kip,” I said, knowing full well he’d never had any other plans.

“Ella,” Kip nodded in my wife’s general direction, taking care not to make direct eye contact. He’d learned from extended experience that once he made eye contact with people, it became much harder to ignore social cues.

“Cornelius,” she said, through a mouth as thin as a zipper.

“Just Kip is fine,” said Kip.

“Cornelius,” my wife repeated, louder this time.

“What are we having for dinner,” Kip asked, handing me a two-thirds empty  bottle of Zinfandel.

“Well,” said Ella, “I was going to make a turducken. But now that you’re here, I figure I should make something special. So we’re having frozen chicken nuggets and mashed potatoes.”

“We spare no expense!” I added cheerfully.

“Funny you should mention turducken,” Kip said, ignoring our jabs. “My new business plan is based on the turducken.”

“Oh really,” I said. “Do pray tell.”

I wasn’t surprised Kip had a new business plan. If anything, I was surprised he’d managed to get as far as exchanging pleasantries before launching into a graphic description of his latest scheme.

“Well,” he said, “it’s not really based on the turducken. The turducken is more of an analogy. To illustrate what it is that my new startup does.”

“And what is it that your new startup does,” Ella’s mouth asked, though the rest of her face very clearly did not care to hear the answer.

“We miniaturize data,” Kip said. He waved his hands in the air with a flourish and looked at us expectantly. It made me think back to something my wife had said about Kip after the first time she ever met him: He thinks he’s a magician, and he acts like he’s a magician, but none of his tricks ever work.

“Prithee, do continue,” I said.

“We take big datasets,” he said. “Large datasets. Enormous datasets. Doesn’t matter what kind of data. You give it to us, and we miniaturize it. We give you back a much smaller dataset. And then you carry on your work with your wonderfully shrunken new spreadsheet, which keeps only the important trends and throws out all of the unnecessary details.”

“Interesting,” I nodded. On a scale of one-to-Kipsanity, this one was a solid five. “And the turducken figures into this how?”

“Weeeeeell, imagine someone hands you a turducken and asks you to figure out what’s in it,” said Kip. “I grant that this may not happen to you very often, but it happens all the time in KipLand. So, you know there’s a bunch of birds in there, all stuffed into each other’s–well, you know–but you don’t know which birds. All you see is this giant deep-fried bird collage, and you want to disassemble it into a set of discrete, identifiable fowls. Now, you hear a lot about how to construct a turducken. But if you think about it, deconstructing a turducken is a much more interesting engineering problem. And that’s what my new venture is all about. We take a complicated mass of data and pick out all the key elements that went into it. Deconstructing the turducken.”

He did the little flourish with his hands again. Again, Ella’s words rang out in my head. None of his tricks ever work.

“That’s quite possibly the craziest thing I’ve ever heard,” I observed. “This whole turducken analogy isn’t working so well for me. I hope you haven’t put it in your promotional materials.”

Kip stared at me unpleasantly for a good ten or fifteen seconds.

“Actually, I take that back,” I said. “That conversation we had about the shinbones on Isaac Newton’s coat of arms that time I ran into you at the dry cleaner’s… that was an order of magnitude more ridiculous.”

Maybe it was a mean thing to say, but you have to understand: my friendship with Kip is built entirely on mutual abuse. And he who flinches first, loses.

“Whatever,” Kip said. He looked annoyed, which filled me with schadenfreude. It wasn’t often he got to experience the full range of emotions he routinely visited on others.

“I didn’t come here to talk about turducken,” he continued. “You brought up the turducken, not me. I just wanted to get your opinion on something…”

Again the hand flourish. Again the voice.

“I’m trying to figure out what to call my new startup,” he said. “Which do you like better: ‘Small data’ or ‘little data’? Neither has the ring of ‘big data‘, but I think both sound better than ‘Kipling Data Miniaturization Services’.”

“How about MiniData,” Ella offered. I noticed she was hitting the wine pretty hard, though we both knew it would do nothing to blunt the Kipling trauma.

“Or maybe NanoData,” I offered. “If you can make the data small enough. What level of compression are you aiming for?”

“Oh, sky’s the limit. Actually, that’s one of the unique features of my service. Most compression schemes have a fixed limit. Take a standard algorithm like bzip2. You compress text, you might get a file 10% of the size if you’re lucky. But binary data? You’ll be lucky if you shrink it by a factor of three. Now, with my NanoData compression service, you as the customer get to choose how much or how little you want. And you select the output format. You can hand me a terabyte of data and say, ‘Dr. Kipling, sir, I want you to distill this eight-dimensional MATLAB array down to a single Excel spreadsheet, no more than 10 rows by 10 columns.’ And that’s exactly what you’ll get.”

“And this miraculously distilled dataset that you give me… will it, by chance, have any passing resemblance to the original dataset I gave you?”

“Oh, sure, if you want it to,” said Kip. “But the fidelity service costs double.”

I resisted the overpowering urge to facepalm.

“Well, it’s certainly not the worst idea you’ve ever had,” I said diplomatically. “But I have to say, I’m amazed you keep launching new startups. A lesser man would have given up ten or twelve bankruptcies ago.”

“I guess I just have an uncanny sense for ideas ten years ahead of their time,” Kip smiled.

“Ten years ahead of anyone’s time,” Ella muttered.

“Right,” I said. “You’re a visionary. You have… the visions. Hey, what happened to that deli you were going to open? The one that was going to sell premium hay sandwiches? I thought that one was going to make it for sure.”

“Terrible shame. Turns out it’s very difficult to get sandwich-grade hay in Colorado. So, you know, it didn’t pan out. Very sad; I even had a name picked out: Hay Day Sandwiches. Get it?”

I didn’t really get it, but still nodded in mock sympathy.

“Anyway, since you brought up my new startup,” Kip said, oblivious to the death rays radiating towards him from Ella’s head, “let me take this opportunity to give the both of you the opportunity of your lifetime. I like you guys, so I’m going to cut you in as my very first angel investors. All I’m asking…”

And here he paused, looking at us. I knew what he was doing; he was trying to gauge our level of displeasure with him so he could pick a number that was sufficiently high, but not completely ridiculous.

“…is fifteen thousand,” he finished “You get 5% of equity, and I’ll even throw in some nice swag. I’m having mugs and frisbees printed up as we speak.”

Around this time, Ella put her head down on her arms; she may or may not have been softly sobbing, I couldn’t really tell.

“That’s quite an offer, Kip,” I said. “And I’m really glad you like me enough to make it. It’s not like I’ve ever bought into your ideas before, but then, the thing I like best about you is how you never take repeated failure for an answer. Unfortunately, I just don’t have fifteen thousand right now. I just spent my last fifteen thousand souping up an old John Deer lawnmower so I can drive around the bike path blaring Ridin’ Dirty from three hundred watt speakers while glowing pink neon lights presage my arrival by five hundred feet. You should see it, it’s beautiful. But I swear, if I hadn’t done that, I’d be ready to sign on the dotted line right now.”

“That’s quite alright,” Kip said. “No harm, no foul. Your loss, my gain. It’s probably crazy of me to give up that much equity for so little anyway; this idea is going to make millions. No. Billions.”

He paused just long enough for some of the delusion to drip off; then I watched in real time as yet another unwise idea corkscrewed through his ear and crawled into his brain.

“Hey,” he said. “I’ve never thought of pimping out a John Deer lawnmower, but that’s a pretty good idea too. You sound like you have some experience with this now; want to go fifty-fifty on a startup? I’ll provide the salesmanship and take advantage of my many business contacts. You provide the technical knowledge. Ella, you can get in on this too; we’ll throw in a free turducken with every purchase.”

This time I definitely heard my wife sobbing, and just like that, it was time for Cornelius Kipling to leave.

a human and a monkey walk into an fMRI scanner…

Tor Wager and I have a “news and views” piece in Nature Methods this week; we discuss a paper by Mantini and colleagues (in the same issue) introducing a new method for identifying functional brain homologies across different species–essentially, identifying brain regions in humans and monkeys that seem to do roughly the same thing even if they’re not located in the same place anatomically. Mantini et al make some fairly strong claims about what their approach tells us about the evolution of the human brain (namely, that some cortical regions have undergone expansion relative to monkeys, while others have adapted substantively new functions). For reasons we articulate in our commentary, I’m personally not so convinced by the substantive conclusions, but I do think the core idea underlying the method is a very clever and potentially useful one:

Their technique, interspecies activity correlation (ISAC), uses functional magnetic resonance imaging (fMRI) to identify brain regions in which humans and monkeys exposed to the same dynamic stimulus—a 30-minute clip from the movie The Good, the Bad and the Ugly—show correlated patterns of activity (Fig. 1). The premise is that homologous regions should have similar patterns of activity across species. For example, a brain region sensitive to a particular configuration of features, including visual motion, hands, faces, object and others, should show a similar time course of activity in both species—even if its anatomical location differs across species and even if the precise features that drive the area’s neurons have not yet been specified.

Mo Costandi has more on the paper in an excellent Guardian piece (and I’m not just saying that because he quoted me a few times). All in all, I think it’s a very exciting method, and it’ll be interesting to see how it’s applied in future studies. I think there’s a fairly broad class of potential applications based loosely around the same idea of searching for correlated patterns. It’s an idea that’s already been used by Uri Hasson (an author on the Mantini et al paper) and others fairly widely in the fMRI literature to identify functional correspondences across different subjects; but you can easily imagine conceptually similar applications in other fields too–e.g., correlating gene expression profiles across species in order to identify structural homologies (actually, one could probably try this out pretty easily using the mouse and human data available in the Allen Brain Atlas).

ResearchBlogging.orgMantini D, Hasson U, Betti V, Perrucci MG, Romani GL, Corbetta M, Orban GA, & Vanduffel W (2012). Interspecies activity correlations reveal functional correspondence between monkey and human brain areas. Nature methods PMID: 22306809

Wager, T., & Yarkoni, T. (2012). Establishing homology between monkey and human brains Nature Methods DOI: 10.1038/nmeth.1869

no free lunch in statistics

Simon and Tibshirani recently posted a short comment on the Reshef et al MIC data mining paper I blogged about a while back:

The proposal of Reshef et. al. (“MIC“) is an interesting new approach for discovering non-linear dependencies among pairs of measurements in exploratory data mining. However, it has a potentially serious drawback. The authors laud the fact that MIC has no preference for some alternatives over others, but as the authors know, there is no free lunch in Statistics: tests which strive to have high power against all alternatives can have low power in many important situations.

They then report some simulation results clearly demonstrating that MIC is (very) underpowered relative to Pearson correlation in most situations, and performs even worse relative to Székely & Rizzo’s distance correlation (which I hadn’t heard about, but will have to look into now). I mentioned low power as a potential concern in my own post, but figured it would be an issue under relatively specific circumstances (i.e., only for certain kinds of associations in relatively small samples). Simon & Tibshirani’s simulations pretty clearly demonstrate that isn’t so. Which, needless to say, rather dampens the enthusiasm for the MIC statistic.