estimating the influence of a tweet–now with 33% more causal inference!

Twitter is kind of a big deal. Not just out there in the world at large, but also in the research community, which loves the kind of structured metadata you can retrieve for every tweet. A lot of researchers rely heavily on twitter to model social networks, information propagation, persuasion, and all kinds of interesting things. For example, here’s the abstract of a nice recent paper on arXiv that aims to  predict successful memes using network and community structure:

We investigate the predictability of successful memes using their early spreading patterns in the underlying social networks. We propose and analyze a comprehensive set of features and develop an accurate model to predict future popularity of a meme given its early spreading patterns. Our paper provides the first comprehensive comparison of existing predictive frameworks. We categorize our features into three groups: influence of early adopters, community concentration, and characteristics of adoption time series. We find that features based on community structure are the most powerful predictors of future success. We also find that early popularity of a meme is not a good predictor of its future popularity, contrary to common belief. Our methods outperform other approaches, particularly in the task of detecting very popular or unpopular memes.

One limitation of much of this body of research is that the data are almost invariably observational. We can build sophisticated models that do a good job predicting some future outcome (like meme success), but we don’t necessarily know that the “important” features we identify carry any causal influence. In principle, they could be completely epiphenomenal–for example, in the study I linked to, maybe the community structure features are just a proxy for some other, causally important, factor (e.g., whether the content of a meme has sufficiently broad appeal to attract attention from many different kinds of people). From a predictive standpoint, this may not matter much; if your goal is just to passively predict whether a meme is going to be successful or not, it’s irrelevant whether or not the features you’re using are doing causal work. On the other hand, if you want to actively design memes in such a way as to maximize their spread, the ability to get a handle on causation starts to look pretty important.

How can we estimate the direct causal influence of a tweet on the downstream popularity of a meme? Here’s a simple and (I suspect) very feasible idea in two steps:

  1. Create a small web app that allows any existing Twitter user to register via Twitter authentication. On signing up, a user has to specify just one (optional) setting: the proportion of their intended retweets they’re willing to withhold. Let’s this the Withholding Fraction (WF).
  2. Every time (or at least some of the time) a registered user wants to retweet a particular tweet*, they do so via the new web app’s interface (which has permission to post to the user’s Twitter account) instead of whatever interface they’re currently using. The key is that the retweet isn’t just obediently passed along; instead, the target tweet is retweeted successfully with probability (1 – WF), and randomly suppressed from the user’s stream with probability (WF).

Doing this  would allow the community to very quickly (assuming rapid adoption, which seems reasonably likely) build up an enormous database of tweets that were targeted for retweeting by an active user, but randomly assigned to fail with some known probability. Researchers would then be able to directly quantify the causal impact of individual retweets on downstream popularity–and to estimate that influence conditional on all of the other standard variables, like the retweeter’s number of followers, the content of the tweet, etc. Of course, this still wouldn’t get us to true experimental manipulation of such features (i.e., we wouldn’t be manipulating users’ follower networks, just randomly omitting tweets from users with different followers), but it seems like a step in the right direction**.

I figure building a barebones app like this would take an experienced developer familiar with the Twitter OAuth API just a day or two. And I suspect many people (myself included!) would be happy to contribute to this kind of experiment, provided that all of the resulting data were made public. (I’m aware that there are all kinds of restrictions on sharing assembled Twitter datasets, but we’re not talking about sharing firehose dumps here, just a restricted set of retweets from users who’ve explicitly given their consent to have the data used in this way.)

Has this kind of thing already been done? If not, does anyone want to build it?

 

* It doesn’t just have to be retweets, of course; the same principle would work just as well for withholding a random fraction of original tweets. But I suspect not many users would be willing to randomly eliminate a proportion of their original content from the firehose.

** If we really wanted to get close to true random assignment, we could potentially inject selected tweets into random users streams based on selected criteria. But I’m not sure how many tweeps would consent to have entirely random retweets published in their name (I probably wouldn’t), so this probably isn’t viable.

then gravity let go

This is fiction.


My grandmother’s stroke destroyed most of Nuremberg and all of Wurzburg. She was sailing down the Danube on a boat when it happened. I won’t tell you who she was with and what they were doing at the time, because you’ll think less of her for it, and anyway it’s not relevant to the story. But she was in the boat, and she was alive and happy, and then the next thing you know, she was unhappy and barely breathing. They were so far out in the water that she would have been dead if the other person she was with had had to row all the way back. So a medical helicopter was sent out, and they strapped her to the sky with hooks and carried her to the hospital dangling sixty feet below a tangle of blades.

All of her life, my grandmother was afraid of heights. She never got on a plane; never even went up a high-rise viewing deck to see the city unfold below her like a tourist map. “No amount of money or gratitude you could give me is worth the vertigo that I’d get when I felt my life rushing away below me,” she told me once. She was very melodramatic, my grandmother. It figures that the one time her feet actually refused gravity long enough for it to count, she was out like a light. That her life started to rush away from her not in an airplane over the sea, as she’d always feared, but in a boat on the water. That it took a trip into the same sky she loathed so much just to keep her alive.

*          *          *

Bavaria occupies the southeast corner of Germany; by area, it makes up one-fifth of the country. It’s the largest state, and pretty densely populated, but for all that, I don’t remember there being very much to do there. As a child, we used to visit my grandmother in Nuremberg in the summers. I remember the front of her brown and white house, coated in green vines, gently hugging the street the way the houses do in Europe. In America, we place our homes a modest distance away from the road, safely detached in their own little fiefdoms. I’ll just be back here, doing my own thing, our houses say. You just keep walking along there, sir—and don’t try to look through my windows. When Columbus discovered all that land, what he was really discovering was the driveway.

When we visited my grandmother, I’d slam the car door shut, run up to the steps, and knock repeatedly until she answered. She’d open the door, look all around, and then, finally seeing me, ask, “Who is this? Who are you?” That was the joke when I was very young. Who Are You was the joke, and after I yelled “grandma, it’s me!” several times, she’d always suddenly remember me, and invite me in to feed me schnitzel. “Why didn’t you say it was you,” she’d say. “Are you trying to give an old lady a heart attack? Do you think that’s funny?”

After her stroke, Who Are You was no longer funny. The words had a different meaning, and when I said, “grandma, it’s me,” she’d look at me sadly, with no recognition, as if she was wondering what could have happened to her beloved Bavaria; how the world could have gotten so bad that every person who knocked on her door now was a scoundrel claiming to be her grandson, lying to an old lady just so he could get inside and steal all of her valuable belongings.

Not that she really had any. Those last few years of her life, the inside of her house changed, until it was all newspapers and gift wrap, wooden soldiers and plastic souvenir cups, spent batteries and change from other countries. She never threw anything away, but there was nothing in there you would have wanted except memories. And by the end, I couldn’t even find the memories for all of the junk. So I just stopped going. Eventually, all of the burglars stopped coming by.

*          *          *

When my grandfather got to the hospital, he was beside himself. He kept running from doctor to doctor, asking them all the same two questions:

“Who was she with,” he asked, “and what were they doing on that boat?”

The doctors all calmly told him the same thing: it’s not really relevant to her condition, and anyway, you’d think less of her. Just go sit in the waiting room. We’ll tell you when you can see her.

Inside the operating room, they weren’t so calm.

“She’s still hemorrhaging,” a doctor said over the din of scalpels and foam alcohol. They unfolded her cortex like a map, laid tangles of blood clot and old memories down to soak against fresh bandages. But there was no stopping the flood.

“We need to save Wurzburg,” said another doctor, tracing his cold finger through the cortical geography on the table. He moved delicately, as if folding and unfolding a series of very small, very fragile secrets; a surgical scalpel carefully traced a path through gyri and sulci, the hills and valleys of my grandmother’s mnemonic Bavaria. Behind it, red blood crashed through arteries to fill new cavities, like flood water racing through inundated forest spillways, desperately looking for some exit, any exit, its urgent crossing shattering windows and homes, obliterating impressions of people and towns that took decades to form, entire histories vanishing from memory in a single cataclysmic moment on the river.

*          *          *

They moved around a lot. My grandfather had trouble holding down a job. The Wurzburg years were the hardest. We stopped visiting my grandmother for a while; she wouldn’t let anyone see her. My grandfather had started out a decent man, but he drank frequently. He suffered his alcohol poorly, and when he became violent, he wouldn’t stop until everyone around him suffered with him. Often, my grandmother was the only person around him.

I remember once—I think it was the only time we saw them in Wurzburg—when we visited, and my grandmother was sporting a black eye she’d inherited from somewhere. “I got it playing tennis,” she said, winking at me. “Your grandfather went for the ball, and accidentally threw the racket. Went right over the net; hit me right in the eye. Tach, just like that.”

My grandmother could always make the best of the worst situation. I used to think that kind of optimism was a good trait—as long as she had a twinkle in her eye, how bad could things be? But after her stroke, I decided that maybe that was exactly the thing that had kept her from leaving him for so many years. A less optimistic person would have long ago lost hope that he would ever change; a less happy person might have run down to the courthouse and annulled him forever. But not her; she kept her good humor, a racket on the wall, and always had that long-running excuse for the black eyes and bruised arms.

Years later, I found out from my mother that she’d never even played tennis.

*          *          *

My grandfather never found out who my grandmother was with on the boat, or what they were doing out on the river. A week after she was admitted, a doctor finally offered to tell him—if you think it’ll make you feel better to have closure. But by then, my grandfather had decided he didn’t want to know. What was the point? There was no one to blame any more, nowhere to point the finger. He wouldn’t be able to yell at her and make her feel guilty about what she’d done, yell at her until she agreed she’d do better next time, and then they could get into bed and read newspapers together, pretending it was all suddenly alright. After my grandmother came home, my grandfather stopped talking to anyone at all, including my grandmother.

I never told my grandfather that I knew what had happened on the boat. I’d found out almost immediately. A friend of mine from the army was a paramedic, and he knew the guy on the chopper who strapped my grandmother to the sky that night. He said the circumstances were such that the chopper had had to come down much closer to the water than it was supposed to, and even then, there was some uncertainty about whether they’d actually be able to lift my grandmother out of the boat. They guy who was on the chopper had been scared. “It was like she had an anvil in her chest,” he told my friend. “And for a moment, I thought it would take us all down into the water with it. But then gravity let go, and we lifted her up above the river.”

*          *          *

In the winter, parts of the Danube freeze, but the current keeps most of the water going. It rushes from the Black Forest in the West to the Ukraine in the East, with temporary stops in Vienna, Budapest, and Belgrade. If the waters ever rise too high, they’ll flood a large part of Europe, a large part of Germany. Ingolstadt, Regensburg, Passau; they’d all be underwater. It would be St. Mary Magdalene all over again, and it would tear away beautiful places, places full of memories and laughter. All the places that I visited as a kid, where my grandmother lived, before the stroke that took away her Bavaria.