The bots of the future are going to use our own metadata to seem more humanNovember 17, 2018
Today the internet is a quagmire of captial-c Content, made navigable by retweets, likes, and favorites; everything posted can be quantified by its corresponding reactions. Though in aggregate it may seem like noise, to people in the business of disinformation, there’s a valuable signal there to be picked apart and studied. Our activity on social platforms — those favorites and likes and retweets — are a form of metadata that can help manipulators and their bots appear human to the algorithms that police social networks. And that problem is about to get a lot worse: bots are starting to mimic your social media activity in order to look more human.
“For users and platforms alike, it is getting harder to discern ‘real’ users and authentic account activities from fake, spammy, and malicious manipulations,” writes the researcher Amelia Acker in a recent Data & Society report that explores how metadata — your likes, comments, reactions — is being used to hoodwink the public in new and increasingly lifelike ways. “Manipulators are getting craftier at faking what looks like authentic behavior on social media.”
Those manipulators are manifold: services like Devumi, which sells followers to celebrities, businesses, and people who aspire to be influencers; political meddlers, like Russia’s Internet Research Agency, which attempt to influence elections using social media tools; or, even more seriously, repressive governments who want to gain support for unethical or otherwise unsavory policies, like the one in Myanmar that set up a host of sockpuppet Facebook pages to make a genocide more palatable to the public. Because more than half of Americans get their news primarily from social media, the Data & Society report concludes, these manipulations are becoming ever more harmful.
People using metadata to create disinformation bots, Acker tells The Verge, are getting it from “three stages of accessibility: there’s the stuff on the user interface that users like you and me can read with our eyes and can be scraped and read by machines,” meaning the metadata you generate when you’re using social media — comments, retweets, reactions, and the like. “Then,” she says, “there’s stuff that you can get access to through the API, which may be a little bit more precise, or a little bit more specific about the account setup” — say, the places you’re sending tweets from, or the date and time you created your account, right down to the second.
Then there’s what Acker calls the “macro layer”: the level that the platforms have exclusive access to. That’s the most important stuff — it’s the data we don’t see unless platforms like Facebook and Twitter get caught doing something they shouldn’t be (as in Twitter’s recent data dump of accounts and tweets related to Russia’s Internet Research Agency and Iran).
“But we can imagine what those dossiers look like,” says Acker, because researchers who investigate disinformation campaigns are using them already in their work. Those dossiers she mentions are composed of all the information that “everyone’s creating when they’re using social media.”
In the report, Acker dubs the manipulation of this information data craft: “a collection of practices that create, rely on, or even play with the proliferation of data on social media by engaging with new computational and algorithmic mechanisms of organization and classification.” Manipulators and bot-makers are in the business of data craft, optimizing for maximum impact.
Bots are increasingly good at mimicking people, even if only to fool the algorithms that are the first line of defense against spam and influence operations. “When I’m looking at a disinformation bot, it’s not always clear to me whether that person is trying to appear to be a real person to me, a human reader, or if they’re just trying to appear to be a real person to the automated limits,” says Acker. They also might exist, she adds, simply to find the limits of these auto-filters.
The overall point of these bots, she explains, is to fly just under the radar. And while that radar is pretty good — both Facebook and Twitter have made large strides in using machine learning to combat malicious spam — it’s still difficult to catch inauthentic accounts at scale. Partially this is because doing so might snap up legitimate accounts that only appear to be fake, but also partly because the real manipulators are getting smarter.
To appear human to a human reader, meanwhile, is a totally different task that’s much more labor intensive; it’s a lot harder to fool a person than it is to fool a program. Even so, there’s a template for it.
“The idea is that you have to sort of create some astroturfing across platforms or across sites to signal that you are real, that you’re not just like an avatar or username on one platform,” Acker says. “Deep cover” — hiding a fake account in plain sight — “comes from early accounts of sockpuppets online, and the idea that you have to have at least one or two or three different channels in order to be a reliable fake person online.”
If you’ve ever Googled yourself, you know that being alive means generating data. There’s reams and reams of it now, more than ever before, because of how computers can quantify a life. That information — which now includes everything from your heart rate to your recent purchases — is mostly used to advertise to us, which is why your Amazon recommendations look the way they do. It’s just something that could also be spoofed eventually, so that bots might appear just as alive.
“I suppose that future bots will be, like, better at creating activity data, as opposed to just like sort of flat messages,” Acker says, although she did preface her predictions by saying she’s not really in the business of making them. “They’ll be better at interacting, better at creating check-ins and tags, better at incorporating, you know, more informal ways that people communicate, like with emoji.” Acker also thinks that different kinds of bots will spring up, ones that are focused on pushing messages through filter bubbles; bots that are intended to sow doubt in processes or people; and deep cover bots, which wouldn’t be noticeable until they were activated, like proverbial sleeper agents — all that means understanding metadata and how it’s used will only become more important in the future.
“Probably, future bots will be better at jumping across platforms,” she says, which is a problem because it’s currently very difficult to moderate across platforms. “Like the hopscotching from chans to Reddit to Facebook,” Acker continues. “More coordinated efforts across platforms I think is probably where we’ll see… I don’t know, I don’t want to say innovation.”