Man vs. machines: Research aims to nix bot content online

People have been battling robots for decades in the movies and, with the proliferation of "bots" online, we're now fighting them for real. Scholars estimate that as much as 8 percent of content on social media platforms comes from bots.

Among Facebook's 2.27 billion active users, that 8 percent would equal 178 million nonhuman accounts. If those code-created Facebook posters were the citizens of a nation, they would make up the fifth largest country in the world. Those 178 million bots would outnumber the populations of France, Italy, and Spain combined.

Over the past two years, bot interference has earned enough media attention that users are becoming a little savvier about identifying them, says Assistant Professor of Information Systems Victor Benjamin. That's why he teamed up with his departmental colleague Professor and Chair of Information Systems Raghu Santanam to see if users can help platform owners detect bot activity online. Using analysis based partially on user reactions to bots, the researchers uncovered an approach that holds promise for faster and more accurate bot detection.

I, robot

What, exactly are bots? "In the Reddit space or on Facebook or Instagram, bots are virtual robots," says Santanam. "They're programs that are designed to react in a certain way based on the stimulus they receive, and the stimulus, in this case, would be social media interactions."

He adds that social bots have been used for multiple reasons. In politics, they may spread false news or propaganda designed to exaggerate existing biases. "They increase the polarization," he says.

Bots also have been used to sway policy. When the Federal Communications Commission invited public comment on plans to repeal net neutrality protections, bots chimed in, generally with comments in support of the repeal. Eric Schneiderman, New York's attorney general at the time, estimated that hundreds of thousands of stolen identities masked the bots. "There were comments being made by dead people, which is a huge tell that names were stolen to propagate fake comments," Benjamin says.

Often the phishing schemes we see are asking for donations to a cause, but the actual donation page is not legitimate. Or bots spread hyperlinks leading users to pages with malware that empowers hackers to steal data.

Bots also have been used in advertising campaigns to simulate interest in products or messages, falsify online "likes" and "votes," as well as swell the ranks of followers some famous person can boast, which means advertisers may be paying more than they should for celebrity endorsements. In addition, bots are used to lure unsuspecting social media users to scams. "Often the phishing schemes we see are asking for donations to a cause, but the actual donation page is not legitimate," Benjamin notes. "Or bots spread hyperlinks leading users to pages with malware" that empowers hackers to steal data.

Machine-readable

To help platform providers like Facebook and Reddit identify bots, Benjamin and Santanam performed a linguistic analysis of both known bot accounts and nonbot accounts on Reddit. The researchers used analytic methods rooted in artificial intelligence (AI) to parse through bot- and human-generated messages while comparing features of the content, topics discussed, and language used. In addition, the researchers looked specifically at the human reaction to the bot posting. That means instead of using AI to augment human decision-making, they used human comments to augment AI.

"What kind of linguistic patterns did we look at?" Benjamin asks. "There may be specific ideas that the bots are trying to propagate, and you may see the same message pushed repeatedly by a few different accounts."

Or topic persistence will flag bot interference. "If there is a six-month-old account and it only posts about one topic, that's suspicious," he continues. Along with topic and message similarity, the researchers evaluated user comments identifying bots, as well as the confidence levels expressed by the users. "If someone says, ‘I think maybe I have found a bot,' that's a very different confidence level than someone saying, ‘I know for sure this post is from a bot,'" Benjamin explains.

He adds that the researchers also have been trying to evaluate the influence of bots. The scholars use "thumbs up" and "thumbs down" votes on comments — what Reddit calls a posting's karma — as one signifier. They also look at how many comments a bot posting receives. "In this study, we're measuring how influential the bot was in generating interaction with humans," Benjamin says.

Results from this ongoing research confirm prior studies and show that message similarity is indeed an indication that bots maintain their activity for long periods of time. This research also shows that bot detection improves with additional inputs, such as human identification and interaction.

"We're helping platform owners take full benefit of how humans react to the type of messages that bots send out," Santanam says. "The more you are able to leverage human intelligence, the better you get at detecting bots before they create too much damage."



Research by Raghu Santanam, Professor and Chair of Information Systems (left) and Victor Benjamin, Assistant Professor of Information Systems

Betsy Loeff
WPC Magazine Cover

These alumni know there's no business like show business.
Read the issue »