Software vs. software: How AI could find and fight video bots online

Information systems professors Victor Benjamin and Raghu Santanam are pitting software against software and hoping to help platform providers like YouTube and Instagram fight fake videos with insights from artificial intelligence.

By Betsy Loeff

Video-based social media like YouTube or Instagram are usually good for a few minutes of harmless fun or helpful how-to presentations. They’re a great place to hear a favorite singer, follow your city’s sport’s team, preview movies, see products in action, and more.

Of course, anyone who’s spent any time on video-based social media has also encountered heart-tugging videos about things like baby deer rescues or video slideshows of some sporting event’s exciting highlights, both narrated incompatibly with a machine-based monotone.

That’s because videos like that can easily be pulled together by bots, a term that’s shorthand for internet robot. Bots are little more than a collection of algorithms making up software that performs automated tasks much faster than humans could perform them. With computer-powered speed, bots are pulling down the quality of video-sharing platforms for users and content creators alike.

That’s why Victor Benjamin and Raghu Santanam, two professors of information systems, are looking for ways to detect bots with artificial intelligence. The researchers are pitting software against software and hoping to help platform providers like YouTube and Instagram fight fake videos with insights from artificial intelligence (AI).

Getting an eyeful

What makes view bots — the ones that post video in social media — a target for detection? “Some of the video-related bots we see are propaganda-based,” says Benjamin. An example he points to is one that leveraged Twitter when the Islamic State in Iraq and Syria (ISIS) hijacked the #JustinBieber hashtag and linked it to graphic footage showing a brutal execution of four men.

Even if bot videos aren’t linking people to horrific footage and jolting propaganda, they’re troublesome because these videos are generally of poor quality and derivative.

On the quality end, bot-generated videos rarely have real people — or even real video — in them, Benjamin says. Instead, they’ll have what’s mostly a slideshow of not-quite-appropriate photos that may or may not be related to the actual narrative, and that narrative is happening via text-to-speech applications that result in an emotionless robotic voiceover. The only motion in these videos is generally the cheesy special effects applied to the photos, which may do things like float, zoom in, zoom out, rotate, gyrate, and too-slowly fade to black.

What’s more, bots often are based on intellectual theft, say Santanam.

Bots are plagiarizing other people’s articles and photography and just doing a readout of the article using a robotic voice.

They may be pulling human interest stories — like those baby deer rescues — or things like new cell phone reviews, or even medical pointers, which could be especially harmful, Benjamin notes.

By posting low-quality videos online, bot producers “start harming the content base in the platform,” Santanam says. “Now you have a lot of noise on top of or along with the good videos. It brings down the quality of the platform when you have malicious, bot-generated videos that may or may not make much sense.”

It also impacts monetization of video on the platform. For one thing, bots create confusion over viewership. “Some bots randomly click on videos,” Santanam notes, making it hard for platform providers to determine appropriate advertising rates because views equate to currency on YouTube and other video sites. “If there is no credibility in the metrics on the social media, it brings down the value of the platform for advertisers.”

Plus, bot content is frustrating for users, Benjamin says.

It increases the user’s search cost if you’re trying to find a particular topic and you have to go through a lot of garbage videos to find it.

Ultimately, bots can destabilize the platforms themselves. “You have content creators and content consumers,” Santanam notes. “The advertisers see value with more consumers, and the consumers come because of the content creators. The content creators are motivated because they get viewership and money from the advertisers. It’s a multisided platform. Any noise brings down the overall value for everyone.”

In fact, some platforms are already losing content creators to other media, such as streaming services like Twitch. “A lot of people got bit by the automated detection methods some platforms employed, and their videos were de-monetized,” Benjamin says. “Instead of pre-recording videos and posting them, some content creators simply do a live stream a couple of hours a day.”

Putting bots in their sights

To combat these problems, Santanam and Benjamin are taking aim at bots, and they’re using artificial intelligence tools to do it. “The problem with identifying the bot videos is that you have to actually go and click on one to see it,” Santanam says. “We are trying to write a machine-learning algorithm that can automatically detect bot videos by seeing all their features.”

The features he’s talking about include things like the lack of humans, the lack of actual video footage, the robotic voice, the chaotic slideshow, the kind of comments the video generates, and miscellaneous data and metadata, such as who did the posting and the number of pixels per frame, which may indicate changes in camera angle.

The researchers are leveraging some of the newest tools in AI, including what’s called deep learning. Part of that involves what Benjamin calls feature engineering, the act of extracting features common to many bot-generated videos to detect which posting came from human hands versus bot algorithms.

Another part of it is leveraging neural networks, a development approach that uses a collection of subcomponents that process just a bit of data and then pass their output on to the next layer of processing rather than do all the processing in one monolithic program. “It’s called the neural network because conceptually, it’s tied to how our brains work,” Benjamin adds. “It’s a good way for machine learning algorithms to learn, especially when working with sequential data.”

A final thing the researchers are doing is employing a flavor of augmented intelligence. “When people think of augmented intelligence, they’re thinking of computers augmenting humans,” Benjamin continues.

We are looking at it from a backward approach: How can human cognitive power augment the AI?

The team is combing through human comments to leverage human perception, using the rules of speech act theory to parse out levels of confidence among those users and adding their findings into their bot-detection algorithms. According to speech act theory, different acts of speech — such as declarations of information or commitments to do something later — also come with predictable words attached. So, for instance, a statement might be accompanied by words such as “I know,” “I believe,” or “I conclude.” A commissive speech act might come along with words like “I promise to …” or “I guarantee I will …”

“One user might comment, ‘Hey, everyone: I think I may have found a bot.’ Another might say, ‘This is definitely a bot. We need to report this to the platform,’ ” Benjamin says. “There are different levels of confidence in these comments. One is posing a question. The other is declaring something.” Benjamin and Santanam are examining what speech act is being presented in the comments and using that to assign different weights to the confidence of the human intelligence so that it can be factored into their algorithms.

Ultimately, the goal of this research is to detect bots quickly and empower platform providers to remove the poor-quality postings more quickly.

Given that research indicates somewhere between 5 percent and 8 percent of social media content comes from bots, platform providers are actively seeking new ways to detect the prolific machines.

“If we write algorithms to detect the bot videos, the bot video creators can come up with technology that detects those algorithms,” Santanam says. “This is like an arms race now. It’s going to be a continuous cycle of algorithms learning from each other.”

Latest news