In 1985, the TV film Max Headroom: 20 Minutes into the Future presented a science fictional cyberpunk world where an evil media company tried to create an artificial intelligence based on a reporter’s brain to generate content to fill airtime. There were somewhat unintended results. Replace “reporter” with “redditors,” “evil media company” with “well meaning artificial intelligence researchers,” and “airtime” with “a very concerned blog post,” and you’ve got what Ars reported about last week: Generative Pre-trained Transformer-2 (GPT-2), a Franken-creation from researchers at the non-profit research organization OpenAI.
Unlike some earlier text-generation systems based on a statistical analysis of text (like those using Markov chains), GPT-2 is a text-generating bot based on a model with 1.5 billion parameters. (Editor’s note: We recognize the headline here, but please don’t call it an “AI”—it’s a machine-learning algorithm, not an android). With or without guidance, GPT-2 can create blocks of text that look like they were written by humans. With written prompts for guidance and some fine tuning, the tool could be theoretically used to post fake reviews on Amazon, fake news articles on social media, fake outrage to generate real outrage, or even fake fiction, forever ruining online content for everyone. All of this comes from a model created by sucking in 40 gigabytes of text retrieved from sources linked by high-ranking Reddit posts. You can only imagine how bad it would have been if the researchers had used 40 gigabytes of text from 4chan posts.