Can journalism resist a chatbot-fueled race to the bottom?
In November 2022, tech media website CNET began publishing pieces generated entirely by artificial intelligence tools. By January, when Futurism reported on it, CNET had already published more than 70 articles under the “CNET Money Staff” byline. The move by CNET and errors discovered in 41 of those stories prompted Wired’s global editorial director Gideon Lichfield to put out policies on how his magazine will and will not use AI tools.
“I decided we needed guidelines, both to give our own writers and editors clarity on what was an allowable use of AI, as well as for transparency so our readers would know what they were getting from us,” Lichfield says.
The guidelines, for example, explicitly state that the magazine will not publish articles written or edited by AI tools. However, the editorial staff could use language or image generators—like DALL-E 2 and Midjourney—for brainstorming.
“I was impressed by Wired magazine; those [guidelines] are the best in class that I’ve seen so far,” says David Karpf, an associate professor in the School of Media and Public Affairs at the George Washington University. “They sent a strong line that they are not going to publish anything written by generative AI—they would treat that as akin to plagiarism—but that what they will use it for is idea creation.”
For instance, instead of having a team of editors come up with several potential headlines, and then pare those down to three that might work, they could use generative AI tools to produce dozens of headlines. The editors would then figure out which, if any, of those options are appropriate.
In essence, the AI tools—similar in some ways to how one might use Wikipedia—could be for “getting started, but never the place to finish,” Karpf says of media outlets who, like Wired, want to uphold their reputation.
Since the launch of OpenAI’s ChatGPT (Generative Pretrained Transformer) late last year and Microsoft’s Bing AI earlier this year, there’s been concern over plagiarism and the future of content creation, be it in academic settings, publishing, or journalism. ChatGPT has since been banned in several schools and colleges and even blocked in Italy due to privacy concerns. Last month, more than 1,000 researchers and tech leaders, including the Bulletin of the Atomic Scientists President Rachel Bronson, signed an open letter urging AI labs to pause the training of systems more powerful than ChatGPT-4 (OpenAI’s latest offering) for six months, citing “profound risks to society and humanity.” Despite the bans and a potential pause, the reality is that large language and image models are already here and impacting all areas of human life, including journalism, a profession defined by reporting facts, upholding truth, and holding those in power to account.
Human versus machine. Al language models collect and combine patterns in speech by combing the internet. When given a prompt, the applications they power take the information a person is seeking and make predictions based on pattern analysis. They then collate those patterns and produce text. To come up with answers, the models utilize information from encyclopedias, forum posts, personal websites, and articles, among others—some of which are copyrighted material—and essentially jumble it all up together.
“In many respects, it [an AI generative tool] doesn’t have any way to tell the difference between true and false information,” says Joan Donovan, research director of the Shorenstein Center on Media, Politics and Public Policy at the Harvard Kennedy School. “That’s what a human does. It does all those things: It reads, it collates, it sorts. But the journalist does so with quality in mind, by trying to understand what it is about a subject that’s important to the audience.”
Without the intelligence that can only come from humans, an AI generated text would require fact checking before it’s published. Such text might work for certain types of evergreen or service pieces or articles on subjects like housing, sports, and the stock market, where there are available datasets to analyze.
“Where it doesn’t work, of course, is in our most contentious issues, which have to do with inequality, racism, crime, places where context and integrity are going to matter,” Donovan says. “And so that’s the moment that we’re in with ChatGPT. But unfortunately, it’s going to be useful to journalism outlets that don’t want to pay humans, that don’t necessarily care about quality.”
Already, a news website with articles completely generated by AI launched in March. Dubbed NewsGPT, its CEO claims the site provide news without the bias and agendas that plague human reporting.
Many news outlets, especially print and local media, have seen a downward readership and revenue trend. For under-resourced newsrooms, these “cliché generators,” as Karpf calls them, might feel like a saving grace, producing content at almost lightning speed, sometimes in artful combinations. For example, one can ask ChatGPT or Bing AI to rewrite the Declaration of Independence in the style of William Shakespeare. The tools can quickly figure out those two forms and predict words for that very specific circumstance.
“That’s cool; that’s fun; but that is still clichés,” Karpf says. “That is still giving to us the most likely next word given the way people in those genres usually talk. So, they’re going to be most useful if what you need is a pretty standard, average answer to a question; they will be amazingly fast at producing that for you. If you want to get some genuinely new insights or thoughts, you’re still going to need humans, because what chatbots do is not akin to what humans do in terms of actual thinking and analysis.”
Even so, the marketing of chatbots as content generators feels targeted towards struggling publications. Judging from the down-sizing efforts already made in chains of local newspapers, it’s reasonable to think that some newsrooms will start to replace humans with technology, a move that experts say will not be without consequence.
“I think that that kind of marketing is essentially profiting off of already under-resourced newsrooms, and I think that newsrooms that start to employ this technology to replace journalists [will] eventually run into massive quality-control issues,” Donovan says.
A flood of misinformation. One major issue will be the scale at which misinformation and disinformation will be spread. AI generative tools themselves are running on substantial quantities of information drawn from internet sources—reliable and unreliable—without the capability to gauge quality.
When Donovan’s research team was experimenting with an early version of ChatGPT, they were trying to assess where it was pulling content from. In addition to Reddit and Wikipedia, they found the tool was using data from 4chan, an online anonymous message board known for hoaxes and pranks as well as extremist antisemitic and racist content.
“Websites using AI text generators [will] flood the web with misinformation or poor-quality information, drowning out reliable sources and polluting web search, in an accelerated version of the problem that already exists with junk pumped out by content farms,” Wired’s Lichfield says, adding that there will also likely be an explosion of AI-generated deepfake images and videos, such as the Pope Francis in a puffer jacket image that went viral in late March. This, Lichfield explains, will create “both an epidemic of misinformation and a breakdown in trust, as people come to assume any image or video they’re seeing could be fake.”
Others, who perceive false information as a reality of the world we have already been living in, believe we need to find ways to combat this new challenge. For Alexander McNamara, editor in chief of Live Science, misinformation has always stood in tandem with the truth.
“It always has been there; it’s always going to be there, [no] matter the medium—whether it’s the written word, whether it’s books, whether it’s newspapers, whether it’s television, whether it’s the internet,” he says. “For every piece of truthful information, there will be misinformation out there as well.”
The challenge will be to find a way to separate what is true and what isn’t. Perhaps, AI itself is the answer, and these tools can also be used to assess and mark whether a piece of text or art is AI generated.
“If there were AI that would look at the picture of the Pope in the in the puffer jacket coat and say this is an AI generated image and put that as a warning, then we’re using it to help fight the spread of misinformation at the same time as potentially causing it,” McNamara says.
Homogenization of content and ripples in the industry. It’s already well known that social media algorithms tailor the news and personalize searches to drive users to certain websites, often homogenizing the content one sees. Additionally, newsrooms can analyze social media data to determine what readers are most interested in. This became especially pronounced when Twitter came on the scene in the early 2000s; assigning editors started relying on Twitter trends to decide what to cover, a move that led to gaps in content readers wanted to see.
“Twitter laid it out for them so clearly that we ended up seeing this homogenizing effect across all of journalism, where it seemed like every outlet was writing the same article, using the same tweets and sources and debates that were generated online,” Donovan says. “That became a serious problem, because it didn’t reflect the broader audience of any particular newsroom; it reflected the audience of Twitter. And we saw, then, media manipulators be able to use and manipulate algorithms in order to set the agenda of mainstream media, day in and day out. We saw politicians especially get adept at doing this just by tweeting, and I think that with ChatGPT we’re entering into another cycle.”
The cycle could mean that if articles are being generated by the same AI tool, it won’t matter which publication users reads, because the pieces will be composed from the same information pool. However, often readers choose a specific publication because they either like the quality or voice of a specific writer or outlet. With writing from a chatbot that is essentially amalgamating text, the stylistic elements audiences might seek won’t exist.
“There are some journalists we read just because we love to read that journalist,” Donovan says. “We don’t necessarily search for a topic so much as we love their writing, and we love their style, we love the flourish of their vocabulary, and so I don’t think that any kind of technology would be able to replace that.”
But some experts worry that in the near term, the use of language and image generators is going to lead to the resurfacing of the trend that took over journalism from around 2009 to 2011, when content farms were producing low-quality, low-cost content for advertising money by satisfying algorithms.
“That worked until Google stepped in and changed their algorithm in order to kind of shut that down,” Karpf says. “I think it is likely that over the next year or so we’re going to see these chatbots used to produce answers that may or may not be true to pretty much every question under the sun [and] scoop up all of the advertising money. And that’s going to lead to a torrent of information out there of at best medium quality and of questionable veracity. That’s going to happen until we figure out ways to put a stop to it.”
Another concern will arise if chatbots begin to replace internet search engines for a good portion of queries made on the internet. If that happens, users might be satisfied with the narrative answers the chatbots provide—and less likely to click through to websites for answers, a move that, Lichfield worries, might reduce traffic to news websites and make their business models harder to sustain.
An ethical paradox and race to the bottom. The Society for Professional Journalists code of ethics states that journalists should “always attribute.” This code, adopted for nearly a century by humans, will be put to a serious test with the use of tools that don’t—at least not yet—attribute information to a particular source. As Donovan says, “if we can’t identify our sources, then we have a very serious ethical paradox to resolve.”
Identifying the sources of text produced by chatbots is especially challenging, because the large language models behind them mine data from hundreds of thousands of sources. Further, even if a newsroom were not using generative AI, the source material for a story might have been the product of a chatbot. For example, new findings are often first reported in academic journals, a resource that many journalists rely on. If academic journals allow their authors to use generative AI, then the journalist might be unwittingly basing a story, or a portion of one, on a research paper authored, or partially authored, by a text generator. Recently Science, a well-respected academic publisher, updated its editorial policies to indicate that “text generated from AI, machine learning, or similar algorithmic tools cannot be used in papers published” in their journals.
“We decided to take a pretty tough stance on this, which is that using ChatGPT to generate any text on a research paper without telling the editor was scientific misconduct,” says H. Holden Thorp, Editor-in-Chief of Science journals. Other well-regarded journals like Nature and PNAS, among several others, also outlined policies on generative AI. But given that there are at least 28,000 peer-reviewed journals, it’s unlikely all of them will update their guidelines, and for those that do, there’s bound to be quite a bit of variation; many of the new guidelines may not rival the strictness of Science’s.
Further, many magazines, newspapers and websites use freelancers to generate content. Unless publications employ only freelancers they know and trust, or add legal language into their contracts regarding the use of large language models, it will become tricky to identify chatbot-generated text. And for freelancers, who rely on productivity to make a living, there might be an argument to be made for using AI to go through mass quantities of information to aid their work.
“No journalist could ever read everything in order to produce that kind of synthesis,” Donovan says. “What we have to ask then is: What is journalism? And what is it about the human judgment and creativity that is lost by relying on these technological applications? And so, newsrooms have a duty to report when they’re using these tools. But it’s still a very open question as to how.”
The how becomes an important consideration for the industry. When CNET began publishing articles using AI tools, an average reader looking at the “CNET Money Staff” byline would assume the pieces were written by journalists. It was only if they clicked on the byline that they would be taken to an “about me” section indicating “This article was assisted by an AI engine and reviewed, fact-checked and edited by our editorial staff.”
“Industry wide, we’re going to have a race to the bottom,” Karpf says. “The best publications are going to stand against that tide [and] are going to be rewarded by producing better work than the folks who are going fast and cheap with chatbots. And I’m sure that won’t be industry-wide; there will be plenty of publications that act now and apologize later.”
In January, CNET removed the word “staff” from some of its AI-generated pieces to read “CNET Money,” a change that likely went undetected by many readers. Last month, CNET laid off some of its news and video staff, a move that, according to a spokesperson, “was not a reflection of the value or performance of our team members, the use of emerging technologies, or our confidence in the CNET Group’s future.”
Source: The Bulletin