Detecting machine-written content in scientific articles

UChicago researchers evaluated thousands of scientific abstracts and spotted a growing trend that researchers are using AI tools in scientific writing.

The recent surge in popularity of AI tools such as ChatGPT is forcing the science community to reckon with its place in scientific literature. Prestigious journals such as Science and Nature have attempted to restrict or prohibit AI use in submissions, but detecting machine-generated language proves difficult.

Since AI is getting more advanced at mimicking human language, researchers at the University of Chicago were interested in learning how frequently authors are using AI and how well it can produce convincing scientific articles. In a study published in JCO Clinical Cancer Informatics, Alexander Pearson, MD, PhD, Frederick Howard, MD, and colleagues evaluated text from over 15,000 published abstracts from the American Society for Clinical Oncology (ASCO) Annual Meeting from 2021 to 2023, using several commercial AI content detectors. They found that there were approximately twice as many abstracts containing AI content in 2023 as compared to 2021 and 2022—a clear signal that researchers are utilizing AI tools in scientific writing.

Interestingly, the content detectors were much better at distinguishing text generated by older versions of AI chatbots from human written text, but they were less accurate in identifying text from the newer, more accurate AI models or mixtures of human-written and AI-generated text.

Howard and colleagues warned that while AI can be used as a tool to aid in scientific writing, the author is ultimately responsible for all aspects of the submission, requiring due diligence to ensure factual and accurate representation of content with no misleading or inaccurate information.

They also concluded that because AI content detectors will never reach perfect accuracy, they should not be used as the sole means to assess AI content on scientific writing. Instead, they could be used as a screening tool to indicate that the presented content requires additional scrutiny from reviewers.

“AI models are prone to errors – such as referencing a scientific study which does not exist, or confidently stating an incorrect fact,” Howard said. “Although most scientists may use these models responsibly and rigorously review AI-generated text included in their studies, we need to define standards to identify and verify the accuracy of any AI content included in oncology literature to ensure that readers (including patients and their care teams) are not misled by inaccurate statements from AI models.”   

Howard presented the results of the study through a poster and live discussion at the 2024 ASCO Annual Meeting in Chicago, with a simultaneous publication of the research paper. The editor-in-chief of JCO Clinical Cancer Informatics commented on the study’s relevance.

“Scientific meetings are one of the most important venues for conveying results including those with practice-changing implications,” Jeremy L. Warner, MD, MS, FAMIA, FASCO, wrote. “In the past several years, large language models (LLMs) have become pervasive and as such it is somewhat expected that they are being used to generate some portion of the scientific discourse. It will be important to recognize and monitor this LLM-generated content, given the potential issues with accuracy, originality, and trustworthiness.”

The study, “Characterizing the Increase in Artificial Intelligence Content Detection in Oncology Scientific Abstracts From 2021 to 2023,” was supported by ASCO’s Center for Research and Analytics. Additional authors included Anran Li, MS from the University of Chicago; and Mark F. Riffon, MPH, and Elizabeth Garrett-Mayer, PhD, from the Center for Research and Analytics at the American Society of Clinical Oncology, Alexandria, VA.

Explore the Biological Sciences Division