TL;DR
ArXiv will ban researchers for one year if they submit papers with obvious signs of unchecked AI generation, such as hallucinated references or leftover chatbot instructions. The policy, announced by computer science section chair Thomas Dietterich, is the first formal penalty by a major preprint platform for AI-generated slop.
ArXiv, the open-access repository that has served as the primary distribution channel for preprint research in computer science, mathematics, and physics for more than three decades, will ban authors for one year if they submit papers containing obvious signs of unchecked AI generation. Thomas Dietterich, chair of arXiv’s computer science section, announced the policy on Thursday, writing that submissions with “incontrovertible evidence” of unvetted large language model output mean “we can’t trust anything in the paper.”
The rule is not a blanket prohibition on using AI tools. Researchers can still use language models for drafting, editing, or analysis. What triggers the penalty is evidence that an author pasted LLM output into a paper without checking it, the kind of carelessness that produces hallucinated references, placeholder instructions from the chatbot, or fabricated data tables with notes reading “fill in with the real numbers from your experiments.” If moderators find such evidence and a section chair confirms it, the author faces a one-year ban from arXiv, after which all subsequent submissions must first be accepted by a peer-reviewed journal before they can appear on the platform.
Why it matters
ArXiv is not a journal. It does not peer-review papers. But it has become the de facto way that research circulates in several of the fastest-moving fields in science, particularly machine learning and artificial intelligence. Papers posted to arXiv are read, cited, and built upon long before they appear in formal publications, if they ever do. That makes the platform’s quality standards unusually consequential: a hallucinated citation on arXiv can propagate through the research literature just as effectively as one in a peer-reviewed journal, and often faster.
The scale of the problem is significant. A study published in The Lancet in May 2026 by researchers at Columbia University audited 2.5 million biomedical papers and 126 million references indexed on PubMed Central. It found that fabricated citations have risen twelvefold since 2023. In that year, roughly one in 2,828 papers contained at least one fake reference. By 2025, the rate had climbed to one in 458. In the first seven weeks of 2026, it was one in 277. The researchers attributed the surge to the proliferation of AI writing tools, noting that previous studies estimate 30 to 69 per cent of LLM-generated references in biomedical contexts are fabricated.
ArXiv has reason to take the threat seriously. The platform receives thousands of submissions each month, and its volunteer moderation system was not designed to screen for machine-generated content at scale. Dietterich’s announcement described the new penalty as a “one-strike” rule, though decisions are subject to appeal and require confirmation by a section chair before being imposed.
What counts as evidence
The policy is deliberately narrow in what it targets. Dietterich listed specific examples of “incontrovertible evidence”: hallucinated references that do not correspond to any real publication, meta-comments from the language model left in the text (such as “here is a 200-word summary; would you like me to make any changes?”), and placeholder data with instructions to the author that were never removed. These are not subtle quality failures. They are signs that the author did not read the paper before submitting it.
The distinction matters because it avoids the far more difficult question of whether AI-assisted writing should be permitted at all. ArXiv’s existing policy already states that authors bear “full responsibility” for their content “irrespective of how the contents are generated.” The new penalty enforces that principle by targeting the most egregious violations, cases where the author’s failure to exercise any oversight is provable from the text itself.
That approach has practical advantages. Detecting whether a well-edited paper was drafted with the help of an LLM is unreliable with current detection tools, and attempting to enforce a broader ban would be both technically difficult and potentially punitive toward researchers who use AI tools responsibly. By focusing on obvious slop, arXiv can enforce the rule without needing to build or buy an AI-detection system, a technology that remains prone to its own errors.
A broader problem
ArXiv is not the only institution struggling with the issue. Academic conferences in computer science, including NeurIPS and ICML, have reported surges in submissions that appear to be generated with minimal human oversight. Nature published a feature in late 2025 describing how AI slop is creating a crisis in computer science, where the volume of low-quality submissions is overwhelming reviewers and diluting the signal-to-noise ratio of the field’s output.
Peer-reviewed journals face the same problem. The Lancet study found that fabricated citations appeared in papers that had already passed peer review, suggesting that reviewers are either not checking references or are unable to identify fabrications at the rate they are now appearing. Lead author Maxim Topaz, of Columbia University’s School of Nursing, warned that clinicians and guideline developers have no way of knowing when the evidence they rely on does not exist, a gap that efforts to reduce AI hallucinations in scientific research have not yet closed.
ArXiv itself is undergoing structural changes that may help it address the challenge. After more than 20 years as a project hosted by Cornell University, the platform is becoming an independent nonprofit, a move that should give it greater autonomy over its moderation policies and the ability to raise funds specifically to combat quality problems. It has also introduced a requirement for first-time submitters to obtain an endorsement from an established author, a gatekeeping measure aimed at reducing the volume of submissions from accounts created solely to publish AI-generated material.
The limits of enforcement
The new rule will catch the most careless offenders, researchers who submit papers they have not read. It will not catch researchers who use language models to generate plausible but incorrect claims, fabricate data, or produce papers that are fluent but scientifically vacuous. Those problems require peer review, institutional oversight, and a willingness within the research community to treat AI-assisted misconduct with the same seriousness as traditional forms of fabrication.
What arXiv’s policy does establish is a principle: if you submit a paper, you are responsible for every word in it. That has always been true in theory. The difference now is that language models have made it trivially easy to produce text that reads like science but contains nothing of substance. ArXiv’s one-year ban is a modest penalty for a serious offence, but it is also the first formal acknowledgement by a major research platform that the problem is no longer one of occasional carelessness. It is structural, it is growing, and it requires dedicated infrastructure to combat.


