ChatGPT, Claude, Gemini and Grok are not ready to brief American voters

A new generation of voters will ask ChatGPT, Claude, Gemini, and Grok how to vote, where the polling station is, and who is telling the truth. The published research is consistent: the models cannot reliably answer those questions. The election will arrive anyway.

In the spring of 2024, a Tow Center researcher at Columbia Journalism School ran a controlled experiment that should, in retrospect, have settled an industry argument.

The team fed eight AI search products, including ChatGPT Search, Perplexity, Gemini, Copilot, and the Grok-2 and Grok-3 search modes, a set of 200 news articles drawn evenly from twenty publishers, then asked each tool to identify the article and credit its source. Across 1,600 queries, the models returned the wrong answer more than 60% of the time.

ChatGPT Search, the only tool that consented to answer all 200 queries, was completely accurate on 28% of them and completely wrong on 57%. Perplexity, marketed as the research-grade option, was wrong 37% of the time, the lowest failure rate in the cohort.

Those numbers were published over a year ago. They have not improved. A Bloomberg study summary published on 20 May confirmed that ChatGPT, Claude, Gemini, and Grok remain unreliable when asked about news, including election news.

Nieman Lab’s read of the same data set found ChatGPT continues to be the worst of the four at crediting the news outlets it draws from. A separate NewsGuard False Claims monitor has the top ten generative-AI chatbots returning false claims to news prompts 35% of the time in August 2025, up from 18% the year before.

The 2026 US midterms are 167 days away from the date of this writing. The first cohort of American voters who will, plausibly, use a chatbot as their primary news interface will go to the polls in November.

NOTUS’s reporting on the campaigns has been blunt: ChatGPT and Claude will be a force in this election, and nobody, including the labs that built them, has a defensible plan for what happens when those forces produce confident, eloquent, well-cited answers that are also wrong.

What the published research shows, taken together, is not that chatbots occasionally hallucinate. The hallucination framing is a category error inherited from the early-2024 discourse. The research shows something more specific and more dangerous for information integrity.

Chatbots misattribute quotes systematically. They fabricate links that resolve to nothing. They cite syndicated or AI-summarised copies of articles in preference to the originals, severing the chain back to the journalists who produced the reporting.

They cannot reliably distinguish between a Reuters wire, a content-farm rewrite, and a Russian disinformation site dressed up in the same syndication wrappers. NewsGuard’s tracking of Moscow-seeded fake-news sites found the top ten generative-AI models mimicking Russian disinformation claims roughly a third of the time, citing the seeded sites as authoritative sources.

The structural reason for this is not a mystery, and the labs do not pretend it is. The training-data pipelines that produce the current generation of frontier models have ingested the open web at a scale that includes both the New York Times and the laundered output of disinformation operations.

The retrieval-augmented-generation systems that sit on top of those models, the ones meant to ground answers in current sources, are running over a search index whose top results in many news queries are AI-generated rewrites of AI-generated rewrites.

The ‘data voids’ analysis in Lawfare from earlier this year describes the mechanism: where a real story has thin original-source coverage, propaganda fills the gap, and the chatbot, on the cleanest read of its retrieval logs, treats the propaganda as the substantive source.

This is the position from which the labs are now negotiating publisher-licensing deals. OpenAI has signed agreements with the Financial Times, Axel Springer, News Corp, Le Monde and a roster of others; Google has done the same; Anthropic and Perplexity have built out their own publisher partnerships.

The argument for the deals, made by both sides, is that licensed-content access will produce better citations, more accurate summarization, and a healthier traffic relationship between chatbot and publisher. The argument is plausible. The published evidence, as of May 2026, does not yet support it.

ChatGPT Search’s 57% complete-failure rate was measured on a corpus that included articles from publishers with which ChatGPT had licensing relationships with. The licensing did not produce accurate retrieval. It produced the appearance of legitimacy around inaccurate retrieval.

The midterm-specific problem is that the failure modes of the current generation of chatbots are calibrated almost perfectly to election misinformation. A voter who asks ChatGPT ‘where is my polling place’ will get a confident answer with a verisimilar-looking citation; whether the answer is correct depends on whether the model’s most-recently-cached source for that address is correct.

A voter who asks Gemini “has the Republican candidate in my district been charged with any crimes” will get an answer whose accuracy depends on which version of which news report the retrieval layer surfaces, and on whether that surface is the AP wire or a syndicated rewrite that quietly omits the contested clause.

A voter who asks Grok ‘who is winning this race’ will get an answer shaped by the underlying model’s training cut-off and by the proportion of pollster-aggregator sites in the retrieval index.

None of these failure modes looks like a hallucination to the user. They look like authoritative information, delivered fluently, with citations.

The lab-side response has been to position the chatbot products as auxiliary, not primary, sources. Sam Altman, Dario Amodei, Sundar Pichai, and Elon Musk have all, at various points across the past eighteen months, made some version of the ‘always verify against the primary source’ argument.

The argument is technically correct and operationally useless. A voter who would have read the primary source before asking the chatbot was never the population at risk.

The voters at risk are the ones for whom the chatbot is the primary source, the way Google Search was the primary source for an earlier cohort, and the network evening news was the primary source for the cohort before that.

The CJR’s running coverage of newsroom-AI experiments has been unsparing on this point: the trade-off being made is accuracy for convenience, and the publishers are increasingly willing to make it.

There is a parallel arc that makes the midterm exposure sharper. China’s regulatory crackdown on AI misuse came online in April 2026 with mandatory labelling and personality-simulation rules.

The European Commission is running its Digital Services Act enforcement track in parallel. Both regimes are calibrated to require chatbot operators to surface provenance, label outputs, and accept liability for misinformation produced inside their products.

The US has nothing comparable on the federal books. OpenAI’s adoption of the C2PA-and-SynthID provenance stack is the lab’s answer to part of this question, applied to AI-generated images. There is no equivalent provenance layer for chatbot text output.

The fact-claim made in confident prose by ChatGPT or Grok carries no machine-readable signal of where it came from, how the retrieval was scored, or whether the underlying source was a wire report or a content farm.

What the labs are betting on, on the available evidence, is that the November result will be unambiguous enough that no chatbot can plausibly be blamed for it. That bet may be correct. It is also a bet that no honest information-integrity policy can rest on.

Stanford’s FSI research group has been clear that curated evidence layers can materially reduce the false-citation rate in chatbots, but that they require the kind of editorial infrastructure no current chatbot interface ships with at scale.

The mid-2026 question is not whether the labs can build that infrastructure. It is whether they will build it before the second Tuesday in November.

The temptation, sitting at this distance from the midterms, is to write a column urging voters to verify, urging publishers to litigate, urging regulators to act, and urging the labs to ship better citations.

All of those urgings are correct, and all of them ask the wrong actors to absorb the cost of a problem the labs created and continue to ship.

The labs have shipped news-mode products into the most consequential US election since 2020 with documented 35% misinformation rates, 60% citation-failure rates, and a retrieval architecture they themselves admit they cannot fully audit.

The same labs negotiating regulatory carve-outs in the UK and Europe for the energy and copyright costs of running frontier training are, in the same week, telling US journalists that the midterm exposure is overstated.

The exposure is not overstated. The pattern from healthcare is the closest available parallel: confident AI outputs deployed into a high-stakes domain, regulators slow to require provenance, and an ECRI patient-safety ranking putting AI-chatbot misuse at the top of the 2026 health-tech hazard list.

The election domain is structurally more exposed than healthcare because the failure mode is not a single bad clinical answer but a cumulative drift in what an entire voter cohort believes the news is. By the time the post-mortem researchers measure that drift, the votes will already have been counted.

The midterms will arrive in 167 days. The chatbots will not be ready. The voters who use them as their primary news interface will go to the polls anyway.

What the labs do between now and November is a test of whether they understand the difference between shipping a product and shipping a piece of the information infrastructure of a democracy.

The published evidence so far is that they understand the first thing and have not yet been required to understand the second.

ChatGPT, Claude, Gemini and Grok are not ready to brief American voters

USA vs. Germany 2026 livestream: How to watch Ice Hockey World Championships for free

Leave a Reply Cancel reply

Recent Posts

Recent Comments

Categories

ChatGPT, Claude, Gemini and Grok are not ready to brief American voters

USA vs. Germany 2026 livestream: How to watch Ice Hockey World Championships for free

Leave a Reply Cancel reply

Recent Posts

Recent Comments

Categories

Get more stuff like this in your inbox

Get more stuff like this
in your inbox