GoogleAI’s overview is 90 per cent accurate and produces millions of errors per day.
According to The New York Times, a recent analysis found that GoogleAI ‘ s overview was about 90 per cent correct. This data seems to be working, but in the context of Google ‘ s extremely large search volume, this means that thousands of wrong messages are exported every minute.

Onmi, AI Models Development Corporation, using the AI tool, tested GoogleAI overview using the SimpleQA assessment criteria. SimpleQA, a common test method published by OpenAI in 2024, is essentially a list of questions with more than 4,000 verifiable answers that are entered for evaluation by the AI model. Oumi started testing last year when Gemini 2.5 was the best AI model in Google. The baseline test at that time showed an accuracy rate of 85 per cent. When Gemini 3 was updated and retested, the accuracy rate was increased to 91 per cent. If this standard is extended to all Google searches, the AI overview produces tens of millions of wrong answers every day. But Google didn’t quite approve of the test. Google spokesman Ned Adrians told The New York Times that Google believed that SimpleQA contained false information. Google’s own model assessment usually relies on a similar test called SimpleQA Verified, which uses a small set of critically scrutinized questions. He said: “The study has serious loopholes that do not reflect what users actually search for on Google.”

Assessing new AI models sometimes feels more like art than science, which is part of the problem. Each company has the ability to display the model in its own preferred way, and the non-certainty characteristics of the generation of AI make it difficult to verify anything. These robots can answer a factual question, but if the user re-examines it immediately, it can output the wrong answer. Oumi even uses AI tools to assess, and these models themselves can create hallucinations. Another complication is that the AI overview is not a single large model. Google indicates that “the right model” is used for each query. While the continuous operation of Gemini 3.1 Pro provides the best answer to the AI overview, it is too slow and costly. In order to quickly load content on the search page, the overview will use a faster Gemini Flash model where possible (which appears to be the majority of cases). It is worth mentioning that the 90 per cent correct rate is not bad. Google recently released a baseline test for the new model, with a factual indicator of between 60 and 80 per cent, which was tested without using tools such as web search. It’s more accurate to anchor AI with more data than naked models. However, the truth is hidden in a link, and the AI overview encourages users to accept their sometimes inaccurate summaries rather than manually checking the sources of information.
