Google DeepMind makes AI history with gold medal win at world’s toughest math competition

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now

Google DeepMind announced Monday that an advanced version of its Gemini artificial intelligence model has officially achieved gold medal-level performance at the International Mathematical Olympiad, solving five of six exceptionally difficult problems and earning recognition as the first AI system to receive official gold-level grading from competition organizers.

The victory advances the field of AI reasoning and puts Google ahead in the intensifying battle between tech giants building next-generation artificial intelligence. More importantly, it demonstrates that AI can now tackle complex mathematical problems using natural language understanding rather than requiring specialized programming languages.

“Official results are in — Gemini achieved gold-medal level in the International Mathematical Olympiad!” Demis Hassabis, CEO of Google DeepMind, wrote on social media platform X Monday morning. “An advanced version was able to solve 5 out of 6 problems. Incredible progress.”

Official results are in – Gemini achieved gold-medal level in the International Mathematical Olympiad! ? An advanced version was able to solve 5 out of 6 problems. Incredible progress – huge congrats to @lmthang and the team! https://t.co/pp9bXF7rVj
— Demis Hassabis (@demishassabis) July 21, 2025

The International Mathematical Olympiad, held annually since 1959, is widely considered the world’s most prestigious mathematics competition for pre-university students. Each participating country sends six elite young mathematicians to compete in solving six exceptionally challenging problems spanning algebra, combinatorics, geometry, and number theory. Only about 8% of human participants typically earn gold medals.

The AI Impact Series Returns to San Francisco – August 5

The next phase of AI is here – are you ready? Join leaders from Block, GSK, and SAP for an exclusive look at how autonomous agents are reshaping enterprise workflows – from real-time decision-making to end-to-end automation.

Secure your spot now – space is limited: https://bit.ly/3GuuPLF

How Google DeepMind’s Gemini Deep Think cracked math’s toughest problems

Google’s latest success far exceeds its 2024 performance, when the company’s combined AlphaProof and AlphaGeometry systems earned silver medal status by solving four of six problems. That earlier system required human experts to first translate natural language problems into domain-specific programming languages and then interpret the AI’s mathematical output.

This year’s breakthrough came through Gemini Deep Think, an enhanced reasoning system that employs what researchers call “parallel thinking.” Unlike traditional AI models that follow a single chain of reasoning, Deep Think simultaneously explores multiple possible solutions before arriving at a final answer.

“Our model operated end-to-end in natural language, producing rigorous mathematical proofs directly from the official problem descriptions,” Hassabis explained in a follow-up post on the social media site X, emphasizing that the system completed its work within the competition’s standard 4.5-hour time limit.

We achieved this year’s impressive result using an advanced version of Gemini Deep Think (an enhanced reasoning mode for complex problems). Our model operated end-to-end in natural language, producing rigorous mathematical proofs directly from the official problem descriptions –…
— Demis Hassabis (@demishassabis) July 21, 2025

The model achieved 35 out of a possible 42 points, comfortably exceeding the gold medal threshold. According to IMO President Prof. Dr. Gregor Dolinar, the solutions were “astonishing in many respects” and found to be “clear, precise and most of them easy to follow” by competition graders.

OpenAI faces backlash for bypassing official competition rules

The announcement comes amid growing tension in the AI industry over competitive practices and transparency. Google DeepMind’s measured approach to releasing its results has drawn praise from the AI community, particularly in contrast to rival OpenAI’s handling of similar achievements.

“We didn’t announce on Friday because we respected the IMO Board’s original request that all AI labs share their results only after the official results had been verified by independent experts & the students had rightly received the acclamation they deserved,” Hassabis wrote, appearing to reference OpenAI’s earlier announcement of its own olympiad performance.

Btw as an aside, we didn’t announce on Friday because we respected the IMO Board’s original request that all AI labs share their results only after the official results had been verified by independent experts & the students had rightly received the acclamation they deserved
— Demis Hassabis (@demishassabis) July 21, 2025

Social media users were quick to note the distinction. “You see? OpenAI ignored the IMO request. Shame. No class. Straight up disrespect,” wrote one user. “Google DeepMind acted with integrity, aligned with humanity.”

The criticism stems from OpenAI’s decision to announce its own mathematical olympiad results without participating in the official IMO evaluation process. Instead, OpenAI had a panel of former IMO participants grade its AI’s performance, a approach that some in the community view as lacking credibility.

“OpenAI is quite possibly the worst company on the planet right now,” wrote one critic, while others suggested the company needs to “take things seriously” and “be more credible.”

You see?
OpenAI ignored the IMO request. Shame. No class. Straight up disrespect.
Google DeepMind acted with integrity, aligned with humanity.
TRVTHNUKE pic.twitter.com/8LAOak6XUE
— NIK (@ns123abc) July 21, 2025

Inside the training methods that powered Gemini’s mathematical mastery

Google DeepMind’s success appears to stem from novel training techniques that go beyond traditional approaches. The team used advanced reinforcement learning methods designed to leverage multi-step reasoning, problem-solving, and theorem-proving data. The model was also provided access to a curated collection of high-quality mathematical solutions and received specific guidance on approaching IMO-style problems.

The technical achievement impressed AI researchers who noted its broader implications. “Not just solving math… but understanding language-described problems and applying abstract logic to novel cases,” wrote AI observer Elyss Wren. “This isn’t rote memory — this is emergent cognition in motion.”

Ethan Mollick, a professor at the Wharton School who studies AI, emphasized the significance of using a general-purpose model rather than specialized tools. “Increasing evidence of the ability of LLMs to generalize to novel problem solving,” he wrote, highlighting how this differs from previous approaches that required specialized mathematical software.

It wasn’t just OpenAI.
Google also used a general purpose model to solve the very hard math problems of the International Math Olympiad in plain language. Last year they used specialized tool use
Increasing evidence of the ability of LLMs to generalize to novel problem solving https://t.co/Ve72fFmx2b
— Ethan Mollick (@emollick) July 21, 2025

The model demonstrated particularly impressive reasoning in one problem where many human competitors applied graduate-level mathematical concepts. According to DeepMind researcher Junehyuk Jung, Gemini “made a brilliant observation and used only elementary number theory to create a self-contained proof,” finding a more elegant solution than many human participants.

What Google DeepMind’s victory means for the $200 billion AI race

The breakthrough comes at a critical moment in the AI industry, where companies are racing to demonstrate superior reasoning capabilities. The success has immediate practical implications: Google plans to make a version of this Deep Think model available to mathematicians for testing before rolling it out to Google AI Ultra subscribers, who pay $250 monthly for access to the company’s most advanced AI models.

The timing also highlights the intensifying competition between major AI laboratories. While Google celebrated its methodical, officially-verified approach, the controversy surrounding OpenAI’s announcement reflects broader tensions about transparency and credibility in AI development.

This competitive dynamic extends beyond just mathematical reasoning. Recent weeks have seen various AI companies announce breakthrough capabilities, though not all have been received positively. Elon Musk’s xAI recently launched Grok 4, which the company claimed was the “smartest AI in the world,” though leaderboard scores showed it trailing behind models from Google and OpenAI. Additionally, Grok has faced criticism for controversial features including sexualized AI companions and episodes of generating antisemitic content.

The dawn of AI that thinks like humans—with real-world consequences

The mathematical olympiad victory goes beyond competitive bragging rights. Gemini’s performance demonstrates that AI systems can now match human-level reasoning in complex tasks requiring creativity, abstract thinking, and the ability to synthesize insights across multiple domains.

“This is a significant advance over last year’s breakthrough result,” the DeepMind team noted in their technical announcement. The progression from requiring specialized formal languages to operating entirely in natural language suggests that AI systems are becoming more intuitive and accessible.

For businesses, this development signals that AI may soon tackle complex analytical problems across various industries without requiring specialized programming or domain expertise. The ability to reason through intricate challenges using everyday language could democratize sophisticated analytical capabilities across organizations.

However, questions persist about whether these reasoning capabilities will translate effectively to messier real-world challenges. The mathematical olympiad provides well-defined problems with clear success criteria — a far cry from the ambiguous, multifaceted decisions that define most business and scientific endeavors.

Google DeepMind plans to return to next year’s competition “in search of a perfect score.” The company believes AI systems combining natural language fluency with rigorous reasoning “will become invaluable tools for mathematicians, scientists, engineers, and researchers, helping us advance human knowledge on the path to AGI.”

But perhaps the most telling detail emerged from the competition itself: when faced with the contest’s most difficult problem, Gemini started from an incorrect hypothesis and never recovered. Only five human students solved that problem correctly. In the end, it seems, even gold medal-winning AI still has something to learn from teenage mathematicians.

Daily insights on business use cases with VB Daily

If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI.

Read our Privacy Policy

Thanks for subscribing. Check out more VB newsletters here.

An error occured.

Source link