Google DeepMind’s new AI models, AlphaProof and AlphaGeometry 2, demonstrate advanced mathematical reasoning, solving complex problems from the International Mathematical Olympiad.
Google DeepMind, Google LLC’s artificial intelligence research unit, unveiled two new AI models capable of advanced mathematical reasoning for solving complex math problems, which current models struggle with.
The company presented AlphaProof, a reinforcement-learning model capable of formal math reasoning, and AlphaGeometry 2, an improved version of the company’s existing geometry-solving system.
DeepMind said advanced mathematical reasoning would be required to develop artificial general intelligence or AGI. This is the “holy grail” of AI development, which involves the production of computer systems capable of self-teaching, understanding, and applying knowledge like humans.
Also Read: Explained: Composite AI
DeepMind pitted them against problems from this year’s International Mathematical Olympiad to test both models. This is the oldest and most prestigious competition for young mathematicians, held annually since 1959. It comprises six tough questions from algebra, combinatorics, geometry and number theory.
The research team applied the two models to the six problems, and together, they solved four of them, achieving the same level of proficiency as a silver medalist. AlphaProof solved two algebra problems and one number theory problem, while AlphaGeometry 2 solved the geometry problem. The remaining two combinatorics problems remained unsolved.
AlphaProof works by proving mathematical statements in a formal language, Lean. It’s built using the pre-trained language model AlphaZero, a reinforcement-learning algorithm that previously taught itself to master the games of chess, shogi, and Go. Formal languages allow the representation of mathematical equations in a way that can be verified for correctness. Still, a limited amount of human-written data is available for AI algorithms.
In contrast, large language models using natural language instead of formal language can tend to hallucinate, providing plausible but erroneous answers even with a large amount of human-generated data.
“We established a bridge between these two complementary spheres by fine-tuning a Gemini model to automatically translate natural language problem statements into formal statements, creating a large library of formal problems of varying difficulty,” the DeepMind researchers said.
Also Read: Top AI Chatbots for Every Need
Gemini is Google’s largest and most capable large language model and the LLM behind the company’s Gemini chatbot. It can hold conversations, answer questions, perform advanced text generation, summarize text and generate code.
The researchers trained AlphaProof by having it prove and disprove millions of potential problems across a broad range of mathematical topic areas and difficulties. The same training loop was applied during the competition to self-generate variations of contest problems until a solution could be found.
AlphaGeometry 2 is also a hybrid model based on Gemini and a neuro-symbolic system trained from scratch with more synthetic data than its predecessor, AlphaGeometry.
“This helped the model tackle much more challenging geometry problems, including problems about object movements and equations of angles, ratios, or distances,” the researchers said.
The team said the symbolic engine within the model is also twice as fast as its predecessor. It uses a specialized knowledge-sharing mechanism to discover geometry solutions and search trees to understand complex problems.
Due to these upgrades, AlphaGeometry 2 solved 83% of all historical IMO geometry problems from the past 25 years, compared with a 53% solve rate achieved by the previous model. According to the researchers, the model solved Problem 4 within 19 seconds after its formalization.
Also Read: Explained: Artificial General Intelligence
The team also experimented with natural language reasoning built with Gemini. The system doesn’t require problems to be translated into formal language and thus could be combined with other AI systems. The researchers said it was tested against the problems from this year’s IMO, and “the results showed great promise.”