9.6 C
Sunday, May 26, 2024

Google Unveils Gemini: A Challenger to GPT-4

Must read

Google has been an ‘AI-first company’ for nearly a decade. Now, a year into the AI era brought on by ChatGPT, it’s finally making a big move.

It’s the beginning of a new era of AI at Google, says CEO Sundar Pichai: the Gemini era. Gemini is Google’s latest large language model, which Pichai first teased at the I/O developer conference in June and is now launching to the public. Hearing Pichai and Google DeepMind CEO Demis Hassabis describe it as a huge leap forward in an AI model that will affect practically all of Google’s products. “One of the powerful things about this moment,” Pichai says, “is you can work on one underlying technology and make it better and it immediately flows across our products.” 

Gemini is more than a single AI model. A lighter version called Gemini Nano is meant to be run natively and offline on Android devices. A beefier version called Gemini Pro will soon power lots of Google AI services and is the backbone of Bard starting today. And there’s an even more capable model called Gemini Ultra, the most powerful LLM Google has yet created. It seems to be mostly designed for data centers and enterprise applications. 

Google is launching the model in a few ways: Bard is now powered by Gemini Pro and Pixel 8 Pro users will get a few new features thanks to Gemini Nano. (Gemini Ultra is coming next year.) Developers and enterprise customers can access Gemini Pro through Google Generative AI Studio or Vertex AI in Google Cloud starting on December 13th. Gemini is only available in English, with other languages coming soon. But Pichai says the model will eventually be integrated into Google’s search engine, its ad products, the Chrome browser, and more worldwide. It is the future of Google, and it’s here not a moment too soon.

OpenAI launched ChatGPT a year and a week ago, and the company and product immediately became the biggest things in AI. Now, Google — the company that created much of the foundational technology behind the current AI boom, that has called itself an “AI-first” organization for nearly a decade, and that was clearly and embarrassingly caught off guard by how good ChatGPT was and how fast OpenAI’s tech has taken over the industry — is finally ready to fight back.

So, let’s just get to the important question, shall we? OpenAI’s GPT-4 versus Google’s Gemini: ready, go. This has very clearly been on Google’s mind for a while. “We’ve done a thorough analysis of the systems side by side and the benchmarking,” Hassabis says. Google ran 32 well-established benchmarks comparing the two models, from broad overall tests like the Multi-task Language Understanding benchmark to one that compares the two models’ ability to generate Python code. “I think we’re substantially ahead on 30 out of 32” of those benchmarks, Hassabis says, with a bit of a smile on his face. “Some of them are very narrow. Some of them are larger.”

In those benchmarks (mostly very close), Gemini’s clearest advantage comes from its ability to understand and interact with video and audio. This is very much by design: multimodality has been part of the Gemini plan. Google hasn’t trained separate models for images and voice, like OpenAI created DALL-E and Whisper; it built one multisensory model from the beginning. “We’ve always been interested in very general systems,” Hassabis says. He’s especially interested in mixing all those modes — to collect as much data as possible from any number of inputs and senses and then give responses with just as much variety.

Currently, Gemini’s most basic models are text in and out, but more powerful models like Gemini Ultra can work with images, video, and audio. “it’s going to get even more general than that,” Hassabis says. “There’s still things like action and touch — more like robotics.” Over time, he says, Gemini will get more senses, become more aware, and become more accurate and grounded in the process. “These models just sort of understand better about the world around them.” Of course, these models still hallucinate and have biases and other problems. But the more they know, Hassabis says, the better they’ll get.

Benchmarks are just benchmarks, though, and ultimately, the true test of Gemini’s capability will come from everyday users who want to use it to brainstorm ideas, look up information, write code, and much more. Google seems to see coding as a killer app for Gemini; it uses a new code-generating system called AlphaCode 2 that performs better than 85 % of coding competition participants, up from 50 % for the original AlphaCode. But Pichai says that users will notice an improvement in almost everything the model touches.

Equally important to Google is that Gemini is a far more efficient model. It was trained on Google’s Tensor Processing Units and is faster and cheaper to run than Google’s previous models like PaLM. Alongside the new model, Google is also launching a new version of its TPU system, the TPU v5p, a computing system designed for use in data centers for training and running large-scale models.

Talking to Pichai and Hassabis, it’s clear that they see the Gemini launch both as the beginning of a larger project and as a step change in itself. Gemini is the model Google has been waiting for, the one it has been building toward for years, maybe even the one it should have had before OpenAI and ChatGPT took over the world. 

Google, which declared a “code red” after ChatGPT’s launch and has been perceived to be playing catch-up ever since, seems to still be trying to hold fast to its “bold and responsible” mantra. Hassabis and Pichai say they’re not willing to move too fast just to keep up, especially as we get closer to the ultimate AI dream: artificial general intelligence, an AI that is self-improving, smarter than humans, and poised to change the world. “As we approach AGI, things will be different,” Hassabis says. “It’s kind of an active technology, so we must approach that cautiously. Cautiously, but optimistically.”

Google says it has worked hard to ensure Gemini’s safety and responsibility through internal and external testing and red-teaming. Pichai points out that ensuring data security and reliability is particularly important for enterprise-first products, where most generative AI makes money. However, Hassabis acknowledges that one of the risks of launching a state-of-the-art AI system is that it will have issues and attack vectors no one could have predicted. “That’s why you have to release things,” he says, “to see and learn.” Google is taking the Ultra release particularly slowly; Hassabis compares it to a controlled beta, with a “safer experimentation zone” for Google’s most capable and unrestrained model. Basically, if there’s a marriage-ruining alternate personality inside Gemini, Google is trying to find it before you do.

For years, Pichai and other Google executives have waxed poetic about the potential for AI. Pichai has said more than once that AI will be more transformative to humanity than fire or electricity. In this first generation, the Gemini model may not change the world. Best-case scenario: it helps Google catch up to OpenAI in the race to build great generative AI. (In the worst-case scenario, Bard stays boring and mediocre, and ChatGPT keeps winning.) But Pichai, Hassabis, and everyone else at Google seem to think this is the beginning of something huge. The web made Google a tech giant; Gemini could be even bigger.

More articles

Latest news