Gemini is a large language model deriving its training technique from AlphaGo, the same computer program that defeated a professional human Go player. Unlike the previous models of Google, like LaMDA, PaLM, and PaLM 2, Gemini is designed as a multimodal, meaning it can process not only text but also images, videos, and audio.
ChatGPT was also launched as unimodal until recently, when it was updated to accept image and voice inputs.
In the announcement, Pichai mentioned that Google brought its two AI research teams, Brain Team and DeepMind, together to develop Gemini, making it a next-generation model
He further said it is highly efficient at tool and API integrations and built to enable future innovations, like memory and planning. While still early, we’re already seeing impressive multimodal capabilities not seen in prior models.
Also, it was said that once Gemini is rigorously tested for safety, it will be available in various sizes and capabilities, just like PaLM 2.