Google launches native multimodal AI model Gemini, challenging GPT-4

share
Google launches native multimodal AI model Gemini, challenging GPT-4

Google announced the launch of its native multimodal AI model Gemini. Gemini is Google's most powerful and versatile AI model to date, capable of understanding, manipulating, and combining different types of information such as text, code, audio, images, and videos simultaneously.

Google Unveils Native Multimodal AI Model Gemini

Google has introduced its native multimodal AI model Gemini, emphasizing that it is a multimodal AI model built from scratch, similar to how humans have five senses to simultaneously perceive and understand the world. This means Gemini can comprehend, manipulate, and combine different types of information seamlessly, including text, code, audio, images, and videos, just like humans. This will result in better outcomes compared to other separately constructed text or voice models that are later connected.

Google rigorously tested the Gemini model and evaluated its performance across various tasks. From natural image, audio, and video understanding to mathematical reasoning, Gemini Ultra's performance surpassed current state-of-the-art results in 30 out of 32 widely-used academic benchmarks employed in the development of large language models (LLM).

The highest-tier Gemini Ultra scored as high as 90.0%, becoming the first model to surpass human expert performance on MMLU (Massive Multitask Language Understanding) at scale.

How strong is its understanding capability? In a demonstration, Google presented two very simple hand-drawn car images and asked Gemini which one would run faster. Gemini replied, "The one on the right is faster because it is more aerodynamic."

Gemini Offers Three Versions, Including Mobile Compatibility

To cater to diverse usage environments ranging from data center-level to mobile devices, Gemini has launched three versions:

  • Ultra: The largest and most powerful model suitable for highly complex tasks. Google is conducting a series of security tests and plans to release a limited trial version to enterprise customers and developers, with the official version expected to launch next year.
  • Pro: The best model that can be scaled for various tasks, already in use in the English chatbot Bard.
  • Nano: The most efficient device-side task model, prepared for use on the Pixel 8 Pro smartphone.

Gemini Will Be Fully Integrated Across Google's Services

Google's AI chatbot Bard has begun using a refined version of Gemini Pro for advanced reasoning, planning, and understanding. This marks the biggest upgrade to Bard since its launch. It will offer an English version in over 170 countries and regions, with plans to expand to different modes and support new languages and locations in the near future.

Google is also introducing Gemini to Pixel. The Pixel 8 Pro is the first smartphone to run Gemini Nano, supporting new features like Summarize in the Recorder app, which allows users to organize meeting summaries from recorded audio without internet connection. Starting from WhatsApp, Smart Reply in Gboard is being rolled out, with more applications scheduled for release next year.

In the coming months, Gemini will feature in more products and services, including Search, Ads, Chrome, and Duet AI.

Google and Alphabet CEO Sundar Pichai stated:

This is our most powerful and versatile model to date, and I am genuinely excited about the future and the opportunities Gemini will bring to people around the world.

More powerful use cases of Gemini: Google introduces the strongest AI Gemini, suitable for consultants, tutors, and assistants.