APPLICATION

OpenAI's latest model GPT-4o: Is the era of "ultra-realistic chat" and humans falling in love with robots upon us?

2024/5/14

Yesterday's OpenAI press conference introduced the new language model GPT-4o, which can take in user input such as text, voice, images, laughter, and emotions to provide users with a more human-like chatting environment.

Table of Contents

The Real Chatbot GPT-4o

Advantages of the GPT-4o Model

According to the team, GPT-4o will move towards more natural human-machine interactions, accepting any combination of text, audio, and images as input and generating any combination of text, audio, and images as output. Compared to existing models, GPT-4o is more precise and faster in understanding visual and audio information.

GPT-4o performs similarly to GPT-4 Turbo in English text and code, with an average response speed of 320 milliseconds, similar to the interval between human conversations. The average delay for GPT-3.5 was 2.8 seconds, and for GPT-4, it was 5.4 seconds.

But what do these represent?

Can Serve as a Real-time Chatbot

The GPT-4o model can achieve more authentic interactions by analyzing speech and real-time images, meaning users only need to open their phone camera or converse directly to start.

For example, real-time translation, singing birthday songs, acting as a customized language tutor, analyzing the surrounding environment, understanding human jokes and displaying happy emotions and laughter, or grasping the sarcastic undertones behind language.

The GPT-4o model has trained a new model end-to-end across text, visual, and audio, automatically inputting users' expressions, laughter, and environment in addition to users' primary voice or text input, making responses more authentic and precise. If a user interrupts its speech, GPT-4o also knows how to react.

The 'o' in GPT-4o stands for omni, meaning all-encompassing. The team hopes to provide a model that can respond to anything for users, rather than just text input or single-dimensional questions.

Aiming to Open to All Users for Free

Currently, GPT-4o is available to paid users, but it seems that only text and voice inputs are open, with the official real-time image input still pending. OpenAI's goal is to open up to all users for free.

From the current user experience, many of the functionalities mentioned by the team are still not fully developed, including the poor performance of listening to Chinese jokes, hollow real chat content, and relatively slow actual response speed, looking forward to further updates from the team.

Continuous Competition Between OpenAI and Google

OpenAI chose to release new products before the Google I/O developer conference, indicating strong competition. Previously, both ChatGPT and Gemini models from both sides were rumored to potentially collaborate with Apple to introduce iOS 18.

Rumors suggest Apple may collaborate with OpenAI to introduce ChatGPT into iOS 18