ChatGPT introduces voice conversation and image upload features, making it adept at tasks like car repairs and reading reports!

2023/9/26

OpenAI announced yesterday on the 25th that its AI language model, ChatGPT, is set to introduce new voice and image capabilities. Users will be able to describe their problems through voice conversations and upload and doodle on images to highlight key points and help ChatGPT understand the questions they want to ask. This feature will be rolled out in the next two weeks on the paid Plus and Enterprise versions.

Table of Contents

OpenAI Introduces Voice and Image Functionality

OpenAI has announced the addition of voice and image search capabilities to its AI language model, ChatGPT, making the product, which was originally based on text conversations, more powerful and interactive.

This move is seen as part of the "generative AI" war among global tech giants, including Google's chatbot Bard and Apple's upcoming Apple GPT.

ChatGPT Storytelling

First, ChatGPT combines its own large language model (LLM) and voice assistant technology, allowing users to engage in simple verbal conversations and ask questions without having to type, saving time and enhancing efficiency.

As an example mentioned in the press release, users can verbally request ChatGPT to create a bedtime story, providing some voice prompts to guide its description, with ChatGPT able to respond in up to five different voices selected by the user.

OpenAI adds:

This new voice technology will be able to synthesize highly realistic voices from a few seconds of human speech, opening doors for many creative applications.

Image Upload for Questions

In terms of image functionality, users will be able to take photos and upload them to ask ChatGPT to explain what it is, its functions, or how to use it.

Additionally, users can highlight key areas by doodling when necessary to emphasize and help ChatGPT better understand their questions.

The press release also mentions that ChatGPT can assist users in identifying reasons for bike damage, checking fridge contents to plan today's menu, and even analyzing complex chart data for work purposes.

How Can Users Access the Features?

Reportedly, the voice functionality will initially be available on the ChatGPT mobile app for Android and iOS systems, while image search will be accessible on all platforms.

Users can go to the "Settings" menu in the app, then proceed to "New Features" and choose to enable voice conversations to start using it.

The aforementioned features will first be rolled out to Plus and Enterprise users within the next two weeks and will gradually be made available to other users and developers.

Concerns and Risks

Regarding concerns and risks in usage, the press release also mentions that ChatGPT has its limitations, advising users not to use the product for research or professional technical applications. Additionally, users are cautioned against following its instructions for high-risk activities without thorough verification.

Furthermore, concerning the voice functionality, OpenAI states:

This also brings new risks, including criminals impersonating public figures for fraudulent activities.

However, prior to this, products in the GPT 3.5 and 4 series faced skepticism from a research paper circulating on major social platforms, pointing out a rapid decline in functionality and quality post the June update, including precision or accuracy of responses, which has been noted by users.

APPLICATION DEVELOPMENT

OpenAI Introduces Voice and Image Functionality

ChatGPT Storytelling

Image Upload for Questions

How Can Users Access the Features?

Concerns and Risks

Related