Can Text or Images Create Games? Google Launches Generative Interactive Environment AI Model "Genie"

share
Can Text or Images Create Games? Google Launches Generative Interactive Environment AI Model "Genie"

Google DeepMind recently introduced a generative interactive environment AI model called Genie, which can generate interactive animated games based on text or image prompts without the need for prior training on game mechanics and operations.

Google DeepMind Launches Generative Interactive Environment Tool "Genie"

Google DeepMind, an artificial intelligence company acquired by Google in 2014, introduced a generative interactive environment AI model called "Genie" in a paper submitted on the 23rd. This model can create controllable interactive virtual environments based solely on text, images, or sketches.

The paper states that Genie was trained using a large amount of publicly available internet videos, rather than relying on data from specific games or scenarios, making it more widely applicable in fields such as game development and creative entertainment:

As a novel generative AI, we have introduced the generative interactive environment "Genie", which can generate interactive and playable environments based on a single image prompt.

What is Genie?

Multi-model Architecture

The paper reveals that Genie, as a foundational world model, is comprised of 11 billion parameters set by the Spatiotemporal video tokenizer, Autoregressive dynamics model, and Scalable latent action model.

Genie paper content

Therefore, it can autonomously train from 2D platform games and robotics videos on the internet without explicit instructions in an unsupervised manner. It can also generate controllable and interactive virtual environments based on external images provided, including real-world photos or sketches.

Learning to Reproduce Actions and Identify Controllable Parts

Genie's unique capability lies in its ability to learn and reproduce the control content of game characters from internet videos, even without labels about the actions being performed. It can infer consistent or multiple latent actions from the generated environment.

Genie learns and identifies controllable parts through action reproduction

Furthermore, Genie can also learn and identify which parts of actions are controllable, generating interactive scenarios.

Synthesizing Game from Real or Synthetic Images

Additionally, Genie can create a complete new interactive environment with just one image. It first uses the Imagen 2 model for text-to-image generation to produce keyframes, and then adds dynamic effects to the images through Genie.

Genie generates interactive animated environments from synthesized images

Moreover, Genie can also accept unseen image prompts, including real-world photos or simple sketches, allowing people to interact with previously immovable real objects.

Genie generates interactive animated environments from real photos and drawn sketches

A blog post mentioned:

Genie's capabilities allow anyone, even children, to create and enter controllable simulated environments or interactive generative worlds.

The post concludes with the broader goal of the Genie product:

Genie's applications are not limited to entertainment or creative development; it can also serve as an excellent testing platform for training intelligent agents, driving advancements in the field of AI.

Intelligent agents refer to autonomous entities that can observe their surroundings and take actions to achieve goals, a core concept and important goal in current AI research.

Google, OpenAI Engage in Intense Competition

In recent months, Google has released several generative AI models and products, including the powerful AI advisor "Gemini," the text-to-video generation tool "Lumiere," and the keyword image generation tool "ImageFX," all of which have captured public attention.

On the other hand, OpenAI's text-to-video tool Sora, as the first video generation product, also sparked an AI frenzy a few weeks ago.

Creating movies with just text! Why OpenAI's Sora is a leap forward for AI video generation

However, recent controversies surrounding Gemini's image generation involving racial biases have led to a more than 4% single-day drop in the stock price of its parent company Alphabet.

Demis Hassabis, head of research at Google DeepMind, stated at the MWC Barcelona 2024 yesterday:

We have disabled that feature of Gemini and will fix the issue and restore it in the coming weeks.