

Google DeepMind has announced SIMA 2, the next-generation version of its Scalable Instructable Multiworld Agent. This updated model, built on Gemini technology, brings major improvements over the original SIMA launched in 2024. The new version can analyze its actions, reason through situations, and even communicate with users through a text interface.
At its core, SIMA 2 is still designed to operate in 3D open-world video games. But now it performs tasks more efficiently, adapts to unfamiliar environments, and learns new skills faster. The agent processes visual input from the game world alongside user instructions such as “build a shelter” or “find the red house” and then breaks them down into smaller steps which it executes using keyboard and mouse actions.
One of the standout upgrades is its ability to handle games it has never been trained on. When tested on new environments like Minedojo and the Viking survival game ASKA, SIMA 2 outperformed its predecessor. It also supports multimodal prompts, including sketches, emojis, and multiple languages. The model can even transfer concepts for example, taking what it learned about “mining” in one game and applying it as “harvesting” in another.
SIMA 2 is trained using a combination of human-provided demonstrations and auto-labeled data from Gemini. Whenever the agent learns new skills in fresh environments, that data is added back to the training set, reducing the need for manual human labeling.
Despite its improvements, SIMA 2 still has challenges. Long-term memory remains limited, complex long-horizon planning is difficult, and precise low-level motor control is outside its current scope.
DeepMind clarifies that SIMA 2 is not being built as a gaming assistant. Instead, its gaming skills are a stepping stone toward building advanced real-world robots that follow natural language instructions and perform multiple tasks autonomously.



.jpeg&w=3840&q=75)














Comments (0)
No comments yet
Be the first to comment!