Can robots understand the real world? SCIoI researchers featured in Germany’s ZEIT WISSEN podcast on the future of humanoid AI

A robot folds laundry in a viral video. Another one walks across a stage at the White House. Online, it suddenly feels as if humanoid robots are everywhere. But what can these machines actually understand about the world around them? And how far are we really from robots that can help us in everyday life?

In the new ZEIT WISSEN podcast episode “Hey Robot, Make the Beds and Clean the Apartment!” („Hey Roboter, mach die Betten und räum die Wohnung auf!“), researchers from Science of Intelligence (SCIoI) offer a look into the reality behind today’s AI and robotics boom and into the difficult path toward machines that can truly navigate the real world. For the episode, journalist Max Rauner visited the labs of SCIoI, speaking with researchers across robotics, AI, psychology and philosophy about one of the biggest technological promises of our time: intelligent humanoid robots for everyday life.

Listen to the German-language podcast here:
ZEIT WISSEN Podcast Episode „Hey Roboter, mach die Betten und räum die Wohnung auf!“

ZEIT WISSEN is one of Germany’s most prominent science media platforms and part of the weekly newspaper DIE ZEIT. Its podcast “How Do You Know That?” is highly regarded for its long-form reporting and immersive storytelling. It reaches a large audience far beyond academia, bringing complex scientific questions into public conversation widely regarded as one of the leading German-language science podcasts and regularly features cutting-edge research topics from across the sciences.

From vacuum robots to humanoid helpers

The episode opens with a familiar situation: the quiet helper already living in many homes, the vacuum robot. But from there, the discussion quickly moves toward a bigger question currently driving global robotics research. Could large language models eventually become the brains of humanoid household robots? Are we approaching a “ChatGPT moment” for robotics?

Inside SCIoI’s labs, the answer becomes far more complicated and much more interesting.

The journalists meet Oliver Brock, Oussama Zenkri, Alexander Koenig and Vito Mengers, who guide them through robotic test scenarios involving lockboxes, mechanical puzzles and practical challenges. Some of the robots use chatbot systems to help interpret tasks and generate possible actions. But the experiments reveal a gap between fluent language and real-world understanding.

Why robots still get stuck

In one demonstration, Oussama Zenkri shows how he uses large language models in embodied problem-solving tasks involving lockboxes. He tested how the systems behave when given different kinds of information about the world, from raw camera images to ground-truth symbolic descriptions that should, in theory, be easier for AI models to reason with. With visual information, the models often misunderstood the outcomes of their own actions, sometimes repeatedly claiming success while failing the task. Surprisingly, they performed better with messy visual input than with ground-truth information. The experiments revealed how fragile current AI systems’ perception and reasoning remain when faced with real-world problems. Oussama also observed that these models showed little ability to learn from experience, unlike human participants solving the same lockboxes, who continuously improved through repeated interaction. This highlights the gap between fluent language generation and genuine problem-solving skills.

©SCIoI/Kevin Fuchs

Intelligence is more than words

As Oliver Brock explains in the podcast, today’s AI systems are undeniably impressive when it comes to text and speech. But solving practical tasks in the physical world is something entirely different. A chatbot may explain how to change an air filter in a car, but it cannot actually feel resistance, estimate weight, notice texture or improvise when reality does not behave exactly as expected.

For Oliver, this difference points toward a deeper misunderstanding of intelligence itself. Animals solve many everyday problems with comparatively little computational effort because their intelligence is inseparable from their bodies and sensory experience. A cockatoo manipulating objects does not calculate every movement symbolically. It learns through physical interaction with the world.

This idea of embodied intelligence runs through the entire episode.

Learning the world like a child

SCIoI principal investigator Verena Hafner describes how her lab studies robot learning inspired by infant development. Human babies spend countless hours exploring their surroundings while lying on a play mat, slowly discovering how their bodies relate to the world around them. At first, they miss objects, move awkwardly or accidentally bring their hands toward their faces instead of the toy they are trying to grasp. But through endless experimentation, they gradually learn coordination, movement and prediction.

Verena’s team transferred this developmental principle to humanoid robots. In their experiments, robots initially move their limbs seemingly aimlessly while observing themselves through cameras. The software continuously predicts what the robot expects to see and compares this prediction with reality. If both match, the system receives positive feedback. Over time, this exploratory behavior allows the robot to develop an internal understanding of its own movements and body structure. Eventually, the robot becomes capable of intentionally positioning its limbs because it has learned the relationships between movement, vision and physical space through experience.

©Elliot Walker

The hand thinks too

Still, understanding the physical world requires more than abstract reasoning alone.

That becomes especially visible in the work surrounding the soft robotic RBO Hand presented by Alexander Koenig and Oliver Brock. Humans effortlessly pick up a phone, notice it is upside down and rotate it correctly within seconds. For robots, tasks like these remain enormously difficult, a classic example of Moravec’s paradox: what feels trivial to humans is often extraordinarily hard for machines.

The RBO Hand approaches the problem differently. Because the hand itself is soft and adaptive, intelligent behavior partly emerges directly from its physical structure. The robot does not need to calculate every tiny motion centrally. The material properties and sensory feedback of the hand itself help stabilize and guide movement. In this sense, part of the “knowledge” needed to manipulate objects is already built into the body.

“In order to be intelligent, you need a body,” Brock says during the episode. The body itself becomes the training ground for intelligent behavior.

©SCIoI/Kevin Fuchs

At another point, he remarks with dry humor: “At the moment, we are still more like the robots of the large language models.” Humans ask ChatGPT how to repair something, then go and physically perform the task themselves.

Beyond the robot lab

Alongside the robotics demonstrations, the journalist also spoke with researchers Anna Lange and Helene Ackermann, whose work explores how humans and robots develop mutual understanding during teaching and learning interactions. Their research investigates how artificial agents can adapt to human learners, respond to individual differences and build trust in social interaction settings. Rauner also met SCIoI principal investigator Dimitri Coelho Mollo, who contributes the philosophical perspective on intelligence, embodiment and artificial agents.

The result is a grounded exploration of what robots can already do, where they continue to struggle and why the path toward real-world intelligence may look very different from the futuristic visions often dominating public debate.

Title image ©SCIoI/Kevin Fuchs


Research

An overview of our scientific work

See our Research Projects