SCIoI‘s David Mezey‘s work on the front page of Nature Portfolio’s npj Robotics

By: Maria Ott

Paper DOI →

In a sunlit robotics lab at Science of Intelligence (SCIoI) in Berlin, ten little robots are moving with quiet purpose (but not-so-quiet buzzing) across the floor. There’s no central computer telling them where to go. They don’t talk to each other. They don’t even know where they or their neighbors are located in space. And yet, they move as one. This study advances a long-term goal shared by biologists, engineers, and roboticists: developing a robotic swarm that, like a flock of birds or school of fish, is decentralized, local, and fully autonomous using only local vision.

In this experimental set-up, the robots orchestrate their movement using only their “eyes” (i.e., cameras). It has recently been published under the title “Purely vision-based collective movement of robots”, led by SCIoI researcher David Mezey, along with Renaud Bastien, Yating Zheng, Neal McKee, David Stoll, Heiko Hamann and Pawel Romanczuk. The study now has been featured on the front page of Nature Portfolio’s npj Robotics, a journal at the forefront of the field.

See more? Say less!

Unlike most robotic swarms, which rely on GPS signals, central servers, or wireless communication to coordinate their movement, the method used in David’s work lets them operate solely on what they see through their individual cameras. They don’t need to exchange data. They don’t even “know” how fast or in which direction their peers or they are moving. All they need is the silhouette of another robot passing through their visual field of view. “It’s a design inspired by fish schools and bird flocks, where vision is the primary modality shaping collective motion,” as David explains. “Fish don’t have GPS, and birds can’t send their exact positions to each other either. They perceive the world, and they respond to it. That’s exactly what we wanted our robots to do, too.”

Using a purely vision-based model first proposed in theoretical biology by some of the team members, David and his team developed a minimalist algorithm that turns camera frames into simple motion control commands. No maps of the environment, no memory, no orientation or position detection, no distance estimation, no tags or markers—just raw visual input, processed onboard, and turned into physical movement. As a result, when one robot sees another, it adjusts its own speed and direction reflexively.

The result is a cohesive, smooth group motion, complete with turns, regrouping events, and formations similar to real animal collectives.

A change of direction in swarm design

For years, scientists have imagined teams of small, autonomous robots helping out in places where humans can’t easily go, supporting environmental research, aiding in search-and-rescue missions, or assisting in the exploration of remote or dangerous terrain. But most existing robot swarms rely on infrastructures that can easily become fragile: they depend on constant communication, centralized coordination, or a steady stream of external data. These dependencies can be a great deal of liability, especially in unpredictable environments. A broken link, a blocked signal, or missing information can bring entire systems to a halt.

David’s approach offers an alternative: “By removing communication and central control, we remove single points of failure. And that makes the whole system more resilient. This is similar to how overcoming the challenges of orchestrating movement using local information only, animal collectives become extremely robust to perturbations.”

This resilience coming from decentralized, vision-based control, was tested across more than 30 hours of real-world experiments in a large, enclosed arena. Despite the limitations of a camera with only a 175-degree view (meaning the robots have large “blind spots”) the swarm adapted, flowed, and recovered from occasional fragmentation. If one robot lost sight of the group, it could find its way back later on using only its camera feed.

This behavior, known as “fission-fusion,” is common in natural animal groups but has rarely been achieved in robots, especially without communication or memory:

Simple rules, complex results

The elegance of the system as shown in this work, lies in its simplicity. Each robot perceives the world as a simple, 1-dimensional band of ones and zeros around them. A “blob” of ones means another robot is visible in that direction, and zeros mean empty space. Based on how much of its field of vision is occupied, and where, the robot adjusts its speed and direction. There’s no GPS, no knowledge of where the walls are, and no awareness of its own orientation. And yet, from this minimal information, complex group behavior can emerge. Such a mechanism is suggested in multiple animal species, such as fish, birds, and locusts, showing that relying primarily on simple retinal occupancy might be sufficient to move in groups with high precision.

The robots avoid collisions, stick together, and align their movement, similarly to natural collectives.

And because the entire control loop is run onboard each robot, the system is fully decentralized. There’s no need for a cloud server or even internet connection. The experiments in the paper demonstrated this by individually turning on or off robots in the swarm on the fly. New members could easily catch up with others, and removing robots did not perturb the state of the group either:

Why It Matters

In many real-world scenarios, like natural disaster response, environmental monitoring, or remote fieldwork, reliable communication networks or GPS systems aren’t always available. Traditional robotic systems, which often rely on centralized information sharing, processing or continuous data streams, can struggle under such conditions. This work’s vision-based approach offers a promising alternative. Because each robot makes decisions based only on what it sees, and without needing to talk to others or access a central plan, the system is naturally resilient and adaptable.

This research also provides a possible direction to create hybrid societies where animals and robots can coexist and interact in a shared space. Given the impossibility of direct communication between animal and robot in such cases, vision- and perception-based behavior might be key for practical applications, from wildlife rescue to mitigating the effects of global environmental change.

Beyond the practical benefits, the project raises deeper questions about how complex behavior can emerge from simple rules. It invites us to rethink what collective intelligence looks like: as something that arises primarily from and shaped mainly by simple local perception.

Research

An overview of our scientific work

See our Research Projects