Science of Intelligence at ICCV 2025: How event cameras are teaching machines to see motion like humans
This week, SCIoI PI Guillermo Gallego`s Robotic Interactive Perception (RIP) Lab attended the International Conference on Computer Vision (ICCV) in Honolulu, Hawai’i to showcase work that redefines how machines interpret visual information: not as a static series of frames, but as a continuous flow of change. Shuang Guo and SCIoI member Friedhelm Hamann presented a highlight paper titled Unsupervised Joint Learning of Optical Flow and Intensity with Event Cameras. It tackles a challenge in computer vision: how to teach neural networks to extract both motion and appearance from a type of camera that doesn’t record images at all. Shintaro Shiba contributed with a second paper titled Simultaneous Motion And Noise Estimation with Event Cameras about improving motion estimation by identifying the noise in the same type of cameras.
A new way of seeing
Event cameras, inspired by biological vision, work differently from conventional ones. Instead of capturing full images at fixed intervals, they continuously record changes in brightness at each pixel, so-called tiny “events” triggered whenever light intensity changes. The result is an extraordinarily fast and efficient stream of information that can operate in extreme conditions: dazzling sunlight, near-darkness, or high-speed motion that would blur any normal sensor.
But this efficiency comes at a cost. Because event data doesn’t contain traditional images, it’s difficult to reconstruct what a scene actually looks like while simultaneously tracking how it moves. The new study offers a solution: a single deep neural network that learns both tasks at once, without supervision, by combining two mathematical frameworks: contrast maximization and a newly derived event-based photometric error. This approach allows the network to infer motion and appearance in a way that naturally reflects how the sensor perceives the world.
The results are remarkable: the model achieves state-of-the-art accuracy among all unsupervised methods, improving motion estimates by up to 20 percent, while delivering sharp, high-dynamic-range reconstructed images in milliseconds. Beyond robotics, the method could inform the next generation of smartphone and automotive cameras, enabling clear imaging even under rapid movement or extreme lighting.
From theory to community
Besides the highlight paper presentation, Friedhelm was invited to speak at the Neuromorphic Vision Workshop at ICCV, where he presented data-scaling strategies for event cameras, exploring how such biologically inspired sensors can be trained at scale for more complex cognitive tasks. This is also the main topic of his upcoming PhD dissertation. The workshop, bringing together academic and industrial leaders, reflects a growing interest in neuromorphic vision as a cornerstone of next-generation AI systems, ones that are not only faster and more efficient, but also more intelligent in how they process visual change.
Finally, Friedhelm also co-led a community effort: the Spatio-temporal Instance Segmentation (SIS) Challenge, based on the MouseSIS dataset. Organized earlier this year and now continued as an open benchmark, the challenge invited teams worldwide to develop algorithms capable of identifying and tracking individual objects over time, using either event-only or combined event-and-frame data. The dataset used in the challenge was recorded within SCIoI in collaboration with the team of researchers Lars Lewejohann and Paul Mieske.
The intelligence behind perception
From theoretical insights to open scientific infrastructure, Friedhelm’s activities at ICCV show the spirit of Science of Intelligence: understanding intelligence by building it. The joint learning framework for event data captures how to make cameras faster, and also how to make perception itself more adaptive and robust, which is an essential step toward artificial systems that see and act in the world as flexibly as living beings do.




