SCIoI joining CVPR 2024 with one main conference and one workshop paper

A paper about the potential of event cameras to revolutionize the field of animal behavior studies by SCIoI members Friedhelm Hamann, Alex Kacelnik, Guillermo Gallego and collaborators will be presented at the main IEEE/CVF Conference on Computer Vision and Pattern Recognition 2024 (CVPR). Another SCIoI paper by member Marah Halawa and Florian Blume et al. has been accepted at the CVPR Workshop “”6th Workshop and Competition on Affective Behavior Analysis in-the-wild (ABAW)”.

Main Conference Paper „Low-power, continuous remote behavioral localization with event cameras“ (Friedhelm Hamann et al.)


Studying animal behavior in the wild is important for many reasons, from understanding how animals interact with each other to figuring out how they find food. Traditionally, scientists have relied on camera traps that take pictures or videos at regular intervals. However, these methods have limitations. For instance, they may not capture fast movements or use too much battery power to be practical for long-term studies.

The paper describes a new approach to studying animal behavior using event cameras. Event cameras are different from conventional cameras in that they don’t record entire images at a fixed rate. Instead, they only record changes in brightness with microsecond resolution. This makes them well-suited for capturing motion while using less power.

The researchers used event cameras to study a behavior in chinstrap penguins called the ecstatic display (ED). During an ED, a penguin will stand up, point its head up, flap its wings, and make a loud call. The reason penguins perform this behavior is not well understood, but the researchers believe that studying it in more detail could help us learn more about these birds.

The scientists set up event cameras to observe a colony of chinstrap penguins in Antarctica for a month. They developed a new method to analyze the event camera data to identify instances of EDs. Their method involves two steps: first, it proposes possible start and end times for the behavior, and then it classifies those time intervals as either an ED or not an ED.

The team found that their system was effective at identifying EDs, even in challenging conditions like nighttime or snow. They are making their data and method publicly available so that other scientists can use them to study animal behavior.

This new approach to studying animal behavior using event cameras has the potential to revolutionize the field. “Event cameras are more energy-efficient than traditional cameras, which means they can be used for longer-term studies. They are also better at capturing fast movements, which could allow scientists to study behaviors that have been difficult to study in the past” says Friedhelm Hamann.

Overall, this research is a significant step forward in the field of animal behavior study. By using event cameras, scientists can now study animals in more detail and for longer periods than before. This could lead to discoveries about how animals behave and interact with their environment.

Workshop Paper “Multi-Task Multi-Modal Self-Supervised Learning for Facial Expression Recognition” (Marah Halawa, Florian Blume et al.)


Human communication is multi-modal; e.g., face-to-face interaction involves auditory signals (speech) and visual signals (face movements and hand gestures). Hence, it is essential to exploit multiple modalities when designing machine learning-based facial expression recognition systems. In addition, given the ever-growing quantities of video data that capture human facial expressions, such systems should utilize raw unlabeled videos without requiring expensive annotations.

Therefore, in this paper, the researchers employ a multitask multi-modal self-supervised learning method for facial expression recognition from in-the-wild video data. The model combines three self-supervised objective functions: First, a multi-modal contrastive loss, that pulls diverse data modalities of the same video together in the representation space. Second, a multi-modal clustering loss that preserves the semantic structure of input data in the representation space. Finally, a multi-modal data reconstruction loss.

The team conducts a comprehensive study on this multimodal multi-task self-supervised learning method on three facial expression recognition benchmarks. To that end, the performance of learning is examined through different combinations of self-supervised tasks on the facial expression recognition downstream task.

The results generally show that multi-modal self-supervision tasks offer large performance gains for challenging tasks such as facial expression recognition, while also reducing the amount of manual annotations required. The scientists release the pre-trained models as well as the source code publicly.

The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) in Seattle, USA, is a leading conference in the field of computer vision. It’s where top researchers, innovators, and game-changers from academia, industry, and government come together to showcase their latest breakthroughs, exchange ideas, and push the boundaries of what’s possible. This annual conference covers everything from image and video analysis to object recognition, 3D computer vision, and beyond. CVPR is held annually and is an important platform for advancing the field and promoting collaboration among researchers and industry. This year, it will take place on 17 – 21 June at the Seattle Convention Center.



An overview of our scientific work

See our Research Projects