Jennifer Groh (Duke University)
Hearing works in concert with vision, such as when we watch someone’s lips move to help us understand what they are saying. But bridging between these two senses poses computational challenges for the brain. One such challenge involves movements of the eyes – every time the eyes move with respect to the head, the relationship between visual spatial input (the retina) and auditory spatial input (sound localization cues anchored to the head) changes. I will describe this problem from early computational and experimental work showing how and where signals regarding eye movements are incorporated into auditory processing, closing with a recent discovery from our group that a signal regarding eye movements is sent by the brain to the ears themselves. This signal casues the eardrum to oscillate in conjunction with eye movements (Gruters et al PNAS 2018) and carries detailed spatial information about the direction and amplitude of the eye movement (Lovich et al PNAS 2023). I will also present new findings concerning the underlying mechanism of this effect, involving the contributions of the middle ear muscles and outer hair cells, and the potential impact on sound transduction.
Kristen Grauman (University of Texas), “Audio-visual learning in 3D environments”
Perception systems that can both see and hear have great potential to unlock problems in video understanding, augmented reality, and embodied AI. I will present our recent work in egocentric audio-visual (AV) perception. First, we explore how audio’s spatial signals can augment visual understanding of 3D environments. This includes ideas for self-supervised feature learning from echoes, AV floorplan reconstruction, and active source separation, where an agent intelligently moves to hear things better in a busy environment. Throughout this line of work, we leverage our open-source SoundSpaces platform, which allows state-of-the-art rendering of highly realistic audio in real-world scanned environments. Next, building on these spatial AV and scene acoustics ideas, we introduce new ways to enhance the audio stream – making it possible to transport a sound to a new physical environment observed in a photo, or to dereverberate speech so it is intelligible for machine and human ears alike.
This talk is part of course Olga Shurygina‘s course “Active Sensing,” a seminar on cutting-edge research on active sensory perception in humans and other mammals and realted advances in artificial agents’ abilities such as seeing, grasping, and navigating in space.
Photo created with DALL-E by Maria Ott.