01

Course Title

Active Sensing

 

Course Description

Sensing — whether it be through vision, touch, hearing, or smell — is an active process. It involves continuous sampling and processing of external information, as well as its selection and prioritization in space and time. In this course, we will (1) delve into cutting-edge research on active sensory perception in humans and other mammals, (2) explore how an active sensing approach advances artificial agents allowing robots to see, grasp, and navigate in space.

 

Course Organizer

Olga Shurygina

 

Course Format

Hybrid format: The course will take place in person, but students can also join remotely via Zoom. Sessions will be recorded (provided the speakers agree) for potential publication on SCIoI’s media channels.

 

Target Group

Tbd

 

Course Structure

This course explores active vision, touch, hearing in three distinct blocks. The course begins with an introductory session where students receive essential information about the course objectives, grading and key literature. Following the introduction, the three main blocks unfold monthly. Within each block, students engage in (1) a seminar designed like a journal club to discuss papers in preparation for meetings with invited speakers; and (2) talk sessions featuring live discussions with two invited speakers, encompassing both analytic and synthetic approaches. Speakers will present individually (60 min each, including discussion), followed by a 60 min workshop.

Course introduction session [90 minutes in total]: Introduction to active sensing

In the first session, we will provide the necessary background for research on active sensing and how it relates to the objectives of Science of Intelligence. We will define Active Sensing in robotics and cognitive neuroscience and compare the Active Sensing approach with traditional passive sensing perspectives. We will also cover organizational topics such as the schedule, requirements, and preparation for the upcoming seminars and address students’ questions.

 

Learning outcomes

  • Familiarize yourself with the goals of studying active sensing
  • Explain different modes of sampling and their constraints on sensing
  • Name what sampling methods exist for the different senses
  • Identify the challenges that an active sensing framework poses for studying sensory research in

humans/mammals and artificial agents

 

Block 1 [90 + 180 minutes; 270 minutes in total]: Active vision

Human vision is an active process: we constantly shift our gaze in order to sample the scenes. Even when fixating, the eyes are not still, and these microscopic eye movements play an important role in visual perception. In robotics, researchers enhance the performance of artificial visual systems by drawing inspiration from biological mechanisms. For instance, they establish algorithms that leverage fixational movements to estimate depth and distance between the objects with a monocular camera (Duran & del Pobil, 2020; Battaje & Brock, 2022). In the first block, we will explore the latest research on active vision in both humans and robots.

Preparation seminar with discussion of the relevant papers [90 minutes]

During this session, students will present and discuss two papers on active vision recommended by the invited speakers. We’ve listed potential papers here as an example, they may be modified by our guests based on the focus of their talks.

Analytic approach:

  • Mostofi, N., Zhao, Z., Intoy, J., Boi, M., Victor, J. D., & Rucci, M. (2020). Spatiotemporal content of saccade transients. Current Biology, 30(20), 3999-4008.
  • Shelchkova, N., Tang, C., & Poletti, M. (2019). Task-driven visual exploration at the foveal scale. Proceedings of the National Academy of Sciences, 116(12), 5811-5818.
  • Matthis, J. S., Yates, J. L., & Hayhoe, M. M. (2018). Gaze and the control of foot placement when walking in natural terrain. Current Biology, 28(8), 1224-1233.

Synthetic approach:

  • Antonelli, M., Rucci, M., & Shi, B. (2016, October). Unsupervised learning of depth during coordinated head/eye movements. In 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (pp. 5199-5204). IEEE.
  • Zhang, Z., Yezzi, A., & Gallego, G. (2021). Formulating event-based image reconstruction as a linear inverse problem with deep regularization using optical flow. arXiv preprint arXiv:2112.06242.
  • Battaje, A., & Brock, O. (2022). One Object at a Time: Accurate and Robust Structure from Motion for Robots. 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 3598–3603.

Block seminar with two invited speakers from analytic and synthetic approach to active vision [60 + 60 + 60 minutes] organized the week after the Preparation seminar.

Potential speakers:

  • Analytic: Martina Poletti (University of Rochester)
  • Synthetic: Michele Rucci (University of Rochester)

 

Block 2 [90 + 90 + 90 minutes; 270 minutes in total]: Active touch

Animals gather a wealth of information about the external world through touch. Common examples of active touch are finger movements during object manipulation and whisking in rodents. Both of these examples involve motor behavior that enables an optimal sampling of information about an object or area of interest. In robotics, touch sensors play a crucial role in enabling machines to interact with the environment. To boost the precision and speed of these interactions, numerous researchers take cues from how active touch works in biological systems (e.g., distributed sensing, soft flexible structures, feedback mechanisms). In this block we will gather insights from analytic and synthetic approaches on active touch and explore how biological sciences and engineering of intelligent systems can mutually benefit from incorporating each other’s findings.

 

Preparation seminar with discussion of the relevant papers [90 minutes]

In this session, students will present and discuss two papers on active touch. The papers will be recommended by the invited speakers. Here, we list exemplar papers related to active touch.

Analytic approach:

  • Brecht, M., Naumann, R., Anjum, F., Wolfe, J., Munz, M., Mende, C., & Roth-Alpermann, C. (2011). The neurobiology of Etruscan shrew active touch. Philosophical Transactions of the Royal Society B: Biological Sciences, 366(1581), 3026-3036.
  • Wallach, A., Deutsch, D., Oram, T. B., & Ahissar, E. (2020). Predictive whisker kinematics reveal context-dependent sensorimotor strategies. PLoS biology, 18(5), e3000571.

Synthetic approach:

  • Lepora, N. F., Church, A., De Kerckhove, C., Hadsell, R., & Lloyd, J. (2019). From pixels to percepts: Highly robust edge perception and contour following using deep learning and an optical biomimetic tactile sensor. IEEE Robotics and Automation Letters, 4(2), 2101-2107.
  • Pacchierotti, C., Prattichizzo, D., & Kuchenbecker, K. J. (2015). Cutaneous feedback of fingertip deformation and vibration for palpation in robotic surgery. IEEE Transactions on Biomedical Engineering, 63(2), 278-287.

 

Block seminar with two invited speakers from analytic and synthetic approach to active touch [60 + 60 + 60 minutes]. Organized in a week after the Preparation seminar.

Potential speakers:

  • Analytic: Michael Brecht (Humboldt-Universität zu Berlin)
  • Synthetic: Katherine J. Kuchenbecker (MPI for Intelligent Systems Stuttgart)

Block 3 [90 + 180 minutes; 270 minutes in total]: Active hearing

Hearing serves multiple functions including communication, sound localization and scene understanding. In audition, the role of movement may be less obvious, but echolocation and active sound localization are testament that mammals purposefully shape their auditory input. For instance, auditory sensor movements (e.g., the ears in mammals and humans) relative to the environment actively restructures the auditory input and disambiguates the localization of auditory events. In this block, we will discuss how the auditory system is integrated with other sensory modalities and learn how a biologically inspired active hearing approach holds immense promise for robotic systems, significantly enhancing both locomotion and navigation.

Seminar with discussion of the relevant papers [90 minutes]

In this session, students will present and discuss two papers on active touch. The papers will be recommended by the invited speakers. Here, we list exemplar papers related to active touch.

Analytic approach:

  • Caruso, V. C., Pages, D. S., Sommer, M. A., & Groh, J. M. (2021). Compensating for a shifting world: evolving reference frames of visual and auditory signals across three multimodal brain areas. Journal of neurophysiology, 126(1), 82-94.
  • Woods, K. J., & McDermott, J. H. (2015). Attentive tracking of sound sources. Current Biology, 25(17), 2238-2246.

Synthetic approach:

  • Chen, C., Jain, U., Schissler, C., Gari, S. V. A., Al-Halah, Z., Ithapu, V. K., … & Grauman, K. (2020). Soundspaces: Audio-visual navigation in 3d environments. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part VI 16 (pp. 17-36). Springer International Publishing.
  • Majumder, S., Al-Halah, Z., & Grauman, K. (2021). Move2hear: Active audio-visual source separation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 275-285).

 

Block seminar with two invited speakers from analytic and synthetic approach to active touch [60 + 60 + 60 minutes]. Organized in a week after the Preparation seminar.

Potential speakers:

  • Analytic: Jennifer M. Groh (Duke University)
  • Synthetic: Kristen Grauman (University of Texas at Austin)

 

Course closing session [90 minutes in total]: Discussion of essays & Feedback

Students will submit essays within two weeks after Block 3. In a final session, at the end of the semester, they will read each other’s essays in groups (one for each block), discuss them, and present a synthesis of their respective work to the rest of the course. The instructor will provide systematic feedback to each group. Finally, students will be given the chance to give feedback on the course.