Efficient Multi-task Deep Learning

Principal Investigators:

Klaus Obermayer

Team members:

Heiner Spieß (Doctoral researcher)

Developing deep-learning methods

Research Unit 3, SCIoI Project 15

Deep learning excels in constructing hierarchical representation from raw data for robustly solving machine learning tasks – provided that data is sufficient. It is a common practice to transfer learned models to new datasets and fine-tune their representation to a new task for which data is scarce. This suggests that a specialized representation can be modified to perform other tasks, and raises questions about whether general-purpose deep representations exist.

  1. Can one find representations that are optimal for multiple objectives (Yao et al. 2012, Eigen et al. 2015, Kokkinos 2017)?
  2. Does deep learning even construct representations where multi-optimal solutions outperform specialized representations, thus mitigating the need for large datasets before adding new tasks?
  3. Can the presence of multiple tasks be used to isolate portions of the representation, assign roles to them, and identify conditions under which the network makes decisions?

Our objective is to develop deep learning methods for learning general-purpose representations in the visual domain by training multi-task networks with the help of transfer learning techniques. We will first develop a measure of “transferability” between task pairs which can be used to guide the hyper-parameter exploration in multi-task networks.
Combining this measurement with the the results of multi-task representations enables us to identify the advantages and limitations of multi-task learning in order to answer the first and third questions above. Second, we will develop sequential training approaches (“life-long learning”) that enables a single-task network to acquire additional tasks with only minor drop in performance. Third, we will investigate whether unsupervised cues and noisy (“weak”) labels support the learning process. Incorporating this source of large data enables us to answer the second question.

 

Related Publications