Learning from noisy labels
Research Unit 3, SCIoI Project 44
All intelligent systems that learn by example must be able to deal with incorrectly provided examples, i.e. data points for which the label is incorrect. Consider a child (or robotic agent) pointing at a potted plant to inquire its name – but the parent mistakenly believes the child is pointing at the window behind the plant, and thus provides the wrong name. Ideally, the child should notice the misunderstanding in this situation and conclude that the provided label actually belongs to a different type of object.
Learning from noisy labels is an important research area in machine learning that distinguishes between instance-independent and instance-dependent noise. For the former, a multitude of strategies have been proposed including noise adaptation layers, loss correction , or reweighting, whereas the latter is comparatively less researched despite its higher relevance for real-world applications. In particular, existing strategies build on complex multi-model architectures to counter confirmation bias and semantic drift, and thus do not scale to applications such as online learning.
With this project, we aim to devise strategies for intelligent systems to deal with such incorrectly labeled data points while maintaining a single model of the world. This entails (1) identifying possibly incorrect data points based on the model’s current understanding of the world, (2) learning to ignore incorrectly labeled data points, and (3) potentially overwriting incorrectly labeled data points with pseudo-labels and using these as supervision. We argue that such a mechanism will enable model learning in the presence of noisy data, and more effectively scale to realistic scenarios that involve large amounts of streaming and partially labeled data.