Abstract:
The opacity of datasets poses a significant challenge to creating inclusive and intelligible machine learning (ML) systems. Various AI ethics initiatives have addressed this issue by proposing standardized dataset documentation frameworks based on the value of transparency. In this talk, I propose a shift of perspective: from documenting for transparency to documenting for reflexivity. Based on a long-term project with outsourced data workers in Argentina, Bulgaria and Syria, I argue for the need of designing documentation starting from the needs and experience of the workers who collect, sort, and label the data that trains ML models. This requires considering the historical inequalities, working conditions, and epistemological standpoints that shape both data work and datasets.
This talk will take place in person at SCIoI.
Photo by Hunter Harritt on Unsplash