Loading Events

« All Events

Elena Simperl (King’s College London), “Open Data Infrastructure in the Age of Generative AI”

15 September @ 10:30 am - 11:30 am

Details

Date:
15 September
Time:
10:30 am - 11:30 am
Event Category:
Website:
https://www.bifold.berlin/education/summerschool

Venue

TU-Campus EUREF, EUREF-Campus 9, 10829 Berlin-Schöneberg. EUREF-Campus 10829 Germany

Open data infrastructure refers to the systems, frameworks, and processes put in place to collect, store, manage, and share data generated or held by government, science, and other public institutions. It is meant to ensure that public data is accessible, high-quality, secure, and usable by a wide range of stakeholders, including the public.

For more than a decade, we have witnessed millions of datasets made available via such infrastructure, advancing research, policymaking, and innovation. However, open data infrastructure is still far from realising its potential; non-technical users face significant barriers in navigating complex datasets and extracting meaningful information to support their decisions.

Furthermore, the global AI race has put substantial strains on this infrastructure, with data holders forced to re-examine their ability to sustain critical public services.

In this talk I will walk through some of my recent research into addressing these challenges. I will start with a series of user studies, which explore how professionals in various data-related roles engage with chatbots to find, make sense, and use open data.

Diving deeper to the accuracy issues suggested by these studies, I will then describe two experiments, which use machine unlearning and information leakage methods to understand if existing public authoritative sources of data are used by widely accessible generative AI tools.

Informed by the findings, my team developed PortalGPT, a series of AI prototypes leveraging knowledge graphs, large language models, and retrieval-augmented generation to make open data more accessible and actionable for people with varying levels of data literacy.

PortalGPT enhances dataset discovery by bridging the gap between user information needs and structured data queries and enables dataset exploration through interactive analysis tools. Through conversational natural language interactions, users can seamlessly search, analyse, and explore knowledge from open data portals, redefining the traditional methods of navigating and utilizing open datasets.

The talk is part of the 2025 Berlin Summer School on Artificial Intelligence and Society, jointly organized by the Berlin Institute for the Foundations of Learning and Data (BIFOLD), the Weizenbaum Institute for the Networked Society, and Science of Intelligence (SCIoI), will focus on a timely and critical topic: “Open Science and AI – Shaping the Future of Responsible Research.”

Taking place from 15 to 18 September 2025, this year’s Summer School invites early-career researchers and advanced Master’s students to explore how open data, open collaboration, and responsible practices can help build more transparent, fair, and trustworthy AI systems.

Image generated with DALL-E by Maria Ott.

Details

Date:
15 September
Time:
10:30 am - 11:30 am
Event Category:
Website:
https://www.bifold.berlin/education/summerschool

Venue

TU-Campus EUREF, EUREF-Campus 9, 10829 Berlin-Schöneberg. EUREF-Campus 10829 Germany