In a world where so much data is generated every moment, there is a temptation to collect more and work with the most up-to-date sources. However, it is also important to find ways to reuse existing data and build on it. This philosophy is part of the REIMAGINE ADM meta project, which reuses insights from existing qualitative, mostly ethnographic case studies to derive new insights and consolidate findings across cases and countries.
Besides advancing existing research, we also collected some new open data to explore the public perceptions of algorithmic systems and the values that are important in these contexts. This allowed us to ask new questions about our past cases and sparked conversations within our team about the similarities and differences among our use cases.
Initially, we examined three public sources, namely public documents, social media, and parliamentary debates. We accessed policy documents via a service called Overton, a platform that indexes and structures these documents. While all these documents are publicly available, a structured database helps researchers derive insights much faster. However, the Overton data was difficult to work with due to the diversity of text types and the lack of robust metadata for searching.
Next, we examined Twitter to gauge public opinion on algorithmic systems. We collected 614,169 tweets between January 22, 2023, and May 18, 2023. On February 9, 2023, Twitter officially revoked free API access. Our retrieval script ran successfully for a while, until it stopped working in mid-May. The unforeseen termination of API access prevented us from observing the attitudes over a longer period.
Finally, we examined parliamentary debates. We took the ParlaMint 4.0 dataset, specifically the subsets from Belgium, Britain, Denmark, Finland, Sweden, and Slovenia. These data are already publicly available via the CLARIN.SI repository and are searchable in the NoSketch Engine.
Building fair and reusable resources for public values and ethnographic insight
Within the scope of the project, we developed three main resources for exploring public values and algorithmic systems through text. The first is a list of public values
Our project emphasizes the vital role of the FAIR principles in science (Wilkinson et al. 2016). These principles ensure that data is findable, meaning it is deposited in public repositories, often with corresponding DOIs. They are accessible, thanks to accompanying metadata. They are interoperable and provided in standard formats such as .csv or .pdf. Finally, they are reusable, supported by open licenses and documentation. Research does not end with the final analysis or the publication of a paper. Rather, it continues through dissemination and data management practices that form the foundation of robust and reliable science.
However, it is more difficult to apply these principles to qualitative ethnographic data, much of which is sensitive or inaccessible to outsiders. Therefore, it is crucial to develop mechanisms that facilitate the reanalysis of ethnographic cases. Reanalysis helps bring some of the nuanced experiential insights from researchers’ heads and fieldnotes into public discussions.
References
**
Ajda Pretnar Žagar is a researcher at Faculty of Computer and Information Science at University of Ljubljana. She also works in the Reimagine ADM project lead by professor Minna Ruckenstein. In this project she participates in mapping of values, applying circular mixed methods, making visualisation of data and quantitative analysis, and promoting interaction with the stakeholders.