Data from the internet can help address the biodiversity crisis

Digital platforms provide a wealth of data that conservation scientists can use to help address the biodiversity crisis and other sustainability challenges, including climate change. However, the use of these data is still limited in conservation science, and sometimes it is unclear how these data can be used in compliance with data privacy regulations.

Scientists from the University of Helsinki have addressed these concerns in two new articles published as part of a special section on ‘Advancing Conservation Culturomics’ in the journal Conservation Biology.

Using digital data for conservation

“Conservation culturomics focuses on the quantitative analysis of digital data to study human-nature interactions,” says lead author of one of the manuscripts, Dr Ricardo Correia, a postdoctoral researcher at the Helsinki Lab for Interdisciplinary Conservation Science (HELICS). He is one of the guest editors of the special section in Conservation Biology.

“Conservation scientists and practitioners often find it challenging to access and analyse digital data. Our manuscript provides advice on how to overcome these challenges and shines a light on new potential applications that remain unexplored”, adds Dr Correia. The manuscript, which is the product of an international collaboration, introduces a framework for carrying out culturomics analyses in conservation. This framework includes suggestions on how to collect and process the data.

“Digital data have enormous potential to help researchers explore the human side of conservation problems, such as the introduction of invasive species or the consumption of wildlife products,” explains Dr Andrea Soriano-Redondo, one of the co-authors of the study. Dr Soriano-Redondo recently secured a Marie Skłodowska-Curie fellowship from the European Union to join HELICS to explore issues related to the illegal trade in wildlife using data from digital platforms.

Data, methods, and privacy

Working with digital data is not always straightforward. “Online data often contain personal information. This entails dealing with strict ethical and legal requirements. We have learnt a lot about careful and ethical data processing over the years and wanted to share our experiences with other researchers,” reports Christoph Fink, co-lead author of a second manuscript and a doctoral researcher at HELICS.

This manuscript highlights the importance of data minimisation and of removing identifiers from data collected from social media or other online data sources to comply with the EU’s General Data Protection Regulation (GDPR). “Our framework provides helpful guidelines to ensure that users’ privacy is respected and preserved at all stages,” adds Dr Anna Hausmann, co-author and a postdoctoral researcher at HELICS. “We also stress how important it is to identify all possible risks to people’s privacy so that mitigation strategies can be found”, she continues.

The team also uses machine learning and natural language processing for analysing the deluge of digital data. “When we train machine learning models, we want to have as much data as possible, since even for computers – just like humans – practice makes perfect. Essentially, machine learning is fine-tuning from many examples”, explains Dr Ritwik Kulkarni, a co-author of the manuscript and a postdoctoral researcher at HELICS. “This, of course, can conflict with data minimisation. Finding solutions that respect people’s privacy while allowing for research to continue can be challenging, but we show that it is possible,” he continues.

Future outlook

“Thus far, we have been able to carry out research using digital data to investigate both threats to, and opportunities for supporting biodiversity conservation. We carry out research that meets strict ethical, legal, and data privacy requirements when using digital data sources,” summarises Associate Professor Enrico Di Minin, who leads HELICS and is a co-lead and the senior author in these contributions. “However, it is of foremost importance that free access for research purposes to these rich data sources of human behaviour is ensured so that they can continue being used to help address the biggest sustainability challenge for humans, the loss of biodiversity.”


Correia RA, et al. 2021. Digital data sources and methods for conservation culturomics. Conservation Biology.

Di Minin E & Fink C, Hausmann A, Kremer J, Kulkarni R. 2021. How to address data privacy concerns when using social media data in conservation science. Conservation Biology.

Contact information:

Dr Ricardo Correia, postdoctoral researcher, Helsinki Lab of Interdisciplinary Conservation Science,
Twitter: @rahcorreia85

Christoph Fink, doctoral candidate, Helsinki Lab of Interdisciplinary Conservation Science
Twitter: @chrxf

Dr Enrico Di Minin, Associate Professor, Helsinki Lab of Interdisciplinary Conservation Science
Phone: +358 45 8413206
Twitter: @EnTembo