Picture a world where technology can be used to track people’s thoughts, hopes and anxieties. It would be possible to draft an image of the psychological state of a particular area or even an entire nation, to be read like a sort of weather chart: What are Finns talking about right now? Where in the country are people the happiest? Which politician is likely to win in the coming elections and why?
To realise such a scenario, we need not just the technology and the skills, but also a place from which to make the relevant observations. This place already exists: it is social media, and one of the most promising social media platforms in Finland is the Suomi24 discussion forum.
For many, Suomi24 represents a forum for gossiping about your neighbours, ragging on celebrities and spreading hate speech. But for researchers, Suomi24 is a veritable treasure trove, with 1.7 million registered users and more than 70 million archived messages spanning over a decade. The messages are signals that say something about us, and they are being used to find answers to questions in several disciplines.
“As a research environment, Suomi24 is exceptional, because it is so extensive and provides a view so far into the past. It’s likely that the only better resource in the world is a Twitter database held exclusively by MIT,” says Salla-Maaria Laaksonen, a researcher at the University of Helsinki’s Centre for Consumer Society Research.
Laaksonen is just one of several researchers who has been studying the Suomi24 material which Aller Media released for research purposes in 2015. In addition to their futuristic potential, social media resources are associated with a number of challenges in terms of research ethics, and require a balancing act between, for example, open science and privacy.
Teaching an algorithm hate speech from social media
As of now, it is impossible to generate a map of people’s mental states based on Suomi24, or any other platform, in the style of science-fiction movies such as Minority Report. However, social media can already answer questions which researchers were previously unable to address.
“Social media data has been used to predict the vegetarian boom, the success of specific politicians in elections as well as the spread of flu epidemics. But it’s often difficult to repeat such predictions,” Laaksonen explains.
Laaksonen has studied topics such as how social media was used during the 2015 Finnish parliamentary election, and participated in a project in which an algorithm was taught to identify hate speech on social media during the municipal election of spring 2017.
“In the 2015 Cyber-elections project, we created an influencer index to show whether candidates could use social media to influence the topics in traditional media. Some politicians, such as Ville Niinistö, stood out as being able to anticipate topics in traditional media or to bring their own topics into the debate.”
Laaksonen is also involved in the Citizen Mindscapes research collective, which studies the psychological state of citizens using massive online datasets, such as Suomi24.
“For example, the Suomi24 data has shown us that there is more talk about concerns at night, and that people’s biggest worries have to do with their own health. There’s also more cursing at night,” Laaksonen laughs.
Social media scandals eliminate research opportunities
The use of social media data in scholarly research is just getting started, but it may be that the historically vast datasets will dry up before their full potential can be taken advantage of. Recent scandals and information leaks, such as those surrounding Facebook, have made both service providers and users wary.
“From the technological perspective, the situation is worse because programming interfaces are being shut down. On the other hand, for users this might not be the worst thing, considering the Cambridge Analytica case,” Laaksonen says. She mentions Brexit and the election of US President Donald Trump as cases where Cambridge Analytica’s propaganda, based on machine learning, had an effect on election results through social media.
After the Cambridge Analytica scandal, Facebook made major changes to its programming interface, and blocked researchers from accessing most of the available data. Previously, a student writing a master’s thesis, for example, would have been able to use data about public Facebook groups and their members. Now such data is restricted primarily to well-known researchers, whose research proposals are successful in Facebook’s funding initiative. Researchers have criticised the change as undemocratic and destructive for academic research.
“Twitter is also about to change its data release policy. In the future, anyone wanting to access Twitter’s data will have to register as a developer and submit an application explaining what kind of research they intend to do and what they want to do with the data,” Laaksonen adds.
Information security and privacy must be considered in research
So how safe is our data in the hands of major corporations, or even researchers? Laaksonen has comforting words for us: no matter the type of data, researchers must be able to use it ethically.
“Social media data consists of personal information, meaning that it must be processed with the same precautions we would employ for any personal data. Researchers must be able to consider the potential impact that the processing of the data may have on the person to whom it pertains.
“With all personal data, we must comply with the GDPR, the EU’s new General Data Protection Regulation. If you want to apply for funding from the Academy of Finland, for example, you have to have a data management plan indicating what type of data you are collecting, how you intend to store and use the information as well as what you plan to do with it after the research is done,” Laaksonen continues.
Research data can cause trouble, even when seemingly anonymous. For this reason, the raw data from Laaksonen’s hate speech project cannot be released.
“If we were to release the hate speech data we used to train the algorithm, someone might be able to track down the people who wrote the original messages,” Laaksonen explains.
Even though open science is a founding principle for many researchers, Laaksonen included, scenarios such as this force them to restrict access to their data.
Research imagination is the most important characteristic of a social media researcher
Researchers in the humanities and social sciences have for centuries dreamed of introducing the precision and explanatory power of the hard sciences into their disciplines. Economics has tried the hardest, but when the main variable is what humans are thinking about and how they express themselves, even the most educated assumptions of our behaviour tend to fall wide of the mark.
During the era of social media and big data, some, such as MIT Media Lab’s data scientist Alex Pentland, have started to talk about social physics: when we have millions or even billions of observations from a social environment, we can use these observations to generate a new, more exact understanding of the environment.
Pentland himself is one of the authors of a research article, reminiscent of the world of Minority Report, in which governments can prevent crime before it happens thanks to the vast amounts of data they hold on each individual. In the study, Pentland and his colleagues tried to see whether the site of a future crime in London could be predicted by using census data as well as aggregated and anonymised behavioural data from the mobile network. Chillingly, the researchers found that they were able to predict the site of a crime in London with an accuracy of 70%.
We are yet to see what the significance of social physics and the new research methods will have. According to Laaksonen, much depends on the ‘research imagination’ of scholars, which can take them down very peculiar paths of inquiry – as it did with Pentland and his colleagues.
“Pentland et al. are a fantastic example of how researchers have an ethical responsibility as the people who are writing the future, much in the same way as science-fiction authors. When we tackle big data, we must consider not just what we could do, but whether what we’re doing is right,” states Laaksonen.