A total of 30,000 users and 38 million posts helped investigate Finnish Twitter users’ language choices and geographical location

A multidisciplinary research group at the University of Helsinki explored the language choices made by Finnish Twitter users and the distribution of languages and users within Finland.

A total of 38 million Twitter messages were collected from 30,000 users, with geographic information included in 2 million of the posts. Furthermore, the researchers examined temporal and geographical variation in the diversity of the linguistic landscape. Also determined in the study, on the basis of users’ geographical information, was their place of residence to the level of municipality and region. Users’ languages were identified with the help of automated language detection. 

Finnish and English as the dominant languages 

“As expected, the dominant languages on the Finnish Twitter are Finnish and English,” says Assistant Professor Tuomo Hiippala.

“In rural areas, Finnish is, on average, more and English less prevalent. Other than that, the distribution between the two languages is fairly equal.” 

However, Hiippala says that Twitter messages contain geographical information less often than posts published on Instagram. The researchers’ observations largely match Finland’s linguistic realities, demonstrating that the algorithms used for detecting domiciles and languages are relatively reliable. 

“Still, it’s important to keep in mind that Twitter users are not representative of the entire population,” Hiippala notes. “The next natural step is to investigate the use of individual languages and their internal variation. For instance, the distribution of Finnish dialects on Twitter would make an interesting research topic.” 

According to doctoral researcher Tuomas Väisänen, the historical and geographical dimension is strongly in evidence in the regional distribution of languages: on average, Swedish is used more in the Swedish-speaking coastal regions, while Russian is more prevalent near the eastern border. In the case of Estonian, most observations were made in southern Finland. 

“From a temporal and geographical viewpoint, Twitter’s digital linguistic landscape illustrates how the digital and physical worlds are interconnected throughout Finland, regardless of location,” Väisänen says. 

Linguistics can benefit from a geographical perspective

However, most Finnish Twitter users utilise several languages on the platform, but the languages are not used in equal proportions. Only 18% of users rely on a single language. 

“The study shows that a geographical perspective can benefit linguistics,” says Academy Research Fellow Olle Järv.

“As for temporal observations, they supplement the geographical picture by illustrating how linguistic diversity varies by daily, weekly and seasonal rhythms. Twitter messages and other sources of big data open up new avenues for linguistics.”

Users favouring a single language are more often located in rural areas, while people who actively use more than one language are located on the coast and in the Helsinki Metropolitan Area. The most diverse linguistic landscape can be found in the coastal Swedish-speaking areas and the Uusimaa region. Then again, no fewer than 19 languages are used in North Karelia, the linguistically poorest region. 

The study was carried out as part of a project investigating the linguistic landscape of the Helsinki Metropolitan Area, with Tuomo Hiippala from the Faculty of Arts, University of Helsinki as the principal investigator. The project is funded by the Emil Aaltonen Foundation.

The other authors of the research article are Tuomas Väisänen, Tuuli Toivonen and Olle Järv from the Faculty of Science. The English-language article entitled ‘Mapping the languages of Twitter in Finland: richness and diversity in space and time’.

Further information: 

Tuomo Hiippala, assistant professor, +358 50 377 3366, tuomo.hiippala@helsinki.fi, @tuomo_h