Use of digital methods must be grounded in theory

Kaius Sinnemäki, language typology researcher, is using statistical methods to analyse Open Access databases and language catalogues to discover how the different ways of indicating noun case and gender can interact and what kinds of linguistic universals arise from such interactions.

Grammatical gender is not indicated in Finnish. Indo-European languages typically have two or three genders, but many African languages can have five or more. However, a large number of the world’s languages are like Finnish in that they do not indicate gender.

These are the kinds of characteristics of linguistic typology that Academy of Finland Postdoctoral Researcher Kaius Sinnemäki studies.

“I study linguistic universals, the shared tendencies in the world’s languages. One such tendency is grammatical gender and the noun classifiers which exist in many South East Asian and South American languages. The intention is to use dozens or hundreds of languages to make generalisations. This is the kind of humanities research that must be done with a computer, not on the field. The methods are taken from statistics.”

Databases and language catalogues

When a researcher wants to study all of the thousands of languages in the world, it is understandable that digital Open Access databases and language catalogues are an invaluable tool.

“The World Atlas of Language Structures (WALS) is a crucial resource for linguistic typology. It’s been online since 2008 and contains information about more than 2,600 languages. A group of approximately 200 languages has listings of almost 200 characteristics, while others have fewer, but there is some information about every single language in the database. The database can yield many different kinds of information, and it also displays the sources,” Kaius Sinnemäki explains.

“Language catalogues, such as Glottolog and Ethnologue, have a tremendous amount of metadata on languages: where they are spoken and what their genealogies or related languages are. The catalogues also list their sources, and often even have a link to the original source, if it is available online. But sometimes it’s still necessary to go to the library if there is a particular source that is needed.”

And comparing languages through digital resources is not entirely without its problems either, as different databases can use different names and codes for the languages. Even the criteria for differentiating languages from regional varieties vary.

“Parallel texts, or versions of one text translated into several languages, are an increasingly important resource for linguistic typology researchers. For example, the New Testament is available in more than 1,000 languages.”

Mastering statistical methods

When Kaius Sinnemäki started his dissertation research in general linguistics, his statistical knowledge was based on one introductory course in statistics which had been offered as part of elective studies. That level of competence did not get him very far when, as a doctoral student, he had to conduct a scientifically sound sampling of languages which are as independent of one another as possible.

“I spent about a year learning statistical methods, such as editing databases. I primarily used books and the internet to study independently. My supervisors were a big help. After many twists and turns and trial and error I prevailed, and methodology became my strong suit. My time was far from wasted.”

“These days I even teach others to use the R programming environment. It is perfectly suited for manipulating data and conducting statistical tests.”

Sign language universals

For his latest linguistic typology project, Kaius Sinnemäki intends to include sign languages.

“Linguistic typology often ignores sign languages. However, this means that the tendencies we think of as ‘universal’ may just be features of spoken language, not language itself. Sign language is pivotal in making this distinction."

Sign language research has become more popular over the past 15 years, and researchers at the University of Jyväskylä have conducted language technology analyses of sign language. Together with these researchers and other partners, Sinnemäki’s research project also intends to generate a database of spoken and sign languages, which is planned for open release.

“With all these digital resources and methods, researchers must remember that their work must be guided by a research problem that is grounded in theory. In this study, my research problems arise from the tradition of linguistic typology and hypotheses that have not been previously tested with extensive linguistic data.”

Digital Humanities science theme

Follow and participate: #HelsinkiDH

Find the activities: ThinkWall