Human raters facing automatic speaking assessment
In their new study, the researchers of the DigiTala-project analyzed human raters’ views on and experiences with a digitally implemented rating process for language learners’ speech. The majority of raters were satisfied with the online rating process.

Remote studies, online language learning and digital language tests are part of the contemporary education. Despite this, fully automated tests of oral language skills are still rare. In their study, the researchers of the DigiTala project analyzed analyzed human raters’ views on and experiences with a digitally implemented evaluation process. For this, the project designed Moodle-based online rating environments for Finnish and Swedish as second languages and several scales for speaking assessment. Three rating rounds with a total of 37 expert raters were organized between winter 2020 and summer 2021.

A clear rating process

The evaluators were trained remotely on Zoom and ratings were collected in Moodle. The majority of evaluators found the rating instructions clear. Moodle and analytic rating criteria were new to some of theraters, which slowed down the rating.

Mainly, Moodle was considered as an easy platform despite some technical issues. The navigation view of the Moodle exam was found particularly useful, since it made moving between recordings easier.

Challenges and opportunities in automatic assessment

The majority of raters had a positive or neutral attitude towards automatic speaking assessment. Automated assessment was considered useful in supporting or supplementing human assessment. Automated assessment was also seen as an opportunity to improve the reliability of the assessment. The raters considered the strengths of the automatic evaluation to be reliability and usability.

The raters though the limitations of automatic assessment lie in how the machine deals with diverse speakers and spontaneous speech samples. Assessment bias concerning dialects and language variants as well as personal speaking styles should be taken into account.

Success in defining dimensions

Raters assessed speech samples from both high school students and adult language learners. According to the feedback, different dimensions of spoken language skills were easier to assess from the longer speech samples than from the short samples. Some of the evaluators missed clearer instructions for using different criteria, but overall they managed well in assessing the speaking dimensions..

The raters considered the most important dimensions of speech to be interaction, fulfillment of the assignment, extent of expression and speechfluency. These dimensions – apart from interaction – were also included in the dimensions defined in the DigiTala project.

Read more on the subject here: Machine learns to assess speech with the help of 18 human raters