M.Sc. Juho Leinonen defends his doctoral thesis Keystroke Data in Programming Courses on Wednesday the 20th of November 2019 at 12 o'clock noon in the University of Helsinki Athena building, Room 167 (Siltavuorenpenger 3 A, 1st floor). His opponent is Associate Professor Nickolas Falkner (University of Adelaide, Australia) and custos Professor Petri Myllymäki (University of Helsinki). The defence will be held in English.
The thesis of Juho Leinonen is a part of research done in the Department of Computer Science and in the Agile Education Research (RAGE) group at the University of Helsinki. His supervisors have been University Lecturer Arto Hellas, Associate Professor Petri Ihantola, Professor Tommi Mikkonen, Assistant Professor Arto Klami, and Professor Petri Myllymäki (University of Helsinki).
Keystroke Data in Programming Courses
Data collected from the learning process of students can be used to improve education in many ways. Such data can benefit multiple stakeholders of a programming course. Data about students’ performance can be used to detect struggling students who can then be given additional support benefiting the student. If data shows that students have to read a certain section of the material multiple times, it could indicate either that that section is possibly more important than others, or it might be unclear and could be improved, which benefits the teacher. Data collected through surveys can yield insight into students’ motivations for studying. Ultimately, data can increase our knowledge of how students learn benefiting educational researchers.
Different kinds of data can be collected in online courses. In programming courses, data is typically collected from tools that are specifically made for learning programming. These tools include Integrated Development Environments (IDEs), program visualization tools, automatic assessment tools, and online learning materials. The granularity of data collected from such tools varies. Fine-grained data is data that is collected frequently, while coarse-grained data is collected less frequently. In a programming course, coarse-grained data might include students’ submissions to exercises, whereas fine-grained data might include students’ actions within the IDE such as editing source code. An example of extremely fine-grained data is keystroke data, which typically includes each key pressed while typing together with a timestamp that tells when exactly the key was pressed.
In this work, we study what benefits there are to collecting keystroke data in programming courses. We explore different aspects of keystroke data that could be useful for research and to students and educators. This is studied by conducting multiple quantitative experiments where information about students’ learning or the students themselves is inferred from keystroke data. Most of the experiments are based on examining how fast students are at typing specific character pairs.
The results of this thesis show that students can be uniquely identified solely based on their typing whilst they are programming. This information could be used in online courses to verify that the same student completes all the assignments. Excessive collaboration can also be detected automatically based on the processes students take to reach a solution. Additionally, students’ programming experience and future performance in an exam can be inferred from typing, which could be used to detect struggling students. Inferring students’ programming experience is possible even when data is made less accurate so that identifying individuals is no longer feasible.
Availability of the dissertation
An electronic version of the doctoral dissertation will be available on the e-thesis site of the University of Helsinki at http://urn.fi/URN:ISBN:978-951-51-5604-4.
Printed copies will be available on request from Juho Leinonen: juho.leinonen@helsinki.fi.