The ELFA project

The SciELF corpus

The SciELF corpus consists of research papers that have not undergone professional proofreading services or checking by a native speaker of English. All the papers are written by L2 users of English, and most of these are final drafts of unpublished manuscripts. It is thus a corpus of second-language use (SLU) in written scientific communication. Several international partners have contributed material to this corpus, resulting in 150 papers (759,300 words) by authors with ten different L1 backgrounds. The breakdown of these L1s is as follows:

In addition, we attempted to compile a balanced sample of papers between the sciences (labelled ‘Sci’) and the social sciences and humanities (labelled ‘SSH’). However, the texts categorised as SSH were found to be much longer on average than those labelled Sci, so the broad division of the corpus appears thus:

Among the 326,463 words in the Sci category, most are drawn from the natural sciences (79%) and medicine (18%). The 432,837 words in SSH are drawn from social sciences (45%), humanities (34%), and behavioural sciences (21%). As for the academic roles of the first authors, the distribution of these various roles in SciELF is as follows:

The SciELF corpus would not have been possible without the generous contribution of our international partners, who obtained texts and author permissions in their respective home countries. We gratefully acknowledge the contribution of the following researchers:

  • Marina Bondi and Anna Stermieri, University of Modena and Reggio Emilia
  • Maria Kuteeva and Lisa McGrath, University of Stockholm
  • Pilar Mur-Dueñas, University of Zaragoza
  • Laura Muresan and Mirela Bardi, Bucharest University of Economic Studies
  • Lene Nordrum, Lund University
  • Wei Ren, Guangdong University of Foreign Studies
  • Elizabeth Rowley-Jolivet, Université d’Orléans
  • Tony Berber Sardinha, Catholic University of São Paulo
  • Irina Shchemeleva, St. Petersburg Higher School of Economics
  • Renáta Tomášková, University of Ostrava
  • Ying Wang, China Three Gorges University

Suggested citation

SciELF 2015. The SciELF Corpus. Director: Anna Mauranen. Compilation manager: Ray Carey. (last access).


  • For research blogging on ELF, see the ELFA project blog.
  • Anna Mauranen has published a chapter on academic ELF in New Frontiers in Teaching and Learning English, edited by Paola Vettorel (Cambridge Scholars).
  • An intensive course on ELF is offered by researchers from the ELFA project in the Helsinki Summer School, Aug. 4–20, 2015. For description of the course, see the ELFA blog.
  • Niina Hynninen has published an article in the Journal of English as a Lingua Franca 3(2) entitled "The Common European Framework of Reference from the perspective of English as a lingua franca: What we can learn from a focus on language regulation".
  • Svetlana Vetchinnikova has defended her PhD thesis, Second language lexis and the idiom principle. Read the abstract and download the full text from Helsinki's E-thesis service.
  • Maria Kuteeva & Anna Mauranen have edited a special issue of the Journal of English for Academic Purposes 13: Writing for publication in multilingual contexts. Find their introduction here.
  • Kaisa Pietikäinen has published an article entitled ELF couples and automatic code-switching in the Journal of English as a Lingua Franca 3(1).