Writing and publishing in English is all-important in the making of academic careers. Many academics for whom English is not the first language worry about their language – often about its correctness and 'native-likeness'. Yet the majority of the readers of academic research publications are not native speakers of English. Effective academic text nevertheless hinges more on the quality of its contents, the strength of its argument, and the coherence of its rhetorical organisation than the details of correctness relative to Standard English. Reader responses to such matters are culturally variable, as was shown in Contrastive Rhetoric research in the 1990s.

The current world of academic writing and publishing is far more globalised than it was a decade or two ago. Yet we have no research evidence on the determinants of effectiveness in academic rhetoric in a world that is permeated by English as a lingua franca, and a constant flow of cultural influences from a variety of sources. Speed in publishing findings has also become a major issue in academia. New ways of making findings public are developing in a variety of online forms, which know no national or local boundaries.

Project WrELFA collects and analyses academic texts written in English as a lingua franca. The texts cover high-stakes genres in different fields, both published and unpublished. Among our target text types are evaluative reports, such as examiners' and peer reviewers' reports, and digital media such as research blogs. Our aim is to cover academic writing practices on a broad scale, ranging from texts circulated within academia to texts that reach the wider public.

Professor Anna Mauranen
University of Helsinki

The Corpus of Written English as a Lingua Franca in Academic Settings (WrELFA)

Corpus compilation began in late 2011 and was completed in early 2015. The WrELFA corpus consists of 1.5 million words drawn from three academic text types – unedited research papers (SciELF corpus, 759k words, 50% of total), PhD examiner reports (402k words, 26%), and research blogs (372k words, 24%). The target author is the academic user of English as a lingua franca (ELF), and texts are not to have undergone professional proofreading or checking by an English native speaker. It is thus a corpus of second-language use (SLU) in written scientific communication.

The corpus has been designed as a written complement to the spoken ELFA corpus with similar markup and metadata. WrELFA employs a broad binary categorisation of texts into the sciences (category “Sci”) and disciplines in social sciences and humanities (category “SSH”). Overall, the distribution of these categories is as follows:

Among the Sci texts, natural sciences are the best represented (63% of words) followed by medicine (22%) and agriculture & forestry (11%). The SSH texts are divided between social sciences (44%), humanities (36%) and behavioural sciences (18%).

Concerning first languages of the authors, at least 35 unique L1s are represented in the corpus (click here for a complete list of the L1s represented), along with an undetermined number of blog commenters whose identities cannot be verified. As with other ELF corpora, English native speakers are included within the texts, in this case among the blog commenters and PhD examiners. Finnish is the largest L1, but with only 14% of total words, and the top 10 L1 categories (including unidentified blog commenters) make up 76% of the corpus:

Finally, the authors in the corpus represent different stages of an academic career. Following the ELFA corpus categories, we distinguish between research students (completed master’s degree, but not yet a PhD), junior staff (post-doctoral researchers and early career academics) and senior staff (professors and senior scholars). In WrELFA as a whole, junior staff are best represented with 42% of total words. Senior staff contribute 30% of words, followed by research students with 11%. The remaining 17% include unknown roles (including blog commenters) and bloggers or PhD examiners who are employed outside the university sector.

  • SciELF corpus – a stand-alone subcorpus of research papers that have not undergone professional proofreading services or checking by a native speaker of English.
  • PhD examiner reports for submitted doctoral theses were collected from six faculties in the University of Helsinki over a two-year period.
  • Academic research blogging – consists of a sample of posts and discussions from 40 different research blogs, all of which are maintained by L2 users of English.

WrELFA 2015. The Corpus of Written English as a Lingua Franca in Academic Settings. Director: Anna Mauranen. Compilation manager: Ray Carey. (last access).


Our first and foremost thanks are due to the authors of the examiner reports and SciELF papers, who generously gave us permission to use their texts as research materials. We also thank the deans of the six faculties at the University of Helsinki who helped us obtain the examiner reports from the Faculty Offices. Thanks are also due to the Language Services at the University of Helsinki for their help in finding articles at their pre-language revision stage. The contribution of our international partners to the SciELF corpus is also gratefully acknowledged.

The WrELFA corpus was financed by the GlobE Helsinki project (see the Global English consortium) and the ChangE Helsinki project (see the Changing English consortium), both of which have been funded by the Academy of Finland. Special thanks go to research assistants Ruut Kosonen and Jani Ahtiainen, who made a major contribution to the data collection and processing of texts and ensuring the quality of the corpus.

The WrELFA corpus will be freely distributed for research purposes one year from the April 2015 completion of the corpus. For questions about the corpus compilation, please contact ray.carey (at)


