The ELFA project

First languages represented in the ELFA corpus

The ELFA corpus includes roughly 650 speakers representing 51 first languages. The distribution of tokens among speakers of various first languages is as follows:

Language Tokens % of tokens
Finnish 301632 28.5
German 85996 8.1
Russian 69905 6.6
Swedish 67485 6.4
Dutch 58823 5.6
English 53609 5.1
Danish 39957 3.8
French 37918 3.6
Italian 31124 2.9
Romanian 21420 2.0
Spanish 20984 2.0
Portuguese 19533 1.8
Polish 19134 1.8
Lithuanian 18215 1.7
Norwegian 14984 1.4
Catalan 14512 1.4
Bengali 13722 1.3
Croatian 13674 1.3
Czech 13384 1.3
Akan/Twi 12515 1.2
Somali 12194 1.2
unknown 11779 1.1
Swahili 10910 1.0
Dagbani 10237 1.0
Arabic 9243 0.9
Persian/Farsi 9242 0.9
Hindi 8299 0.8
Chinese/Cantonese 7667 0.7
Japanese 6720 0.6
Kikuyu 6324 0.6
Bulgarian 5459 0.5
Hungarian 4053 0.4
Estonian 3193 0.3
Igbo 3150 0.3
Greek 2486 0.2
Dangme 2364 0.2
Kihaya 1936 0.2
Urdu 1846 0.2
Uzbek 1726 0.2
Nepali 1705 0.2
Turkish 1590 0.2
Efilo 989 0.1
Yoruba 989 0.1
Hausa 989 0.1
Oromo 940 0.1
Amharic 749 0.07
Latvian 666 0.06
Slovakian 548 0.05
Icelandic 337 0.03
Hebrew 260 0.02
Berber 113 0.01
Welsh 102 0.01

The above figures were derived from the XML version of the ELFA corpus. Word tokens have been counted independent of the header metadata and all XML mark-up, with the exception of anonymised names, which have been counted as tokens. When a speaker has reported more than one first language, that speaker's tokens have been counted under each of those languages. Thus, the total number of tokens presented here are greater than in the corpus itself.

The proportion of speech by Finnish native speakers was kept to 28.5%. The proportion of native/bilingual English speakers amounts to 5.1% of speech in the corpus. Among English speakers, several regional varieties of English are represented:

  • Australia
  • Bangladesh
  • Canada
  • Cameroon
  • Ghana
  • Hong Kong
  • India
  • Ireland
  • Jamaica
  • Lebanon
  • Nigeria
  • New Zealand
  • Trinidad and Tobago
  • UK
  • USA

News

  • For research blogging on ELF, see the ELFA project blog.
  • Anna Mauranen has published a chapter on academic ELF in New Frontiers in Teaching and Learning English, edited by Paola Vettorel (Cambridge Scholars).
  • An intensive course on ELF is offered by researchers from the ELFA project in the Helsinki Summer School, Aug. 4–20, 2015. For description of the course, see the ELFA blog.
  • Niina Hynninen has published an article in the Journal of English as a Lingua Franca 3(2) entitled "The Common European Framework of Reference from the perspective of English as a lingua franca: What we can learn from a focus on language regulation".
  • Svetlana Vetchinnikova has defended her PhD thesis, Second language lexis and the idiom principle. Read the abstract and download the full text from Helsinki's E-thesis service.
  • Maria Kuteeva & Anna Mauranen have edited a special issue of the Journal of English for Academic Purposes 13: Writing for publication in multilingual contexts. Find their introduction here.
  • Kaisa Pietikäinen has published an article entitled ELF couples and automatic code-switching in the Journal of English as a Lingua Franca 3(1).