All the papers are written by L2 users of English, and most of these are final drafts of unpublished manuscripts. It is thus a corpus of second-language use (SLU) in written scientific communication. Several international partners have contributed material to this corpus, resulting in 150 papers (759,300 words) by authors with ten different L1 backgrounds. The breakdown of these L1s is as follows:
first author's L1 | no. of articles | no. of words | % of words | avg. words/article | |
---|---|---|---|---|---|
1 | Finnish | 25 | 123153 | 16% | 4926 |
2 | Czech | 22 | 109173 | 14% | 4962 |
3 | French | 16 | 91186 | 12% | 5699 |
4 | Chinese | 21 | 84807 | 11% | 4038 |
5 | Spanish | 13 | 79038 | 10% | 6080 |
6 | Russian | 13 | 71376 | 9% | 5490 |
7 | Swedish | 13 | 60060 | 8% | 4620 |
8 | Italian | 11 | 58685 | 8% | 5335 |
9 | Portuguese (Brazil) | 12 | 56625 | 7% | 4719 |
10 | Romanian | 4 | 25197 | 3% | 6299 |
150 | 759300 | 100% | 5062 |
In addition, we attempted to compile a balanced sample of papers between the sciences (labelled ‘Sci’) and the social sciences and humanities (labelled ‘SSH’). However, the texts categorised as SSH were found to be much longer on average than those labelled Sci, so the broad division of the corpus appears thus:
category | no. of articles | no. of words | % of total words | avg. words/article |
---|---|---|---|---|
Sci | 78 | 326463 | 43% | 4185 |
SSH | 72 | 432837 | 57% | 6012 |
150 | 759300 | 100% | 5062 |
Among the 326,463 words in the Sci category, most are drawn from the natural sciences (79%) and medicine (18%). The 432,837 words in SSH are drawn from social sciences (45%), humanities (34%), and behavioural sciences (21%). As for the academic roles of the first authors, the distribution of these various roles in SciELF is as follows:
first author role | no. of articles | no. of words | % of words |
---|---|---|---|
Junior staff | 86 | 418366 | 55% |
Senior staff | 34 | 172075 | 23% |
Research student | 17 | 107998 | 14% |
Unknown | 11 | 41116 | 5% |
Masters student | 2 | 19745 | 3% |
150 | 759300 | 100% |
The SciELF corpus would not have been possible without the generous contribution of our international partners, who obtained texts and author permissions in their respective home countries. We gratefully acknowledge the contribution of the following researchers:
SciELF 2015. The SciELF Corpus. Director: Anna Mauranen. Compilation manager: Ray Carey. http://www.helsinki.fi/elfa/ (last access).