British Academic Written English Corpus (BAWE)
The British Academic Written English Corpus (BAWE)is a record of proficient university-level student writing at the turn of the 21st century. It contains just under 3000 good-standard student assignments (6,506,995 words). Holdings are fairly evenly distributed across four broad disciplinary areas (Arts and Humanities, Social Sciences, Life Sciences and Physical Sciences) and across four levels of study (undergraduate and taught masters level). Thirty main disciplines are represented.
The corpus (BAWE) contains 2,858 texts categorised into 13 broad genre families: Case Study Critique Design Specficiation Empathy Writing Essay Exercise Explanation Literature Survey Methodology Recount Narrative Recount Problem Question Proposal Research Report Information about genre family, discipline and level is provided in the header for each assignment file. The categorisation system is described in: Gardner, S. & H. Nesi (2013). A classification of genre families in university student writing. Applied Linguistics 34 (1) 1-29 and Nesi, H. & S. Gardner (2012). Genres across the Disciplines: Student writing in Higher Education. Cambridge: Cambridge University Press
All the assignments in the corpus were written for assessment purposes as part of coursework for British university degrees. They were all judged proficient by subject lecturers - 1,251 at distinction and 1,402 at merit grade. 1,953 were written by L1 speakers of English. The remainder were written by proficient users of English as an academic lingua franca. File headers include information about the writer (age, L1, gender, schooling), the module (title, department, disciplinary group), and the assignment (title, genre family, level, production date, grade >60%).
The BAWE corpus was preceded by a pilot corpus, described in: Nesi, H., G. Sharpling & L. Ganobcsik-Williams (2004) Student papers across the curriculum: Designing and developing a corpus of British student writing. Computers and Composition. 21 (4) 401-503 The history of the compilation of the BAWE corpus is outlined in: Alsop, S. & H. Nesi. (2009) Issues in the development of the British Academic Written English (BAWE) corpus. Corpora 4 (1) 71-83 (available from http://www.coventry.ac.uk/research/research-directory/art-design/british-academic-written-english-corpus-bawe/research-/publications/)
Encoding format: TEI XML The assignments have been annotated using a system devised in accordance with the TEI guidelines. The transcription and mark-up conventions are described in the BAWE manual document. There is a dtd file named tei_bawe.dtd. The holdings are described in an Excel spreadsheet 'BAWE.xls' http://www.coventry.ac.uk/research/research-directory/art-design/british-academic-written-english-corpus-bawe/contents-of-the-bawe-corpus/
Project leader: Hilary Nesi
Time of compilation: 2004-2008
Size:6,506,995, 2761 texts
Period: 2000- 2007
Project home page:www.coventry.ac.uk/bawe
Funding: The British Academic Written English Corpus (BAWE) was collected as part of the project, 'An Investigation of Genres of Assessed Writing in British Higher Education'. The project was funded by the Economic and Social Research Council. (2004 - 2007 project number RES-000-23-0800).
Heuboeck, A., Holmes, J. & Nesi, H. (2010) The BAWE Corpus Manual
Research assistants: Siân Alsop , Signe Ebeling , Richard Forsyth, Alois Heuboeck, Dawn Hindle, Jasper Holmes,Maria Leedham,
Collaborating researchers: Douglas Biber
Registered users can download the corpus from the Oxford Text Archive http://ota.ahds.ac.uk/headers/2539.xml
It is also available via the Sketch Engine corpus query tool http://www.sketchengine.co.uk/, by subscription or open-access (https://ca.sketchengine.co.uk/open/)
The Wordtree provides an open-access visualisation tool.
Reference line and copyright
Use of the corpus is acknowledged using the following form of words:
The data in this study come from the British Academic Written English (BAWE) corpus, which was developed at the Universities of Warwick, Reading and Oxford Brookes under the directorship of Hilary Nesi and Sheena Gardner (formerly of the Centre for Applied Linguistics [previously called CELTE], Warwick), Paul Thompson (Department of Applied Linguistics, Reading) and Paul Wickens (Westminster Institute of Education, Oxford Brookes), with funding from the ESRC (RES-000-23-0800). When referring to the BAWE corpus in your presentations and publications it is easiest to cite an original publication which describes the project. We recommend: Gardner, S. & Nesi, H. (2013) A classification of genre families in university student writing Applied Linguistics 34 (1) 1-29 or Nesi, H. & Gardner, S (2012) Genres across the Disciplines: Student writing in higher education. Cambridge University Press