Thinking back, 1993 was a special year for me: I obtained my first professorial position at the University of Helsinki, and the Academy of Finland decided to fund a project entitled Sociolinguistics and Language History. In our project proposal, Helena Raumolin-Brunberg and I had set ourselves the goal of compiling a corpus that would enable sociolinguistic research into the history of English. Little did we know at the time that this project would continue to prosper well into the 21st century.
On three separate occasions in the early to mid-1990s I was given the opportunity to carry out research at the University of Cambridge. My explorations in the open stacks of the University Library at that time added a great deal to the correspondence corpus. The advice of historians like Jeremy Maule and discussions about the project with colleagues in the Faculty of English, in particular, contributed to my awareness and appreciation of the period we had planned to cover: from the first preserved personal letters written in English at the beginning of the fifteenth century to the aftermath of the Civil War.
Over the years, the CEEC has amply rewarded our efforts. Helena and I gave our first full-length paper using the corpus data – on four changes in Early Modern English – at the 8th International Conference on English Historical Linguistics in Edinburgh in 1994. Having no idea how each of these changes would progress over time, we simply decided to analyse them in the work of 44 writers from two 30-year periods of the corpus, 1520–1550 and 1590–1620. Our results showed that all the changes had advanced from one period to the next – we had hit upon two periods that illustrated their diffusion.
The subsequent work of the CEEC team on diverse research topics has borne out the relevance of flexible periodization of the corpus to analysing language variation and change. Collaboration with data-mining specialists, notably Professor Heikki Mannila from the Helsinki Institute for Information Technology from the mid-2000s on, has opened up a whole new world of exploratory data-mining – and of potential research avenues offered by the CEEC that I, for one, had no inkling of in the early 1990s.
My interest in historical sociolinguistics grew from a combination of circumstances in the 1980s. As a graduate student I came to participate in the compilation of the Helsinki Corpus and met Terttu Nevalainen in the team. My studies in the Department of General Linguistics included sociolinguistics. In addition, I had always been concerned with societal and economic issues.
The work on the Helsinki Corpus gave rise to questions about the sociolinguistic representativeness of the material, in other words, who and what kind of people were the writers of the texts to be included in the corpus? This led to extensive reading of works on early modern social and economic history and a co-authored article with Terttu on the topic in 1989 in Neuphilologische Mitteilungen.
The sociolinguistic studies of language change made us wonder why the same methods could not be used in historical linguistics, not only in terms of genres but also going beyond texts to individual informants. The knowledge of Early Modern England we had acquired for the Helsinki Corpus project would be useful in this research. The development of this idea and the subsequent funding by the Academy of Finland in 1993 prompted the creation of the CEEC, as we needed appropriate data for our sociolinguistic research into Renaissance English.
I worked as a full-time researcher in our project 'Sociolinguistics and Language History' from 1993 to 1997. It was a time of great enthusiasm, with Arja Nurmi, Minna Palander-Collin, Jukka Keränen and Minna Nevala joining our team. A book of pilot studies, Sociolinguistics and Language History, came out in 1996, followed by a large number of articles. In 2003, Terttu and I published the results of our joint research in a book entitled Historical Sociolinguistics: Language Change in Tudor and Stuart England. My most recent articles have dealt with the behaviour of individuals under ongoing language change.
Arja Nurmi has been involved in the compilation of the CEEC corpora since the beginning of September 1993 in multiple guises (research assistant, post-graduate student, post-doctoral researcher and senior researcher). She has collaborated with and trained several generations of research assistants, with the judicious application of bribery and threats, with a constant thread of micromanaging.
Nurmi is the only person in the known universe to have read all the texts included in the various versions of the corpus family, many of them more than once. The reading has been occasioned by letter selection, editing letters from manuscript, editing post-scan texts, proofreading, checking editions against original manuscripts and POS-tagging, but occasionally also research.
Much to the dismay of defenseless research assistants, she has provided an unquenchable fountain of CEEC-compiling arcana at every (in)opportune moment, and will divulge details of the private lifes and personal characteristics of corpus informants at the drop of a hat. She is in the process of compiling a manual of texts for the PCEEC, to be made available online in the unforeseeable future.
Minna has been a member of the CEEC project since the beginning. You can read more about her on her home page.
Minna has been in the CEEC team since 1996. You can read more about her on her home page.
I started working for the Historical Sociolinguistics project in August 2000 when I was granted a six-month VARIENG scholarship for my Master's thesis work. This grant enabled me to work independently for 50 per cent of the time and contribute to the corpus compilation for the rest. After finishing my thesis on corpus-based description of generic pronouns (singular they and generic he) in the British National Corpus and the ARCHER (and the scanning and proof-reading of the letter material to be included in the CEEC Extension), I realized that I wanted to know more about the methods in historical sociolinguistic and corpus compilation in general. So I wanted to stay on board, and Terttu and Helena hired me as a research assistant in spring 2001. I stayed in this position until I graduated with a Master's degree in late 2002, after which I received 4-year funding for my doctoral work from LANGNET (the Finnish Graduate School in Language Studies). My dissertation, published in Mémoires de la Société Néophilologique (Vol. 71), explores the sociolinguistic variation of common-number expressions in Modern and Present-day English. The modern period material consists of the Corpus of Early English Correspondence and its Extension. Today (from late 2008 onwards), I try to make the best use of my knowledge of corpus-driven variationist sociolinguistics, learned while working with the Historical Sociolinguistics project compiling the CEECE, and work for compiling an electronic Corpus of English in Finland (FIN-CE). The objective of this post-doctoral research, funded by VARIENG, is to investigate the synchronic sociolinguistic variation of English used in Finland and explore ways of modelling the outcome of language contact in early 21st century Finland.
I have mostly worked with the compilation of the CEEC Extension and the writer database of the corpus. I started as a research assistant of the CEEC team in the autumn of 2000, and spent the next couple of years selecting and proofreading letters, gathering data of writers and recipients, and coding the material for the CEEC Extension. At the same time I wrote my MA thesis, which dealt with the letters of Queen Elizabeth I, and in 2002 I received my MA degree in English philology.
I quite liked the glimpses of eighteenth-century life witnessed in the letters I had read over the years, and in the biographies of the period which I had also began to read; hence the choice of my PhD topic, the eighteenth-century Bluestocking circle and their correspondence. I defended my dissertation in the spring of 2009. For my PhD thesis I have edited a selection of manuscript letters of the correspondence of Elizabeth Montagu and her social network, compiled into the Bluestocking Corpus. The studies concern morphosyntactic and spelling variation with regard to the influence of social networks and sociolinguistic factors.
I am currently (in 2009) on family leave. I will begin my post-doctoral research in the project Language and Identity: Variation and Change in Patterns of Interaction in the History of English with Minna Nevala and Minna Palander-Collin at the University of Helsinki.
I was first hired as a research assistant for the CEEC team in January 2003 to replace Mikko Laitinen, who had just received PhD funding from elsewhere. For the next three years I compiled the CEEC Extension and Supplement and built their sender and letter databases – somewhere along the way deciding to upgrade the latter from paper forms to computerised spreadsheets. Going digital with the databases led to the CEECer project in 2006, which I worked on together with Tanja Säily.
Working in the CEEC team became a formative experience for my academic career. Having to proofread something like 3,000 letters over three years meant that I was immersed in historical varieties of English. This strengthened my interest in the Early Modern period – and I even grew to like Late Modern English. The CEEC also led me to English historical manuscripts, as in 2004 I visited the U.K. National Archives in London to check an edited volume against the original manuscript letters. But perhaps most importantly, my current interest in the early English East India Company arose from a chance discovery, while looking for material for the CEEC Supplement, of an edition of the correspondence of the English East India Company trading post in Japan (The English Factory in Japan, 1613–1623, ed. by Anthony Farrington. London: British Library, 1991). I did my MA thesis on incidental knowledge of Japan in the letters from the East India Company merchants in Japan to their employers in England, and have since worked on loanwords from Asian languages in early East India Company correspondence.
I received my MA in 2005, and in 2006 started working on my PhD project, which is a study and a digital edition of a collection of intelligence (or 'spy') letters by Richard Cocks, later head of the East India Company trading post in Japan (see here for more).
My association with the Corpus of Early English Correspondence began in the academic year 2004–2005, when I took Terttu's seminar course on sociolinguistics and did a small-scale study of the suffixes -ness and -ity in the CEECS. Having observed an intriguing difference between men and women in the use of -ity in letters of the 17th century, I continued on this topic in my MA thesis, this time using the full CEEC (1998 version). Meanwhile, I was hired as Rod McConchie's research assistant at VARIENG in autumn 2005, moving on to assisting the CEEC team in spring 2006 – a position in which I was to remain until the end of 2008.
I started out by digitising material for the CEECE and CEECSU and updating the letter and sender databases; after learning about the material and procedures, I was able to choose texts for the corpora by myself. In autumn 2006, I coordinated a joint software engineering project between the CEEC team and students in the Department of Computer Science to create an interface that would combine the two databases and enable searching for texts by the social variables recorded in the databases. By the end of 2006, CEECer (pronounced 'seeker') had been born, and we continued developing the software until spring 2008.
I graduated MA in summer 2008. In 2008, I also worked full time as a research assistant (finalising the letter and sender databases) and as the web editor of the VARIENG e-series, Studies in Variation, Contacts and Change in English. In 2009, with 4-year funding from LANGNET, I'll start working on my PhD thesis, tentatively entitled Sociolinguistic Variation in Derivational Morphology: Studies and Methods in Diachronic Corpus Linguistics. My plans include using the CEECE to study variation in affixation across different social groups in the long 18th century.