Tanja Säily

MA, Postgraduate Student
Room B622, Unioninkatu 40
Tel: +358 (0)9 191 23531
E-mail: tanja.saily(at)helsinki.fi

Tanja

Research Interests

I am currently working on my PhD thesis, tentatively entitled Sociolinguistic Variation in Derivational Morphology: Studies and Methods in Diachronic Corpus Linguistics. I plan to study sociolinguistic variation and change in the morphological productivity of certain derivational affixes from Early Modern English to present-day English, using materials such as the Corpus of Early English Correspondence (CEEC) and the British National Corpus (BNC). I will also consider methodological issues in the study of language variation and change, such as the bad-data problem. To overcome some of these problems, I aim to develop new tools and methods in collaboration with computer scientists (I am a member of the multidisciplinary DAMMOC project). An example of the kinds of questions to be dealt with is provided by my MA thesis, which used the statistical method of permutation testing to solve the problem of comparing samples of different sizes.

The main purpose of my MA thesis (Säily 2008) was to find out whether there is sociolinguistic variation in the morphological productivity of the roughly synonymous nominal suffixes -ness and -ity in personal letters of the 17th century. As hypothesised, such variation is indeed observable in the productivity of the 'learned' suffix -ity, while the default suffix for forming abstract nouns from adjectives, -ness, shows no significant variation. The productivity of -ity is found to be significantly low in letters written by women, as well as in letters written during the period 1600-1639.

Women's lower productivity is explained by their restricted access to education, which was then necessary for a full command of the intricacies of -ity; it is probable that the lowest ranks would also exhibit a lower productivity for the same reason if there were enough data from them. The variation over time can be interpreted as linguistic change in progress: perhaps -ity spreads from the literate registers in which it first appeared to the more speech-like letters during the 17th century, or an increase in its use in literate registers shows up with a delay in more speech-like registers. The change may have been accelerated in the 1640s by the Civil-War effect, as there was much more contact between different kinds of people and an increase in weak social ties during the war.

The second focus of the study was on methodology. In collaboration with researcher Jukka Suomela of the Helsinki Institute for Information Technology HIIT, a little-known solution was presented to the problem of comparing type counts obtained from (sub)corpora of varying sizes (see Säily and Suomela 2009). Based on type accumulation curves and the statistical technique of permutation testing, the method is an assumption-free, highly visual way of determining whether a subcorpus is significantly different from the corpus as a whole, in terms of either the number of types or hapax legomena. The latter measure, however, was shown to be unpractical at least in the corpus used in this study (the CEEC), as the upper and lower bounds for hapaxes turned out to be too wide for any significant differences to emerge. Therefore, the results mentioned above were obtained with the measure of type counts. With the help of this method, it was possible to gain linguistically interesting and statistically significant results even though the amount of data was relatively small, c. 1.4 million words divided into various subcorpora.

Teaching

Spring 2012: ENG114a Text Analysis from English into Finnish I, group 5

Publications

(2011) Harri Siirtola, Terttu Nevalainen, Tanja Säily and Kari-Jouko Räihä. "Visualisation of text corpora: A case study of the PCEEC". How to Deal with Data: Problems and Approaches to the Investigation of the English Language over Time and Space (Studies in Variation, Contacts and Change in English 7), ed. by Terttu Nevalainen & Susan M. Fitzmaurice. Helsinki: VARIENG.

(2011) Tanja Säily, Terttu Nevalainen and Harri Siirtola. "Variation in noun and pronoun frequencies in a sociohistorical corpus of English". Literary and Linguistic Computing 26(2): 167-188.

(2011) Olga Timofeeva and Tanja Säily (eds.) Words in Dictionaries and History: Essays in Honour of R.W. McConchie. Terminology and Lexicography Research and Practice 14. Amsterdam/Philadelphia: John Benjamins.

(2011) "Variation in morphological productivity in the BNC: Sociolinguistic and methodological considerations". Corpus Linguistics and Linguistic Theory 7(1): 119-141. (Special Issue: Corpus Linguistics and Sociolinguistic Inquiry, ed. by Tyler Kendall & Gerard Van Herk.)

(2010) Harri Siirtola, Kari-Jouko Räihä, Tanja Säily and Terttu Nevalainen. "Information visualization for corpus linguistics: Towards interactive tools". Proceedings of the First International Workshop on Intelligent Visual Interfaces for Text Analysis, ed. by Shixia Liu, Michelle X. Zhou, Giuseppe Carenini & Huamin Qu. New York: ACM, 33-36.

(2009) Tanja Säily and Jukka Suomela. "Comparing type counts: The case of women, men and -ity in early English letters". Corpus Linguistics: Refinements and Reassessments (Language and Computers: Studies in Practical Linguistics 69), ed. by Antoinette Renouf & Andrew Kehoe. Amsterdam: Rodopi, 87-109.

(2008) Productivity of the Suffixes -ness and -ity in 17th-Century English Letters: A Sociolinguistic Approach. Unpublished MA thesis, Department of English, University of Helsinki.

(2006) R.W. McConchie, Olga Timofeeva, Heli Tissari and Tanja Säily (eds.) Selected Proceedings of the 2005 Symposium on New Approaches in English Historical Lexis (HEL-LEX). Somerville, MA: Cascadilla Proceedings Project.

Presentations

(2011) Panagiotis Papapetrou, Jefrey Lijffijt, Tanja Säily, Kai Puolamäki, Terttu Nevalainen and Heikki Mannila. "Are you talking Bernoulli to me? Comparing methods of assessing word frequencies". Helsinki Corpus Festival, Helsinki, Finland, September 2011.

(2011) Harri Siirtola, Terttu Nevalainen and Tanja Säily. "Tools for comparing corpora". Software demonstration, Helsinki Corpus Festival, Helsinki, Finland, September 2011.

(2011) Tanja Säily, Turo Vartiainen, Terttu Nevalainen, Jefrey Lijffijt, Harri Siirtola, Panagiotis Papapetrou, Kai Puolamäki, Kari-Jouko Räihä and Heikki Mannila. "DAMMOC: Towards interactive visual analysis of corpora". Poster, Helsinki Corpus Festival, Helsinki, Finland, September 2011.

(2011) "Sociolinguistic variation in morphological productivity in 18th-century English". International Society for the Linguistics of English (ISLE 2011), Boston, USA, June 2011.

(2011) Terttu Nevalainen, Tanja Säily and Harri Siirtola. "Tools for comparing corpora: Text Variation Explorer (TVE)". International Society for the Linguistics of English (ISLE 2011), Boston, USA, June 2011.

(2010) "Variation in noun and pronoun frequencies: Gendered drift or a corpus artefact?" 31st Annual Conference of the International Computer Archive of Modern and Medieval English (ICAME 31), Gießen, Germany, May 2010.

(2010) Jefrey Lijffijt, Harri Siirtola, Tanja Säily, Turo Vartiainen, Terttu Nevalainen and Heikki Mannila. "Towards interactive visual analysis of corpora". Poster, 31st Annual Conference of the International Computer Archive of Modern and Medieval English (ICAME 31), Gießen, Germany, May 2010.

(2010) "Substantiivi- ja pronominimäärien vaihtelu historiallisessa korpuksessa: Sosiolingvistinen muutosprosessi vai korpuksen epätasaisuutta?" XXXVII Kielitieteen päivät, Helsinki, Finland, May 2010.

(2009) "Variation in morphological productivity in the BNC: Sociolinguistic and methodological considerations". American Association for Corpus Linguistics (AACL 2009), Edmonton, Alberta, Canada, October 2009.

(2009) "The DAMMOC project: Data mining tools for changing modalities of communication". MOTIVE opening seminar, Helsinki, Finland, May 2009.