The corpus has been compiled as part of course work on four successive advanced level Corpus Methodology courses taught at the Department of English at the University of Helsinki. Each class contributed a set of texts focusing on a particular time period, gender or nationality. The choice of political speeches as the topic domain was more or less arbitrary, although arguably political language is of some interest to everyone and thus engendres ready discussion and investment of personal interest. I am grateful to the students for participating in this endeavour and for the interesting discussions and papers that the corpus has already spawned.

Part of the reason for compiling the corpus has been to use the compilation work as food for discussiong on the principles of corpus compiling and annotation as a new batch of texts is being prepared.

Online speech repositories are a readily available source of vast amounts of socially and politically significant language. The texts are openly available and usually well documented. Naturally, some attention must be afforded to the transcription process, i.e., to questions such as who transcribed the text and what conventions were followed.

The nature of political speaking

In compiling a diachronic corpus of a domain like political speeches, we must pay some attention to the kind of language such a corpus can reliably represent. Although spoken out loud, political speeches are (in most cases) far from spontaneous spoken language. In fact, political speeches are arguably one of the most important examples of texts written-to-be-spoken.

As such they represent not only the individual developing the speech, but also, and arguably even more so, the opinions of a whole political party, class, or ideology. Today especially political speeches are drafted and redrafted by teams of professional speech writers who deliberate on everything from rhetorical structures to word choices. The corpus linguistic implication is that the biographical features of the speaker, such as gender, age, background, and nationality, might best be seen as peripheral, while his or her political faction and the nature of the event comes across as ultimately more important.

Usefulness of SCPS

Political language has been one of the foci of the discipline of Critical Discourse Analysis (CDA). This corpus is marginally useful for the study of CDA topics, though it must be noted that the corpus is at present too small for well-argued analysis of political views as such. On the other hand, the corpus may be used for the study of questions such as the extent to which political speeches resemble other domains of language use and how they differ from them, how political speeches on the whole make use of affective features such as the use of pronouns and modal verbs, and how repetitive and formulaic political speeches are.

Given the small size and heterogenous composition of the corpus it is best not used at present for the study of political idealogies as such. The speeches range from those delivered at major international venues on globally important topics, to relatively minor ones given at constituency level events. At this point in time the major research potential of the corpus lies in the study of speech structure and the grammatical features of political speeches.