Corpus of Singaporean Blogs (CoSiB)

There are two main motives behind the corpus: on the one hand, to supplement the data available in ICE-Singapore, and on the other, to include computer-mediated communication (CMC) because this written genre appears to attract features of spoken usage. Sociolinguistic parameters of the corpus include: Place of birth, Age group (under 16, 16–25, 26–35, 36–45, over 45), Gender, Linguistic Background (mother tongue, other languages spoken at home, longer stays abroad), Level of Education.

Project leader: Andrea Sand, Universität Trier
Time of compilation: 2009– 2012
Size: 200,000 words
Language: English
Number of texts/samples: 100
Period: 2006–2010
Released: 2010
Funding: Universität Trier

Reference line and Copyright

"Corpus of Singaporean Blogs (CoSiB) 2006-2010"

Copyright by Universität Trier, English Department, Andrea Sand. Only for nonprofit linguistic research.


Distributed with the corpus.


Project leader: Andrea Sand
Compilers: Bernd Elzer, Franziska Hackhausen

File format

TXT text files (bundled in a ZIP archive).

The mark-up is based on the ICE Markup Manual for Written Text, including subtext, extra-corpus data, editorial comments, untranscribed text, deleted text, foreign and indigenous words, quotations, headings, boldface, italics, typeface, underlining. In addition, the date of publication is provided for each subtext (blog post).


Available upon request - please e-mail Andrea Sand <>