Yahoo-based Contrastive Corpus of Questions and Answers (YCCQA)
is a contrastive corpus of English, French, German and Spanish, based on the questions
and answers submitted by users of the Yahoo Answers website. It thus consists of
question-answer interactions between internet users, produced under almost identical
circumstances, for the four languages. The near-identical production contexts allow
contrastive analysis, but the sub-corpora can also be used independently for
language-specific research. The language represented in the corpus is characteristically
informal and unmonitored, illustrating the casual writing style of internet postings.
Project leader: Hendrik De Smet, Katholieke Universiteit Leuven / Research Foundation Flanders
Time of compilation: 2008–2009
Size: 29,400,000 words
Languages: English, French, German, Spanish
Number of texts/samples: about 90,000 questions and 575,000 answers
Funding: Research Foundation Flanders
Reference lines and copyright
Contrastive Corpus of Questions and Answers. 2009. Compiled by Hendrik De Smet. Department of Linguistics, University of Leuven.
Hendrik De Smet
The corpus is available to all who are
interested, free of charge, upon agreement to the terms and conditions of use. Please
contact the compiler (http://www.arts.kuleuven.be/ling/func/members/hendrik-desmet).
The corpus consists of .txt files that can be searched using any concordancer.
CoRD Entry submitted on November 13, 2009 by Hendrik De Smet.
Information for the entry was edited by Hendrik De Smet.