This is the time table of the general research seminar in language technology. Presentations will be given primarily by doctoral students but also other researchers in language technology including external guests
Title: Reports from conferences and on-going/coming projects
September, 21: Anssi Yli-Jyrä
Title: A High Coverage (99.994%) Limit on Two-Sided Embedding in
Universal Dependencies Treebanks
Abstract: A recently proposed encoding for noncrossing digraphs can be used to implement
generic inference over families of these digraphs and to carry out first-order
factored dependency parsing. It is now shown that the recent proposal can be
substantially streamlined without information loss. The improved encoding is
less dependent on hierarchical processing.
The encoding gives rise to a high-coverage bounded-depth approximation of the
space of noncrossing digraphs. This subset is presented elegantly by a
finite-state machine that recognizes an infinite set of encoded graphs. The set
includes more than 99.99% of the 0.6 million noncrossing graphs obtained from
the UDv2 treebanks through planarisation.
Rather than taking the low probability of the residual as a flat rate, it can be
modelled with a joint probability distribution that is factorised into two
underlying stochastic processes – the sentence length distribution and the
related conditional distribution for deep nesting. This model points out that
deep nesting in the streamlined code requires extreme sentence lengths. High
depth is categorically out in common sentence lengths but emerges slowly at
infrequent lengths that prompt further inquiry.
September, 28: Mika Hämäläinen
October, 5: Seppo Nyrkkö
Title: Ontology-related, learning tagging model on parsed text
October, 19: Niklas Laxström
November, 1: FinMT (NOTE: Wednesday!)
Title: Workshop on Machine Translation
November, 16: Mark Granroth-Wilding
Title: Unsupervised learning of cross-lingual representations with no prior linguistic knowledge.
November, 23: Aarne Talman