M.Sc. Lidia Pivovarova defends her doctoral thesis Classification and Clustering in Media Monitoring: from Knowledge Engineering to Deep Learning on Friday the 21st of December 2018 at 12 o'clock noon in the University of Helsinki Exactum Building, Room D122 (Pietari Kalmin katu 5, 1st floor). The supervisor of Lidia Pivovarova has been University Researcher Roman Yangarber (University of Helsinki). Her opponent is Professor Heng Ji (Rensselaer Polytechnic Institute, USA) and custos Professor Jyrki Kivinen (University of Helsinki). The defence will be held in English.
Classification and Clustering in Media Monitoring: from Knowledge Engineering to Deep Learning
This thesis addresses information extraction from financial news for decision support in the business domain. News is an important source of information for business decision makers, which reflects investors’ expectations and affects companies’ reputations. A vast amount of various news sources forces development of text mining algorithms to collect most crucial information and present to a user in a condensed form.
The thesis presents the PULS media monitoring system and describes several news mining tasks, namely document clustering, multi-label news classification and text polarity detection. For each task, we present an end-to-end processing pipeline, starting from data preprocessing and clean-up. A particular attention is given to named entities (NEs), that are used as one of the inputs for all presented algorithms.
Chapter 1 overviews the PULS news monitoring system and its niche within text mining for business intelligence.
In Chapter 2 we propose a novel algorithm for news grouping, which uses NE salience and exploits a specific structure of news articles.
In Chapter 3 we use automatically extracted NEs and entity descriptors in combination with keywords to improve SVM classifiers for large-scale multi-label text classification. Then, we propose a convolutional neural network (CNN) architecture that outperforms an ensemble of SVM classifiers for two different datasets. We compared various ways to represent NEs for CNN classifiers.
In Chapter 4 we use a CNN classifier for entity-level business polarity detection. We compare three methods of re-using data annotated for a different though remotely related task and demonstrate that unsupervised knowledge transfer works better than other techniques that involve manual mapping.
Availability of the dissertation
An electronic version of the doctoral dissertation is available on the e-thesis site of the University of Helsinki at http://urn.fi/URN:ISBN:978-951-51-4701-1.
Printed copies will be available on request from Lidia Pivovarova: firstname.lastname@example.org.