Topi Talvitie defends his PhD thesis on Counting and Sampling Directed Acyclic Graphs for Learning Bayesian Networks

8.11.2019
On Thursday the 21st of November 2019, M.Sc. Topi Talvitie will defend his doctoral thesis on Counting and Sampling Directed Acyclic Graphs for Learning Bayesian Networks. The thesis is a part of research done in the Department of Computer Science and in the Sums of Products research group at the University of Helsinki.

M.Sc. Topi Talvitie defends his doctoral thesis Counting and Sampling Directed Acyclic Graphs for Learning Bayesian Networks on Thursday the 21st of November 2019 at 12 o'clock noon in the University of Helsinki Exactum building, Auditorium CK112 (Pietari Kalmin katu 5, Basement). His opponent is Assistant Professor Kuldeep Meel (National University of Singapore, Singapore) and custos Professor Mikko Koivisto (University of Helsinki). The defence will be held in English.

The thesis of Topi Talvitie is a part of research done in the Department of Computer Science and in the Sums of Products research group at the University of Helsinki. His supervisors have been Professor Mikko Koivisto (University of Helsinki) and Associate Professor Valentin Polishchuk (Linköping University, Sweden).

Counting and Sampling Directed Acyclic Graphs for Learning Bayesian Networks

Bayesian networks are probabilistic models that represent dependencies between random variables via directed acyclic graphs (DAGs). They provide a succinct representation for the joint distribution in cases where the dependency structure is sparse. Specifying the network by hand is often unfeasible, and thus it would be desirable to learn the model from observed data over the variables. In this thesis, we study computational problems encountered in different approaches to learning Bayesian networks. All of the problems involve counting or sampling DAGs under various constraints.

One important computational problem in the fully Bayesian approach to structure learning is the problem of sampling DAGs from the posterior distribution over all the possible structures for the Bayesian network. From the typical modeling assumptions it follows that the distribution is modular, which means that the probability of each DAG factorizes into per-node weights, each of which depends only on the parent set of the node. For this problem, we give the first exact algorithm with a time complexity bound exponential in the number of nodes, and thus polynomial in the size of the input, which consists of all the possible per-node weights. We also adapt the algorithm such that it outperforms the previous methods in the special case of sampling DAGs from the uniform distribution.

We also study the problem of counting the linear extensions of a given partial order, which appears as a subroutine in some importance sampling methods for modular distributions. This problem is a classic example of a #P-complete problem that can be approximately solved in polynomial time by reduction to sampling linear extensions uniformly at random. We present two new randomized approximation algorithms for the problem. The first algorithm extends the applicable range of an exact dynamic programming algorithm by using sampling to reduce the given instance into an easier instance. The second algorithm is obtained by combining a novel, Markov chain-based exact sampler with the Tootsie Pop algorithm, a recent generic scheme for reducing counting into sampling. Together, these two algorithms speed up approximate linear extension counting by multiple orders of magnitude in practice.

Finally, we investigate the problem of counting and sampling DAGs that are Markov equivalent to a given DAG. This problem is important in learning causal Bayesian networks, because distinct Markov equivalent DAGs cannot be distinguished only based on observational data, yet they are different from the causal viewpoint. We speed up the state-of-the-art recursive algorithm for the problem by using dynamic programming. We also present a new, tree decomposition-based algorithm, which runs in linear time if the size of the maximum clique is bounded.

Availability of the dissertation

An electronic version of the doctoral dissertation will be available on the e-thesis site of the University of Helsinki at http://urn.fi/URN:ISBN:978-951-51-5610-5.

Printed copies will be available on request from Topi Talvitie: topi.talvitie@helsinki.fi.