Master's Thesis of Tuomo Samuli Hartonen

Novel experimental methods ChIP-exo and ChIP-Nexus allow studying transcription factor binding accurately in vivo. Only fraction of transcription factor binding mechanisms are yet fully understood and can be explained with simple positional weight matrix (PWM) models. Accurate knowledge of binding locations and patterns of transcription factors is key to understanding binding not explained by the current models. ChIP-exo and ChIP-Nexus experiments can also offer insights on the effects of single nucleotide polymorphism (SNP) at transcription factor binding sites on expression of the target genes. This is an important mechanism of action for disease-causing SNPs at non-coding genomic regions. In this thesis I describe a transcription factor binding site discovery software PeakXus specifically designed to leverage the increased resolution of ChIP-exo and ChIP-Nexus experiments. The key development principle of PeakXus is to make minimal number of assumptions of the data to allow discovery of novel binding patterns and mechanisms. PeakXus is tested with ChIP-Nexus and ChIP-exo experiments performed both in Homo sapiens and Drosophila melanogaster cell lines. PeakXus is shown to consistently find more peaks overlapping with a transcription factor-specific recognition sequence than published methods. As an application example I demonstrate how PeakXus can be coupled with Unique Molecular Identifiers (UMI) to measure the effect of a SNP overlapping with a transcription factor binding site on the in vivo transcription factor binding. The allele specific binding pipeline presented in this thesis takes better into account the read duplication bias and the varying coverage of the sequencing experiments than previous methods.