University of Cambridge > Talks.cam > Microsoft Research Cambridge, public talks > Statistical Significance Analysis of Motif Discovery

Statistical Significance Analysis of Motif Discovery

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Microsoft Research Cambridge Talks Admins.

The identification of transcription factor binding sites, and of cis-regulatory elements in general, is an important step in understanding the regulation of gene expression. To address this need, many motif-finding tools have been described that can find short sequence motifs given only an input set of sequences. In the first part of the talk, I will discuss why a reliable significance evaluation should be considered an essential component of any motif finder, and then I will introduce a novel biologically realistic method to estimate the reported motif’s statistical significance based on a novel 3-Gamma approximation scheme. Furthermore, I will show how its reliability can be further improved by incorporating local base composition information. Finally, I will present GIMSAN : a tool for de novo motif finding that incorporates this novel significance evaluation technique.

In the second part of my talk, I will present ALICO (Alignment Constrained) null set generator: a framework to generate randomized versions of an input multiple sequence alignment that preserve some of its crucial features including its dependence structure. In particular, I will show that, on average, ALICO samples approximately preserve the PIDs (percent identities) between every pair of input sequences as well as the average Markov model composition. I will demonstrate its utility in phylogenetic motif finders, which are finders that leverage on conservation information.

This talk is part of the Microsoft Research Cambridge, public talks series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.

 

© 2006-2019 Talks.cam, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity