Speaker: Jane Marie Lin, Bioinformatics and Integrative Biology Program, Univ. of Massachusetts Medical School & Boston Univ.

When: Mon, June 2, 2008

Where: CNSI Auditorium

Abstract:
Promoter sequences contain the necessary and sufficient information for cells to initiate transcription. This information is encoded in the form of 6-10 base-pair long motifs that are bound by transcription factors (TFs) to influence the transcription of downstream genes. If one simply scans promoter regions for these motifs, however, most predictions are false positives i.e. they do not bind in vivo, or are non-functional, i.e. they are bound, but have no transcriptional consequence. To address this challenge, computational scientists have developed a myriad of TF binding site prediction algorithms with varying performance accuracies. I will describe a novel modeling strategy that overcomes computational pitfalls to predict and annotate general and tissue-specific TF binding sites that are also functional. A major advance is training our models on the largest ever screen of luciferase transfection assays, as this assay determines the effect of DNA sequence alone, without epigenetic or long-range elements. We designed support vector machine models and supplied sequence information to discover TF binding motifs that are enriched in ubiquitously active as well as tissue-specific promoters. Averaged across 5-fold cross validated trials, the performance AUC was greater than 92% for the ubiquitous models, and greater than 79% for the tissue-specific models. Half of the motif predictions were supported by literature and half were novel. We are in the process of testing our predicted binding sites by site-directed mutagenesis. Preliminary tests of a general activating TF called GABP yielded an impressive 3-fold average knock down of promoter activity compared to wild type.

Leave a Reply

You must be logged in to post a comment.