Publications - Published papers
Please find below publications of our group. Currently, we list 565 papers. Some of the publications are in collaboration with the group of Sonja Prohaska and are also listed in the publication list for her individual group. Access to published papers (
) is restricted to our local network and chosen collaborators.
If you have problems accessing electronic information, please let us know:

©NOTICE: All papers are copyrighted by the authors; If you would like to use all or a portion of any paper, please contact the author.
Computational discovery of human coding and non-coding transcripts with conserved splice sites
Dominic Rose, Michael Hiller, Katharina Schutt, Jörg Hackermüller, Rolf Backofen, Peter F. Stadler
Download
Status: Published
Bioinformatics
Abstract
<p><b>Motivation:</b>
Long non-coding RNAs (lncRNAs) resemble protein-
coding mRNAs but do not encode proteins. Most lncRNAs are
under lower sequence constraints than protein-coding genes and
lack conserved secondary structures, making it hard to predict them
computationally.
</p>
<p><b>Results:</b>
We introduce an approach to predict spliced lncRNAs in
vertebrate genomes combining comparative genomics and machine
learning. It is based on detecting signatures of characteristic
splice site evolution in vertebrate whole genome alignments. First,
we predict individual splice sites, then assemble compatible sites
into exon candidates, and finally predict multi-exon transcripts.
Using a novel method to evaluate typical splice site substitution
patterns that explicitly takes the species phylogeny into account,
we show that individual splice sites can be accurately predicted.
Since our approach relies only on predicted splice sites, it can
uncover both coding and non-coding exons. We show that our
predicted exons and partial transcripts are mostly non-coding
and lack conserved secondary structures. These exons are of
particular interest, since existing computational approaches cannot
detect them. Transcriptome sequencing data indicate tissue-specific
expression patterns of predicted exons and there is evidence that
increasing sequencing depth and breadth will validate additional
predictions. We also found a significant enrichment of predicted exons
that form multi-exon transcript parts, and we experimentally validate
such a novel multi-exon gene. Overall, we obtain 336 novel multi-exon
transcript predictions from human intergenic regions.
Our results indicate the existence of novel human transcripts that
are conserved in evolution and our approach contributes to the
completion of the human transcript catalog.
</p>
<p>
<b>Availability and Implementation:</b>
A Perl implementation of the tree-
based log-odds scoring is available online (see supplement).
<p>
Keywords
Splicing, splice site prediction, long non-coding RNA, lncRNA, log-odds substitution scores, human genome, ncRNA
Note
doi: 10.1093/bioinformatics/btr314