Publications - Published papers

Please find below publications of our group. Currently, we list 565 papers. Some of the publications are in collaboration with the group of Sonja Prohaska and are also listed in the publication list for her individual group. Access to published papers (access) is restricted to our local network and chosen collaborators. If you have problems accessing electronic information, please let us know:

©NOTICE: All papers are copyrighted by the authors; If you would like to use all or a portion of any paper, please contact the author.

Phylogenetics from Paralogs

Marc Hellmuth, Nicolas Wieseke, Markus Lechner, Hans-Peter Lenhof, Martin Middendorf, Peter F. Stadler

Download


PREPRINT 14-004: [ PDF ]  [ Software ]

Status: Published


Proc. Natl. Acad. Sci. USA 112:2058-2063

Abstract


Motivation: Sequence-based phylogenetic approaches heavily rely on initial data sets to be composed of orthologous sequences only. Paralogs are treated as a dangerous nuisance that has to be detected and removed. Recent advances in mathematical phylogenetics, however, have indicated that gene duplications can also convey meaningful phylogenetic information provided orthologs and paralogs can be distinguished with a degree of certainty.

Results: e demonstrate that plausible phylogenetic trees can be inferred from paralogy information only. To this end, tree-free estimates of orthology, the complement of paralogy, are first corrected to conform cographs and then translated into equivalent event-labeled gene phylogenies. A certain subset of the triples displayed by these trees translates into constraints on the species trees. While the resolution is very poor for individual gene families, we observe that genome-wide data sets are sufficient to generate fully resolved phylogenetic trees of several groups of eubacteria. The novel method introduced here relies on solving three intertwined NP-hard optimization problems: the cograph editing problem, the maximum consistent triple set problem, and the least resolved tree problem. Implemented as Integer Linear Program, paralogy-based phylogenies can be computed exactly for up to some twenty species and their complete protein complements.

Availability: The ILP formulation is implemented in the Software ParaPhylo using IBM ILOG CPLEX® Optimizer 12.6 and is freely available from http://pacosy.informatik.uni-leipzig.de/paraphylo/

Keywords


orthology, paralogs, phylogenetic tree, triple sets, maximum consistent triple set, NP-complete, Integer Linear Program