Fortgeschrittene Methoden in der Bioinformatik (Modul 10-INF-BIO4)
Contents
The topic for this semester is the
automatic layout of RNA secondary
structures. Our goal is to understand, slightly modify, and (re)-implement
RNA layout algorithms for the explicit use in the
Vienna RNA Package.
See, in particular, the current
ONLINE MANUAL
and the
RNAlib Reference Manual.
The Vienna RNA Package already contains several drawing routines,
which we will NOT be concerned with. Our task will be exclusively
to produce layouts for the RNA secondary structures given only the
base pairing information. The only input is
an array that contains, for each sequence position
i, the pairing
partner
j or
0 for unpaired positions:
(((...))).. ->  
[ 9, 8, 7, 0, 0, 0, 3, 2, 1, 0, 0 ]
The pairtable can also hold information on so-called
G-quadruplexes. These extensions of the normal secondary
structure consist of 4 runs of G's of the same length
that separated from each other by at least unpaired
nucleotides only, while two quadruplexes are separated
from each other by at least one intervening paired or
unpaired position. Bases envolved in G-quadruplexes
are denoted by -1 in the pair_table.
The layouting algorithms should return only the x and y
coordinates and possible an error code, i.e.,
int cool_layout( short *pairtable, float *X, float *X )
Pack any optional parameters for the layout into global variables for the
time being.
Currently two "ancient" algorithm are used.
A description of the simples radial layouter
B.A.
Shapiro et al.1982
A simple radial layout that draws every each helix and base
in a multiloop at regular angular distances.
B.A. Shapiro et al.1984
The naview algorithm (Bruccoleri
and Heinrich, 1988)
A more "recent" algorithm that aims to optimize spatial requirements of
the plot can be found in
Auber et al., 2006.
Layout Requirements
Since RNA secondary structures can be regarded essentially as rooted ordered
trees, it is not surprising that the layouting is related to simple tree
layouts. There are, however, serveral specific properties of a good layout
the two bases involved in a base pair should have the same distance
for all pairs.
stacks should linear, with base pairs orthogonal to the backbone
all pairs of adjacent base pairs should have the same distance between
the pairs.
the distance of consecutive base pairs should be approximately the
same
loops should be drawn convex, and for larger loops at least
approximately circular
each base should have enough space to display its base
stem connected by very short loops should preferrably be drawn
colinear, large angles between stems in interior loops should
be formed preferentially in large interior loops
stacks/co-linearly drawn regions should preferentially be oriented
in the major compass directions N E S W NE SE SW NW and if necessary
their subdivisions
For the "exterior loop", i.e., the one containing the begin and the end
of the molecule either a circular or a linear layout should be
possible.
... and of course the layout should not overlap anywhere :-)
a deterministic layout would be most useful
Tasks
First we analyse the two (rather simple) layout algorithm to determine
how to re-implmement them in such a way as to allow easy modification
in the layout. Can we, in fact, find a common discription for them?
Identify which parameters are actually required to customize them.
Next we deal with the linear version of the exterior loop. Idea:
(1) layout the components of the structure first, determine how much
space they need (2) decide which ones to draw above/below the backbone
(x-axis) so that the backbone has to be "stretched" as
little possible.
At this point we should have an implementation to play with ...
How to modify the layout of individual loops to position the stems
in the preferred direction without "overstretching" the loop?
After having delt with these purely local constraints,
we go after global overlaps.
Here ideas from RNAfdl maybe useful
Survey the literatur for papers on RNA drawing.
A useful extension: layout for a set of structures with a common
consensus substructure should be layed out so that the consensus
structure has identical coordinates for all individual layouts.
Timetable