Fortgeschrittene Methoden in der Bioinformatik (Modul 10-INF-BIO4)

Contents

The topic for this semester is the automatic layout of RNA secondary structures. Our goal is to understand, slightly modify, and (re)-implement RNA layout algorithms for the explicit use in the Vienna RNA Package. See, in particular, the current ONLINE MANUAL and the RNAlib Reference Manual.

The Vienna RNA Package already contains several drawing routines, which we will NOT be concerned with. Our task will be exclusively to produce layouts for the RNA secondary structures given only the base pairing information. The only input is

short *pair_table
an array that contains, for each sequence position i, the pairing partner j or 0 for unpaired positions:
(((...)))..   ->   [ 9, 8, 7, 0, 0, 0, 3, 2, 1, 0, 0 ]

The pairtable can also hold information on so-called G-quadruplexes. These extensions of the normal secondary structure consist of 4 runs of G's of the same length that separated from each other by at least unpaired nucleotides only, while two quadruplexes are separated from each other by at least one intervening paired or unpaired position. Bases envolved in G-quadruplexes are denoted by -1 in the pair_table.
The layouting algorithms should return only the x and y coordinates and possible an error code, i.e.,
int cool_layout( short *pairtable, float *X, float *X )
Pack any optional parameters for the layout into global variables for the time being.

Currently two "ancient" algorithm are used.

  • A description of the simples radial layouter B.A. Shapiro et al.1982
  • A simple radial layout that draws every each helix and base in a multiloop at regular angular distances. B.A. Shapiro et al.1984
  • The naview algorithm (Bruccoleri and Heinrich, 1988)
  • A more "recent" algorithm that aims to optimize spatial requirements of the plot can be found in Auber et al., 2006.

    Layout Requirements

    Since RNA secondary structures can be regarded essentially as rooted ordered trees, it is not surprising that the layouting is related to simple tree layouts. There are, however, serveral specific properties of a good layout
  • the two bases involved in a base pair should have the same distance for all pairs.
  • stacks should linear, with base pairs orthogonal to the backbone
  • all pairs of adjacent base pairs should have the same distance between the pairs.
  • the distance of consecutive base pairs should be approximately the same
  • loops should be drawn convex, and for larger loops at least approximately circular
  • each base should have enough space to display its base
  • stem connected by very short loops should preferrably be drawn colinear, large angles between stems in interior loops should be formed preferentially in large interior loops
  • stacks/co-linearly drawn regions should preferentially be oriented in the major compass directions N E S W NE SE SW NW and if necessary their subdivisions
  • For the "exterior loop", i.e., the one containing the begin and the end of the molecule either a circular or a linear layout should be possible.
  • ... and of course the layout should not overlap anywhere :-)
  • a deterministic layout would be most useful
  • Tasks

  • First we analyse the two (rather simple) layout algorithm to determine how to re-implmement them in such a way as to allow easy modification in the layout. Can we, in fact, find a common discription for them? Identify which parameters are actually required to customize them.
  • Next we deal with the linear version of the exterior loop. Idea: (1) layout the components of the structure first, determine how much space they need (2) decide which ones to draw above/below the backbone (x-axis) so that the backbone has to be "stretched" as little possible.
  • At this point we should have an implementation to play with ...
  • How to modify the layout of individual loops to position the stems in the preferred direction without "overstretching" the loop?
  • After having delt with these purely local constraints, we go after global overlaps. Here ideas from RNAfdl maybe useful
  • Survey the literatur for papers on RNA drawing.
  • A useful extension: layout for a set of structures with a common consensus substructure should be layed out so that the consensus structure has identical coordinates for all individual layouts.
  • Timetable