JANA HERTEL, MANUELA LINDEMEYER AND PETER MENZEL
University of Leipzig
![]() |
The segmented genome consists of two single-stranded, linear RNA molecules, named RNA1 and RNA2. Both genome segments are encapsidated separately into different types of particles (RNA1 into M and RNA2 into B particles). Additionally the virions have small particles containing no nucleid acids. Total genome length is 10000-15000 nt, whereas RNA1 contains 6kb-8kb and RNA2 contains 4kb-7.5kb. The 5' end of the genome has a genome-linked protein (VPg) and the 3' end has a poly (A) tract. See [httb] for further details.
Our aim is to find conserved RNA secondary structures in Comoviridae genomes. Those contain protein coding regions and non-coding regions. Both can form secondary structures, but the non-coding regions are more likely to contain functionally active secondary structures, which are important for the virus to live. Small changes (point mutations) in the nucleotid sequence of a structural element can cause large changes in the secondary structure and loss of function. A series of random point mutations in the sequence would quickly destroy a secondary structure if there was no mechanism to prevent this. If both positions in a base pair are mutated at once, the bases keep pairing. Therefore only these so called compensatory mutations can preserve a structural element and its function.
To find conserved secondary structures, we looked for compensatory mutations in structural elements which are present in a group of sequences.
From NCBI's GenBank database we got complete genome sequences for just a few species, which are displayed in (Tab. 1). To get appropriate results, we only concentrated on these complete sequences.
The alignments quality is very important to find conserved structures, because small errors in the alignment can lead to different or no predicted secondary structures. Therefore we used both ClustalW 1.82 and Roman Stocsits code2aln 1.0 to calculate the multiple sequence alignments.
From the multiple alignment we derived the phylogenetic information of the aligned species. To create the phylogenetic tree with Splitstree, we converted the resulting alignment files with aln2nex.pl in the nexus file format.
We used the Vienna RNA Package to compute the secondary structure prediction of each RNA molecule. RNAfold computes from each sequence the minimum free energy (mfe) structure, partition function (pf) and base pairing probability matrix. The mfe structures are given in bracket notation and the base pairing probability matrix is written to a ps file (dot plot).
The program alidot uses the secondary structure prediction (base pairing probability matrix) and the multiple sequence alignment to detect conserved secondary structure patterns in the set of RNA sequences.
Alidot.pl
viewer to display alidots output for further analysis.
Alternativly we used cmount.pl
to create mountain plots and RNAplot
to create secondary structure drawings of special consensus sequences, obtained with consens.pl
from the multiple alignment and alidots text output.
With dpzoom.pl
we produced the dot plot of these consensus sequences.
Reviewing these pictures, we saw that TRSV (NC_003840
& NC_003839
) has nearly the double length of all the other sequences, thus the alignment could be wrong.
SDV (NC_003785
& NC_003786
) are tentatively classified as Nepovirus, but it is too distant in the tree from the other species of this genus.
ALSphV (NC_003787
& NC_003788
) are not classified in the family Comoviridae and GFLV (NC_003615
& NC_003623
) is too distant from all other species, nevertheless it is listed in the Nepovirus genus.
Maybe NCBIs database is wrong.
Therefore we decided to exclude these four viruses and make another multiple alignment of the remaining sequences with both ClustalW and code2aln.
The new phylogenetic tree is shown in Fig.3.
In these pictures one can easily distinguish between the three genera of Comoviridae.
The next steps were to look for conserved structures found in all genera and then examine each one separately.
Looking at their dot plots and mountain plots we found no interesting common structures. Either the ClustalW alignment was not good enough (code2aln failed aligning the RNA1 sequences1) or the sequences really have no common structures. Thus we computed the multiple alignments of each genus and then tried to find some common structures.
Fig. 6 shows the multiple alignment of the five Comovirus RNA2 sequences in the range of 3300 to 3340 bases, beneath the secondary structure prediction of the alignments consenus sequence. One can see six conserved base pairs, the first 4 form an energy stack, then follows an interior loop and the last 2 form a second energy stack which is then followed by a loop hairpin.
![]() ![]()
![]() ![]() |
Regarding the mountain plots of Fabavirus (see Fig.7) we extracted the consensus sequences you can see in Tab.3. Fig.9 shows the multiple alignment and the secondary structure in the range of 7505-7550. The secondary structure consists of one single stem-loop with a length of 14 basepairs.
RNA1 | RNA2 | |
ClustalW | 2035..2110 | 245..285 |
2495..2595 | 1650..1700 | |
code2aln | 240..275 | |
3410..3460 |
In the consens sequence ranging from 1190 to 1285 we found a secondary structure forming a long stem-loop containing an interior loop with lots of consered base pairs (fig.10).
![]() ![]()
![]() ![]() |
![]() ![]() |
This document was generated using the LaTeX2HTML translator Version 2002 (1.62)
Copyright © 1993, 1994, 1995, 1996,
Nikos Drakos,
Computer Based Learning Unit, University of Leeds.
Copyright © 1997, 1998, 1999,
Ross Moore,
Mathematics Department, Macquarie University, Sydney.
The command line arguments were:
latex2html -split 0 protocol.tex
The translation was initiated by Praktikum on 2003-05-07