next_inactive up previous


Conserved RNA Secondary Structures in
Comoviridae Genomes



Practical Course Protocol

JANA HERTEL, MANUELA LINDEMEYER AND PETER MENZEL



University of Leipzig


Contents

Introduction

The family of Comoviridae is divided into three genera: Comovirus, Fabavirus and Nepovirus. These plant viruses have a simple construction and consist of one capsid. The isometric capsid has a diameter of 28-30 nm and consists of 32 capsomers.

Figure 1: a) Comovirus, Radish mosaic virus, b) Nepovirus, Tobacco ringspot virus; Pictures from ICTVdB Virus Description, [httb].
\resizebox*{0.5\columnwidth}{!}{\includegraphics{como/em_comos.eps}}

The segmented genome consists of two single-stranded, linear RNA molecules, named RNA1 and RNA2. Both genome segments are encapsidated separately into different types of particles (RNA1 into M and RNA2 into B particles). Additionally the virions have small particles containing no nucleid acids. Total genome length is 10000-15000 nt, whereas RNA1 contains 6kb-8kb and RNA2 contains 4kb-7.5kb. The 5' end of the genome has a genome-linked protein (VPg) and the 3' end has a poly (A) tract. See [httb] for further details.

Our aim is to find conserved RNA secondary structures in Comoviridae genomes. Those contain protein coding regions and non-coding regions. Both can form secondary structures, but the non-coding regions are more likely to contain functionally active secondary structures, which are important for the virus to live. Small changes (point mutations) in the nucleotid sequence of a structural element can cause large changes in the secondary structure and loss of function. A series of random point mutations in the sequence would quickly destroy a secondary structure if there was no mechanism to prevent this. If both positions in a base pair are mutated at once, the bases keep pairing. Therefore only these so called compensatory mutations can preserve a structural element and its function.

To find conserved secondary structures, we looked for compensatory mutations in structural elements which are present in a group of sequences.

Materials and Methods

Sequences

NCBI's ICTV database currently lists 64 species in the family of Comoviridae [httc]. There are 15 species of Comovirus, 4 of Fabavirus and 34 of Nepovirus, additionally there are 10 species, which are tentative species of Nepovirus and one virus which is unassigned in this family.

From NCBI's GenBank database we got complete genome sequences for just a few species, which are displayed in (Tab. 1). To get appropriate results, we only concentrated on these complete sequences.


Table 1: Complete genomic RNA of Comoviridae from GenBank
Genus Species Abbrev. Access No. RNA1/RNA2
Comovirus Bean pod mottle virus BPMV NC_003496.1/ NC_003495.1
  Cowpea mosaic virus CPMV NC_003549.1/ NC_003550.1
  Cowpea severe mosaic virus CPSMV NC_003545.1/ NC_003544.1
  Red clover mottle virus RCMV NC_003741.1/ NC_003738.1
  Squash mosaic virus SqMV NC_003799.1/ NC_003800.1
Fabavirus Broad bean wilt virus 1 BBWV-1 / AF225955.1
  Broad bean wilt virus 2 BBWV-2 NC_003003.1/ NC_003004.1
  Patchouli mild mosaic virus PatMMV NC_003975.1/ NC_003074.1
Nepovirus Grapevine fanleaf virus GFLV NC_003615.1/ NC_003623.1
  Grapevine chrome mosaic virus GCMV NC_003622.1/ NC_003621.1
  Beet ringspot virus BRSV NC_003693.1/ NC_003694.1
  Tabacco ringspot virus TRSV NC_003840.1/ NC_003839.1
  Tomato black ring virus TBRV NC_004439.1/ NC_004440.1
  Cycas necrotic stunt virus CNSV NC_003791.1/ NC_003792.1
Nepovirus (tentative) Satsuma dwarf virus SDV NC_003785.1/ NC_003786.1
unassigned Apple latent spherical virus ALSphV NC_003787.1/ NC_003788.1


Multiple Sequence Alignment

The alignments quality is very important to find conserved structures, because small errors in the alignment can lead to different or no predicted secondary structures. Therefore we used both ClustalW 1.82 and Roman Stocsits code2aln 1.0 to calculate the multiple sequence alignments.

From the multiple alignment we derived the phylogenetic information of the aligned species. To create the phylogenetic tree with Splitstree, we converted the resulting alignment files with aln2nex.pl in the nexus file format.

Secondary Structure Prediction

We used the Vienna RNA Package to compute the secondary structure prediction of each RNA molecule. RNAfold computes from each sequence the minimum free energy (mfe) structure, partition function (pf) and base pairing probability matrix. The mfe structures are given in bracket notation and the base pairing probability matrix is written to a ps file (dot plot).

Figure 2: a) Guide tree of all 15 RNA1 b) Guide tree of all 16 RNA2
\resizebox*{0.48\columnwidth}{!}{\includegraphics{como/rna1/allrna1_oldtree.eps}} \resizebox*{0.48\columnwidth}{!}{\includegraphics{como/rna2/allrna2_oldtree.eps}}
Figure 3: a) Guide tree of the remaining 11 RNA1 b) Guide tree of the remaining 12 RNA2
\resizebox*{0.48\columnwidth}{!}{\includegraphics{como/rna1/allrna1_tree.eps}} \resizebox*{0.48\columnwidth}{!}{\includegraphics{como/rna2/allrna2_tree.eps}}

Conserved secondary structures

The program alidot uses the secondary structure prediction (base pairing probability matrix) and the multiple sequence alignment to detect conserved secondary structure patterns in the set of RNA sequences.

Data analysis

alidot produces text output as well as postscript output. The text output contains base pairing data sorted by credibility and the conserved secondary structure in bracket notation. The postscript output contains the dot plot of the predicted secondary structure. We used the Alidot.pl viewer to display alidots output for further analysis. Alternativly we used cmount.pl to create mountain plots and RNAplot to create secondary structure drawings of special consensus sequences, obtained with consens.pl from the multiple alignment and alidots text output. With dpzoom.pl we produced the dot plot of these consensus sequences.

Figure 4: Comoviridae: a) Mountain plot of RNA1 b) Mountain plot of RNA2
\resizebox*{0.48\columnwidth}{!}{\includegraphics{como/rna1/allrna1_mount.ps}} \resizebox*{0.48\columnwidth}{!}{\includegraphics{como/rna2/allrna2_mount.ps}}

Results

First we created a multiple alignment of all viruses (RNA1 and RNA2) with ClustalW. The phylogenetic trees for RNA1 and RNA2 of all viruses of the family Comoviridae are shown in Fig.2.

Reviewing these pictures, we saw that TRSV (NC_003840 & NC_003839) has nearly the double length of all the other sequences, thus the alignment could be wrong. SDV (NC_003785 & NC_003786) are tentatively classified as Nepovirus, but it is too distant in the tree from the other species of this genus. ALSphV (NC_003787 & NC_003788) are not classified in the family Comoviridae and GFLV (NC_003615 & NC_003623) is too distant from all other species, nevertheless it is listed in the Nepovirus genus. Maybe NCBIs database is wrong. Therefore we decided to exclude these four viruses and make another multiple alignment of the remaining sequences with both ClustalW and code2aln.

The new phylogenetic tree is shown in Fig.3.

In these pictures one can easily distinguish between the three genera of Comoviridae.
The next steps were to look for conserved structures found in all genera and then examine each one separately.

Comoviridae

We created the multiple alignment of the selected 11 RNA1 and of the 12 RNA2.

Looking at their dot plots and mountain plots we found no interesting common structures. Either the ClustalW alignment was not good enough (code2aln failed aligning the RNA1 sequences1) or the sequences really have no common structures. Thus we computed the multiple alignments of each genus and then tried to find some common structures.

Comovirus

Regarding the mountain plots of Comovirus (see Fig.5) we looked at the interesting peaks and the consensus sequences of this region (see Tab.2). All pictures can be found on our webpage.

Fig. 6 shows the multiple alignment of the five Comovirus RNA2 sequences in the range of 3300 to 3340 bases, beneath the secondary structure prediction of the alignments consenus sequence. One can see six conserved base pairs, the first 4 form an energy stack, then follows an interior loop and the last 2 form a second energy stack which is then followed by a loop hairpin.

Figure 6: Secondary structure and alignment of consensus sequence 3300..3340 of Comovirus RNA2.
\resizebox*{0.48\columnwidth}{!}{\includegraphics{como/comovirus/rna1/allrna1_mount.ps}} \resizebox*{0.48\columnwidth}{!}{\includegraphics{como/comovirus/rna2/allrna2_mount.ps}}

Figure 5: a) Mountain plot of Comovirus RNA1 b) Mountain plot of Comovirus RNA2
  RNA1 RNA2
ClustalW 1820..1855 2600..2640
  3050..3105 3300..3340
  4650..4695  
code2aln 1795..1825 1865..1900
  3020..3080  
  3425..3470  
  4610..4660  
tableInteresting consensus sequences of Comovirus

\resizebox*{0.2\columnwidth}{!}{\includegraphics{como/comovirus/rna2/3300_3340_ss.eps}} \resizebox*{0.6\columnwidth}{!}{\includegraphics{como/comovirus/rna2/3300_3340_aln.eps}}

Fabavirus

Regarding the mountain plots of Fabavirus (see Fig.7) we extracted the consensus sequences you can see in Tab.3. Fig.9 shows the multiple alignment and the secondary structure in the range of 7505-7550. The secondary structure consists of one single stem-loop with a length of 14 basepairs.

\resizebox*{0.48\columnwidth}{!}{\includegraphics{como/fabavirus/rna1/allrna1_mount.ps}} \resizebox*{0.48\columnwidth}{!}{\includegraphics{como/fabavirus/rna2/allrna2_mount.ps}}

Figure 7: a) Mountain plot of Fabavirus RNA1 b) Mountain plot of Fabavirus RNA2
  RNA1 RNA2
ClustalW 2035..2110 245..285
  2495..2595 1650..1700
code2aln   240..275
    3410..3460
tableInteresting consensus sequences of Fabavirus

Nepovirus

In the consens sequence ranging from 1190 to 1285 we found a secondary structure forming a long stem-loop containing an interior loop with lots of consered base pairs (fig.10).

Figure: Secondary structure and ClustalW alignment of consensus sequence 7505..7550 of Nepovirus RNA1
\resizebox*{0.48\columnwidth}{!}{\includegraphics{como/nepovirus/rna1/allrna1_mount.ps}} \resizebox*{0.48\columnwidth}{!}{\includegraphics{como/nepovirus/rna2/allrna2_mount.ps}}

Figure 8: a) Mountain plot of Nepovirus RNA1 b) Mountain plot of Nepovirus RNA2
  RNA1 RNA2
ClustalW 545..585 845..885
  2335..2410 1250..1310
  4680..4805 2690..2770
  5580..5745 3260..3350
  5815..5955  
  7505..7550  
code2aln 200..235 1190..1285
  2310..2380 2115..2195
    3215..3295
  4430..4490 2935..2990
  4675..4755 4630..4680
  5800..5925 4740..4775
tableInteresting consensus sequences of Nepovirus

\resizebox*{0.1\columnwidth}{!}{\includegraphics{como/nepovirus/rna1/7505_7550_ss.eps}} \resizebox*{0.6\columnwidth}{!}{\includegraphics{como/nepovirus/rna1/7505_7550_aln.eps}}
Figure 10: Secondary structure and code2aln alignment of consensus sequence 1190..1285 of Nepovirus RNA2
\resizebox*{0.08\columnwidth}{!}{\includegraphics{como/nepovirus/rna2/code2aln/1190_1285_ss.eps}} \resizebox*{0.6\columnwidth}{!}{\includegraphics{como/nepovirus/rna2/code2aln/1190_1285_aln.eps}}

Discussion

We found no common structures occuring in all three genera. Viewing the genera each seperately we found a few consensus sequences with conserved elements. However, it is not sure that those conserved elements occur in all species of that genus, since we only had 4 sequences in Nepovirus, 5 in Comovirus and 2 in Fabavirus. code2aln and ClustalW nearly produced the same alignments, and showed no differences in the predicted secondary structures in the consensus sequence.
If more complete genome sequences become available, the results could be verified or discardedwhether the found structures are present in new found sequences or not.

Bibliography

HFS
Ivo L. Hofacker, Martin Fekete, and Peter F. Stadler.
Secondary Structure Prediction for Aligned RNA Sequences.

htta
http://www.ncbi.nlm.nih.gov/.
National Center for Biotechnology Information.

httb
http://www.ncbi.nlm.nih.gov/ICTVdb/ICTVdB/18000000.htm.
ICTVdB Virus Description - Comoviridae.

httc
http://www.ncbi.nlm.nih.gov/ICTVdb/Ictv/fs_comov.htm.
ICTVdB Index of Viruses - Comoviridae.

Sch98
Gottfried Schuster.
Viren in der Umwelt.
B.G.Teubner, Stuttgart, Leipzig, 1 edition, 1998.

SHS
Roman Stocsits, Ivo L. Hofacker, and Peter F. Stadler.
Conserved Secondary Structure in Hepatitis B Virus RNA.

WRHS
Christina Witwer, Susanne Rauscher, Ivo L. Hofacker, and Peter F. Stadler.
Conserved Secondary Structure in Picornaviridae Genomes.

About this document ...

Conserved RNA Secondary Structures in
Comoviridae Genomes



Practical Course Protocol

This document was generated using the LaTeX2HTML translator Version 2002 (1.62)

Copyright © 1993, 1994, 1995, 1996, Nikos Drakos, Computer Based Learning Unit, University of Leeds.
Copyright © 1997, 1998, 1999, Ross Moore, Mathematics Department, Macquarie University, Sydney.

The command line arguments were:
latex2html -split 0 protocol.tex

The translation was initiated by Praktikum on 2003-05-07


Footnotes

... sequences1
This was due to a bug in version 1.0, Roman Stocsist fixed it in version 1.1

next_inactive up previous
Praktikum 2003-05-07