Conserved RNA Secondary Structures in
Comoviridae Genomes

Practical Course Protocol

JANA HERTEL, MANUELA LINDEMEYER AND PETER MENZEL

University of Leipzig

Introduction

The family of Comoviridae is divided into three genera: Comovirus, Fabavirus and Nepovirus. These plant viruses have a simple construction and consist of one capsid. The isometric capsid has a diameter of 28-30 nm and consists of 32 capsomers.

**Figure 1:** a) *Comovirus*, *Radish mosaic virus*, b) *Nepovirus*, *Tobacco ringspot virus*; Pictures from ICTVdB Virus Description, [httb].
$\resizebox*{0.5\columnwidth}{!}{\includegraphics{como/em_comos.eps}}$

The segmented genome consists of two single-stranded, linear RNA molecules, named RNA1 and RNA2. Both genome segments are encapsidated separately into different types of particles (RNA1 into M and RNA2 into B particles). Additionally the virions have small particles containing no nucleid acids. Total genome length is 10000-15000 nt, whereas RNA1 contains 6kb-8kb and RNA2 contains 4kb-7.5kb. The 5' end of the genome has a genome-linked protein (VPg) and the 3' end has a poly (A) tract. See [httb] for further details.

Our aim is to find conserved RNA secondary structures in Comoviridae genomes. Those contain protein coding regions and non-coding regions. Both can form secondary structures, but the non-coding regions are more likely to contain functionally active secondary structures, which are important for the virus to live. Small changes (point mutations) in the nucleotid sequence of a structural element can cause large changes in the secondary structure and loss of function. A series of random point mutations in the sequence would quickly destroy a secondary structure if there was no mechanism to prevent this. If both positions in a base pair are mutated at once, the bases keep pairing. Therefore only these so called compensatory mutations can preserve a structural element and its function.

To find conserved secondary structures, we looked for compensatory mutations in structural elements which are present in a group of sequences.

Materials and Methods

Sequences

NCBI's ICTV database currently lists 64 species in the family of Comoviridae [httc]. There are 15 species of Comovirus, 4 of Fabavirus and 34 of Nepovirus, additionally there are 10 species, which are tentative species of Nepovirus and one virus which is unassigned in this family.

From NCBI's GenBank database we got complete genome sequences for just a few species, which are displayed in (Tab. 1). To get appropriate results, we only concentrated on these complete sequences.

Table 1: Complete genomic RNA of Comoviridae from GenBank

Genus	Species	Abbrev.	Access No. RNA1/RNA2
Comovirus	Bean pod mottle virus	BPMV	NC_003496.1/ NC_003495.1
	Cowpea mosaic virus	CPMV	NC_003549.1/ NC_003550.1
	Cowpea severe mosaic virus	CPSMV	NC_003545.1/ NC_003544.1
	Red clover mottle virus	RCMV	NC_003741.1/ NC_003738.1
	Squash mosaic virus	SqMV	NC_003799.1/ NC_003800.1
Fabavirus	Broad bean wilt virus 1	BBWV-1	/ AF225955.1
	Broad bean wilt virus 2	BBWV-2	NC_003003.1/ NC_003004.1
	Patchouli mild mosaic virus	PatMMV	NC_003975.1/ NC_003074.1
Nepovirus	Grapevine fanleaf virus	GFLV	NC_003615.1/ NC_003623.1
	Grapevine chrome mosaic virus	GCMV	NC_003622.1/ NC_003621.1
	Beet ringspot virus	BRSV	NC_003693.1/ NC_003694.1
	Tabacco ringspot virus	TRSV	NC_003840.1/ NC_003839.1
	Tomato black ring virus	TBRV	NC_004439.1/ NC_004440.1
	Cycas necrotic stunt virus	CNSV	NC_003791.1/ NC_003792.1
Nepovirus (tentative)	Satsuma dwarf virus	SDV	NC_003785.1/ NC_003786.1
unassigned	Apple latent spherical virus	ALSphV	NC_003787.1/ NC_003788.1

Multiple Sequence Alignment

The alignments quality is very important to find conserved structures, because small errors in the alignment can lead to different or no predicted secondary structures. Therefore we used both ClustalW 1.82 and Roman Stocsits code2aln 1.0 to calculate the multiple sequence alignments.

From the multiple alignment we derived the phylogenetic information of the aligned species. To create the phylogenetic tree with Splitstree, we converted the resulting alignment files with aln2nex.pl in the nexus file format.

Secondary Structure Prediction

We used the Vienna RNA Package to compute the secondary structure prediction of each RNA molecule. RNAfold computes from each sequence the minimum free energy (mfe) structure, partition function (pf) and base pairing probability matrix. The mfe structures are given in bracket notation and the base pairing probability matrix is written to a ps file (dot plot).

**Figure 2:** a) Guide tree of all 15 RNA1 b) Guide tree of all 16 RNA2
$\resizebox{0.48\columnwidth}{!}{\includegraphics{como/rna1/allrna1_oldtree.eps}}$ $\resizebox{0.48\columnwidth}{!}{\includegraphics{como/rna2/allrna2_oldtree.eps}}$

**Figure 3:** a) Guide tree of the remaining 11 RNA1 b) Guide tree of the remaining 12 RNA2
$\resizebox{0.48\columnwidth}{!}{\includegraphics{como/rna1/allrna1_tree.eps}}$ $\resizebox{0.48\columnwidth}{!}{\includegraphics{como/rna2/allrna2_tree.eps}}$

Conserved secondary structures

The program alidot uses the secondary structure prediction (base pairing probability matrix) and the multiple sequence alignment to detect conserved secondary structure patterns in the set of RNA sequences.

Data analysis

alidot produces text output as well as postscript output. The text output contains base pairing data sorted by credibility and the conserved secondary structure in bracket notation. The postscript output contains the dot plot of the predicted secondary structure. We used the Alidot.pl viewer to display alidots output for further analysis. Alternativly we used cmount.pl to create mountain plots and RNAplot to create secondary structure drawings of special consensus sequences, obtained with consens.pl from the multiple alignment and alidots text output. With dpzoom.pl we produced the dot plot of these consensus sequences.

**Figure 4:** *Comoviridae*: a) Mountain plot of RNA1 b) Mountain plot of RNA2
$\resizebox{0.48\columnwidth}{!}{\includegraphics{como/rna1/allrna1_mount.ps}}$ $\resizebox{0.48\columnwidth}{!}{\includegraphics{como/rna2/allrna2_mount.ps}}$

Results

First we created a multiple alignment of all viruses (RNA1 and RNA2) with ClustalW. The phylogenetic trees for RNA1 and RNA2 of all viruses of the family Comoviridae are shown in Fig.2.

Reviewing these pictures, we saw that TRSV (NC_003840 & NC_003839) has nearly the double length of all the other sequences, thus the alignment could be wrong. SDV (NC_003785 & NC_003786) are tentatively classified as Nepovirus, but it is too distant in the tree from the other species of this genus. ALSphV (NC_003787 & NC_003788) are not classified in the family Comoviridae and GFLV (NC_003615 & NC_003623) is too distant from all other species, nevertheless it is listed in the Nepovirus genus. Maybe NCBIs database is wrong. Therefore we decided to exclude these four viruses and make another multiple alignment of the remaining sequences with both ClustalW and code2aln.

The new phylogenetic tree is shown in Fig.3.

In these pictures one can easily distinguish between the three genera of Comoviridae.
The next steps were to look for conserved structures found in all genera and then examine each one separately.

Comoviridae

We created the multiple alignment of the selected 11 RNA1 and of the 12 RNA2.

Looking at their dot plots and mountain plots we found no interesting common structures. Either the ClustalW alignment was not good enough (code2aln failed aligning the RNA1 sequences¹) or the sequences really have no common structures. Thus we computed the multiple alignments of each genus and then tried to find some common structures.

Comovirus

Regarding the mountain plots of Comovirus (see Fig.5) we looked at the interesting peaks and the consensus sequences of this region (see Tab.2). All pictures can be found on our webpage.

Fig. 6 shows the multiple alignment of the five Comovirus RNA2 sequences in the range of 3300 to 3340 bases, beneath the secondary structure prediction of the alignments consenus sequence. One can see six conserved base pairs, the first 4 form an energy stack, then follows an interior loop and the last 2 form a second energy stack which is then followed by a loop hairpin.

$\resizebox*{0.48\columnwidth}{!}{\includegraphics{como/comovirus/rna1/allrna1_mount.ps}}$ $\resizebox*{0.48\columnwidth}{!}{\includegraphics{como/comovirus/rna2/allrna2_mount.ps}}$

**Figure 5:** a) Mountain plot of *Comovirus* RNA1 b) Mountain plot of *Comovirus* RNA2
	RNA1	RNA2
`ClustalW`	1820..1855	2600..2640
	3050..3105	3300..3340
	4650..4695
`code2aln`	1795..1825	1865..1900
	3020..3080
	3425..3470
	4610..4660

tableInteresting consensus sequences of Comovirus

$\resizebox*{0.2\columnwidth}{!}{\includegraphics{como/comovirus/rna2/3300_3340_ss.eps}}$ $\resizebox*{0.6\columnwidth}{!}{\includegraphics{como/comovirus/rna2/3300_3340_aln.eps}}$

Fabavirus

Regarding the mountain plots of Fabavirus (see Fig.7) we extracted the consensus sequences you can see in Tab.3. Fig.9 shows the multiple alignment and the secondary structure in the range of 7505-7550. The secondary structure consists of one single stem-loop with a length of 14 basepairs.

$\resizebox*{0.48\columnwidth}{!}{\includegraphics{como/fabavirus/rna1/allrna1_mount.ps}}$ $\resizebox*{0.48\columnwidth}{!}{\includegraphics{como/fabavirus/rna2/allrna2_mount.ps}}$

**Figure 7:** a) Mountain plot of *Fabavirus* RNA1 b) Mountain plot of *Fabavirus* RNA2
	RNA1	RNA2
`ClustalW`	2035..2110	245..285
	2495..2595	1650..1700
`code2aln`		240..275
		3410..3460

tableInteresting consensus sequences of Fabavirus

Nepovirus

In the consens sequence ranging from 1190 to 1285 we found a secondary structure forming a long stem-loop containing an interior loop with lots of consered base pairs (fig.10).

$\resizebox*{0.48\columnwidth}{!}{\includegraphics{como/nepovirus/rna1/allrna1_mount.ps}}$ $\resizebox*{0.48\columnwidth}{!}{\includegraphics{como/nepovirus/rna2/allrna2_mount.ps}}$

**Figure 8:** a) Mountain plot of *Nepovirus* RNA1 b) Mountain plot of *Nepovirus* RNA2
	RNA1	RNA2
`ClustalW`	545..585	845..885
	2335..2410	1250..1310
	4680..4805	2690..2770
	5580..5745	3260..3350
	5815..5955
	7505..7550
`code2aln`	200..235	1190..1285
	2310..2380	2115..2195
		3215..3295
	4430..4490	2935..2990
	4675..4755	4630..4680
	5800..5925	4740..4775

tableInteresting consensus sequences of Nepovirus

$\resizebox*{0.1\columnwidth}{!}{\includegraphics{como/nepovirus/rna1/7505_7550_ss.eps}}$ $\resizebox*{0.6\columnwidth}{!}{\includegraphics{como/nepovirus/rna1/7505_7550_aln.eps}}$

**Figure 10:** Secondary structure and `code2aln` alignment of consensus sequence 1190..1285 of *Nepovirus* RNA2
$\resizebox{0.08\columnwidth}{!}{\includegraphics{como/nepovirus/rna2/code2aln/1190_1285_ss.eps}}$ $\resizebox{0.6\columnwidth}{!}{\includegraphics{como/nepovirus/rna2/code2aln/1190_1285_aln.eps}}$

Discussion

We found no common structures occuring in all three genera. Viewing the genera each seperately we found a few consensus sequences with conserved elements. However, it is not sure that those conserved elements occur in all species of that genus, since we only had 4 sequences in Nepovirus, 5 in Comovirus and 2 in Fabavirus. code2aln and ClustalW nearly produced the same alignments, and showed no differences in the predicted secondary structures in the consensus sequence.
If more complete genome sequences become available, the results could be verified or discardedwhether the found structures are present in new found sequences or not.

Bibliography

HFS: Ivo L. Hofacker, Martin Fekete, and Peter F. Stadler.
Secondary Structure Prediction for Aligned RNA Sequences.
htta: http://www.ncbi.nlm.nih.gov/.
National Center for Biotechnology Information.
httb: http://www.ncbi.nlm.nih.gov/ICTVdb/ICTVdB/18000000.htm.
ICTVdB Virus Description - Comoviridae.
httc: http://www.ncbi.nlm.nih.gov/ICTVdb/Ictv/fs_comov.htm.
ICTVdB Index of Viruses - Comoviridae.
Sch98: Gottfried Schuster.
Viren in der Umwelt.
B.G.Teubner, Stuttgart, Leipzig, 1 edition, 1998.
SHS: Roman Stocsits, Ivo L. Hofacker, and Peter F. Stadler.
Conserved Secondary Structure in Hepatitis B Virus RNA.
WRHS: Christina Witwer, Susanne Rauscher, Ivo L. Hofacker, and Peter F. Stadler.
Conserved Secondary Structure in Picornaviridae Genomes.

About this document ...

Conserved RNA Secondary Structures in
Comoviridae Genomes

Practical Course Protocol

This document was generated using the LaTeX2HTML translator Version 2002 (1.62)

The command line arguments were:
latex2html -split 0 protocol.tex

The translation was initiated by Praktikum on 2003-05-07

Footnotes

... sequences ¹: This was due to a bug in version 1.0, Roman Stocsist fixed it in version 1.1

Praktikum 2003-05-07

Conserved RNA Secondary Structures in Comoviridae Genomes Practical Course Protocol

Footnotes

Conserved RNA Secondary Structures in
Comoviridae Genomes

Practical Course Protocol