Please find below supplemental material corresponding to publications of our group. Currently, we list 133 supplements.
If you have problems accessing electronic information, please let us know:
Machine readable information about the fungal organisms, such as genome sources and 3-letter abbreviations are given here: Genome Information
S2: Experimentally detected snoRNAs
The analysis of fungal snoRNAs was mainly based on five surveys
introducing experimentally detected snoRNAs from different organisms
such as N.crassa (Liu et al.), A.fumigatus (Jöchl et al.), C.albicans (Mitrovich et al.), S.cerevisiae (Piekna-Przybylska et al.), and
S.pombe (Li et al.). An overview of the retrieved snoRNAs and the
corresponding publications can be seen in the Table below.
Although the survey by Jöchl et al.
originally covered box C/D snoRNAs only, sequence AM921943 was treated as a
box H/ACA snoRNA instead of a box C/D snoRNA since this sequence shows two separated,
perfect hairpins and comprises convincing box motifs while it clearly
lacks characteristics of box C/D snoRNAs. Another issue concerns the sequences
AM921919 and AM921934 which are treated as the same snoRNA in this work
decreasing the amount to 25 box C/D snoRNAs that were used from this
publication. Both sequences map to the exact same genomic location,
despite that AM921934 comprises three point-mutation with respect to
AM921919.
All snoRNAs sets were taken from their corresponding publication,
despite the budding yest sequences, that were downloaded from the
UMass-database.
organism
box C/D snoRNAs
box H/ACA snoRNAs
publication
S3: Mapping of experimentally detected snoRNAs
The following tables display the mapping of previously, experimentally verified snoRNAs of the five fungi S.pombe, S.cerevisiae, C.albicans, A.fumigatus, and N.crassa as it was automatically detected by the snoStrip pipeline.
Mapping of experimentally detected box C/D snoRNAs [html] [csv]
Mapping of experimentally detected box H/ACA snoRNAs [html] [csv]
S4: Target RNA alignments
Table S4. The table shows the number of target RNA sequences that were gathered for
fungal organisms. Numbers in 'target alignment' denote the number of target sequences in the
respective alignment since not all single sequences targets were used by all means.
In this section we provide family-specific snoRNA sequences and alignments (in clustal or stockholm format). You can either download all box C/D or box H/ACA families at once (in the list below) or download the sequence information of a respective family of your choice (listed in the tables below).
Box CBox DBox C'Box D' Box HBox ACA Figure S6.1 Sequence logos of snoRNA specific
box motifs. Box motifs were extracted from
all snoStrip-annotated box C/D snoRNAs
(5593 sequences) and box H/ACA snoRNAs
(2331 sequences). Pictures were generated
with WebLogo.
B) Sequence Lengths and Distances betweeen Boxes
Both major snoRNA classes, box C/D and box H/ACA, are clearly distinguishable based on their distinct sequence lengths. In accordance to the published canonical length distribution, 90% of the novel snoStrip-annotated box C/D snoRNAs are found to be 80nts to 135nts in length. The median length is 93nts, see Figure S5.2 (C/D snoRNAs). Family CD_53 is the only exception since its members share sequences with lengths between 200 and 300nts. Crucial features are the distances between box C and the potential box D' as well as between box C' and D since these stretches harbor the target binding sites. Hence they need to provide a sufficient length. In case of box C/D' distances, the minimal gap is found to be 11nts while the median space is 24nts long. The gap between box C' and box D seems to be smaller. The shortest distance is 9nts long while the median is 22nts. The distance between both prime boxes is not known to be of significant relevance. A single requirement is given by a minimal distance of at least 2nts to form another kink-turn motif with the aid of snoRNP associated proteins. Larger distances do not pose a problem. Within the novel fungi snoRNAs, the shortest distance is 3nts while 80% of all prime box annotated sequences possess gaps between 6 to 31 nucleotides.
In contrast to box C/D snoRNAs, box H/ACA snoRNAs are reasonably longer. Their median sequence length is 188nt, see Figure S5.2 (H/ACA snoRNAs). The shortest sequence being annotated by snoStrip is 115nts while 90% of all sequences are between 148 and 266nts long. When comparing both hairpins, no significant difference can be observed. Both share similar median values of 85nts and 79nts for hairpin 1 (HP1) and hairpin 2 (HP2), respectively. Solely the length distribution of HP2 sequences is a little bit tighter than for HP1. Extraordinary long snoRNAs can be found in families HACA 36 (snR86) and HACA 41 (snR84) with lengths of ∼1000nt and ∼600nt, respectively. Family HACA 12 (snR30), which is ∼600nt long, provides an exceptional secondary structure with extensively enlarged 5’ hairpins and hinge regions, where the latter one is also able to form a so-called internal hairpin [Fayet-Lebaron et al. 2009].
box C/D snoRNAsbox H/ACA snoRNAs Figure S6.2 Length distributions of snoRNA sequences and distances between characteristic box C/D motifs can be seen. For H/ACA snoRNAs the hairpin lengths are depicted. Due to visibility reasons, extraordinary long families such as Nc_CD_53 (CD_53), snR30 (HACA_12), and the Saccharomycetes specific families snR86 (HACA 36), snR84 (HACA 41) were excluded from these boxplots.
S7: Phylogenetic Heatmaps
The phylogenetic distribution of box C/D and box H/ACA snoRNA families is depicted in two separate heatmaps, respectively.
Therein, the amount of snoRNA sequences belonging to a particular organism and family is color encoded.
Both images share the same structure: each column represents a specific snoRNA family while each row represents a certain organism or genus.
The NCBI-derived taxonomic classification is shown on the left hand side. SnoRNA families that appear to be lineage-specific are shown in red boxes.
The figure concerning box C/D snoRNAs is already shown in the paper as Figure 2 in the result section.
Phylogenetic heatmap of box C/D snoRNA families [png] [eps]
Phylogenetic heatmap of box H/ACA snoRNA families [png] [eps]
Figure S7.1 A heatmap of snoStrip-detected box C/D snoRNAs is shown on the previous
site. Each column represents a specific snoRNA family, while each row either represents
a certain species or genus. A taxonomic classification is shown on the left hand side.
The amount of snoRNAs detected in a specific species and snoRNA family is encoded
in a blue color scheme. Lineage specific families are boxed (A: Saccharomycotina, B:
Pezizomycotina, C: Sordariomycetes). Figure S7.2 A heatmap of snoStrip-detected box H/ACA snoRNAs is shown on the previous
site. Each column represents a specific snoRNA family, while each row either represents
a certain species or genus. A taxonomic classification is shown on the left side. The
amount of snoRNAs detected in a specific species and snoRNA family is encoded in a
blue color scheme. Lineage specific families are boxed (A: Schizosaccharomycotina, B:
Saccharomycotina, C: Pezizomycotina).
S8: Evolutionary Events in snoRNA History
In the following, a general analysis on evolutionary innovation and deletion events on sequence and family level is presented. To precisely determine evolutionary events leading to innovations and losses, an adapted version of the ePope (Hertel and Stadler) tool was applied.
The following figures show two different representations of evolutionary events mapped to the NCBI-taxonomic tree. The first one shows absolute events at the root of major fungal clades up to a level of families and orders. The second one, on the other hand, shows relative innovation and deletion events mapped to the pre-ordered
nodes of the taxonomic tree up to species level. The latter one is already shown in the original paper as Figure 3 in the result section.
Absolute innovation and deletion events [png] [eps]
Relative innovation and deletion events [png] [eps]
Figure S8.1 Absolute innovation and deletion events of snoRNAs during fungal evolution.
Figure S8.2 Relative number of gains and losses of entire snoRNA families during fungal
evolution. The relative gain is the number of gained snoRNA families compared to the
observed number of snoRNA families. The relative loss describes the number of lost
snoRNA families compared to the number of snoRNA families in the parent node of the
phylogenetic tree.
S9: Target Switches
This section deals with two 'snoRNA clans' each of which comprises more than just one previously annotated snoRNA family, whose evolutionary history is coupled through a series of target switches and major rearrangements.
Please have also a look at the 'Target switches' paragraph in the result section of the original paper.
Evolutionary history of snoRNA cluster CD_5
Since the evolutionary history of snoRNA clan CD_5 is discussed in great detail in the paper, we will solely publish the more detailed figure summarizing the evolutionary events similar to Figure 6 and Figure 7 in the paper.
Potential evolutionary History of snoRNA cluster CD_5 [png] [eps]
Evolutionary insights into a snoRNA cluster harboring members of the CD_5 snoRNA clan [png] [eps]
Figure S9.1 Potential evolutionary history of snoRNA clan CD 5 involving four different modification sites on the LSU rRNA. Gain/loss events are displayed with arrows, while potential rearrangements are shown with red stars. ⊤ 25S-1866 is solely found in Pichia. ∓ Only putative since LSU sequences are missing, but snoRNAs show convincing ASE conservation. ⊥ Only putative since no LSU sequence is present, but snoRNAs shows convincing ASE conservation for three modifications.
Evolutionary history of snoRNA cluster CD_19
A similar evolutionary history can be reconstructed for the snoRNA clan
CD\_19 including the budding yeast \sno s snR52 and snR56 as well as
three Neurospora sequences (Nc_CD_19, Nc_CD_41, and Nc_CD_42). The RNA
molecules of this snoRNA clan are known to guide two SSU
methylations: 18S-462 (S.cerevisiae 18S-420, D target), 18S-1580 (18S-1428,
D' target), and two LSU methylations: 25S-2574 (25S-1508, D target)
and 25S-4143 (25S-2921, D' target). A potential
evolutionary history is depicted in Figure below.
All four modification sites can be denoted as ancient since they map
to known methylated positions in human small and large subunit
rRNAs. However, a potential ancient state at the root of fungi
involves solely both SSU modifications. Both methylations in the LSU
at 25S-2574 and 25S-4143 are exclusively found in Pezizomycotina and
Saccharomycotina, respectively. Thus, they are rather be
reinvented in these lineages than lost in all other.
Both SSU sites are present in nearly all analyzed fungi with the
exception of lineages were only a few species are present, e.g.,
Tremellomycetes or Blastocladiomycota. A noteworthy observation is the
putative duplication of target interaction for position 18S-1580 at the root of
Pezizomycotina. It seems that the duplicated interaction is inserted
in a new single guide snoRNA. In Eurotiomycetes, on the other hand, this
anti sense element is relocated into the formerly single guide sequence that targets
the other 18S position of this snoRNA clan.
Other double guide snoRNAs can be seen in Saccharomycotina combining
the ancient target of 18S-462 with the presumably reinvented
25S-4143.
A similar behaviour is detected in several Pezizomycotina species,
where novel double guide sequence incorporate target binding
capabilities for 18S-1580 and 25S-2574. A further target switch is observed in Pichia membranifaciens, where the species specific
duplication of 18S-462 is inserted as D target in the snoRNA guiding
18S-1580.
In all but one species that are capable of guiding methylations at
18S-462, a second target at position 18S-602 is further predicted with
the same snoRNA ASE. The additional interaction is marginally
weaker than the annotated one but still rather exceptional raising the
question if potentially both positions are modified by one anti sense element.
Target 18S-462 seems also subjected to yet another reinvention since
it is also predictable as potential D' target (!) in family
CD_42. This family is exclusively found in Pezizomycotina and is
predicted to contain a highly conserved D target guiding 25S-2979
(25S-1856, ICI; 1.26). In Dothideomycetes, Eurotiomycetes, and
Leotiomycetes, an additional D' target site capable of targeting
18S-462 is found with an ICI score of 0.60 and a mean mfe of -14.57
kcal/mol. It is quite remarkable that this modification seems to be
guided by two different snoRNA families where the ASEs are located at
different sites.
Potential evolutionary history of snoRNA cluster CD_19 [png] [eps]
Figure S9.2 Potential evolutionary history of snoRNA clan CD 19 involving four different modification sites on the LSU and SSU rRNA. Gain/loss events are displayed with arrows, while potential rearrangements are shown with red stars. ∓ Targets for 25S-2574 are putative since LSU sequences are missing in these species, but the snoRNAs show convincing ASE conservation.
S10: Comparison to the Rfam database
The total amount of 18 snoRNA families comprise
sequences of two different Rfam models each. To investigate
and validate the conflations made by snoStrip, we run
CMcompare to compare the Rfam snoRNA models.
For each Rfam snoRNA family, we used
CMcompare to calculate pairwise scores to the models that are
merged by snoStrip. In the figures below, we plotted the
resulting z-score distribution to distinguish between models that are
truly merged by snoStrip and all remaining models.
Figure S10.1Comparison of all snoRNA Rfam models against both models of the each merged box C/D Rfam pair.
Figure S10.2Comparison of all snoRNA Rfam models against both models of the each merged box H/ACA Rfam pair.
S11: Ribosome profiling
To verify our snoRNA annotation, we cross-checked with available
Ribo-seq data of four different fungal organisms:
Saccharomyces cerevisiae,
Schizosaccharomyces pombe,
Candida albicans, and
Ajellomyces capsulatus.
Sequencing data of ribosomal profiling experiments does not only contain ribosom-protected mRNAs but also non-ribosomal protein-protected ncRNAs such as tRNAs, snRNAs, or snoRNAs.
Furthermore, there is a fundamental difference in the read distribution of mRNAs and ncRNAs. While mRNAs share a quite uniform read distribution with a visible 3nt periodicity, in response to the 3-letter genetic code, ncRNAs show a rather tight read distribution embracing only these regions that were protein-protected against RNase digestion, which is an essential part of Ribo-seq library preparation.