Publications - Supplemental material
Please find below supplemental material corresponding to publications of our group. Currently, we list 133 supplements.
If you have problems accessing electronic information, please let us know:
You may use this URL to cite or link to us.
©NOTICE: All documents are copyrighted by the authors; If you would like to use all or a portion of any paper, please contact the author.
This supplement is also available at http://www.bioinf.uni-leipzig.de/publications/supplements/22-003You may use this URL to cite or link to us.
BIOINF 22-003: Tailored machine learning models for functional RNA detection in genome wide screens
Christopher Klapproth, Siegfried Zöztsche, Felix Kühnl, Jörg Fallmann, Peter F. Stadler, Sven Findeiß
In the following section, we list input and output of used test sets. Test set 1 consists of alignments of known highly conserved noncoding RNAs and respective control sets that were sampled using selection random noncoding genomic alignments, SISSIz simulation and shuffling with the rnazRandomizeAln.pl tool. Test set 2 is built from a 27-way multiple genome alignment (FlyBase v2, last accessed 01.05.22) cut into overlapping windows using the rnazWindow.pl tool with parameter --slide=40.
Training data as ClustalW alignments as selected for training of experimental models that were later used in evaluation and prediction of Drosophila genomic data.
Test data for structural conservation filter acceptance rates and z-score SVR training and test data.
Model files
UCSC TrackHub
This public hub provides detailed data for the analysis of two-way classification approaches, i.e. RNAz 2.0 and Svhip, identifying conserved non-coding RNA elements in Drosophila melanogaster. Detailed description of the individual tracks are provided within the hub. Please go to https://genome-euro.ucsc.edu/cgi-bin/hgHubConnect and copy the link http://www.bioinf.uni-leipzig.de/Publications/SUPPLEMENTS/22-003/SvhipDmelHub/hub.txt into the URL text field and click "Add Hub". You will be directly forwarded to the UCSC Genome Browser loading D. melanogaster Assembly dm6. Please add chr3R:17,645,050-17,647,096 to the "Position/Search Term" text field and click "GO". You should get something similar to the picture below.
In the following section, we list input and output of used test sets. Test set 1 consists of alignments of known highly conserved noncoding RNAs and respective control sets that were sampled using selection random noncoding genomic alignments, SISSIz simulation and shuffling with the rnazRandomizeAln.pl tool. Test set 2 is built from a 27-way multiple genome alignment (FlyBase v2, last accessed 01.05.22) cut into overlapping windows using the rnazWindow.pl tool with parameter --slide=40.
Test set | Folder |
---|---|
Test Set 1 | Test_Set_1_RNAz |
Test Set 2 | Test_Set_2_Drosophila |
Training data as ClustalW alignments as selected for training of experimental models that were later used in evaluation and prediction of Drosophila genomic data.
Training Set | Folder |
---|---|
Training Set noncoding RNA | ncRNA_training_alignments |
Training set protein coding | Protein_training_alignments |
Test data for structural conservation filter acceptance rates and z-score SVR training and test data.
Data Set | Folder |
---|---|
Structural conservation filter test set | Acceptance_rate_test |
z-score SVR raw training data | Trainingdata |
z-score SVR test data | Testdata |
Model files
ncRNA_model |
coding_model |
three_way_model |
ncRNA_trainingdata |
trainingdata_protein |
UCSC TrackHub
This public hub provides detailed data for the analysis of two-way classification approaches, i.e. RNAz 2.0 and Svhip, identifying conserved non-coding RNA elements in Drosophila melanogaster. Detailed description of the individual tracks are provided within the hub. Please go to https://genome-euro.ucsc.edu/cgi-bin/hgHubConnect and copy the link http://www.bioinf.uni-leipzig.de/Publications/SUPPLEMENTS/22-003/SvhipDmelHub/hub.txt into the URL text field and click "Add Hub". You will be directly forwarded to the UCSC Genome Browser loading D. melanogaster Assembly dm6. Please add chr3R:17,645,050-17,647,096 to the "Position/Search Term" text field and click "GO". You should get something similar to the picture below.
Utilizied annotations and predictions The underlying data displayed on UCSC as trackhub provided in BigBed format (i.e. an indexed binary format). Applying the UCSC bigBedToBed tool available at http://hgdownload.soe.ucsc.edu/admin/exe/ this format can be easily converted to human readable Bed.