1. General Information
Details of the used methods to calculate the presented data can be
found in the corresponding publication by Müller et
al. 2012.
This page is intended to give an easy access and
especially the opertunity to visualize the proteome data in
the UCSC archaeal genome
browser. Hence all available files follow in principle the
UCSC bed and
gff format. In addition we provide large data files like the
databases used for the peptide search and orginial output formats like
the rcd file of RNAcode.
2. UCSC Data Integration
Three alternatives to load the data into the UCSC are possible:
- UCSC Track Hub
- go to
the UCSC trackhub page
- open the "My Hubs" tab and paste
"http://www.bioinf.uni-leipzig.de/Publications/SUPPLEMENTS/12-023/pyloriHub/hub.txt" into the
URL field
- click the Add Hub button.
- if everything worked out you should see a new track hub
which you can now load using the button "Load Selected Hubs"
- Direct Link
- Click here
to directly load the RNAcode and proteome UCSC tracks into the UCSC
browser.
- Manual bed File Uplaod
We suggest to switch the "Base Position" track to full in order
to show a genome sequence translation on top of the main UCSC
graphic. This can be done in the "Track controls" under the main
graphic which displays a genomic region and selected tracks.
Please note that the direct data upload (posibility 2 and 3)
may take a while due to the huge amount of data. Hence, we suggest to
use possibility 1) the UCSC track hub integration.
3. Data Download
3.1 Proteome Data
3.1.1 Database Construction
In order to generate a comprehensive database for the Mascot analysis
the Helicobacter pylori genome has been translated in all six
reading-frames. For each frame nucleotide triplets are trans- lated
into the corresponding amino acid. If a triplet contains non-canonical
nucleotides, i.e. other than A, C, G and T, it is translated into X,
which has no encoding in the amino acid space. The amino acid chain is
terminated if a triplet encodes a canonical stop codon. All chains
shorter than six amino acids are rejected.
The database contained this six-frame translation, all NCBI annotated
amino acid sequences and a set of decoy sequences. The decoy was
generated by reversing the annotated sequences.
3.1.2 Identification tables
Supporting Material for novel protein annotations and corrections including validation by MS/MS spectra is summarized int the supplemental PDF
3.1.3 Peptide mapping
The experimentally determined peptide fragments (PFs) were mapped with
tblastn to the H. pylori genome. Only perfect and full length
sequence matches were used for subsequent analysis.
3.2 RNAcode Data
System call: RNAcode -o OUTPUT.rcd --stop-early -p 0.05 INPUT
Data Set | File |
Genome wide RNAcode predictions full data set: | rcd file |
Genome wide RNAcode predictions full data set: | UCSC bed file |
Short ORF canidates based on RNAcode predictions: | UCSC bed file |
How to cite
If you use the data of this web site please
reference:
Stephan A. Müller et al., Identification of new
protein coding sequences and signal peptidase cleavage sites of
Helicobacter pylori strain 26695 by proteogenomics, Journal of
Proteomics, accepted.
Last modified: 2013-07-16 10:29 sven