Quick Start¶
Download the user guide here.
1 Installation¶
metilene is available as pre-compiled versions for 32/64-bit linux, or as source code to be built from source. It runs on a normal desktop machine and supports multi-threading. However, the underlying algorithms are efficient enough to run only single-threaded, if needed.
If you do not want to use the pre-compiled versions for 32/64-bit Linux systems, you can build metilene from source. In both cases, simply download the latest version from here and extract it with
$ tar -xvzf metilene .tar.gz
go to the new directory and type
$ make
or run the pre-compiled versions directly.
2 Prepare Input File¶
The input file containing all methylation data is a sorted tab-separated file with the following format and header:
chr <tab> pos <tab> g1_xxx <tab> g1_xxx <tab> [...] <tab> g2_xxx <tab> g2_xxx <tab> [...]
where the first column refers to the chromosome, the second column to the genomic position of the CpG and all following columns to the absolute methylation ratio (in [0,1]). All ratio columns are dedicated to the group described by the prefix in their header, e.g., g1 or g2.
There are multiple ways to build such an input file. You could use our input script provided in the Downloads section or you simply use bedtools bedtools unionbedg for combining the methylation rates of your samples. Simply add a header to your file, e.g. chr start pos group_labels ... and there you have your input file.
3 Run de-novo DMR detection¶
To do a de-novo annotation of DMRs run
$ metilene -a g1 -b g2 methylation-file
Options -a and -b indicate the groups that are considered. There are some parameter you might want to change, i.e. maxdist (allowed nt distance between two CpGs within a DMR), mincpgs (minimum # of CpGs in a DMR) and minMethDiff (minimum mean methylation difference for calling DMRs). Additionally, you might want to change the number of used threads (--threads) or think about missing value estimation. For more details about paramters see the Parameters section.
4 Output¶
The output for the de-novo DMR annotation mode consists of a bed-like format:
chr <tab> start <tab> stop <tab> q-value <tab> mean difference: mean g1 - mean g2 <tab> #CpGs <tab> p (MWU) <tab> p (2D KS) <tab> mean g1 <tab> mean g2
While "mean g1" and "mean g2" refer to the absolute mean methylation level for the corresponding segment in both groups, the difference is given in the 5th column. Single CpGs are not tested using the 2D KS-test. Here, q-values are (per default) Bonferroni adjusted and based on MWU-test p-values. All outputs are unsorted when using multiple threads. We recommend to use sort:
$ metilene *options* | sort -V -k1,1 -k2,2n
for a sorted output.
The output is not filtered for significance! Please decide on your own, if you like to filer on p- or q-value and to which significance.
An easy way to filter your already called DMRs is offered by "filterOutput.pl". Furthermore, it will create some basic statistic plots characterizing your DMRs, i.e., distribution of DMR differences, DMR length in nucleotides and #CpGs, DMR differences vs. q-values, mean methylation group 1 vs. mean methylation group 2 and DMR length in nucleotides vs. length in CpGs (Fig. 1). A version of R with the ggplot2 package is required to be in PATH. DMRs can by filtered by q-value, #CpGs, length in nucleotides and mean methylation difference. 3 files are produced: (i) bedgraph file containing the methylation difference for each DMR, (ii) basic statistic pdf and (iii) filtered bedgraph-like file, containing all information already in the metilene output. To use this script pleas see the User Guide Output section.