Groups
From single-sample methylation analysis to the comparison of two
groups, each comprising multiple samples.
BAT_summarize
BAT_summarize
facilitates the merging of multiple samples of two
groups into files comprising a coherent set of cytosine positions that
will be used for all downstream analyses such as calling of
differentially methylated regions (DMRs), annotation independent and
dependent analyses, or data integration (e.g. with histone
modifications, transcription factor binding sites, expression). The
coherent set of cytosines (referred to as filtered positions) is
determined by user-defined thresholds on the maximum number of missing
values for this position in each group. Only positions with a
sufficiently large number of samples per group and hence with an
accurate estimate of the biological variance in the methylation are
included in the set of filtered positions. In addition to a single
summary file containing methylation rates of all samples at filtered
positions, BAT_summary
produces the input file for the DMR caller
metilene, two bedGraph files (one per group) with mean methylation
rates, a bedGraph file with the difference in the mean methylation
rates between both groups (group1-group2), and bedGraph as well
as bigWig files for each sample.
In case of a human dataset, BAT_summarize
is also able to
automatically generate a fancy Circos plot, i.e., a genome-wide binned
methylation heatmap of all samples. For other datasets it may be
necessary to adjust the configuration files for the Circos plot
accordingly. In this case, please consult the documentation at the
Circos website.
Basic usage
BAT_summarize --in1 <list> --in2 <list> --out <prefix> --cs <file>
Output files
File |
Description |
prefix_mean_group1.bedgraph |
BedGraph and BigWig file of mean methylation rates in group 1 at filtered positions. |
prefix_mean_group2.bedgraph |
BedGraph and BigWig file of mean methylation rates in group 2 at filtered positions. |
prefix_diff_group1_group2.bedgraph |
BedGraph and BigWig file of difference in mean methylation rates between group 1 and group 2 at filtered positions (group1 - group2). |
prefix_summary_group1_group2.bedgraph |
BedGraph file of methylation rates of group 1 and group 2 at filtered positions. |
prefix_metilene_group1_group2.bedgraph |
Input file for DMR caller metilene containing methylation rates of group 1 and group 2 at filtered positions. |
prefix_sample.bedgraph |
BedGraph file for each sample containing the methylation rates at filtered positions. |
prefix_sample.bw |
BigWig file for each sample containing the methylation rates at filtered positions. |
circos.png |
Circos plot (png and vcf) illustrating methylation rates of all samples as genomic methylation heatmap. |
Option |
Description |
--in1 |
Comma-separated list of bedGraph input filenames of group 1. |
--in2 |
Comma-separated list of bedGraph input filenames of group 2. |
--out |
Prefix for output files. |
Other options
Option |
Description |
--cs |
Prefix for chrom.sizes file of corresponding referemce genome. |
--groups |
Comma-separated list of group identifiers, one per group (default: g1,g2). |
--mis |
String indicating how to encode missing values (default: NA). |
--mis1 |
Maximum number of samples in group 1 with missing values, otherwise position will be excluded (default: 0). |
--mis2 |
Maximum number of samples in group 2 with missing values, otherwise position will be excluded (default: 0). |
--h1 |
Comma-separated list of sample identifiers of group 1 (default: prefix of bedGraph input files of group 1). |
--h2 |
Comma-separated list of sample identifiers of group 2 (default: prefix of bedGraph input files of group 2). |
--cir |
Path to Circos folder. If defined, a Circos plot (i.e., genome-wide methylation heatmap of all samples) will be plotted. Requires to contain "bin" BED files. |
Option |
Description |
-c |
Path to Circos executable. Required if Circos executable is not in PATH. For installation, manual or problems please go to the circos website . |
-b |
Path to bedtools executable. Required if bedtools executable is not in PATH. For installation, manual or problems please go to the bedtools website. |
--bgbw |
Path to UCSCtools' bedGraphToBigWig executable. Required if bedGraphToBigWig executable is not in PATH. For installation, manual or problems please go to the UCSCtools website. |
(top)
BAT_overview
To get an annotation-independent overview of the methylome between the
two conditions, you can use BAT_overview
. It is basically an R
wrapper that automatically generates the following overview
statistics. A boxplot of the genome-wide average methylation level of
each sample in a group as well as a dendrogram showing the
hierarchical clustering of the methylation rates of each sample can
help to inspect the variance in the methylation level within and
between groups and detect possible outlier samples. Moreover, the
distribution of position-wise mean methylation rates in each group is
depicted as barplot for ranges of methylation levels, e.g. to detect
overall shifts in the abundance of lowly, partially, or highly
methylated Cs between the two groups. For a direct comparison of the
groups at each position, a smoothed scatter plot is generated where
the position-wise mean methylation of both groups are plotted against
each other. Finally, a histogram of the difference in the mean
methylation rate between the groups is generated.
Basic usage
Rscript BAT_overview -i <file> -o <file> [--groups <list>]
Output file
File |
Description |
output.pdf |
PDF file with basic overview plots. |
Option |
Description |
-i |
Input file (summary file produced by BAT_summarize ) with methylation rates of all samples in both groups. |
-o |
Prefix of output file (PDF). |
--groups |
Identifier for first and second group, seperated by "," (default g1,g2). Column names need to start with the group identifier. |
Other option
Option |
Description |
--miss |
String indicating how missing values are encoded (default: NA). |
(top)
BAT_annotation
BAT_annotation
provides an easy method for inspecting the
methylation of a set of annotation items. For example, these
annotation items could be DMRs (possibly subdivided into hyper- and
hypomethylated ones), transcription factor binding sites, CpG
islands/shores/shelfes, or protein/non-protein coding genes.
It reports the methylation rate for each annotation item per sample and
the average methylation rate per group in a file with bedGraph-related
format. Moreover, several graphics are automatically generated as
visualizations including the distribution of the length of annotation
items (in Cs and nucleotides), boxplots of the methylation rate for
all annotation items per sample or per group, and heatmaps of
methylation rates with a hierarchical clustering on samples and
annotation items.
Basic usage
BAT_annotation -b <file> -i <file> --groups <list> -o <file>
Output file
File |
Description |
output.txt |
file containing average methylation rates for each annotation item. Averages are given for each sample and the group means. |
output.pdf |
PDF file with annotation item overview plots, i.e. length of annotation items, average methylation per sample in annotation items, heatmaps of average methylation rates. |
Option |
Description |
-i |
Name of input file (summary file produced by BAT_summarize ) with methylation rates of all samples in both groups. |
--groups |
Identifier for first and second group, seperated by "," (default g1,g2). Column names need to start with the group identifier. |
-b |
BedGraph file containing annotation of regions, e.g. TFBS, hypo/hypermethylated regions, genes, CpG islands/shores. Format: chr <tab> start <tab> end <tab> unique_annotation_identifier <tab> group_label . |
-o |
Prefix of output files (default: current directory/annotation). |
Option |
Description |
--bedtools |
Path to bedtools executable. Required if bedtools executable is not in PATH. For installation, manual or problems please go to the bedtools website. |
-R |
Path to R executable. Required if R executable is not in PATH. For installation, manual or problems please go to the R website. |
(top)