Example data set¶
For a quick start with BAT and its modules, we have assembled a small example dataset, adopted from data used in a recent lymphoma publication (link). It is a subset of a paired-end human WGBS dataset, comprising 8 samples (S1-S8), each with two sequencing runs. The samples are split up in two groups (control: S1-S4, case: S5-S8). Either start with the most minimal example data set or use a run-script and additional files to test BAT.
Minimum input data¶
- Download dataset (BAT_example_input.tar.gz, 54 MB) and extract it
$ tar xvf BAT_example_input.tar.gz
-
BAT_mapping, BAT_mapping_stat, BAT_merging, and BAT_calling can be tested with the raw data of sample S5:
S5.1_R1.fastq.gz
,S5.1_R2.fastq.gz
This minimum example data set comprises the raw reads of one sample and the already called, but not filtered reads of that sample and further 7 samples. The samples blong to two groups, each of four samples. The unmapped sample consists of two sequencing runs. These reads could be mapped to a reduced genome and merged prior to methylation calling. In addition to the raw and calles methylation data are provided. This will enable you, to run the entire toolkit on a small example region.
In a quite basic version, the tool calls are shown at the example pages. There, the tool calls are given, all output files are stated and, if plots are produced, they are presented.
Extended input data¶
We recommand to download the entire BAT example directory (985 MB),
since a variety of additional files is provided to run all BAT
tools, eg., a reduced reference genome, some gene annotations and
gene expression data. The directory BAT_example_structure
contains a basic folder structure,
i.e.,
raw
- two lanes of paired end datamapped
- output folder for mapped datacalled
- gzipped vcf files of all samplesdata
- output folder for filtered methylation files for all samplesannotation
- folder for annotation dependen analysisDMRs
- output folder for DMR dependend analysisgenomes/hg19
- reduced hg19 genome fasta, annotation of some TFBS, reduced gene annotation and the chromosome size fileexpression
- gene expression files for all samplescircos
- circos-dependent data and output folder for circos plot
Extract the example directory using
$ tar xvf BAT_example_structure.tar.gz
Using the example data, given the directory structure and provided files described above, the following scripts can be tested.
- BAT_mapping
- BAT_mapping_stat
- BAT_merging
- BAT_calling
- BAT_filter_vcf
- BAT_summarize
- BAT_overview
- BAT_annotation
- BAT_DMRcalling
- BAT_correlating
For each script, a link to the more details explanation (including the description of all parameters), the example run command, the output, and a short glimpse at the output files and plots is provided.
The entire calls for running the example data are given in the run script, which is based on the given directory structure.