Documentation
Manual Pages
- ncDNAlign.1.cutSequences.pl
- ncDNAlign.2.getGwAln.pl
- ncDNAlign.3.mergeGwAln.pl
- ncDNAlign.4.realign.pl
- ncDNAlign.5.trimAln.pl
trimAln.pl
NAME
trimAln.pl
- part of the NcDNAlign alignment pipeline, step (4)
Remove flanking gaps (trim leading and succeeding gaps and N's) of all *.aln files in the alignment directory. Beautify the alignments by length maximization.
SYNOPSIS
realign.pl [options]
OPTIONS
- -c, --conf FILE
-
Path to central NcDNAlign configuration file [REQUIRED]
- -o, --out [0|1]
-
Printing detailed results at STDOUT
ON(1)
or OFF(0). Default: 0. - -g, --maxGaps INT
-
The minimal number of connected gaps enforcing application of the beautification algorithm. The maximal allowed number of connected gaps in a sequence is maxGaps-1. Default is 120, an appropriate window size for third party programs (e.g. RNAz).
- -z, --maxZeros INT
-
The minimal number of connected zeros enforcing a DIALIGN alignment to be splitted. The maximal allowed number of connected zeros in the DIALIGN alignment score-string (sum-of-weights indicates the degree of local similarity among sequences) is maxZeros-1. Default is 20, an appropriate value to split one alignment into two.
- -d, --dropRef [0|1]
-
Is it allowed to drop the reference sequence out of an alignment for beautification purposes? 0=NO (default), 1=YES
- -s, --silent [0|1]
-
Silent mode, avoid printing to STDOUT 0=OFF (Default), 1=ON
- --stat [0|1]
-
Calculate statistics OFF(0, default) or ON(1). Statistics are printed to STDOUT and into the 'aln.statistics' file. Not fully implemented yet.
- -r, --realign [0|1]
-
After all, should a realignment step using CLUSTALW be performed? 1=YES (default), 0=NO
- -t, --tree [0|1]
-
If realignment step using CLUSTALW is performed, should we store the *.dnd tree file? 0=NO (default), 1=YES In case of large-screen analyses there could be many files and the tree info is not always needed.
- -f, --format [CLUSTAL|FASTA|MAF]
-
Output alignment format. Default: CLUSTAL
- -v, --version
-
Prints version information and exits.
- -h, --help
-
Prints a short help message and exits.
- --man
-
Prints a detailed manual page and exits.
DESCRIPTION
Regardless of the applied alignment algorithm, MSAs consisting of pairwise alignments may need beautification, conveniently, because of too large variations in sequence lengths. DIALIGN2-2 prints out a sum-of-weight-score indicating the degree of local similarity among sequences for each alignment column. However, we use this score to split one alignment into two if at least --maxZeros consecutive zeros are read in to separate valid local alignments from non-alignable regions ($x=20$ could be an appropriate default value). Moreover, we test the alignments (1) if their length exceeds the minimal length (Minimal length of the overlap is the same value as for retaining the local alignments in the configuration file) and (2) if they contain --maxGaps consecutive gaps (120 could be a valuable threshold for many alignment processing tools applying sliding windows). Obviously, it is useless to scan sequences that contain more consecutive gaps than the length of the sliding window. If (1) or (2) is true, the beautification algorithm is applied to the alignment until the number of aligned sequences exceeds the minimal number of species in the screen or no improvement is achieved. Trimmed/Improved alignment files are outputted and a BED file 'trimmedAln.bed' for the coordinates of the trimmed alignments (not every alignment format handles coordinates) is created.
EXAMPLES
$ trimAln.pl -c config-file.cfg
AUTHORS
Dominic Rose (dominic@bioinf.uni-leipzig.de)