NcDNAlign

NcDNAlign - Plausible Multiple Alignments of Non-Protein-Coding Genomic Sequences

Dominic Rose, Jana Hertel, Kristin Reiche, Peter F Stadler, Jörg Hackermüller


Documentation

PDF | PS

Manual Pages

trimAln.pl


NAME

trimAln.pl - part of the NcDNAlign alignment pipeline, step (4)

Remove flanking gaps (trim leading and succeeding gaps and N's) of all *.aln files in the alignment directory. Beautify the alignments by length maximization.


SYNOPSIS

realign.pl [options]


OPTIONS

-c, --conf FILE

Path to central NcDNAlign configuration file [REQUIRED]

-o, --out [0|1]

Printing detailed results at STDOUT ON(1) or OFF(0). Default: 0.

-g, --maxGaps INT

The minimal number of connected gaps enforcing application of the beautification algorithm. The maximal allowed number of connected gaps in a sequence is maxGaps-1. Default is 120, an appropriate window size for third party programs (e.g. RNAz).

-z, --maxZeros INT

The minimal number of connected zeros enforcing a DIALIGN alignment to be splitted. The maximal allowed number of connected zeros in the DIALIGN alignment score-string (sum-of-weights indicates the degree of local similarity among sequences) is maxZeros-1. Default is 20, an appropriate value to split one alignment into two.

-d, --dropRef [0|1]

Is it allowed to drop the reference sequence out of an alignment for beautification purposes? 0=NO (default), 1=YES

-s, --silent [0|1]

Silent mode, avoid printing to STDOUT 0=OFF (Default), 1=ON

--stat [0|1]

Calculate statistics OFF(0, default) or ON(1). Statistics are printed to STDOUT and into the 'aln.statistics' file. Not fully implemented yet.

-r, --realign [0|1]

After all, should a realignment step using CLUSTALW be performed? 1=YES (default), 0=NO

-t, --tree [0|1]

If realignment step using CLUSTALW is performed, should we store the *.dnd tree file? 0=NO (default), 1=YES In case of large-screen analyses there could be many files and the tree info is not always needed.

-f, --format [CLUSTAL|FASTA|MAF]

Output alignment format. Default: CLUSTAL

-v, --version

Prints version information and exits.

-h, --help

Prints a short help message and exits.

--man

Prints a detailed manual page and exits.


DESCRIPTION

Regardless of the applied alignment algorithm, MSAs consisting of pairwise alignments may need beautification, conveniently, because of too large variations in sequence lengths. DIALIGN2-2 prints out a sum-of-weight-score indicating the degree of local similarity among sequences for each alignment column. However, we use this score to split one alignment into two if at least --maxZeros consecutive zeros are read in to separate valid local alignments from non-alignable regions ($x=20$ could be an appropriate default value). Moreover, we test the alignments (1) if their length exceeds the minimal length (Minimal length of the overlap is the same value as for retaining the local alignments in the configuration file) and (2) if they contain --maxGaps consecutive gaps (120 could be a valuable threshold for many alignment processing tools applying sliding windows). Obviously, it is useless to scan sequences that contain more consecutive gaps than the length of the sliding window. If (1) or (2) is true, the beautification algorithm is applied to the alignment until the number of aligned sequences exceeds the minimal number of species in the screen or no improvement is achieved. Trimmed/Improved alignment files are outputted and a BED file 'trimmedAln.bed' for the coordinates of the trimmed alignments (not every alignment format handles coordinates) is created.


EXAMPLES

$ trimAln.pl -c config-file.cfg


AUTHORS

Dominic Rose (dominic@bioinf.uni-leipzig.de)


AVAILABILITY

http://www.bioinf.uni-leipzig.de/Software/NcDNAlign/