bbq
is a powerful command-line tool for discovering clustered occurences of transcription factor binding sites in genomes. This is done in a multiple-alignment-like fashion by solving a certain geometric optimization problem, the so-called best barbeque problem (explaining the name bbq
).
The implementation provided here supports many practically relevant features, such as weighted sets, p-value based weighting schemes, computing the best h
solutions instead only the best solution, and several others. Most of these features are supported by two algorithms representing two different approaches to solving the given optimization problem.
The main goal of this implementation was to achieve best possible running times - together with a demand for numerous features (and combinations of features), which meant finding the right balance between an extendable and maintainable design and high performance. Templates sometimes lessened the pain of finding the right balance - anyway, anybody who makes it up to this point in the source code should keep in mind that this implementation inevitably is the result of a compromise. Whoever modifies this, should bear in mind to keep it a high-performance implementation.
A few words should be spent on the class design. All routines for scanning command line parameters and reading input files are contained in main.cpp
in the global namespace. The functions in main.cpp
create a suitable instance of the template class weighted_footprint_detector
.
Class weighted_footprint_detector
implements all crucial parts of the underlying algorithms, including string matching, computation of interval arrangements and finding best intersections of cells (based on algorithm (A1) or (A2)). Class weighted_footprint_detector
requires handling (weighted) sets or multisets. For the different versions - weighted/unweighted and sets/multisets - different classes can be passed as a template parameter to weighted_footprint_detector
: weighted_variable_bitset
for weighted bitsets of arbitrary cardinality, variable_bitset
for arbitrary length unweighted bitsets, uint_bitset
and uint64_bitset
for unweighted bitsets of limited cardinality (i.e., 32 or 64 elements), bit_multi_set
for unweighted multisets and weighted_bit_multi_set
for weighted multisets.
Due to the performance oriented design, the interface for set-handling classes is somewhat more complex than might be neccessary in a minimum-effort implementation. In case of implementing your own set-handling class, refer to the existing ones. Note that generic programming allows for flexible algorithms essentially without losing efficiency.
All classes are contained in the namespace bbq
.
A rather historical version of bbq
- demonstrating the actual idea behind the algorithms - can be found in the class footprint_detector
.