NetwPartLearn ============= NetwPartLearn is a simulation tool for reverse engineering of genetic networks in case not all gene expression levels are known before transition: t | t+1 ----------------- ??00 | 1100 Experiment 1: M transition vectors ??01 | 0100 ??10 | 1110 ??11 | 1001 ... | ... ----------------- ?0?0 | 0100 Experiment 2: M transition vectors ?0?1 | 0110 ?1?0 | 0001 ?1?1 | 1000 ... | ... ----------------- 0??0 | 1001 Experiment 3: M transition vectors 0??1 | 1100 1??0 | 1010 1??1 | 0100 ... | ... ----------------- ?00? | 0101 Experiment 4: M transition vectors ?01? | 1101 ?10? | 1111 ?11? | 0010 ? denotes a gene for which the expression level is unknown at time t. In different experiments the expression levels of different combinations of genes are known. Genetic networks are modelled as dynamic Bayesian networks (DBNs) with Boolean conditional probability tables (CPTs), i.e. the CPTs can be simplified to Boolean rules. However, the inferred network is a "real" DBN as the CPTs can not be simplified to Boolean rules. NetwPartLearn evaluates in how far the genetic network can be learned from incomplete data by calculating the sensitivity, the positive predictive value and the fidelity for each number of parents k by averaging over a sample with B networks. NetwPartLearn clarifies how many transition vectors M are needed in order to infer a reliable network. It uses a partial learning (PartLearn) strategy to learn the topology and the expectation maximization implementation of LibB to infer the parameters. Please note, that this tool is only suitable for small networks (no larger than 25 nodes). INSTALLATION ============ See INSTALL for details. NetwPartLearn requires the GNU Scientific Library (GSL), the Perl Compatible Regular Expression library (PCRE) and LibB. The environment variable LIBBDIR must be set to the installation directory of LibB. export LIBBDIR=libb_installation_directory/ (BASH shell) or setenv LIBBDIR libb_installation_directory/ (C/TC-shell) USAGE ===== NetwPartLearn is a command line based tool. It simulates 1) in how far a randomly generated DBN with Boolean CPTs can be inferred from incomplete data given that the network can reside in all possible network states at time t 2) in how far a randomly generated DBN with Boolean CPTs can be inferred from incomplete data given that the network is fixed in an attractor 3) 1) and 2) for DBNs generated from the ensemble of the hypothetic haemopoietic network with and without prior knowledge In modes 1) and 2) two different types of Boolean rules -- canalyzing Boolean rules or effective Boolean rules -- can be defined for the CPTs. Output ------ Fidelity, sensitivity, positive predictive value for each k and hamming distance for the whole network are printed to STDOUT: M Fidelity k 10 0.7402597403 1 15 0.8684210526 1 M Fidelity k 10 0.0000000000 2 15 0.0797101449 2 M Sens k 10 0.7792207792 1 15 0.9342105263 1 M ppv k 10 0.9375000000 1 15 0.9342105263 1 M Sens k 10 0.1187500000 2 15 0.2862318841 2 M ppv k 10 0.9500000000 2 15 0.9518072289 2 M hamming 10 12.5500000000 15 11.6300000000 During simulation following output files are created: {k}_rule_can.txt: File with canalyzing Boolean rules for a given k k in {1,2,3,4} {k}_rule_keff.txt: File with effective Boolean rules for a given k k in {1,2,3,4} These files are used to have a fast access to rules with k<=4. All rules with k>4 are generated at runtime. hamming_{M}.txt: Files with the hamming distances of each learned network and the original network for a number of transition vectors M. logNets_{M}.txt: Files with the original and inferred networks for a number of transition vectors M. If only the topology is learned the parameters (conditional probabilities) are labelled with '-1'. attractors.txt: If NetwPartLearn is called with the '-attr [S]' or '-attr2' option this file contains the generated networks and their attractors. This file will be the input file for simulations where the networks are fixed in an attractor. Temporary files --------------- When parameters are learned, the following temporary files are created for each number of transition vectors M and network S. Input files to the program LearnBayes of LibB: constraints_b{S}_{M}.txt: Constraints for the DBN gene_b{S}_{M}.names: Identifiers for the nodes genenet_b{S}_{M}.txt: Network topology inferred by NetwPartLearn gene_b{S}_{M}.data: Transition vectors Output files: final_b{S}_{M}.txt: Topology and expectation maximization estimations of parameters log_libb_b{S}_{M}.txt: Log file Examples -------- 1) NetwPartLearn -N 6 -K 5 -n 1 -B 1000 -f 10 -t 3000 -d 20 -noi 1 -a 0.001 Generates for each M 1000 networks (-B 1000) randomly. All networks have 6 nodes (-N 6) and the maximal number of parents is 5 (-K 5). At time t only one gene is fixed (-n 1). The simulation starts at M = 10 (-f 10) and stops when M = 3000 (-t 3000) is reached. M is increased by 20 (-d 20). Noise is introduced to the transition vectors (-noi 1). Only the topology is inferred. Network states at time t are chosen such that within 2^(2^N) states each possible network state is observed once. For the chi square test a significance level of 0.001 (-a 0.001) is used. 2) NetwPartLearn -N 6 -K 5 -n 2 -B 1000 -f 10 -t 3000 -d 20 -noi 1 -a 0.001 -aD 0.00001 The same as 1), but at time t two genes are fixed (-n 2). If a set of two genes was identified to be possible parents of a node a second chi square test evaluates if both genes are responsible for the influence. The second test uses a significance level of 0.00001 (-aD 0.00001). 2) NetwPartLearn -N 6 -K 5 -n 1 -B 1000 -f 10 -t 3000 -d 20 -noi 1 -a 0.001 -aD 0.00001 -par -parDir LIBBOUT The same as 1), but the parameters are also learned (-par). Temporary files are written to LIBBOUT (-parDir LIBBOUT). 3) NetwPartLearn -N 6 -K 5 -n 1 -B 1000 -f 10 -t 3000 -d 20 -noi 1 -a 0.001 -aD 0.00001 -red The same as 1), but the network states at time t are chosen randomly (-red), i.e. redundancy is given. 4) NetwPartLearn -N 6 -K 5 -n 1 -B 1000 -f 10 -t 3000 -d 20 -noi 1 -a 0.001 -bonf 36 The same as 1), but Bonferroni correction is used for the chi square test, i.e. the significance level is reduced to 0.001/36 (-a 0.001, -bonf 36). 5) NetwPartLearn -N 6 -K 5 -n 1 -B 1000 -attr 3 Generates 1000 networks randomly which have 6 nodes, a maximal indegree of 5 and all their attractors with at least 3 states. The networks and their attractors with at least 3 states are written to 'attractors.txt'. 6) NetwPartLearn -N 6 -K 5 -n 1 -B 1000 -attr2 The same as 5), but the attractors must have exact 2 states. 7) NetwPartLearn -N 6 -K 5 -n 1 -B 1000 -nets NETFILE In NETFILE are 1000 networks, which have 6 nodes and a maximal indegree of 5. NetwPartLearn evaluates in how far those networks can be inferred given that they are fixed in their attractors (given in NETFILE) and that one gene can be fixed at time t. NETFILE must have the format of 'attractors.txt'. 8) NetwPartLearn -hyp -B 1000 -f 10 -t 3000 -d 20 -a 0.01 -n 1 Networks are chosen randomly from the ensemble of the hypothetic haemopoietic network. 9) NetwPartLearn -hyp -prior -B 1000 -f 10 -t 3000 -d 20 -a 0.01 -n 1 The same as 8), but prior knowledge is used to infer the topology. Disclaimer and Copyright ======================== NetwPartLearn is free software. It is distributed in the hope that it will be useful but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. Permission is granted for research and educational use and modification so long as 1) the program and any derived works are not redistributed for any fee, other than media costs, 2) proper credit is given to the authors and the Interdisciplinary Centre for Bioinformatics of the University Leipzig. ADDITIONAL INFORMATION ====================== For any questions and comments to the software, please send your email to Kristin Missal (kristin@bioinf.uni-leipzig.de). If you use NetwPartLearn in your work please cite: Kristin Missal, Michael A. Cross and Dirk Drasdo Gene Network Inference from Incomplete Expression Data: Transcriptional Control of Haemopoietic commitment. (2005), Bioinformatics Advance Access, bti820. Acknowledgements: This work was partly supported by the Interdisciplinary Center for Clinical Research, University of Leipzig (Project N02) and the grant BIZ-6 1/1 from the DFG.