NetwPartLearn
=============

NetwPartLearn is a simulation tool for reverse engineering of genetic networks in
case not all gene expression levels are known before transition:

     t	     |	  t+1
     -----------------
     ??00    |	  1100	Experiment 1: M transition vectors
     ??01    |	  0100
     ??10    |	  1110
     ??11    |	  1001
     ...     |	  ...
     -----------------
     ?0?0    |	  0100	Experiment 2: M transition vectors	
     ?0?1    |	  0110
     ?1?0    |	  0001
     ?1?1    |	  1000
     ...     |	  ...
     -----------------
     0??0    |	  1001  Experiment 3: M transition vectors
     0??1    |	  1100
     1??0    |	  1010
     1??1    |	  0100
     ...     |	  ...
     -----------------
     ?00?    |	  0101  Experiment 4: M transition vectors
     ?01?    |	  1101
     ?10?    |	  1111
     ?11?    |	  0010
  
? denotes a gene for which the expression level is unknown at time t. In
different experiments the expression levels of different combinations of genes
are known.

Genetic networks are modelled as dynamic Bayesian networks (DBNs) with Boolean
conditional probability tables (CPTs), i.e. the CPTs can be simplified to
Boolean rules. However, the inferred network is a "real" DBN as the CPTs can
not be simplified to Boolean rules.  NetwPartLearn evaluates in how far the genetic
network can be learned from incomplete data by calculating the sensitivity,
the positive predictive value and the fidelity for each number of parents k by
averaging over a sample with B networks. NetwPartLearn clarifies how many transition
vectors M are needed in order to infer a reliable network.
It uses a partial learning (PartLearn) strategy to learn the topology and the expectation
maximization implementation of LibB to infer the parameters.

Please note, that this tool is only suitable for small networks (no larger
than 25 nodes).

INSTALLATION 
============

See INSTALL for details.

NetwPartLearn requires the GNU Scientific Library (GSL), the Perl Compatible Regular
Expression library (PCRE) and LibB. The environment variable LIBBDIR must be
set to the installation directory of LibB.

export LIBBDIR=libb_installation_directory/                (BASH shell)
 or
setenv LIBBDIR libb_installation_directory/                (C/TC-shell)


USAGE 
=====

NetwPartLearn is a command line based tool. It simulates

1) in how far a randomly generated DBN with Boolean CPTs can be inferred from
   incomplete data given that the network can reside in all possible network
   states at time t 
2) in how far a randomly generated DBN with Boolean CPTs can be inferred from 
   incomplete data given that the network is fixed in an attractor
3) 1) and 2) for DBNs generated from the ensemble of the 
   hypothetic haemopoietic network with and without prior knowledge

In modes 1) and 2) two different types of Boolean rules -- canalyzing Boolean
rules or effective Boolean rules -- can be defined for the CPTs.


Output
------

Fidelity, sensitivity, positive predictive value for each k and hamming
distance for the whole network are printed to STDOUT:

M       Fidelity        k
10      0.7402597403    1
15      0.8684210526    1
M       Fidelity        k
10      0.0000000000    2
15      0.0797101449    2
M       Sens		k
10      0.7792207792    1
15      0.9342105263    1
M       ppv		k
10      0.9375000000    1
15      0.9342105263    1
M       Sens		k
10      0.1187500000    2
15      0.2862318841    2
M       ppv		k
10      0.9500000000    2
15      0.9518072289    2
M       hamming
10      12.5500000000
15      11.6300000000

During simulation following output files are created:

{k}_rule_can.txt:	    File with canalyzing Boolean rules for a given k 
			    k in {1,2,3,4}
{k}_rule_keff.txt:	    File with effective Boolean rules for a given k 
			    k in {1,2,3,4}
			    These files are used to have a fast access to
			    rules with k<=4. All rules with k>4 are generated 
			    at runtime.
hamming_{M}.txt:	    Files with the hamming distances of each learned
			    network and the original network for a number 
			    of transition vectors M.
logNets_{M}.txt:	    Files with the original and inferred networks for 
			    a number of transition vectors M. If only the
			    topology is learned the parameters (conditional
			    probabilities) are labelled with '-1'.
attractors.txt:		    If NetwPartLearn is called with the '-attr [S]' or
			    '-attr2' option this file contains the generated
			    networks and their attractors.
			    This file will be the input file for simulations
			    where the networks are fixed in an attractor.	


Temporary files
---------------

When parameters are learned, the following temporary files are created for
each number of transition vectors M and network S.

Input files to the program LearnBayes of LibB:

constraints_b{S}_{M}.txt:	Constraints for the DBN
gene_b{S}_{M}.names:	        Identifiers for the nodes
genenet_b{S}_{M}.txt:		Network topology inferred by NetwPartLearn
gene_b{S}_{M}.data:	        Transition vectors	      

Output files:

final_b{S}_{M}.txt:		Topology and expectation maximization
				estimations of parameters
log_libb_b{S}_{M}.txt:		Log file 


Examples
--------

1)	NetwPartLearn -N 6 -K 5 -n 1 -B 1000 -f 10 -t 3000 -d 20 -noi 1 -a 0.001

	Generates for each M 1000 networks (-B 1000) randomly. All networks have 
	6 nodes (-N 6) and the maximal number of parents is 5 (-K 5). 
	At time t only one gene is fixed (-n 1). The simulation starts at 
	M = 10 (-f 10) and stops when M = 3000 (-t 3000) is reached. 
	M is increased by 20 (-d 20). Noise is introduced to the transition
	vectors (-noi 1). 
	Only the topology is inferred. Network states at time t are chosen 
	such that within 2^(2^N) states each possible network state is
	observed once. For the chi square test a significance level of 0.001
	(-a 0.001) is used.
	

2)	NetwPartLearn -N 6 -K 5 -n 2 -B 1000 -f 10 -t 3000 -d 20 -noi 1 -a 0.001 -aD 0.00001
	
	The same as 1), but at time t two genes are fixed (-n 2). If a set of
	two genes was identified to be possible parents of a node a second
	chi square test evaluates if both genes are responsible for
	the influence. The second test uses a significance level of 0.00001
	(-aD 0.00001). 


2)	NetwPartLearn -N 6 -K 5 -n 1 -B 1000 -f 10 -t 3000 -d 20 -noi 1 -a 0.001 -aD 0.00001 -par -parDir LIBBOUT

	The same as 1), but the parameters are also learned (-par). Temporary
	files are written to LIBBOUT (-parDir LIBBOUT).


3)	NetwPartLearn -N 6 -K 5 -n 1 -B 1000 -f 10 -t 3000 -d 20 -noi 1 -a 0.001 -aD 0.00001 -red

	The same as 1), but the network states at time t are chosen randomly (-red), 
	i.e. redundancy is given.
	

4)	NetwPartLearn -N 6 -K 5 -n 1 -B 1000 -f 10 -t 3000 -d 20 -noi 1 -a 0.001 -bonf 36

	The same as 1), but Bonferroni correction is used for the chi square
	test, i.e. the significance level is reduced to 0.001/36 (-a 0.001,
	-bonf 36).


5)	NetwPartLearn -N 6 -K 5 -n 1 -B 1000 -attr 3

	Generates 1000 networks randomly which have 6 nodes, a maximal
	indegree of 5 and all their attractors with at least 3 states. The networks and
	their attractors with at least 3 states are written to 'attractors.txt'.

6)	NetwPartLearn -N 6 -K 5 -n 1 -B 1000 -attr2

	The same as 5), but the attractors must have exact 2 states.


7)	NetwPartLearn -N 6 -K 5 -n 1 -B 1000 -nets NETFILE
	
	In NETFILE are 1000 networks, which have 6 nodes and a maximal
	indegree of 5. NetwPartLearn evaluates in how far those networks can be
	inferred given that they are fixed in their attractors (given in
	NETFILE) and that one gene can be fixed at time t. NETFILE must have
	the format of 'attractors.txt'.
	
	
8)	NetwPartLearn -hyp -B 1000 -f 10 -t 3000 -d 20 -a 0.01 -n 1

	Networks are chosen randomly from the ensemble of the hypothetic haemopoietic network. 


9)	NetwPartLearn -hyp -prior -B 1000 -f 10 -t 3000 -d 20 -a 0.01 -n 1

	The same as 8), but prior knowledge is used to infer the topology.


Disclaimer and Copyright
========================

NetwPartLearn is free software. It is distributed in the hope that it will be useful
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Permission is granted for research and educational use and modification so
long as 1) the program and any derived works are not redistributed for any
fee, other than media costs, 2) proper credit is given to the authors and the
Interdisciplinary Centre for Bioinformatics of the University Leipzig.


ADDITIONAL INFORMATION
======================

For any questions and comments to the software, please send your email to 
Kristin Missal (kristin@bioinf.uni-leipzig.de).

If you use NetwPartLearn in your work please cite:

Kristin Missal, Michael A. Cross and Dirk Drasdo
Gene Network Inference from Incomplete Expression Data: Transcriptional Control of Haemopoietic commitment. 
(2005), Bioinformatics Advance Access, bti820.


Acknowledgements:
This work was partly supported by the Interdisciplinary Center 
for Clinical Research, University of Leipzig (Project N02) and 
the grant BIZ-6 1/1 from the DFG.