A Bayesian Network View on Nested Effects Models
© The Author(s) 2009
Received: 27 June 2008
Accepted: 24 October 2008
Published: 12 November 2008
Nested effects models (NEMs) are a class of probabilistic models that were designed to reconstruct a hidden signalling structure from a large set of observable effects caused by active interventions into the signalling pathway. We give a more flexible formulation of NEMs in the language of Bayesian networks. Our framework constitutes a natural generalization of the original NEM model, since it explicitly states the assumptions that are tacitly underlying the original version. Our approach gives rise to new learning methods for NEMs, which have been implemented in the /Bioconductor package nem. We validate these methods in a simulation study and apply them to a synthetic lethality dataset in yeast.
Nested effects models (NEMs) are a class of probabilistic models. They aim to reconstruct a hidden signalling structure (e.g., a gene regulatory system) by the analysis of high-dimensional phenotypes (e.g., gene expression profiles) which are consequences of well-defined perturbations of the system (e.g., RNA interference). NEMs have been introduced by Markowetz et al. , and they have been extended by Fröhlich et al.  and Tresch and Markowetz , see also the review of Markowetz and Spang . There is an open-source software package "nem" available on the platform /Bioconductor [5, 13], which implements a collection of methods for learning NEMs from experimental data. The utility of NEMs has been shown in several biological applications (Drosophila melanogaster , Saccharomyces cerevisiae , estrogen receptor pathway, ). The model in its original formulation suffers from some ad hoc restrictions which seemingly are only imposed for the sake of computability. The present paper gives an NEM formulation in the context of Bayesian networks (BNs). Doing so, we provide a motivation for these restrictions by explicitly stating prior assumptions that are inherent to the original formulation. This leads to a natural and meaningful generalization of the NEM model.
The paper is organized as follows. Section 2 briefly recalls the original formulation of NEMs. Section 3 defines NEMs as a special instance of Bayesian networks. In Section 4, we show that this definition is equivalent to the original one if we impose suitable structural constraints. Section 5 exploits the BN framework to shed light onto the learning problem for NEMs. We propose a new approach to parameter learning, and we introduce structure priors that lead to the classical NEM as a limit case. In Section 6, a simulation study compares the performance of our approach to other implementations. Section 7 provides an application of NEMs to synthetic lethality data. In Section 8, we conclude with an outlook on further issues in NEM learning.
2. The Classical Formulation of Nested Effects Models
3. The Bayesian Network Formulation of Nested Effects Models
A Bayesian network describes the joint probability distribution of a finite family of random variables (the nodes) by a directed acyclic graph and by a family of local probability distributions, which we assume to be parameterized by a set of parameters (for details, see, e.g., ). We want to cast the situation of Section 2 in the language of Bayesian networks. Assuming the acyclicity of the graph of the previous section, this is fairly easy. A discussion on how to proceed when contains cycles is given in Section 4. We have to model a deterministic signalling hierarchy, in which some components can be probed by measurements, and some components are perturbed in order to measure the reaction of the system as a whole. All these components will be hidden nodes in the sense that no observations will be available for , and we let the topology between these nodes be identical to that in the classical model. In order to account for the data, we introduce an additional layer of observable variables (observables, ) in an obvious way: each effect node has an edge pointing to a unique (its) observable node (see Figure 1). Hence, , and we call the observation of .
We slightly abuse notation by writing for the maximum value that is assumed by a node in . Obviously, all hidden nodes are set to 0 or 1 deterministically, given their parents. The local probabilities , , remain arbitrary for the moment. Assume that we have made an intervention into the system by activating a set of nodes . This amounts to cutting all edges that lead to the nodes in and setting their states to value 1. When an intervention is performed, let be the value of . This value is uniquely determined by , as the next lemma shows.
The proof is straightforward though somewhat technical and may be skipped for first reading. Let be an ordering of the nodes compatible with , which means , . Such an ordering exists because the graph connecting the states is acyclic. The proof is by induction on the order, the case being trivial. If , there is nothing to prove. Hence, we may assume in the graph which arises from by cutting all edges that lead to a node in . Since , it follows that if and only if for some . This holds exactly if for some (in particular, ). By induction, this is the case if and only if there exists an and a directed path from to , which can then be extended to a path from to .
This formula is very intuitive. It says that if an intervention has been performed, one has to determine the unique current state of each effect node. This, in turn, determines the (conditional) probability distribution of the corresponding observable node, for which one has to calculate the probability of observing the data. The product over all effects then gives the desired result.
4. Specialization to the Original NEM Formulation
with the matrices and . The importance of (9) lies in the fact that it completely separates the estimation steps for and . The information about the topology of the Bayesian network enters the formula merely in the shape of , and the local probability distributions alone define . Hence, prior to learning the topology, one needs to learn the local probabilities only for once. Then, finding a Bayesian network that fits the data well means finding a topology which maximizes .
which for transitively closed graphs is exactly the formulation in . It has the advantage that given , the optimal can be calculated exactly and very fast, which dramatically reduces the search space and simplifies the search for a good graph . The BN formulation of NEMs implies via (10) that two graphs are indistinguishable (likelihood equivalent, they fit all data equally well) if they have the same transitive closure. It is a subject of discussion whether the transitive closure of the underlying graph is a desirable property of such a model (think of causal chains which are observed in a stable state) or not (think of the dampening of a signal when passed from one node to another, or of a snapshot of the system where the signalling happens with large time lags), see .
It should be mentioned that the graph topology in our BN formulation of NEMs is necessarily acyclic, whereas the original formulation admits arbitrary graphs. This is only an apparent restriction. Due to the transitivity assumption, effects that connect to a cycle of signals will always react in the same way. This behaviour can also be obtained by arranging the nodes of the cycle in a chain and connecting the effects to the last node of the chain. This even leaves the possibility for connecting other effects to only a subset of the signals in the cycle by attaching them to a node higher up in the chain. As a consequence, admitting cycles does not extend the model class of NEMs in the Bayesian setting.
Given an observation at observable together with the state of its parent , the quantity should not depend on the intervention during which the data were obtained, by the defining property of Bayesian networks. However, we learn the ratio separately for each intervention, that is, we learn separate local parameters , which is counterintuitive.
Reference measurements are used to calculate the ratio , raising the need for a "null" experiment corresponding to an unperturbed observation of the system, which might not be available. The null experiment enters the estimation of each ratio . This introduces an unnecessary asymmetry in the importance of intervention relative to the other interventions.
The procedure uses the data inefficiently since for a given topology, the quantities of interest , respectively, could be learned from all interventions that imply , respectively, , providing a broader basis for the estimation.
The method proposed in the last item is much more time-consuming, since the occurring probabilities have to be estimated individually for each topology. However, such a model promises to better capture the real situation, so we develop the theory into this direction.
5. NEM Learning in the Bayesian Network Setting
The hope is that the maximization in (13) can be calculated analytically or at least very efficiently, see . Then, maximization over is again done using standard optimization algorithms. Section 5.2 is devoted to this approach.
5.1. Bayesian Learning of the Local Parameters
The full Bayesian approach in a multinomial setting was introduced by Cooper and Herskovits .
Here, , and are shape parameters, which, for the sake of simplicity, are set to the same value for every effect . This assumption can be easily dropped and different priors may be used for each effect.
5.2. Maximum Likelihood Learning of the Local Parameters
Let the topology and the interventions be given. For learning the parameters of the local distributions , we perform maximum likelihood estimation in two different settings. The observables are assumed to follow either a binomial distribution or a Gaussian distribution.
For an effect , let its observable be a binary random variable with values in , and let , . The model is then completely parameterized by the topology and .
(the ratios with a denominator of zero are irrelevant for the evaluation of (22) and are set to zero).
There is an analogous way of doing ML estimation in the case of continuous observable variables if one assumes to be a normal distribution with mean and variance , , .
(quotients with a denominator of zero are again irrelevant for the evaluation of (24) and are set to zero). Note that in both the discrete and the continuous case, depends on the topology , since the topology determines the values of , , .
5.3. Structure Learning
For the penalty parameter , this is the original NEM restriction. If , each marginal node can be assigned to all suitable core nodes. As a consequence, there is always a best scoring topology with an empty core graph. makes signalling to the marginal nodes "expensive" relative to signalling in the core graph. It is unclear how to choose optimally, so we stick to the choice for the applications. Simulation studies have shown that a simple gradient ascent algorithm does very well in optimizing the topology of the Bayesian network, compared to other methods that have been proposed .
6.1. Network and Data Sampling
where is an appropriate normalization constant. Binary data (1 = effect, 0 = no effect) was simulated for the perturbation of each signal in the created network using 4 replicate measurements with type-I and type-II error rates and , which were drawn uniformly from and for each perturbation separately. This simulates individual measurement error characteristics for each experiment.
8. Summary and Outlook
Some aspects of the classical NEM concept appear in a different light when stated in the BN framework. Mainly, these are three folds: (1) the learning of the local parameters, for which we proposed new learning rules; (2) the structural constraints, they can be cast as priors on the NEM topology; (3) the distinction between hidden and observable nodes, which can be different from that of core nodes and marginal nodes.
We proposed some new lines of investigation, like a full Bayesian approach for the evaluation of , and a smooth structure prior with continuous penalty parameter . It is much easier to proceed in the BN framework and implement, for example, a boolean logic for the signal transduction, which is less simplistic than in the current model. A straightforward application of NEMs in their BN formulation to synthetic lethality data demonstrated the potential of the NEM method, with the purpose of stimulating further research in that field.
The authors like to thank Peter Bühlmann and Daniel Schöner for proposing the application of NEMs to synthetic lethality data. This work was supported by the Deutsche Forschungsgemeinschaft, the Sonderforschungsbereich SFB646. H. Fröhlich is funded by the National Genome Research Network (NGFN) of the German Federal Ministry of Education and Research (BMBF) through the platforms SMP Bioinformatics (OIGR0450) and SMP RNA (OIGR0418).
- Markowetz F, Bloch J, Spang R: Non-transcriptional pathway features reconstructed from secondary effects of RNA interference. Bioinformatics 2005, 21(21):4026-4032. 10.1093/bioinformatics/bti662View ArticleGoogle Scholar
- Fröhlich H, Fellmann M, Sültmann H, Poustka A, Beissbarth T: Estimating large-scale signaling networks through nested effect models with intervention effects from microarray data. Bioinformatics 2008, 24(22):2650-2656. 10.1093/bioinformatics/btm634View ArticleGoogle Scholar
- Tresch A, Markowetz F: Structure learning in nested effects models. Statistical Applications in Genetics and Molecular Biology 2008., 7(1, article 9):
- Markowetz F, Spang R: Inferring cellular networks—a review. BMC Bioinformatics 2007, 8, supplement 6: 1-17.Google Scholar
- Gentleman RC, Carey VJ, Bates DM, et al.: Bioconductor: open software development for computational biology and bioinformatics. Genome biology 2004, 5(10, article R80):1-16. 10.1186/gb-2004-5-10-r80View ArticleGoogle Scholar
- Markowetz F, Kostka D, Troyanskaya OG, Spang R: Nested effects models for high-dimensional phenotyping screens. Bioinformatics 2007, 23(13):i305-i312. 10.1093/bioinformatics/btm178View ArticleGoogle Scholar
- Froehlich H, Fellmann M, Sueltmann H, Poustka A, Beissbarth T: Large scale statistical inference of signaling pathways from RNAi and microarray data. BMC Bioinformatics 2007, 8, article 386: 1-15.Google Scholar
- Neapolitan RE: Learning Bayesian Networks. Prentice Hall, Upper Saddle River, NJ, USA; 2003.Google Scholar
- Jacob J, Jentsch M, Kostka D, Bentink S, Spang R: Detecting hierarchical structure in molecular characteristics of disease using transitive approximations of directed graphs. Bioinformatics 2008, 24(7):995-1001. 10.1093/bioinformatics/btn056View ArticleGoogle Scholar
- Cooper GF, Herskovits E: A Bayesian method for the induction of probabilistic networks from data. Machine Learning 1992, 9(4):309-347.MATHGoogle Scholar
- Ye P, Peyser BD, Pan X, Boeke JD, Spencer FA, Bader JS: Gene function prediction from congruent synthetic lethal interactions in yeast. Molecular Systems Biolog 2005, 1, article 2005.0026: 1. 10.1038/msb4100034View ArticleGoogle Scholar
- Mukherjee S, Speed TP: Network inference using informative priors. Proceedings of the National Academy of Sciences of the United States of America 2008, 105(38):14313-14318. 10.1073/pnas.0802272105View ArticleGoogle Scholar
- Fröhlich H, Beißbarth T, Tresch A, et al.: Analyzing gene perturbation screens with nested effects models in R and bioconductor. Bioinformatics 2008, 24(21):2549-2550. 10.1093/bioinformatics/btn446View ArticleGoogle Scholar
- Le Meur N, Gentleman R: Modeling synthetic lethality. Genome Biology 2008, 9(9, article R135):1-10. 10.1186/gb-2008-9-9-r135View ArticleGoogle Scholar
- Tong AHY, Lesage G, Bader GD, et al.: Global mapping of the yeast genetic interaction network. Science 2004, 303(5659):808-813. 10.1126/science.1091317View ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.