# A Bayesian Network View on Nested Effects Models

- Cordula Zeller
^{1}, - Holger Fröhlich
^{2}and - Achim Tresch
^{3}Email author

**2009**:195272

**DOI: **10.1155/2009/195272

© The Author(s) 2009

**Received: **27 June 2008

**Accepted: **24 October 2008

**Published: **12 November 2008

## Abstract

Nested effects models (NEMs) are a class of probabilistic models that were designed to reconstruct a hidden signalling structure from a large set of observable effects caused by active interventions into the signalling pathway. We give a more flexible formulation of NEMs in the language of Bayesian networks. Our framework constitutes a natural generalization of the original NEM model, since it explicitly states the assumptions that are tacitly underlying the original version. Our approach gives rise to new learning methods for NEMs, which have been implemented in the
/*Bioconductor* package nem. We validate these methods in a simulation study and apply them to a synthetic lethality dataset in yeast.

## 1. Introduction

Nested effects models (NEMs) are a class of probabilistic models. They aim to reconstruct a hidden signalling structure (e.g., a gene regulatory system) by the analysis of high-dimensional phenotypes (e.g., gene expression profiles) which are consequences of well-defined perturbations of the system (e.g., RNA interference). NEMs have been introduced by Markowetz et al. [1], and they have been extended by Fröhlich et al. [2] and Tresch and Markowetz [3], see also the review of Markowetz and Spang [4]. There is an open-source software package "*nem*" available on the platform
/*Bioconductor* [5, 13], which implements a collection of methods for learning NEMs from experimental data. The utility of NEMs has been shown in several biological applications (*Drosophila melanogaster* [1], *Saccharomyces cerevisiae* [6], estrogen receptor pathway, [7]). The model in its original formulation suffers from some ad hoc restrictions which seemingly are only imposed for the sake of computability. The present paper gives an NEM formulation in the context of Bayesian networks (BNs). Doing so, we provide a motivation for these restrictions by explicitly stating prior assumptions that are inherent to the original formulation. This leads to a natural and meaningful generalization of the NEM model.

The paper is organized as follows. Section 2 briefly recalls the original formulation of NEMs. Section 3 defines NEMs as a special instance of Bayesian networks. In Section 4, we show that this definition is equivalent to the original one if we impose suitable structural constraints. Section 5 exploits the BN framework to shed light onto the learning problem for NEMs. We propose a new approach to parameter learning, and we introduce structure priors that lead to the classical NEM as a limit case. In Section 6, a simulation study compares the performance of our approach to other implementations. Section 7 provides an application of NEMs to synthetic lethality data. In Section 8, we conclude with an outlook on further issues in NEM learning.

## 2. The Classical Formulation of Nested Effects Models

*active*, respectively,

*inactive*. NEMs deterministically predict the states of the effects, given the states of the signals. Furthermore, they provide a probabilistic model for relating the predicted state of an effect to its measurements. NEMs consist of a directed graph the nodes of which are the variables . Edges represent dependencies between their adjacent nodes. An arrow pointing from to means that is active whenever is active. To be more precise, the graph can be decomposed into a graph , which encodes the information flow between the signals, and a graph which relates each effect to exactly one signal, see Figure 1. The effects that are active as a consequence of a signal are those effects that can be reached from via at most one step in , followed by one step in . Let denote the predicted state of when signal is activated, and let be the matrix of all predicted effects.

## 3. The Bayesian Network Formulation of Nested Effects Models

A Bayesian network describes the joint probability distribution of a finite family of random variables (the nodes) by a directed acyclic graph
and by a family of local probability distributions, which we assume to be parameterized by a set of parameters
(for details, see, e.g., [8]). We want to cast the situation of Section 2 in the language of Bayesian networks. Assuming the acyclicity of the graph
of the previous section, this is fairly easy. A discussion on how to proceed when
contains cycles is given in Section 4. We have to model a deterministic signalling hierarchy, in which some components
can be probed by measurements, and some components
are perturbed in order to measure the reaction of the system as a whole. All these components
will be *hidden* nodes in the sense that no observations will be available for
, and we let the topology between these nodes be identical to that in the classical model. In order to account for the data, we introduce an additional layer of observable variables (*observables*,
) in an obvious way: each effect node
has an edge pointing to a unique (its) observable node
(see Figure 1). Hence,
, and we call
*the observation of*
.

We slightly abuse notation by writing for the maximum value that is assumed by a node in . Obviously, all hidden nodes are set to 0 or 1 deterministically, given their parents. The local probabilities , , remain arbitrary for the moment. Assume that we have made an intervention into the system by activating a set of nodes . This amounts to cutting all edges that lead to the nodes in and setting their states to value 1. When an intervention is performed, let be the value of . This value is uniquely determined by , as the next lemma shows.

Lemma 3.1.

Proof.

The proof is straightforward though somewhat technical and may be skipped for first reading. Let be an ordering of the nodes compatible with , which means , . Such an ordering exists because the graph connecting the states is acyclic. The proof is by induction on the order, the case being trivial. If , there is nothing to prove. Hence, we may assume in the graph which arises from by cutting all edges that lead to a node in . Since , it follows that if and only if for some . This holds exactly if for some (in particular, ). By induction, this is the case if and only if there exists an and a directed path from to , which can then be extended to a path from to .

This formula is very intuitive. It says that if an intervention has been performed, one has to determine the unique current state of each effect node. This, in turn, determines the (conditional) probability distribution of the corresponding observable node, for which one has to calculate the probability of observing the data. The product over all effects then gives the desired result.

## 4. Specialization to the Original NEM Formulation

with the matrices and . The importance of (9) lies in the fact that it completely separates the estimation steps for and . The information about the topology of the Bayesian network enters the formula merely in the shape of , and the local probability distributions alone define . Hence, prior to learning the topology, one needs to learn the local probabilities only for once. Then, finding a Bayesian network that fits the data well means finding a topology which maximizes .

which for transitively closed graphs is exactly the formulation in [3]. It has the advantage that given , the optimal can be calculated exactly and very fast, which dramatically reduces the search space and simplifies the search for a good graph . The BN formulation of NEMs implies via (10) that two graphs are indistinguishable (likelihood equivalent, they fit all data equally well) if they have the same transitive closure. It is a subject of discussion whether the transitive closure of the underlying graph is a desirable property of such a model (think of causal chains which are observed in a stable state) or not (think of the dampening of a signal when passed from one node to another, or of a snapshot of the system where the signalling happens with large time lags), see [9].

It should be mentioned that the graph topology in our BN formulation of NEMs is necessarily acyclic, whereas the original formulation admits arbitrary graphs. This is only an apparent restriction. Due to the transitivity assumption, effects that connect to a cycle of signals will always react in the same way. This behaviour can also be obtained by arranging the nodes of the cycle in a chain and connecting the effects to the last node of the chain. This even leaves the possibility for connecting other effects to only a subset of the signals in the cycle by attaching them to a node higher up in the chain. As a consequence, admitting cycles does not extend the model class of NEMs in the Bayesian setting.

- (1)
Given an observation at observable together with the state of its parent , the quantity should not depend on the intervention during which the data were obtained, by the defining property of Bayesian networks. However, we learn the ratio separately for each intervention, that is, we learn separate local parameters , which is counterintuitive.

- (2)
Reference measurements are used to calculate the ratio , raising the need for a "null" experiment corresponding to an unperturbed observation of the system, which might not be available. The null experiment enters the estimation of each ratio . This introduces an unnecessary asymmetry in the importance of intervention relative to the other interventions.

- (3)
The procedure uses the data inefficiently since for a given topology, the quantities of interest , respectively, could be learned from

*all*interventions that imply , respectively, , providing a broader basis for the estimation.

The method proposed in the last item is much more time-consuming, since the occurring probabilities have to be estimated individually for each topology. However, such a model promises to better capture the real situation, so we develop the theory into this direction.

## 5. NEM Learning in the Bayesian Network Setting

The hope is that the maximization in (13) can be calculated analytically or at least very efficiently, see [3]. Then, maximization over is again done using standard optimization algorithms. Section 5.2 is devoted to this approach.

### 5.1. Bayesian Learning of the Local Parameters

Binary Observables

The full Bayesian approach in a multinomial setting was introduced by Cooper and Herskovits [10].

Here, , and are shape parameters, which, for the sake of simplicity, are set to the same value for every effect . This assumption can be easily dropped and different priors may be used for each effect.

Continuous Observables

### 5.2. Maximum Likelihood Learning of the Local Parameters

Let the topology and the interventions be given. For learning the parameters of the local distributions , we perform maximum likelihood estimation in two different settings. The observables are assumed to follow either a binomial distribution or a Gaussian distribution.

Binary Observables

For an effect , let its observable be a binary random variable with values in , and let , . The model is then completely parameterized by the topology and .

(the ratios with a denominator of zero are irrelevant for the evaluation of (22) and are set to zero).

Continuous Observables

There is an analogous way of doing ML estimation in the case of continuous observable variables if one assumes to be a normal distribution with mean and variance , , .

(quotients with a denominator of zero are again irrelevant for the evaluation of (24) and are set to zero). Note that in both the discrete and the continuous case, depends on the topology , since the topology determines the values of , , .

### 5.3. Structure Learning

For the penalty parameter , this is the original NEM restriction. If , each marginal node can be assigned to all suitable core nodes. As a consequence, there is always a best scoring topology with an empty core graph. makes signalling to the marginal nodes "expensive" relative to signalling in the core graph. It is unclear how to choose optimally, so we stick to the choice for the applications. Simulation studies have shown that a simple gradient ascent algorithm does very well in optimizing the topology of the Bayesian network, compared to other methods that have been proposed [7].

## 6. Simulation

### 6.1. Network and Data Sampling

*nem*software [13], which is freely available at the /

*Bioconductor*software platform [5]. To test the performance of our method, we conducted simulations with randomly created acyclic networks with signals. The out-degree of each signal was sampled from the power-law distribution

where is an appropriate normalization constant. Binary data (1 = effect, 0 = no effect) was simulated for the perturbation of each signal in the created network using 4 replicate measurements with type-I and type-II error rates and , which were drawn uniformly from and for each perturbation separately. This simulates individual measurement error characteristics for each experiment.

### 6.2. Results

## 7. Application

*share*many SL partners. The detection of genetic interactions via synthetic lethality screens and appropriate computational tools is a current area of research, see [14]. Ye and Peyser define a hypergeometric score function to test whether two genes have many SL partners in common. They apply their methodology to a large SL data set [15] for finding pairs (and, consequently, clusters) of genes whose products are likely to participate in the same pathway. We extend their approach as explained in Figure 3. SL partnership arises (not exclusively, but prevalently) among genes pertaining to two distinct pathways that complement each other in a vital cell function. If a gene A is upstream of gene B in some pathway, a deletion of gene A will affect at least as many pathways as a deletion of gene B. Hypothesizing a very simplistic world, all SL partners of B will as well be SL partners of A; but this subset relation can be detected by NEMs. Take the primary knockout genes as core nodes, and the secondary knockout genes as marginal nodes, which are active given a primary knockout whenever SL occurs. We used the dataset from [15] and chose 40 primary knockout genes having the most SL interaction partners as core genes, and included all their 194 SL partners as marginal nodes. An NEM with binary observables was estimated, both with the maximum likelihood approach and in the Bayesian setting. It should be emphasized that NEM estimation for this dataset is only possible in the new BN setting because there is no canonical "null experiment," which enables us to estimate the likelihood ratios needed in the classical setting in (7), (8), [14].

## 8. Summary and Outlook

Some aspects of the classical NEM concept appear in a different light when stated in the BN framework. Mainly, these are three folds: (1) the learning of the local parameters, for which we proposed new learning rules; (2) the structural constraints, they can be cast as priors on the NEM topology; (3) the distinction between hidden and observable nodes, which can be different from that of core nodes and marginal nodes.

We proposed some new lines of investigation, like a full Bayesian approach for the evaluation of , and a smooth structure prior with continuous penalty parameter . It is much easier to proceed in the BN framework and implement, for example, a boolean logic for the signal transduction, which is less simplistic than in the current model. A straightforward application of NEMs in their BN formulation to synthetic lethality data demonstrated the potential of the NEM method, with the purpose of stimulating further research in that field.

## Declarations

### Acknowledgments

The authors like to thank Peter Bühlmann and Daniel Schöner for proposing the application of NEMs to synthetic lethality data. This work was supported by the Deutsche Forschungsgemeinschaft, the Sonderforschungsbereich SFB646. H. Fröhlich is funded by the National Genome Research Network (NGFN) of the German Federal Ministry of Education and Research (BMBF) through the platforms SMP Bioinformatics (OIGR0450) and SMP RNA (OIGR0418).

## Authors’ Affiliations

## References

- Markowetz F, Bloch J, Spang R: Non-transcriptional pathway features reconstructed from secondary effects of RNA interference.
*Bioinformatics*2005, 21(21):4026-4032. 10.1093/bioinformatics/bti662View ArticleGoogle Scholar - Fröhlich H, Fellmann M, Sültmann H, Poustka A, Beissbarth T: Estimating large-scale signaling networks through nested effect models with intervention effects from microarray data.
*Bioinformatics*2008, 24(22):2650-2656. 10.1093/bioinformatics/btm634View ArticleGoogle Scholar - Tresch A, Markowetz F: Structure learning in nested effects models.
*Statistical Applications in Genetics and Molecular Biology*2008., 7(1, article 9): - Markowetz F, Spang R: Inferring cellular networks—a review.
*BMC Bioinformatics*2007, 8, supplement 6: 1-17.Google Scholar - Gentleman RC, Carey VJ, Bates DM,
*et al*.: Bioconductor: open software development for computational biology and bioinformatics.*Genome biology*2004, 5(10, article R80):1-16. 10.1186/gb-2004-5-10-r80View ArticleGoogle Scholar - Markowetz F, Kostka D, Troyanskaya OG, Spang R: Nested effects models for high-dimensional phenotyping screens.
*Bioinformatics*2007, 23(13):i305-i312. 10.1093/bioinformatics/btm178View ArticleGoogle Scholar - Froehlich H, Fellmann M, Sueltmann H, Poustka A, Beissbarth T: Large scale statistical inference of signaling pathways from RNAi and microarray data.
*BMC Bioinformatics*2007, 8, article 386: 1-15.Google Scholar - Neapolitan RE:
*Learning Bayesian Networks*. Prentice Hall, Upper Saddle River, NJ, USA; 2003.Google Scholar - Jacob J, Jentsch M, Kostka D, Bentink S, Spang R: Detecting hierarchical structure in molecular characteristics of disease using transitive approximations of directed graphs.
*Bioinformatics*2008, 24(7):995-1001. 10.1093/bioinformatics/btn056View ArticleGoogle Scholar - Cooper GF, Herskovits E: A Bayesian method for the induction of probabilistic networks from data.
*Machine Learning*1992, 9(4):309-347.MATHGoogle Scholar - Ye P, Peyser BD, Pan X, Boeke JD, Spencer FA, Bader JS: Gene function prediction from congruent synthetic lethal interactions in yeast.
*Molecular Systems Biolog*2005, 1, article 2005.0026: 1. 10.1038/msb4100034View ArticleGoogle Scholar - Mukherjee S, Speed TP: Network inference using informative priors.
*Proceedings of the National Academy of Sciences of the United States of America*2008, 105(38):14313-14318. 10.1073/pnas.0802272105View ArticleGoogle Scholar - Fröhlich H, Beißbarth T, Tresch A,
*et al*.: Analyzing gene perturbation screens with nested effects models in R and bioconductor.*Bioinformatics*2008, 24(21):2549-2550. 10.1093/bioinformatics/btn446View ArticleGoogle Scholar - Le Meur N, Gentleman R: Modeling synthetic lethality.
*Genome Biology*2008, 9(9, article R135):1-10. 10.1186/gb-2008-9-9-r135View ArticleGoogle Scholar - Tong AHY, Lesage G, Bader GD,
*et al*.: Global mapping of the yeast genetic interaction network.*Science*2004, 303(5659):808-813. 10.1126/science.1091317View ArticleGoogle Scholar

## Copyright

This article is published under license to BioMed Central Ltd. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.