Probabilistic polynomial dynamical systems for reverse engineering of gene regulatory networks
 Elena S Dimitrova^{1}Email author,
 Indranil Mitra^{2} and
 Abdul Salam Jarrah^{3, 4}
DOI: 10.1186/1687415320111
© Dimitrova et al; licensee Springer. 2011
Received: 9 September 2010
Accepted: 6 June 2011
Published: 6 June 2011
Abstract
Elucidating the structure and/or dynamics of gene regulatory networks from experimental data is a major goal of systems biology. Stochastic models have the potential to absorb noise, account for uncertainty, and help avoid data overfitting. Within the frame work of probabilistic polynomial dynamical systems, we present an algorithm for the reverse engineering of any gene regulatory network as a discrete, probabilistic polynomial dynamical system. The resulting stochastic model is assembled from all minimal models in the model space and the probability assignment is based on partitioning the model space according to the likeliness with which a minimal model explains the observed data. We used this method to identify stochastic models for two published synthetic network models. In both cases, the generated model retains the key features of the original model and compares favorably to the resulting models from other algorithms.
Keywords
Stochastic modeling polynomial dynamical systems reverse engineering discrete modelingIntroduction
The enormous accumulation of experimental data on the activities of the living cell has triggered an increasing interest in uncovering the biological networks behind the observed data. This interest could be in identifying either the static network, which is usually a labeled directed graph describing how the different components of the network are wired together, or the dynamic network, which describes how the different components of the network influence each other. Identifying dynamic models for gene regulatory networks from transcriptome data is the topic of numerous published articles, and methods have been proposed within different computational frameworks, such as continuous models using differential equations [1, 2], discrete models using Boolean networks [3], Petri nets [4–6], or Logical models [7, 8], and statistical models using dynamic Baysein networks [9, 10], among many other methods. For an uptodate review of the stateoftheart of the field, see, for example [11, 12]. Most of these methods identify a particular model of the network which could be deterministic or stochastic. Due to the fact that the experimental data are typically noisy and of limited amount and that gene regulatory networks are believed to be stochastic, regardless of the used framework, stochastic models seem a natural choice [9, 13, 14]. Furthermore, discrete models where a gene could be in one of a finite number of states are more intuitive, phenomenological descriptions of gene regulatory networks and, at the same time, do not require much data to build. These models could actually be more suitable, especially for large networks [15].
The discrete modeling framework for gene regulatory networks that has received the most attention is Boolean networks, which was introduced by Kauffman [3]. They have been used successfully in modeling gene regulatory and signaling networks; see, for example [16–18]. Many reverse engineering methods have been developed to infer such networks, see, for example [19, 20].
For the purpose of better handling noisy data and the uncertainty in model selection, Boolean networks were extended to probabilistic Boolean networks (PBN) in [13, 21, 22]. A PBN is a Boolean network where each node i may possibly have more than one Boolean transition function, say , where t_{ i } ≥ 1, and, to decide the future state of i, a function is chosen with probability p_{ ij } , where . To be precise, to each node i in a PBN, the set of possible transition functions and their probabilities is assigned. Notice that if t_{ i } = 1 for all nodes in the network, then the PBN is just a Boolean network. As it is the case with Boolean networks, a PBN could be updated synchronously or asynchronously. However, throughout this article, we focus on synchronous PBN. Aspects of PBNs, and also asynchronous PBNs, have been studied in, for instance [23, 24] and they have been applied to the modeling of gene regulatory networks in, for example, [25, 26]. Furthermore, methods for inferring PBN have been developed in [27].
One disadvantage of Boolean models for gene regulatory networks is the limited number of states in which a gene can be. Indeed, although for a molecular biologist the state of a gene is usually discrete, it could be not only "expressed" and "not expressed" but also "over expressed," for example. There has thus been some consideration of morethanbinary discrete models in the Boolean network community. In the context of PBNs, generalizations of Boolean networks for ternary gene expression have been proposed in [28–31]. In addition, in [32] a ternary model has been considered as a preliminary stage for a Boolean one.
Other discrete multistate modeling frameworks have been developed too. Logical models [8] and Kbounded Petri nets [6, 33] are two multistate modeling frameworks that have been used for modeling gene regulatory networks. A natural generalization of Boolean networks to multistate networks are the socalled polynomial dynamical systems (also known as algebraic models), which were introduced in [34]. In an algebraic model, the set of possible states of each node is a finite set, and once the mathematical structure of finite fields is imposed on that set, the transition function of each node is necessarily a polynomial. As this framework is rooted in computational algebra and algebraic geometry, results from these fields are used for the reverse engineering of dynamic and static biological networks [34–37], as well as for analyzing model dynamics [34, 38], which usually is a challenge. Furthermore, in [39], it was shown that logical models and Kbounded Petri nets can be viewed as polynomial dynamical systems and algorithms for their translation into algebraic models were provided which facilitates the analysis of their dynamics.
In this article, we first introduce a stochastic generalization of polynomial dynamical systems, namely, probabilistic polynomial dynamical systems, which is also a generalization of the abovementioned probabilistic Boolean networks to multistate models. Then, using this framework, we present a novel method for the reverse engineering of multistate gene regulatory networks from limited and noisy data. The novelty of our approach is twofold. First, the stochastic model we construct is based on all minimal models in the model space and second, the probabilities assigned to the minimal models are based on an algebraic partition, called Gröbner fan, of the models space, which provides an algorithmic and algebraic method for the construction of such stochastic models.
In the next section, we present our method for the reverse engineering of gene regulatory networks as probabilistic polynomial dynamical systems. Then we demonstrate this method using the yeast cell cycle model in [17], as well as the synthetic network of the yeast cell cycle in [40].
Methods
Probabilistic polynomial dynamical systems
Laubenbacher and Stigler [34] proposed a modeling approach that describes a regulatory network on n genes as a deterministic polynomial dynamical system (PDS), i.e., a polynomial function (f_{ 1 }, ..., f_{ n } ): K^{ n } → K^{ n } , where K is a finite field. (F is just a Boolean network when K = {0, 1}.) Indeed, when K is a finite field, any function F : K^{ n } → K^{ n } is a polynomial function, i.e., F can be described as (f_{1}, ..., f_{ n } ) where, for all i, f_{ i }: k^{ n } → k is a polynomial (see Appendix 1). This shows that PDSs are a suitable modeling framework naturally generalizing Boolean networks. We expand this framework to include stochastic models as follows.
is in Figure 1A.
Reverse engineering PDSs
Laubenbacher and Stigler's reverseengineering method [34] first constructs the set of all PDSs that fit the given discretized data, which we call here the model space, and then uses a minimality criterion to select one system from the model space. A unique feature of their method is that the model space is presented as an algebraic object. Their algorithm is summarized here as Algorithm 2.1.
Unless all state transitions of the system are specified, there will be more than one network that fits the given data set. Since this much information is hardly ever available in practice, any reverseengineering method usually identifies one network model according to a prespecified criterion, and different methods typically identify different models. In [34], first the set of all models is computed and then a particular one f = (f_{1}, ... f_{ n } ) is chosen that satisfies the following property: For each node i, the transition function f_{ i } is minimal in the sense that there is no nonzero polynomial g ∈ k [x_{1} ... x_{ n } ] such that f_{ i } = h + g and g is identically equal to zero on the given time points. This criterion for model selection is analogous to excluding the terms of f_{ i } that vanish on the data. The advantage of the polynomial modeling framework is that there is a welldeveloped algorithmic theory that provides mathematical tools for generating the model space as well as identifying the minimal models.
Algorithm 2.1 Reverse engineering of PDSs.
Output: All minimal PDS's (f_{1}, ..., f_{ n } ) such that the coordinate polynomials f_{ i } ∈ k [x_{1}, ..., x_{ n } ] satisfy f_{ i }(s _{ j }) = s_{i,j+1}for all i = 1, ..., n and j = 1, ..., m  1, and f_{ i } does not contain any term that vanish on the time series.
Step 1: Compute a PDS f_{0} : K^{ n }→ K^{ n } that fits the data. There are several methods to do this, Lagrange interpolation being one of them.
Step 2: Compute the collection I of all polynomials that vanish on the data. Notice that if two polynomials f_{ i } , g_{ i } ∈ k [x_{1}, ..., x_{ n } ] satisfy f_{ i } (s _{ j }) = s_{i,j+1}= g_{ i } (s _{ j }), then (f_{ i }  g_{ i } )(s _{ j }) = 0 for all j. Therefore, in order to find all functions that fit the data, we need to find all functions that vanish on the given time points. Those functions form an algebraic object called the ideal of points and can be computed algorithmically.
Step 3: Reduce f_{0} = (f_{1}, ..., f_{ n } ) found in Step 1 modulo the ideal I. That is, write each f_{ i } as f_{ i } = g + h with h ∈ I and g being minimal in the sense that it cannot be further decomposed into g = g' + h' with h' ∈ I. In other words, h represents the part of f_{ i } that lies in I and is, therefore, identically equal to 0 on the given time series.
The Gröbner basis, however, is not unique and its computation depends on the way the polynomial terms are ordered, called monomial ordering (Definition 2.1). The reason is that the remainder of polynomial division in polynomial rings in more than one variable is not unique and depends on the way the monomials are ordered. In contrast, this is not an issue in k [x] (a polynomial ring in one variable) where the monomials are ordered by degree: ⋯ ≻ x^{m+1}≻ x^{ m } ≻ ⋯ x^{2} ≻ x ≻ 1. However, whenever there is more than one variable, there is more than one choice for ordering the monomials (e.g., x^{2} ≻ xy and xy ≻ x^{2} are both possible) and thus the possibility of obtaining several different Gröbner bases. Consequently, the PDS model generated in Step 3 also depends on the choice of monomial ordering, as Example 3.1 illustrates.
Since different monomial orderings may give rise to different polynomial models, considering only one arbitrarily chosen monomial ordering is not sufficient. Therefore, a systematic method for studying the monomial orderings that affect the model selection is crucial for modeling approaches utilizing Gröbner bases. A naïve approach is to compute all possible Gröbner bases with respect to all monomial orderings. The number of monomial orderings, however, grows rapidly with the number of variables n and can be as large as n^{2}n! [42] and hence considering all of them is computationally challenging. An alternative approach presented in [43] generates a collection of polynomial models from a fixed number of orderings (all graded reverse lexicographic) with random variable orderings and computes a consensus model using a gametheoretic method. While it is reasonable to try to avoid considering all monomial orderings, restricting oneself to variable orderings within a fixed monomial ordering will very likely miss a large number of PDS models that fit the data. Fortunately, the correspondence between Gröbner bases and monomial orderings is onetomany. In [35], we presented a method which guarantees that no PDS model fitting the data is overlooked. Like [43], we avoided checking all possible monomial orderings but instead identified only those that produce distinct PDS models. The method is based on the combinatorial structure known as the Gröbner fan of a polynomial ideal which we discuss in more detail in Appendix 2.4. The Gröbner fan of an ideal I[44] is a polyhedral complex of cones with the property that every point encodes a monomial ordering. The cones are in bijective correspondence with the distinct Gröbner bases of I. (To be precise, the correspondence is to the marked reduced Gröbner bases of I). Therefore, it is sufficient to select exactly one monomial ordering per cone and, ignoring the rest of the orderings, still guarantee that all distinct models are generated. In addition, the relative number of monomial orderings under which a particular PDS model is generated provides an insight into the likelihood that the model is a good representation of the system; for details on this idea see Appendix 3. An excellent implementation of an algorithm for computing the Gröbner fan of an ideal is the software package Gfan[45].
Algorithm for PPDS computation
We propose the following algorithm for the reverse engineering of gene regulatory networks as PPDS models from time series of discrete data. The resulting PPDS consists of all possible reduced PDS models that fit the data. The probability that we assign to each model is proportional to the relative volume of the Gröbner cone that produced that model. See Appendix 3 for assumptions and example.
Algorithm 3.1 Reverse engineering of PPDSs.
Input: A discrete time series of a gene regulatory network on n nodes x_{1}, ..., x_{ n } : S = {(s_{11} , ..., s_{n 1}), ..., (s_{1m}, ..., s_{ nm } )} ⊆ K^{ n } , where K is a finite field.
Output: A probabilistic PDS model F, which is a list of all possible reduced local polynomials for each x_{1}, ..., x_{ n } , together with their corresponding probabilities.
Step 1: Compute a particular PDS F_{0} : K^{ n } → K^{ n } that fits S.
Step 2: Compute the ideal I of polynomials that vanish on S.
Step 3: Compute the Gröbner fan of the ideal I and the relative sizes of its cones, c_{1}, ..., c_{ s } (with c_{1} + ⋯ + c_{ s } = 1).
Step 4: Select one (any) monomial ordering from each cone, ≺_{1}, ..., ≺_{ s }. For each i = 1, ..., s, reduce F_{0} modulo I using a Gröbner basis computed with respect to ≺_{ i }. Let the reduced PDS's be F_{1} = {f_{11}, f_{12}, ..., f_{1t}}, ..., F_{ s } = {f_{s 1}, f_{s 2}, ..., f_{ st } } and adding the cone sizes redefine them as F_{ i } = {(f_{i 1}, c_{1}), (f_{i 2}, c_{2}), ..., (f_{ it } , c_{ t } )}.
Step 5: Construct the list F = {{(f_{11}, c_{1}), (f_{21} , c_{2}), ..., (f_{s 1}, c_{ s } )}, ..., {(f_{1t}, c_{1}), (f_{2t}, c_{2}), ..., (f_{ st } , c_{ s } )}}. For a fixed i, if f_{ ji } = f_{ ki } for some j and k, then "merge" the two local polynomials by adding their corresponding probabilities: (f_{ ji } , c_{ j } + c_{ k } ).
Algorithm 3.1 guarantees that all distinct minimal PDS models will be generated. However, this comes at the expense of having to compute the entire Gröbner fan of the ideal of points. For small networks the computation of the fan is feasible but as the number of network nodes increases, the complexity of the Gröbner fan computation becomes prohibitive [46]. As mentioned earlier, the correspondence between PDS models and Gröbner bases is onetomany. Therefore, computing the entire Gröbner fan of the ideal of vanishing polynomials is excessive and instead a finite subset of points from the fan should be sufficient. This finite subset needs to be carefully selected if we want it to reflect the structure of the entire Gröbner fan. Since we want to rank the dependencies according to their strength, the number of points (weight vectors) we select from a Gröbner cone should correspond to the relative size of this cone with respect to the other cones. That is, we want to sample from the Gröbner fan uniformly, so that the relative frequency with which we select term orders from the fan is approximately equal to the relative sizes of its cones. We do this through random sampling of the Gröbner fan of the ideal of points as in [47]. If the number of points is sufficiently large, their distribution approximately reflects the relative size of the Gröbner cones. The number of points is determined using a t test for proportion. Consequently, steps 3 and 4 of Algorithm 3.1 have to be modified in such a way that direct computation of the Gröbner fan is avoided.
Step 3': Select vectors w_{1}, ..., w_{ s } of length n, with s large, in such a way that every (nonnegative integer) vector in the Gröbner fan of I has equal probability of being chosen.
Step 4': For each i = 1, ..., s, use w_{ i } to define a monomial ordering ≺ _{ i } and reduce F_{0} modulo I using a Gröbner basis computed with respect to ≺_{ i }.
Examples and results
Reverse engineering of the yeast cell cycle
This model captures the known features of the cell cycle dynamics. Furthermore, the trajectory of the known cell cycle sequence is stable and attracting, as its size is 1764 out of the total of 2048 states. The remaining states are distributed into 6 very small trajectories. Each of these trajectories converges to a steady state as well.
The 54 state transitions used for generating the PPDS model (6), each represented by a pair of inputoutput states
Cln3  MBF  SBF  Cln1, 2  Cdh1  Swi5  Cdc20 and Cdc 14  Clb5, 6  Sic1  Clb1, 2  Mcm1/SFF  Cln3  MBF  SBF  Cln1, 2  Cdh1  Swi5  Cdc20 and Cdc14  Clb5, 6  Sic1  Clb1, 2  Mcm1/SFF 

0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  1  1  0  0  0  0 
0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  1  1  0  0  1  0  0 
0  0  0  0  1  0  0  0  0  0  0  0  0  0  0  0  1  0  0  0  1  0 
0  0  0  0  1  0  0  0  0  0  0  0  0  0  0  0  0  1  0  0  1  1 
0  1  0  0  0  0  0  0  1  0  0  0  0  0  0  0  0  0  0  1  0  1 
0  1  0  0  0  0  0  0  1  0  0  0  0  0  0  0  1  1  0  1  0  0 
0  0  1  1  0  0  0  0  0  0  0  1  1  0  0  0  0  0  0  0  1  0 
0  0  1  1  0  0  0  0  0  0  0  0  1  0  0  0  0  1  1  0  1  1 
0  0  1  0  0  0  0  0  0  0  0  0  0  0  0  0  1  0  1  0  0  0 
0  0  1  1  0  0  0  0  0  0  0  0  0  0  0  0  0  0  1  0  1  1 
0  0  0  1  0  0  0  0  0  0  0  0  0  0  0  0  1  0  0  1  0  0 
0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  1  0  0 
0  0  0  0  0  1  0  0  0  0  0  0  0  0  0  0  1  0  0  0  0  1 
0  0  0  0  0  0  0  0  1  0  0  0  0  0  0  0  1  1  0  1  1  0 
0  0  0  0  0  0  1  0  0  0  0  0  0  0  0  0  0  1  1  0  0  0 
0  0  0  0  1  1  0  0  1  0  0  0  0  0  0  0  1  0  0  0  0  1 
0  0  0  0  0  0  0  1  0  0  0  0  0  0  0  0  0  1  0  1  0  0 
0  0  0  0  0  0  0  1  0  1  1  0  0  0  0  1  1  0  0  1  0  0 
0  0  0  0  0  0  0  0  0  1  0  0  0  0  0  0  0  1  0  0  1  0 
0  0  0  0  0  0  1  0  0  1  1  0  0  0  0  0  0  1  0  0  0  1 
0  0  0  0  0  0  0  0  0  0  1  0  0  0  0  0  0  1  0  0  0  1 
0  0  0  0  0  1  1  0  0  1  0  0  0  0  0  1  1  1  0  1  0  0 
1  1  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  1  1  0  0 
0  1  1  0  0  0  0  1  0  0  0  0  0  0  0  0  0  0  0  0  0  1 
1  0  1  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  1  0  1  0 
0  1  1  1  0  0  0  0  0  0  0  0  0  0  0  0  0  1  1  0  1  1 
0  1  0  0  0  0  1  0  0  0  0  0  0  0  0  0  0  0  1  0  0  1 
0  1  0  0  1  1  0  0  1  0  0  0  0  0  0  0  1  1  1  0  1  1 
0  1  0  0  0  0  0  1  0  0  0  0  0  0  0  0  0  0  0  1  1  0 
0  1  0  0  0  0  0  1  0  1  1  0  0  0  0  0  0  1  0  0  0  1 
0  1  0  0  0  0  0  0  0  1  0  0  0  0  0  0  0  0  0  0  1  1 
0  0  0  0  0  0  1  1  0  1  1  0  0  0  0  0  0  1  0  0  1  1 
0  0  1  0  0  0  0  0  0  1  0  1  0  1  0  0  0  0  0  0  1  0 
0  0  0  1  0  0  1  0  0  1  1  0  0  1  1  0  0  1  0  0  1  1 
0  0  0  1  1  0  0  0  0  0  0  0  0  0  1  1  0  1  0  0  0  0 
0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  1  1  0  0  0  0  0 
0  0  0  1  0  1  0  0  0  0  0  0  0  0  0  1  0  1  1  0  0  0 
0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  1  1  0  0  0  0  1 
0  0  0  1  0  0  1  0  0  0  0  0  0  0  1  1  0  0  0  0  1  0 
0  0  0  0  0  1  0  0  0  0  0  0  0  0  0  0  0  1  0  0  0  1 
0  0  0  1  0  0  0  1  0  0  0  0  0  0  1  0  0  1  1  0  0  0 
0  0  0  0  0  0  0  1  0  1  1  0  0  0  0  0  1  0  0  0  0  1 
0  0  0  1  0  0  0  0  1  0  0  0  0  0  1  0  0  1  0  0  1  0 
0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  1  0  0  0  1 
0  0  0  1  0  0  0  0  0  1  0  0  0  0  1  0  0  0  1  0  1  0 
0  0  0  0  0  0  1  0  0  1  1  0  0  0  0  0  0  1  1  0  1  1 
0  0  0  0  1  0  1  0  0  0  0  0  0  0  0  1  0  1  1  0  0  0 
0  0  0  0  1  1  0  0  1  0  0  0  0  0  0  1  1  0  0  0  0  1 
0  0  0  0  1  0  0  1  0  0  0  0  0  0  0  1  0  1  0  0  1  0 
0  0  0  0  0  0  0  1  0  0  1  0  0  0  0  1  0  1  0  0  0  1 
0  0  0  0  1  0  0  0  0  1  0  0  0  0  0  1  0  0  1  0  1  0 
0  0  0  0  0  0  1  0  0  0  1  0  0  0  0  0  0  1  1  0  1  1 
0  0  0  0  1  0  0  0  0  0  1  0  0  0  0  0  0  1  1  0  1  0 
0  0  0  0  1  1  1  0  0  0  0  0  0  0  0  0  0  1  0  0  1  1 
Comparison of the steady states of model (2) and those of the probabilistic PDS (6) built via our reverse engineering method using the data set in Table 1 generated from model (2)
Fixed point  Is it input?  Original system component size  Reverse engineered component size 

1  No  1, 764  1, 015 
2  Yes  151  70 
3  No  109  9 
4  No  9  9 
5  Yes  7  7 
6  Yes  9  5 
7  Yes  1  1 
Further, we assessed the quality of the dependency graph of the inferred model using three standard network measures: positive predictive value, PPV = TP/(TP + FP) = 0.83, specificity , Sp = TN/(TN + FP) = 0.94, and sensitivity, Se = TP/(TP + FN) = 0.69, where TP and TN are the numbers of true positive and negative interactions, respectively, and FP and FN are the numbers of false positive and false negative interactions, respectively, weighted by the corresponding probabilities given after every polynomial in (6). The high values of the three measures indicate that the proposed method is not only capable of capturing the dynamic behavior of the system but also its static wiring network.
Comparison to other methods
PPV, positive predictive value and Se, sensitivity of the reverseengineering approaches NIR, TSNI, BANJO, ARACNE, and PPDS when applied to data generated from the synthetic network in [40]
Switchon  Swithoff  

PPV  Se  PPV  Se  
NIR and TSNI  0.80  0.50  0.6  0.38 
BANJO  *  *  0.6  0.38 
ARACNE  *  *  *  * 
PPDS  0.57  0.50  0.75  0.38 
Conclusion
Gene regulatory networks are structured as interconnected entities and their complex nature is inherently stochastic. The framework of stochastic dynamical systems is natural for modeling and analyzing such networks. We focused on PPDSs due to their applicability to limited and possibly noisy data. Within this modeling framework, we developed a systematic method based on combinatorial topology, algebraic geometry, and statistics for the reverse engineering of the dynamics, as well as the gene dependencies, in biochemical regulatory networks from experimental data. The algorithm can handle large regulatory networks and hence is applicable to many networks of interest. The constructed models are comprised of minimal polynomials according to the definition in [34]. We plan to explore the use of other types of biologically relevant functions, such as nested canalyzing functions [53]. An algorithm for the inference of deterministic nested Boolean canalyzing networks has recently been presented (F Hinkelmann, A Jarrah: Inferring biologically relevant models: nested canalyzing functions, submitted). Combining this with our algorithm here will provide a systematic method for the reverse engineering of gene regulatory networks as probabilistic Boolean nested canalyzing networks.
Appendices
1 Polynomial dynamical systems
Definition 1.1 Let X be a finite set. A finite dynamical system of dimension n is a function F = (f_{1}, ..., f_{ n } ) : X^{ n } → X^{ n } with f_{ i } : X^{ n } → X.
By requiring that the cardinality of the set X be a power of a prime number, one can impose on X the structure of a finite field. This structure determines the only type of functions f_{ i } that need to be considered. The following theorem from [54] characterizes functions over finite fields.
Theorem 1.1 Let k be a finite field. Then every function f : k^{ n } → k is a polynomial of degree at most n.
Therefore, over a finite field, polynomials are the appropriate modeling framework rather than a constraining assumption.
Definition 1.2 If the set X for a finite field, then any function F : X → X is called a polynomial dynamical system (PDS).
Definition 1.3 A probabilistic polynomial dynamical system (PPDS) on n nodes (f_{ 1 }, ..., f_{ n }) : K^{ n }→ K^{ n } with parallel update order consists of n sets of local functions and their associated probabilities such that is the set of local functions that determine the dynamics of node i and . In order to determine each transition in the state space of the system, (x_{1}, ..., x_{ n } ) → (y_{1}, ..., y_{ n } ), for each node i a local function f_{ ij } is selected from f_{ i } with probability p_{ ij } .
As an example, see (1).
2 Concepts from commutative algebra and algebraic geometry [55]
2.1 Gröbner bases
A polynomial in k[x_{1}, ..., x_{ n } ] is a linear combination of monomials of the form over k, where α is the ntuple exponent . For many purposes, such as polynomial division, it is necessary to arrange the terms in a polynomial unambiguously in some order. Unlike polynomials in one variable, there are more than one way of ordering the terms (monomials) of multivariate polynomials. Any ordering of the monomials must be a total ordering, i.e., for every pair of monomials x^{ α } and x^{ β } , exactly one of the following must be true: x^{ α } ≺ x^{ β } , x^{ α } = x^{ β } , x^{ α } ≻ x^{ β } . Taking into account the properties of the polynomial sum and product operations, the following definition emerges.
 1.
≻ is a total ordering on .
 2.
If α ≻ β and , then α + γ ≻ β + γ.
 3.
≻ is a wellordering on , i.e., every nonempty subset of has a smallest element under ≻.
A monomial ordering can also be defined by a weight vector ω = (ω_{ 1 }, ..., ω_{ n } ) in . We require that ω have nonnegative coordinates in order for 1 to always be the smallest monomial. Fix a monomial ordering ≻_{ σ }, such as ≻_{ lex }. Then, for , define α ≻ _{ ω,σ } β if and only if ω · α ≻ ω · β, or ω · α = ω · β and α ≻ _{ σ }β.
Ideal membership problem Another problem with multivariate polynomial division is that when dividing a given polynomial into more than one polynomials, the outcome may depend on the order in which the division is carried out. Let f, g_{1}, ..., g_{ m } ∈ k [x_{1}, ..., x_{ n } ] be polynomials in the variables x_{1}, ..., x_{ n } . The socalled ideal membership problem is to determine whether there are polynomials h_{1}, ..., h_{ m } ∈ k[x_{1}, ..., x_{ n } ] such that . To state this in the language of abstract algebra, we define I = 〈g_{1}, ..., g_{ m } 〉 := {∑h_{ i }g_{ i }  h_{1}, ..., h_{ m } ∈ k[x_{1}, ..., x_{ n } ]} . The polynomials in I form a socalled ideal in k[x_{1}, ..., x_{ n } ], since I is closed under addition and multiplication by any polynomial in k[x_{1}, ..., x_{ n } ], and I is generated by the set {g_{1}, ..., g_{ m } }. The ideal membership problem asks if f is an element of I . In general, even under a fixed monomial ordering, the order in which f is divided by the generating polynomials f_{ i } affects the remainder . Therefore, does not imply f ∉ I. Moreover, the generating set {f_{1}, ..., f_{ m } } of the ideal I is not unique but a special generating set can be selected so that the remainder of polynomial division of f by the polynomials in performed in any order is zero if and only if f lies in I: . A generating set with this property is called a Gröbner basis and its precise definition will be given in Definition 2.3. Here we point out that Gröbner bases provide an algorithmic solution to the ideal membership problem and the Buchberger algorithm [41] is designed to compute a Gröbner basis for any ideal other than {0} and a fixed monomial ordering.
2.2 Monomial Ideals
Gröbner bases are a key concept in computational algebra. Their theory reduces questions about systems of polynomial equations to the combinatorial study of monomial ideals.
Definition 2.2 An ideal I ⊂ k[x_{1}, ..., x_{ n } ] is a monomial ideal if I is generated by monomials, i.e., there is a subset such that I = 〈x^{ α }  α ∈ A〉, i.e., consists of all polynomials which are finite sums of the form , where h_{ α } ∈ k[x_{1}, ..., x_{ n } ].
A special kind of monomial ideal is the initial ideal of an ideal I ≠ {0} for a fixed monomial ordering. It is the ideal generated by the set of initial monomials (under the specified ordering) of the polynomials of I: in (I) = 〈in(f)  f∈ I〉 . The monomials which do not lie in in(I) are called standard monomials.
Definition 2.3 Fix a monomial ordering. A finite subset of an ideal I is a Gröbner basis if .
A Gröbner basis for an ideal may not be unique. If we also require that for any two distinct elements , no term of g' is divisible by in(g), such a Gröbner basis is called reduced and is unique for an ideal and a monomial ordering, provided the coefficient of in(g) in g is 1 for each
2.3 Ideals of points
Given a set of points, it is often necessary to find all the polynomials that vanish on it. Such a set of polynomials forms an ideal called the ideal of points defined as follows.
It can be shown that is an ideal of k[x_{1}, ..., x_{ n } ]. It is called the ideal of points in V .
2.4 The Gröbner fan of an ideal
A combinatorial structure that contains information about the initial ideals of an ideal is the Gröbner fan of an ideal. It is a polyhedral complex of cones, each corresponding to an initial ideal, which, as follows from Definition 2.3, is in a onetoone correspondence with the marked reduced Gröbner bases (the initial term of each generating polynomial being distinguished) of the ideal. A brief introduction to the the Gröbner fan folllows. For details see, for example [44].
A polynomial ideal has only a finite number of different reduced Gröbner bases. Informally, the reason is that most of the monomial orderings only differ in high degree and the Buchberger algorithm for Gröbner basis computation does not "see" the difference among them. However, they may vary greatly in number of polynomials and "shape". In order to classify them, we first present a convenient way to define monomial orderings using matrices [56]. Again, we think of a polynomial in k[x_{1}, ..., x_{ n } ] as a linear combination of monomials of the form over k, where α is the ntuple exponent .
Definition 2.5 Let ω = (ω_{1}, ..., ω_{ n } ) be a vector with real coefficients. We can define an ordering ≻ _{ ω } the elements of by α ≻ _{ ω } β if and only if α · ω > β · ω, componentwise.
Definition 2.6 Let be a marked reduced Gröbner basis for an ideal I. Write each polynomial of the basis as where is the initial term in g_{ i } . The cone of is
The collection of all the cones for a given ideal is the Gröbner fan of that ideal. The cones are in bijection with the marked reduced Gröbner bases of the ideal. Since reducing a polynomial modulo an ideal I, as the reverse engineering algorithm requires in Step 3, can have at most as many outputs as the number of marked reduced Gröbner bases, it follows that the Gröbner fan contains information about all Gröbner bases (and thus all monomial orderings) that need to be considered in the process of model selection. There are algorithms based on the Gröbner fan that enumerate all marked reduced Gröbner bases of a polynomial ideal [45].
3 Reverse engineering of PPDSs
Then x_{1} depends on both x_{1} and x_{2}, while x_{2} depends only on x_{1}.
While inferring the dependency graph from a PDS model is straightforward, identifying that single model may hardly be possible. There is a problem originating from the algorithm proposed in [34]: finding all polynomials that vanish on a set of points. This is equivalent to computing the ideal of these points and computation of an ideal of points boils down to intersection of ideals of polynomials vanishing on one point. There is a wellknown consequence of the Buchberger algorithm, originally presented in [57] MISSING, for their computation. The output of the algorithm is a Gröbner basis {g_{1}, ..., g_{ s } } ⊂ k[x_{1}, ..., x_{ n } ] that generates the ideal of vanishing polynomials: , where h_{ i } ∈ k[x_{1}, ..., x_{ n } ]. The Gröbner basis, however, is not unique, as it was discussed in 2.1, and its computation depends on the choice of monomial ordering.
Example 3.1 Consider a network of 3 genes x_{1}, x_{2}, and x_{3}. Suppose we have the following time series of network states in : s_{1} = (2, 1, 0), s_{2} = (1, 2, 0), s_{3} = (2, 1, 1), s_{4} = (0, 0, 1).
Notice that all three coordinate polynomials involve x_{3} but depending on the monomial ordering, they also contain either x_{1} or x_{2}. In fact, for the given time series s_{1}, ..., s_{4}, these are the only two distinct minimal (in the sense defined in [34]) PDS models that the algorithm generates. While it is not clear whether there is a dependence on x_{1} or on x_{2}, one can be confident that, provided the data are representative of the network, x_{3} has a definite impact on all three genes. We expand on this idea in the next section.
Clearly the monomial ordering selection affects not only the dependency graph of the model but also its dynamics which is represented by the model's state space. Let p = (1, 0, 0) . In Example 3.1, starting at state p, the first model will transition to state (0, 0, 2), while the second's next state is (2, 1, 1). All coordinates of the output are different although they are generated by minimal PDS models that fit the experimental time series data. It is not clear how to select one out of, as it is often the case with real data, hundreds of possibilities. The main reason is that it is hard to relate monomial ordering to biology in any meaningful way. The solution we propose is instead of trying to single out one model, to use all minimal models (possibly with different probabilities) to generate the system dynamics. This means allowing stochastic behavior which is consistent with the experimental observations.
 1.
Minimal polynomials are an appropriate framework for the modeling of gene regulatory networks. The minimal polynomials are only a subset of all polynomials that fit a given data set. In Example 3.1, both f _{1} = x _{2}  x _{3} and fit the data and can be coordinate polynomials for x _{1} but the latter is not minimal since is identically zero on the data and . In that sense, g is not supported by the data and is meaningless for the purpose of reverse engineering.
 2.
The strength of dependency of network node x_{ i } on node x_{ j } on the scale from 0 to 1 is proportional to the relative frequency with which x_{ j } appears in x_{ i }'s minimal coordinate polynomials that fit the data. In Example 3.1, both possible polynomials for f _{1} involve x _{3} which means that the behavior of x _{1} cannot be described without involving x _{3}. It is reasonable then to conclude that the strength of the dependency of x _{1} on x _{3} is 1.
 3.
The strength of dependency of x_{ i } on x_{ j } is proportional of the size of the portion of the Gröbner fan that corresponds to those coordinate polynomials for f_{ i } that involve x_{ j } .
Each point of the Gröbner fan corresponds to a monomial ordering and all points within the same Gröbner cone produce the same Gröbner basis and thus the same minimal model. Typically, we do not know which monomial ordering(s) are more appropriate for a particular network and the best we can do is consider all of them and identify which ones are the most likely to represent the network dependencies. If there are two models, one corresponding to a larger portion of the Gröbner fan than the other, this means that there are more monomial orderings which produce the first model.
In Example 3.1, the Gröbner fan is threedimensional since there are three variables and consists of two Gröbner cones of equal sizes (volumes). Each one of the two coordinate polynomials for f_{1}, for instance, corresponds to .5 of the total size of the Gröbner fan. Therefore, under our assumption, the strength of dependency of x_{1} on x_{2} is .5 and the probability that x_{1} depends on itself is also .5.
4 Reverse engineering the yeast cell cycle
Abbreviations
 PDS:

polynomial dynamical system
 PPV:

predictive value
 PBN:

probabilistic Boolean networks
 PPDS:

probabilistic polynomial dynamical system
 Se:

sensitivity.
Declarations
Acknowledgements
We are thankful to Ana Martins and Reinhard Laubenbacher for helpful discussions and encouragement and to Franziska Hinkelmann for help with the software Polynome.
Authors’ Affiliations
References
 Akutsu T, Miyano S, Kuhara S: Inferring qualitative relations in genetic networks and metabolic pathways. Bioinformatics 2000,16(8):727734. 10.1093/bioinformatics/16.8.727View ArticleGoogle Scholar
 Tian T, Burrage K: Stochastic models for regulatory networks of the genetic toggle switch. PNAS 2006,103(22):83728377. 10.1073/pnas.0507818103View ArticleGoogle Scholar
 Kauffman SA: Metabolic stability and epigenesis in randomly constructed genetic nets. J Theor Biol 1969, 22: 437467. 10.1016/00225193(69)900150MathSciNetView ArticleGoogle Scholar
 Comet JP, Klaudel H, Liauzu S: Modeling multivalued genetic regulatory networks using highlevel Petri nets. In Proceedings of the International Conference on the Application and Theory of Petri Nets. Edited by: Ciardo G, Darondeau P. SpringerVerlag, Berlin; 2005:208227.Google Scholar
 Grunwald S, Speer A, Ackermann J, Koch I: Petri net modelling of gene regulation of the duchenne muscular dystrophy. Biosystems 2008,92(2):189205. 10.1016/j.biosystems.2008.02.005View ArticleGoogle Scholar
 Steggles L, Banks R, Shaw O, Wipat A: Qualitatively modelling and analysing genetic regulatory networks: a Petri net approach. Bioinformatics 2007, 23: 336343. 10.1093/bioinformatics/btl596View ArticleGoogle Scholar
 SaezRodriguez J, Simeoni L, Lindquist J, Hemenway R, Bommhardt U, Arndt B, Haus U, Weismantel R, Gilles E, Klamt S, Schraven B: A logical model provides insights into T cell receptor signaling. PLoS Comput Biol 2007,3(8):e163. 10.1371/journal.pcbi.0030163MathSciNetView ArticleGoogle Scholar
 Thomas R, D'Ari R: Biological Feedback. CRC Press, Boca Raton; 1989.Google Scholar
 Perrin B, Ralaivola L, Mazurie A, Bottani S, Mallet J, d'AlcheBuc F: Gene networks inference using dynamic Bayesian networks. Bioinformatics 2003, 19: 138148. 10.1093/bioinformatics/btg1018View ArticleGoogle Scholar
 Radde N, Kaderali L: Bayesian Inference of Gene Regulatory Networks Using Gene Expression Time Series Data. Proc BIRD'2007 2007, 115.Google Scholar
 Albert R: Network inference, analysis, and modeling in systems biology. Plant Cell 19(11):33273338.View ArticleGoogle Scholar
 Sima C, Hua J, Jung S: Inference of gene regulatory networks using timeseries data: a survey. Curr Genomics 2009,10(6):416429. 10.2174/138920209789177610View ArticleGoogle Scholar
 Shmulevich I, Dougherty ER, Zhang W: From Boolean to probabilistic Boolean networks as models of genetic regulatory networks. Proc IEEE 2002,90(11):17781792. 10.1109/JPROC.2002.804686View ArticleGoogle Scholar
 Tian T: Stochastic models for inferring genetic regulation from microarray gene expression data. Biosystems 2010,99(3):192200. 10.1016/j.biosystems.2009.11.002View ArticleGoogle Scholar
 Bornholdt S: Less is more in modeling large genetic networks. Science 2005,310(5747):449451. 10.1126/science.1119959View ArticleGoogle Scholar
 Albert R, Othmer H: Newblock The topology of the regulatory interactions predicts the expression pattern of the segment polarity genes in Drosophila melanogaster . J Theor Biol 2003, 223: 118. 10.1016/S00225193(03)000353MathSciNetView ArticleGoogle Scholar
 Li F, Long T, Lu Y, Ouyang Q, Tang C: The yeast cellcycle network is robustly designed. PNAS 2004,11(14):47814786.View ArticleGoogle Scholar
 Samal A, Jain S: The regulatory network of E. coli metabolism as a Boolean dynamical system exhibits both homeostasis and flexibility of response. BMC Syst Biol 2008, 2: Article 21.View ArticleGoogle Scholar
 Hickman GJ, Hodgman TC: Inference of gene regulatory networks using booleannetwork inference methods. J Bioinformatics Comput Biol 2009,7(6):10131029. 10.1142/S0219720009004448View ArticleGoogle Scholar
 Liu W, Lähdesmäki H, Dougherty ER, Shmulevich I: Inference of Boolean networks using sensitivity regularization. EURASIP J Bioinformatics Syst Biol 2008, 2008: Article ID 78054.Google Scholar
 Dougherty ER, Shmulevich I: Mappings Between probabilistic Boolean networks. Signal Process 2003,83(4):799809. 10.1016/S01651684(02)004802View ArticleGoogle Scholar
 Shmulevich I, Kauffman SA: Activities and sensitivities in Boolean network models. Phys Rev Lett 2004,93(4):14.View ArticleGoogle Scholar
 Gershenson C: Introduction to random Boolean networks. In Workshop and Tutorial Proceedings, Ninth International Conference on the Simulation and Synthesis of Living Systems (ALife IX) Edited by: Bedau M, Husbands P, Hutton T, Kumar S, Suzuki H. 2004, 160173.Google Scholar
 Matache MT, Heidel J: Asynchronous random Boolean network model based on elementary cellular automata rule 126. Phys Rev E 2005, 71: 113.MathSciNetView ArticleGoogle Scholar
 Balleza E, AlvarezBuylla E, Chaos A, Kauffman SA, Shmulevich I, Aldana M: Critical dynamics in genetic regulatory networks: examples from four kingdoms. PLoS ONE 2008,3(6):e2456. 10.1371/journal.pone.0002456View ArticleGoogle Scholar
 Shmulevich I, Gluhovsky I, Hashimoto R, Dougherty ER, Zhang W: Steadystate analysis of genetic regulatory networks modeled by probabilistic Boolean networks. Comp Funct Genomics 2003,4(6):601608. 10.1002/cfg.342View ArticleGoogle Scholar
 Marshall S, Yu L, Xiao Y, Dougherty E: Inference of a probabilistic Boolean network from a single observed temporal sequence. EURASIP J Bioinformatics Syst Biol 2007.Google Scholar
 Datta A, Pal R, Dougherty ER: Intervention in probabilistic gene regulatory networks. Curr Bioinformatics 2006, 1: 167184. 10.2174/157489306777011978View ArticleGoogle Scholar
 Faryabi B, Vahedi G, Datta A, Chamberland JF, Dougherty ER: Recent advances in intervention in markovian regulatory networks. Curr Genomics 2009,10(7):463477. 10.2174/138920209789208246View ArticleGoogle Scholar
 Kim S, Dougherty ER, Bittner ML, Chen Y, Sivakumar K, Meltzer P, Trent JM: A general framework for the analysis of multivariate gene interaction via expression arrays. Biomedical Opt 2000,5(4):411424. 10.1117/1.1289142View ArticleGoogle Scholar
 Layek R, Datta A, Pal R, Dougherty ER: Adaptive intervention in probabilistic Boolean networks. Bioinformatics 2009,25(16):20422048. 10.1093/bioinformatics/btp349View ArticleGoogle Scholar
 Chaos Á, Aldana M, EspinosaSoto C, García Ponce de León B, Garay Arroyo A, AlvarezBuylla ER: From genes to flower patterns and evolution: dynamic models of gene regulatory networks. J Plant Growth Regul 2006,25(4):278289. 10.1007/s0034400600688View ArticleGoogle Scholar
 Peleg M, Rubin D, Altman R: Using Petri net tools to study properties and dynamics of biological systems. J Am Med Informatics Assoc 2005,12(2):181199.View ArticleGoogle Scholar
 Laubenbacher R, Stigler B: A computational algebra approach to the reverse engineering of gene regulatory networks. J Theor Biol 2004,229(4):523537. 10.1016/j.jtbi.2004.04.037MathSciNetView ArticleGoogle Scholar
 Dimitrova ES, Jarrah AS, Laubenbacher R, Stigler B, Gröbner A: Fan method for biochemical network modeling. Proc International Symposium on Symbolic and Algebraic Computation (ISSAC), ACM 2007, 122126.Google Scholar
 Jarrah A, Laubenbacher R, Stigler B, Stillman M: Reverseengineering of polynomial dynamical systems. Adv Appl Math 2007, 39: 477489. 10.1016/j.aam.2006.08.004MathSciNetView ArticleGoogle Scholar
 Stigler B, Jarrah A, Stillman M, Laubenbacher R: Ann NY Acad Sci. 2008, 1115: 168177.View ArticleGoogle Scholar
 Jarrah A, Laubenbacher R, VelizCuba A: The dynamics of conjunctive and disjunctive Boolean network models. Bull Math Bio 2010.Google Scholar
 VelizCuba A, Jarrah A, Laubenbacher R: Polynomial algebra of discrete models in systems biology. Bioinformatics 2010.Google Scholar
 Cantone I, Marucci L, Iorio F, Ricci MA, Belcastro V, Bansal M, Santini S, di Bernardo M, di Bernardo D, Cosma MP: A yeast synthetic network for in vivo assessment of reverseengineering and modeling approaches. Cell 2009, 137: 172181. 10.1016/j.cell.2009.01.055View ArticleGoogle Scholar
 Buchberger B: Gröbner Bases: an Algorithmic Method in Polynomial Ideal Theory. Volume Chapter 6. Edited by: NK Bose D. Reidel Publishing Co., Dordrecht, Boston, Lancaster; 1985:184232.Google Scholar
 Robbiano L: On the theory of graded structures. J Symbolic Comput 1986, 2: 139170. 10.1016/S07477171(86)800190MathSciNetView ArticleGoogle Scholar
 Allen E, Fetrow J, Daniel L, Thomas S, John D: Algebraic dependency models of protein signal transduction networks from timeseries data. J Theor Biol 2006,238(2):317330. 10.1016/j.jtbi.2005.05.010MathSciNetView ArticleGoogle Scholar
 Mora T, Robbiano L: Gröbner fan of an ideal. J Symbolic Comput 1988,6(2/3):183208.MathSciNetView ArticleGoogle Scholar
 Jensen A:Gfan, a software system for Gröbner fans. 2005. [http://www.math.tuberlin.de/~jensen/software/gfan/gfan.html]Google Scholar
 Fukuda K, Jensen AN, Thomas RR: Computing Gröbner fans. Math Comput 2007,76(260):21892212. 10.1090/S0025571807019862MathSciNetView ArticleGoogle Scholar
 Dimitrova ES: Estimating the relative volumes of the cones in a Groebner fan. Math Comput Sci 2009,3(4):457.MathSciNetView ArticleGoogle Scholar
 Yu J, Smith VA, Wang PP, Hartemink AJ, Jarvis ED: Advances to Bayesian network inference for generating causal networks from observational biological data. Bioinformatics 2004, 20: 35943603. 10.1093/bioinformatics/bth448View ArticleGoogle Scholar
 Della Gatta G, Bansal M, AmbesiImpiombato A, Antonini D, Missero C, di Bernardo D: Direct targets of the TRP63 transcription factor revealed by a combination of gene expression profiling and reverse engineering. Genome Res 2008, 18: 939948. 10.1101/gr.073601.107View ArticleGoogle Scholar
 Gardner TS, di Bernardo D, Lorenz D, Collins JJ: Inferring genetic networks and identifying compound mode of action via expression profiling. Science 2003, 301: 102105. 10.1126/science.1081900View ArticleGoogle Scholar
 Basso K, Margolin AA, Stolovitzky G, Klein U, DallaFavera R, Califano A: Reverse engineering of regulatory networks in human B cells. Nat Genet 2005, 37: 382390. 10.1038/ng1532View ArticleGoogle Scholar
 Dimitrova ES, McGee J, Laubenbacher R, Vera Licona P: Comparison of discretization methods for network inference. J Comput Biol 2010.Google Scholar
 Kauffman S, Peterson C, Samuelsson B, Troein C: Random Boolean network models and the yeast transcriptional network. PNAS 2003,100(25):1479614799. 10.1073/pnas.2036429100View ArticleGoogle Scholar
 Lidl R, Niederreiter H: Finite Fields, Encyclopedia of Mathematics and Its Applications. 2nd edition. Cambridge University Press, New York; 1997.Google Scholar
 Cox D, Little J, O'Shea D: Ideals, Varieties, and Algorithms. Springer Verlag, New York; 1997.Google Scholar
 Pritchard D: Walking Through the Gröbner Fan.2001. [http://sma.epfl.ch/~pritchar/math/2001/Groebner.pdf]Google Scholar
 Buchberger B: An algorithm for finding a basis for the residue class ring of a zerodimensional polynomial ideal. In Ph.D. Thesis. University of Innsbruck; 1965.Google Scholar
 Dimitrova ES, GarciaPuente LD, Hinkelmann F, Jarrah AS, Laubenbacher R, Stigler B, VeraLicona P: Parameter estimation for Boolean models of biological networks. J Theor Comput Sci 2010.Google Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.