Origins of Stochasticity and Burstiness in High-Dimensional Biochemical Networks
© Simon Rosenfeld. 2009
Received: 5 February 2008
Accepted: 24 April 2008
Published: 14 May 2008
Skip to main content
© Simon Rosenfeld. 2009
Received: 5 February 2008
Accepted: 24 April 2008
Published: 14 May 2008
Two major approaches are known in the field of stochastic dynamics of intracellular biochemical networks. The first one places the focus of attention on the fact that many biochemical constituents vitally important for the network functionality may be present only in small quantities within the cell, and therefore the regulatory process is essentially discrete and prone to relatively big fluctuations. The second approach treats the regulatory process as essentially continuous. Complex pseudostochastic behavior in such processes may occur due to multistability and oscillatory motions within limit cycles. In this paper we outline the third scenario of stochasticity in the regulatory process. This scenario is only conceivable in high-dimensional highly nonlinear systems. In particular, we show that burstiness, a well-known phenomenon in the biology of gene expression, is a natural consequence of high dimensionality coupled with high nonlinearity. In mathematical terms, burstiness is associated with heavy-tailed probability distributions of stochastic processes describing the dynamics of the system. We demonstrate how the "shot" noise originates from purely deterministic behavior of the underlying dynamical system. We conclude that the limiting stochastic process may be accurately approximated by the "heavy-tailed" generalized Pareto process which is a direct mathematical expression of burstiness.
High-dimensional biochemical networks are the integral parts of intracellular organization. The most prominent roles in this organization belong to genetic regulatory networks  and protein interaction networks . Also, there are numerous other subsystems, such as metabolic  and glycomic networks , to name just a few. All these networks have several important features in common. First, they are highly diverse, that is, contain numerous (up to tens of thousands) different types of molecules. Second, their dynamics is constrained by a highly structured, densely tangled intracellular environment. Third, their constituents are predominantly macromolecules interacting in accordance with the laws of thermodynamics and chemical kinetics. Fourth, all these networks may be called "unsupervised" in the sense that they do not have an overlying regulatory structure of a nonbiochemical nature. Although the term "regulation" is frequently used in the description of cellular processes, its actual meaning is different from that in the systems control theory. In this theory, the regulatory signal produced by the controller and the way it directs the system are of a different physical nature than the functions of the system under control. In contrast, the intra- and intercellular regulations are of a biochemical nature themselves (e.g., protein signal transduction ); therefore, the subdivision of a system on the regulator and the subsystem-to-be-regulated is largely nominal. In order to be a stabilizing force, a biochemical "controller" should first be stable itself. Logically, such a subdivision serves as a way of compartmentalizing a big biochemical system into relatively independent parts for the simplification of analysis. However, in biology this compartmentalization is rarely unambiguous, and it is never known for sure what regulates what. An indiscriminate usage of the concepts and terminology borrowed from the systems control theory obscures the fundamental fact that intracellular functionality is nothing else than a vast system of interconnecting biochemical reactions between billions of molecules belonging to tens of thousands of molecular species. Therefore, studying general properties of such large biochemical systems is of primary importance for understanding functionality of the cell.
In this work, the focus of attention is placed on the dynamical stability of biochemical networks. First, we show that stringent requirements of dynamical stability have very little chance to be satisfied in the biochemical networks of sufficiently high order. The problem we encounter here is essentially of the same nature as in now classic work by May  where the famous question "will a large complex system be stable?" has been discussed in ecological context. Second, we show that a dynamically unstable system does not necessarily end its existence through explosion or implosion, as prescribed by simple linear considerations. It is possible that such a system would reside in a dynamic state similar to a stationary or slowly evolving stochastic process. Third, we conjecture that the motion in a high-dimensional system of strongly interacting units inevitably includes a pattern of "burstiness," that is, sporadic changes of the state variables in either positive or negative directions.
In biology, burstiness is an experimentally observed phenomenon [7–10], and a variety of theoretical approaches have been developed to understand its origins. Two of them have been especially successful in explanation of the phenomenon of burstiness. In the first one, the focus of attention is placed on the fact that many biochemical constituents vitally important for the network functionality may be present only in small quantities within the cell, and therefore, the regulatory process is essentially discrete and prone to relatively big fluctuations [11, 12]. The second approach treats the regulatory process as essentially continuous. Complex pseudostochastic behavior in such processes may occur due to multistability and oscillatory motions within limit cycles. An extensive summary of this line of theoretical works may be found in [13, 14]. There are numerous other approaches of various levels of mathematical sophistication and adherence to biological realities that attempt to explain the phenomenon of burstiness. It is far beyond the goals of this work to provide a detailed review. Recently published papers [15, 16] are good sources of more comprehensive information. In summary, the origins of stochasticity are so diverse that none of the existing theories may claim to be exhaustive. Each set of unmodeled realities in the system being modeled manifests itself as an additional stochastic force or noise. Stochasticity occurs at all levels of intracellular organization, from a single biomolecule, through the middle-size regulatory units, all the way up to tremendously large and complex systems such as GRN; each of these contexts requires a special tool for mathematical conceptualization.
The goal of this paper is to present a novel scenario of bursting, in addition to the existing ones. Unlike the approaches mentioned above, the mechanism we consider does not require any special conditions for its realization. Rather, it is seen as a ubiquitous property of any high-dimensional highly nonlinear dynamical system, including biochemical networks. The mechanism of stochastic behavior proposed here allows for some experimentally verifiable predictions regarding global parameters characterizing the system.
Interrelations between the stochastic and deterministic descriptions of multidimensional nonlinear systems, in general, and the systems of chemical reactions, in particular, have been given considerable attention in the literature [17–20]. It often happens, however, that an approach, being multidimensional theoretically, stumbles upon insurmountable mathematical difficulties in applications. As a result, there is often a big gap between the sophistication and generality of a theory, on one hand, and simplicity and particularity of the applications, on the other. A big promise in studying really large systems is seen in computational models, the ones that are capable of dealing with dozens  or even hundreds [22–24] of simultaneous biochemical constituents. These models, however, are necessarily linked to particular systems with all the specifics of their functionality and experimentally available parameterization. Due to these narrowly focused designs, computational models are rarely generalizable to other systems with different parameterizations; hence, common features of all such systems are not readily detectable. In addition, so far even big computational models are still too small to be able to capture global properties and patterns of behavior of really big biochemical networks, such as GRN.
The novelty of our approach consists of direct utilization of the property of the system to be "asymptotically diverse"; the bigger the system, the better the approximation we utilize is working. In the biochemical context, the term "asymptotically diverse" does not simply mean that the number of molecules in the system is very large; more importantly, it means that the number of individual molecular species is also very large, and that each of these species requires an individual equation for the description of its dynamics. In this paper, our goal is not in providing a detailed mathematical analysis of any particular biochemical system; rather it is to envision some important global properties and patterns of behavior inherent in the entire class of such systems. The novel message we intend to convey is that burstiness is a fundamental and ubiquitous property of asymptotically diverse nonlinear systems (ADNS). Of course, it would be an oversimplification to ascribe the burstiness in gene expression solely to the property of burstiness of ADNS. Nevertheless, there is little doubt that many subsystems in intracellular dynamics indeed may be seen as ADNS , and as such they may share with them, at least in part, the property of burstiness.
The problem of transition from deterministic to chaotic dynamics in multidimensional systems has long history in physics and mathematics, and a number of powerful techniques have been proposed to solve it [26–29]. It is rarely, however, the case that full strength of these techniques can be actually applied to real systems; far reaching simplifications are unavoidable. Preliminary qualitative exploration supported by partial theoretical modeling and simulation is a necessary step towards developing a theoretically sound yet mathematically tractable approximation. This paper, together with , is intended to provide such an exploration.
A natural basis for the description of chemical kinetics in a multidimensional network is the power-law formalism, also known under the name S-systems [24, 31–33]. Being algebraically similar to the law of mass action (LMA), S-systems proved to be an indispensable tool in the analysis of complex biochemical systems and metabolic pathways . A useful property of S-systems is that S-functions are the "universal approximators," that is, have the capability of representing a wide range of nonlinear functions under mild restrictions on their regularity and differentiability. S-functions are found to be helpful in the analysis of genome-wide data, including those derived from microarray experiments [35, 36]. However, the most important fact in the context of this work is that in the vicinity of equilibrium any nonlinear dynamical system may be represented as an S-system . Unlike mere linearization, which replaces a nonlinear system by the topologically isomorphic linear one, the S-approximation still retains essential traits of nonlinearity but often is much easier to analyze.
known as the law of generalized mass action (GMA). Here are the numbers of concurrent reactions of production and degradation, are the matrices of rates, and are the tensors of stoichiometric coefficients. However, in principle, this more complex system is reducible to form (1) by appropriate redefinition of chemical constituents . Even more important is the fact that any nonlinear dynamical system, after a certain chain of transformations, may be represented in the form (1); for this reason this form is sometimes called "a canonical nonlinear form" (see , and also [41, 42]). At last, as it has been recently shown in , in the vicinity of equilibrium, a wide class of nonlinear systems is topologically isomorphic to the canonical S-system (Appendix A).
where is the rescaled time, , and is the set of constants characterizing constituent-specific rates of chemical transformations (see [30, 43] and Appendix B for definitions and technical details; for simplicity of notation, is further replaced by ).
No simplifications have been made for the derivation of (3). This means that these equations are quite general and may be always derived for any given sets of rates and stoichiometric coefficients.
Equations in (3) may be simultaneously viewed as renormalized equations of chemical kinetics derived from and governed by the laws of nonequilibrium thermodynamics, and also as the equations of an abstract dynamical system, whether originating in chemistry or not. There is a fundamental difference between the dynamic equilibrium resulting from the conditions , and the thermodynamic equilibrium expressed in the LMA in chemical kinetics . The latter assumes, in addition to the fact that the fixed point is the equilibrium point, existence of the detailed balance, that is, full compensation of each chemical reaction by the reverse one. For an arbitrary dynamical system, there are no first principles that would impose any limitations on the structure of the Jacobian matrix, , in the vicinity of the fixed point. This means, in turn, that is just a matrix of general form having the eigenvalues with both positive and negative real parts. Consequently, there are no reasons to assume that the macroscopic law of motion for such systems, that is, , is stable. Although the assumption of stability is frequently introduced in the context of genetic regulation, in fact, it refers to a highly specific condition which is hardly possible in an unsupervised multidimensional system with many thousands of independent governing parameters.
In this context, it is useful to recall some fundamental results pertaining to stability of nonlinear systems. According to the theorem by Lyapunov, the matrix is stable if and only if the equation has a solution, , and this solution is a positive definite matrix . Matrix , if exists, is a complicated function of all the stoichiometric coefficients and kinetic rates characterizing the network. Thus, the Lyapunov criterion would impose a set of very stringent constraints of high algebraic order on the structure of dynamically stable biochemical networks. Another classical approach to stability consists of the application of the Routh-Hurwitz criterion . In this approach, one first calculates the characteristic polynomial of the Jacobian matrix, and then builds the sequence of the so-called Hurwitz determinants from its coefficients. The system is stable if and only if all the Hurwitz determinants are positive. Again, the Routh-Hurwitz criterion imposes a set of very complex constraints on the global structure of a biochemical network. As argued above, apart from the principle of detailed balance (PDB), there are no other first principles and/or general laws governing stability of biochemical systems, and neither the Lyapunov nor the Routh-Hurwitz criteria are the corollaries of PDB. As shown in , the Jacobian matrix of an arbitrary biochemical system may have comparable numbers of eigenvalues with negative and positive real parts. This property holds under widely varying assumptions regarding kinetic rates and stoichiometric coefficients. Therefore, generally, high-dimensional biochemical networks which are not purposefully designed and/or dynamically stabilized (e.g., as in the reactors for biochemical synthesis ) are reasonably presumed to be unstable. Considerable efforts have been undertaken to infer global properties of large biochemical networks far from thermodynamical equilibrium from the first principles; many notable approaches have been developed up to date. Among them are the chemical reaction network theory , stoichiometric network theory , thermodynamically feasible models , imposing constraints of microscopic reversibility , minimal reaction scheme , to name just a few. However, in the majority of these approaches, stability, either dynamical or stochastic, is presumed a priori and serves as a starting point for further considerations. These theories neither question the existence of such stability nor explain why a big biochemical network should necessarily be stable.
where is the matrix of random Pareto-distributed amplitudes and is the set of random point processes coinciding with the events of bursting. Transition from (3) to (13) signifies replacement of purely deterministic dynamics by the pseudostochastic process similar to shot noise. We emphasize again that no assumptions have been made regarding extrinsic noise of any nature which may be present in a dynamical system and which is frequently used as a vehicle for introducing a stochastic element into the system's behavior [17, 57]. The point we make is that even in the absence of such an external source of stochasticity, a multidimensional system itself generates a very complex behavior which for all practical purposes may be regarded as a stochastic process. Formally, this type of stochasticity may be regarded as a case of chaotic dynamics, but it is fundamentally different from what is usually assumed under the terms chaos or chaotic maps in the literature. As known from the literature, chaotic behavior may appear even in a low-dimensional system with a very simple structure of nonlinearity, such as in the celebrated example of Lorenz attractor . Usually in such systems, the bifurcations with transition to chaos appear under highly peculiar conditions expressed in a precise combination of the parameters governing the system. In this sense, chaos is not something typical of low-dimensional nonlinear systems, but rather is a rare and coincidental exclusion from the majority of smoothly behaving systems with a similar algebraic structure. On the contrary, in the model proposed in this work, stochasticity emerges under very general and quite natural conditions without any special requirements imposed on the governing parameters. In this sense, this kind of stochasticity may be regarded as a highly typical all-pervading pattern in the behavior of high-dimensional highly nonlinear dynamical systems.
and therefore, satisfy the same univariate FPE. It is natural to assume that correlation times, , are of the same order of magnitude as the corresponding times of chemical relaxation, , because both introduce characteristic time scales into the individual chemical reactions. Therefore, the entire system may be stratified by only one set of parameters, the kinetic rates, .
Generally, the probabilistic state of a biochemical network may be characterized by joint distribution, of all the chemical constituents which satisfies the multivariate FPE . However, in light of the above simplifications, such a detailed description would be redundant. Instead, we introduce a collection of identical univariate probability distributions, , where is any of the , each satisfying the same FPE with the coefficient of diffusion (22). This self-similarity grossly simplifies analytical treatment of the problem. First, it means that variances, , are directly proportional to the squares of corresponding kinetic rates. Since , we conclude that , that is, in stationary fluctuations, the variances of logarithms of concentrations are proportional to the squares of kinetic rates. This is a testable property of all the large-scale biochemical networks; it may serve as a basis for experimental validation. Furthermore, since is the only set of constituent-specific temporal scaling parameters in the network, it is natural to surmise that the times of correlation, , are directly proportional to the corresponding times of chemical relaxation, . This is another macroscopically observable property suitable for experimental validation.
Due to random partitioning and stochasticity of transcription initiation [60, 61], initial conditions for the system's evolution are considered as random. Starting with these initial conditions, the system is predominantly driven by the sequence of sporadic events of stochastic cooperativity. Although each event produces a noticeable momentary shift in the system's evolution, the multitude of such events makes its overall behavior quite smooth. This behavior is illustrated in Figure 7(d). Smoothness of the trajectories, in practical sense, may be regarded as macroscopic stability, whereas the deviations from these smooth trajectories may be seen as "noise."
As a side note, it is worth mentioning that in this paper, the Pareto representation of exceedances has been derived from the assumption that and are approximately Gaussian processes, and, therefore, and are approximately lognormally distributed. We have justified this closeness to normality of and by the CLT. This assumption, however, only served to simplify the analysis; it may be substantially relaxed at the expense of increased complexity of calculations. Conceptually, all the major ideas leading to the notion of stochastic cooperativity would stay in place even without transition to asymptotic normality. Let us assume again, as we did in the examples in Figures 3-4, that , where are lognormal processes. This time, however, it is not assumed that the number of nonzero elements in these sums is sufficiently large to equate the distributions of sums to their asymptotic limits. This would reflect the situation when the number of transcription factors in GRN is comparatively small. Generally, exact analytical expressions for the distributions of sums of lognormals are unknown, but there is a consensus in the literature that such sums themselves may be accurately modeled as lognormally-distributed . We have performed a simulation for studying the probabilistic structure of the exceedances with lognormal . It is rather remarkable that the GPD turns out to be a good approximation in this drastically nonnormal case as well; the only reservation should be made that simple parameterization (10)-(11) is no longer valid and should be replaced by a more complex one.
Summarizing all these findings, we conclude that inherent dynamical instability of the system considered as deterministic directly translates into heavy-tailness and burstiness in stochastic description. Sequence of events of stochastic cooperativity serves as a link between deterministic and stochastic paradigms.
We have outlined the mechanism by which a multidimensional autonomous nonlinear system, despite being dynamically unstable, nevertheless may be stationary, that is, may reside in a state of stochastic fluctuations obeying the probabilistic laws of random walk. Importantly, in this mechanism, the transition from the deterministic to probabilistic laws of motion does not require any assumptions regarding the presence of extraneous random noise; stochastic-like behavior is produced by the system itself. An important role in forming this type of fluctuative motion belongs to inherent burstiness of the system associated with the events of stochastic cooperativity. Unlike the classical Langevin approach, macroscopic laws of motion of the system are not required to be dynamically stable.
In this work, we have selected the S-systems to be an example of a nonlinear system. Three motivations justified this selection. First, the S-systems are structured after the equations of chemical kinetics, thus being a natural tool for description of high-dimensional biochemical networks. Second, many other nonlinear systems may be represented through the S-systems in the vicinity of fixed point. Third, despite generality, the S-systems have an advantage of being analytically tractable. However, many results regarding stochastic cooperativity and burstiness may be readily extended to other multidimensional nonlinear systems. In such a system, short pulses during the events of stochastic cooperativity may be described in terms of "shot" noise with subsequent derivation of the Fokker-Plank equation. As proposed in this paper, it is possible to indicate some general experimentally verifiable predictions regarding the behavior of this type of system, such as distribution of intensities of fluctuations and distribution of temporal autocorrelations among individual units of the system.
with the parameters dependent on .
We introduce the map, , and rewrite (A.10) as
Formally, these equations may be seen as a system of equation of chemical kinetics with and being the rates, and being stoichiometric coefficients, and being chemical constituents. It is not out of place to mention again, that since and are arbitrary vector functions, then there is no special symmetry in the Jacobian matrix of the system in the vicinity of fixed point. Therefore, there is no reason to expect that its eigenvalues have only negative real parts, that is, that the fixed point is stable.
Note that stoichiometric coefficients and cannot be identical in all the direct and inverse reactions simultaneously, therefore, the matrix is always invertible.
where , and .
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.