Bayesian estimation of the discrete coefficient of determination
 Ting Chen^{1} and
 Ulisses M. BragaNeto^{2}Email author
DOI: 10.1186/s1363701500354
© Chen and BragaNeto. 2016
Received: 11 June 2015
Accepted: 4 December 2015
Published: 15 January 2016
Abstract
The discrete coefficient of determination (CoD) measures the nonlinear interaction between discrete predictor and target variables and has had farreaching applications in Genomic Signal Processing. Previous work has addressed the inference of the discrete CoD using classical parametric and nonparametric approaches. In this paper, we introduce a Bayesian framework for the inference of the discrete CoD. We derive analytically the optimal minimum meansquare error (MMSE) CoD estimator, as well as a CoD estimator based on the Optimal Bayesian Predictor (OBP). For the latter estimator, exact expressions for its bias, variance, and rootmeansquare (RMS) are given. The accuracy of both Bayesian CoD estimators with noninformative and informative priors, under fixed or random parameters, is studied via analytical and numerical approaches. We also demonstrate the application of the proposed Bayesian approach in the inference of gene regulatory networks, using geneexpression data from a previously published study on metastatic melanoma.
Keywords
Discrete coefficient of determination Bayesian inference Gene regulatory network inference1 Introduction
DNA regulatory circuits can be often described by networks of Boolean logical gates updated and observed at discrete time intervals [1–6]. In a stochastic setting, the degree of association between Boolean predictors and targets can be quantified by means of the discrete coefficient of determination (CoD) [7]. As such, the CoD is a function of the joint probability of target and predictor variables, which, however, is usually unknown in practice. Hence, this requires the inference of the discrete CoD given sample data. A larger samplebased CoD value indicates a tighter regulation between target and predictors.
The concept of CoD has farreaching applications in genomics. The CoD was perhaps the first predictive paradigm utilized in the context of microarray data, the goal being to provide a measure of nonlinear interaction among genes [7]. The CoD has been used in the reconstruction or inference of gene regulatory networks using gene expression data quantized into discrete levels [8–11]. It has also been used in the definition of the intrinsicallymultivariate prediction (IMP) criterion for the characterization of canalizing genes [12, 13]. In [14–16], we studied the inferential theory of the discrete CoD in a classical framework, by means of nonparametric and parametric maximumlikelihood estimation (MLE) approaches.
Classical parametric and nonparametric approaches to CoD estimation have been investigated in [14, 15]. In the present paper, we introduce a fully Bayesian approach to the inference of the discrete CoD, based on a parameterized family of targetpredictor distributions. Given the priors, the probability model and sample data, we obtain the posterior distributions of the parameters, which can then be used to obtain the optimal predictors and prediction error estimators for the given problem. Such a Bayesian approach for prediction error estimation was first introduced in [17, 18], in a classification context.
Part of the work presented here appeared in [19], which introduced the minimum meansquare error (MMSE) Bayesian CoD estimator. In the present paper, we provide an exact representation of the analytical expressions of this estimator, and in addition, introduce the optimal Bayesian predictor (OBP) CoD estimator, which is based on an optimal predictor with the minimum expected true error with respect to the posterior distributions of the parameters [20, 21]. We derive exact formulas for the bias, variance, and rootmeansquare (RMS) error of the OBP CoD estimator. The accuracy of both Bayesian CoD estimators is compared against that of several nonparametric CoD estimators by numerical simulations. The results indicate that the Bayesian MMSE CoD estimator is the best one when averaged over all distributions and samples, whereas the simpler OBP CoD estimator, though suboptimal in the MMSE sense, can be more accurate than the MMSE CoD estimator, in a frequentist sense, under lowvariance informative priors around fixed parameters corresponding to a fixed distribution between target and predictors. It is also unsurprisingly found that priors with higher densities around true fixed distributions produce more accurate Bayesian estimators in a frequentist sense.
This paper is organized as follows. In Section 2, we introduce the discrete model for prediction and present the coefficient of determination in this model. In Section 3, we develop a Bayesian framework of the inference of the discrete CoD, define two Bayesian CoD estimators, one in the sense of minimum meansquare error (MMSE), and the other based on the optimal Bayesian classifier, and derive the analytical expressions for both Bayesian CoD estimators. In Section 4, we first present an exact formulation of accuracy metrics for the OBP CoD estimator. Afterwards, we discuss the accuracy of both Bayesian CoD estimators when averaged over all distributions and samples as well as under fixed distributions under varying priors, and their comparison with the nonparametric CoD estimators. Section 6 describes an approach to the inference of gene regulatory networks using the proposed Bayesian CoD estimators and illustrates the approach with gene expression data from a previously published study on metastatic melanoma. Finally, Section 7 presents concluding remarks.
2 The discrete coefficient of determination
where ε _{0} is the minimum error of predicting Y by a constant (i.e., in the absence of observations) and ε is the minimum error of predicting Y based on the observation of X. Since ε≤ε _{0} (all sensible error criteria satisfy this property), the CoD ranges from 0 to 1. The closer it is to one, the closer ε is to zero and the tighter the association between predictor and target variables, whereas the closer it is to zero, the closer ε is to ε _{0} and the weaker the association is. By convention, CoD=0 when ε _{0}=0. The CoD is a function only of the distribution of (X,Y); in particular, it is not a function of sample data. This definition of the CoD reduces gracefully to the classical one in the case when (X,Y) is jointly Gaussian [7].
This formula gives the relationship between the CoD and the parameters of the distribution of (X,Y).
3 Bayesian CoD estimators
In practice, the distributional parameters are generally unknown, and one would like to estimate the CoD from sample data. We present in this section the derivation of two Bayesian estimators for the CoD in (4). One approach is analogous to that followed by [17] in defining the Bayesian MMSE prediction error estimator, whereas the other one makes use of the optimal Bayesian predictor (OBP), a straightforward generalization of the optimal Bayesian classifier (OBC), introduced in [20].
We will assume that an i.i.d. sample S _{ n }={(X _{1},Y _{1}),…,(X _{ n },Y _{ n })} from the distribution of (X,Y) is available. Given S _{ n }, define U _{ i } as the number of sample points such that X=x ^{ i } and Y=0, and V _{ i } as the number of sample points such that X=x ^{ i } and Y=1, for \(i = 1, \dots,2^{d}\). Note that \(N_{0} = \sum _{i=1}^{2^{d}} U_{i}\) and \(N_{1} = \sum _{i=1}^{2^{d}} V_{i}\) are the (random) sample sizes corresponding to Y=0 and Y=1, respectively.
Let \(\phantom {\dot {i}\!}{\mathbf p} = (p_{1}, \ldots, p_{2^{d}})\), \(\phantom {\dot {i}\!}{\mathbf q} = (q_{1},\ldots,q_{2^{d}})\), and θ=(c,p,q), where 0≤c,p _{ i },q _{ i }≤1, and \(\sum _{i=1}^{2^{d}} p_{i} = \sum _{i=1}^{2^{d}} q_{i} =1\). As shown in the previous section, the distribution of (X,Y) is completely specified by the parameter vector θ. The Bayesian approach treats θ as a random variable, the prior distribution of which can take advantage of a priori knowledge about the problem. We will assume that c, p, and q are independent, i.e., f(θ)=f(c)f(p)f(q). It is shown in [17] that this implies that the posterior distribution of θ also factors f(θ∣S _{ n })=f(c∣S _{ n })f(p∣S _{ n })f(q∣S _{ n }).
where the hyperparameters α, β, α _{ i }, β _{ i }, i=1,…,2^{ d }, are positive numbers. These distributions have bounded supports; the Beta distribution is defined over the interval [0,1], while the Dirichlet distribution is defined over the simplex of 2^{ d } nonnegative numbers that add up to one. The shapes of the distributions are controlled by the concentration parameters Δ _{ c }=α+β, \(\Delta _{p} = \sum _{j=1}^{2^{d}} \alpha _{j}\), and \(\Delta _{q} = \sum _{j=1}^{2^{d}} \beta _{j}\), and the base measures c _{0}=α/Δ _{ c }, \({\mathbf p}_{0} = (\alpha _{1}/\Delta _{p}, \dots, \alpha _{2^{d}}/\Delta _{p})\), and \({\mathbf q}_{0} = (\beta _{1}/\Delta _{q}, \dots, \beta _{2^{d}}/\Delta _{q})\). Please refer to Appendices A and B for definitions and important facts about the Beta and Dirichlet distributions, which will be needed in the sequel.
where n _{0} and n _{1} are the observed sample sizes corresponding to Y=0 and Y=1, respectively, while U _{ i } and V _{ i } are the observed sample values of the random variables U _{ i } and V _{ i }, respectively.
3.1 Minimum meansquare error CoD estimator
where the CoD is given by (4).
It is wellknown that the MMSE estimator \(\widehat {\text {CoD}}_{\text {MMSE}}\) not only displays the least root meansquare error (RMS) over the distribution of (θ,S _{ n }), but it is also an unbiased estimator (however, for a specific model with fixed θ, \(\widehat {\text {CoD}}_{\text {MMSE}}\) might not be unbiased or have the least RMS).
where the Beta function B(a,b) and the coefficients r _{ i }(a,b) are defined in Appendix A.
Replacing (11)–(14) into (10) produces an exact expression for computing the MMSE CoD estimator in terms of sample sizes and model hyperparameters. Notice that for the previous expressions to make sense, one must have α>Δ _{ p }−1 and β>Δ _{ q }−1. In particular, if uniform priors are chosen for p or q, then the prior for c cannot be uniform (c.f. Appendix A).
3.2 Optimal Bayesian predictor CoD estimator
It is easy to show that \(0 \leq \hat {\varepsilon }_{\text {OBP}} \leq \hat {\varepsilon }_{0,\text {OBP}}\), and thus \(0 \leq \widehat {\text {CoD}}_{\text {OBP}} \leq 1\).
Execution time for computation of the OBP CoD estimator grows as O(2^{ d }). By comparison, the complexity for exact computation of the Bayesian MMSE CoD estimator introduced in the previous subsection is O(n ^{3} × 2^{ d }). Neither n or d tends to be too large in Genomics applications, due to small sample sizes and the fact that the average number of predictor genes d per target gene must be small for a stable system, as remarked by S. Kauffman in [2]. However, if n and d become large, one could devise Monte Carlo approximation methods to compute both CoD estimators.
Therefore, the OBP CoD estimator, though suboptimal, is much more efficient computationally than the MMSE CoD estimator, especially at large sample sizes. In addition, we will see in the next section that the OBP CoD can be even more accurate than the MMSE CoD estimator, in frequentist sense, under a fixed value of the parameters.
4 Performance analysis
In this section, we investigate the accuracy of the Bayesian CoD estimators proposed in the previous section. We distinguish between two types of accuracy metrics: global metrics concern the average performance over all samples and all distributions of (X,Y), weighted by the prior distribution of θ, whereas fixedparameter metrics have to do with the average performance over all samples, but under a particular distribution of (X,Y), corresponding to a fixed value of the parameter θ. Fixedparameter metrics thus evaluate the proposed Bayesian estimators from a purely frequentist perspective.
It becomes clear that the fixedparameter bias, variance, and RMS of a Bayesian CoD estimator can be obtained with knowledge of the first and second moments \(E_{\mathbf {S}_{n} \mid \boldsymbol {\theta }}\!\left [ \frac {\hat {\varepsilon }}{\hat {\varepsilon }_{0}} \right ]\) and \(E_{\mathbf {S}_{n} \mid \boldsymbol {\theta }}\!\left [\frac {\hat {\varepsilon }^{2}}{\hat {\varepsilon }_{0}^{2}}\right ]\).
The corresponding global accuracy metrics are obtained by taking expectation of the previous quantities with respect to the marginal (i.e., prior) distribution of θ.
As mentioned previously, the global bias of the Bayesian MMSE CoD estimator is zero and its global RMS is minimal among all CoD estimators. However, this does not imply that its fixedparameter bias is zero or that its fixedparameter RMS is minimum for all values of the parameter.
In what follows, we give exact expressions for the computation of \(E_{\mathbf {S}_{n} \mid \boldsymbol {\theta }}\!\left [ \frac {\hat {\varepsilon }}{\hat {\varepsilon }_{0}} \right ]\) and \(E_{\mathbf {S}_{n} \mid \boldsymbol {\theta }}\!\left [\frac {\hat {\varepsilon }^{2}} {\hat {\varepsilon }_{0}^{2}}\right ]\)for the OBP CoD estimator. As argued previously, this allows the exact computation of the fixedparameter bias, variance, and RMS of that CoD estimator. Via simple numerical integration, it is possible then to obtain the global bias, variance, and RMS. It turns out that similar expressions for the MMSE CoD estimator are much harder to obtain; the performance of that estimator are studied via a numerical approach in the next section.
All the expectations and probabilities below are with respect to S _{ n } ∣ θ (the subscript will be omitted for convenience). In the expressions below, c, p _{ i }, and q _{ i }, for i=1,…,2^{ d } refer to the (deterministic) parameters in θ.
We assume that α−⌊α⌋=β−⌊β⌋ but α≠β. Without loss of generality, we assume that α>β, and let α=β+δ, where δ is a positive integer. Notice that it is easy to show in this case that \(\lfloor \frac {n+\beta \alpha }{2} \rfloor + \alpha = \lfloor \frac {n+\alpha \beta }{2} \rfloor + \beta \) by considering the evenness and oddness of n and δ. Therefore, we have L _{0}⊂L _{1}. In the following, we discuss two cases when n+β−α is even and when n+β−α is odd.
(1) When n+β−α is even, the event [M=m] is equal to the union of the disjoint events [ n _{0}=m−α], and [n _{1}=m−β]=[n _{0}=n−m+β], for \(m\in L_{0} \backslash \left \{\alpha + \frac {n+\beta \alpha }{2} \right \}\), whereas \(\left [M=\alpha +\frac {n+\beta \alpha }{2} \right ] = \left [n_{0} = m  \alpha = \frac {n+\beta \alpha }{2} \right ]\), and [ M=m] = [ n _{0}=n−m+β], for m∈L _{1}∖L _{0}.
with \(P(U_{i}=k,V_{i}=l \mid n_{0} = r) \,=\, \binom {r}{k}{p_{i}^{k}}(1p_{i})^{rk}\binom {nr}{l}{q_{i}^{l}}(1q_{i})^{nrl}\). The expression for \(E\left [\frac {\hat {\varepsilon }_{\text {OBP}}}{(n_{1}+\beta)/(n+\Delta _{c})} \;\big \; n_{0} = nr\right ]\) is obtained from (27), with r+α replaced by r+β.
(2) When n+β−α is odd, the event [M=m] is equal to the union of the disjoint events [n _{0}=m−α], and [n _{1}=m−β]=[n _{0}=n−m+β], for m∈L _{0}, whereas [M=m]=[n _{0}=n−m+β], for m∈L _{1}∖L _{0}. By applying the same reasoning, we have the same expression as in (26). Note that \(I_{n_{1} \neq \frac {n+\alpha \beta }{2}} \) is always equal to 1 in this case since \(\frac {n+\alpha \beta }{2}\) is not an integer.
with \(P(U_{i}=k, V_{i}=l, U_{j}=r, V_{j}=s \mid n_{0}=t) = \binom {n_{0}}{k,r}{p_{i}^{k}}{p_{j}^{r}}(1p_{i}p_{j})^{n_{0}kr} \binom {nn_{0}}{l,s}{q_{i}^{l}}{q_{j}^{s}}(1q_{i}q_{j})^{nn_{0}ls}\). The expression for \(E\left [\frac {\hat {\varepsilon }_{\text {OBP}}^{2}}{(n_{1}+\beta)^{2}/(n+\Delta _{c})^{2}} \;\bigg \; n_{0} = n  t\right ]\) is obtained from (30) with t+α replaced by t+β.
5 Numerical experiments
5.1 Global accuracy
In this section, we employ Monte Carlo sampling (with M=10,000 simulated data sets for each sample size) to compute global accuracy metrics of the two Bayesian CoD estimators. Following [17], we let α=2^{ d }+1=β=2^{ d }+1, which produces a prior for c peaked around the value c=0.5, and α _{ i }=β _{ i }=1, for all i=1,…,2^{ d }, i.e. flat (uniform) prior distributions for (p,q). In each iteration, the values of c and (p,q) are drawn from the respective priors, and then sample data is generated according to these probabilities. Given the sample data, we compute the exact Bayesian MMSE and OBP CoD estimates as expressed in Section 3, and compare them to the standard resubstitution CoD estimator, which is based on plugging in sample frequencies in the expression for the optimal CoD, and corresponds to the the original choice of CoD estimator in [7]. This estimator is also called the nonparametric maximumlikelihood CoD estimator in [15]. For further comparison, we also compute CoD estimators based on leaveoneout, 0.632 bootstrap and 10repeated twofold crossvalidation error estimators—for details on all these CoD estimators, please see [14, 15]. Sample means and sample variances are employed to approximate the global accuracy metrics of each CoD estimator.
5.2 Fixedparameter accuracy
In this section, we study the average accuracy of the two proposed Bayesian CoD estimators for a fixed parameter, that is, we evaluate the Bayesian estimators from a purely frequentist perspective.
True distributions and nonflat prior base measures for fixedparameter experiments. In all cases, c ^{∗}=c _{0}=0.5, and q ^{∗} and q _{0} are obtained from p ^{∗} and p _{0}, respectively, by flipping left to right (see text.)
True distribution  Base measure 1  Base measure 2  Base measure 3  

d=1  p ^{∗} = (0.6,0.4)  \({\mathbf p}_{0}^{1} \,=\, (0.6,0.4)\)  \({\mathbf p}_{0}^{2} \,=\, (0.5,0.5)\)  \({\mathbf p}_{0}^{3} \,=\, (0.4,0.6)\) 
d=2  p ^{∗} = (0.2,0.3,0.1,0.4)  \({\mathbf p}_{0}^{1} \,=\, (0.2,0.3,0.1,0.4)\)  \({\mathbf p}_{0}^{2} \,=\, (0.3,0.2,0.2,0.3)\)  \({\mathbf p}_{0}^{3} \,=\, (0.4,0.1,0.3,0.2)\) 
d=3  p ^{∗} = (0.1,0.15,0.05,0.2,  \({\mathbf p}_{0}^{1} \,=\, (0.1,0.15,0.05,0.2,\)  \({\mathbf p}_{0}^{2} \,=\, (0.15, 0.1, 0.1, 0.15,\)  \({\mathbf p}_{0}^{3} \,=\, (0.2, 0.05, 0.15, 0.1,\) 
0.15,0.1,0.1,0.15)  0.15,0.1,0.1,0.15)  0.1,0.05,0.15,0.2)  0.05,0.2,0.2,0.05)  
Matched prior  Poorly matched prior  Mismatched prior 
Table 1 gives the values of the parameters used in the experiments. In all cases, the true value and base measure for c are the same, c ^{∗}=c _{0}=0.5. In addition, in each case, the true value q ^{∗} and base measure q _{0} are obtained from p ^{∗} and p _{0}, respectively, by flipping the corresponding vector left to right; for example, when p _{0}=(0.2,0.1,0.3,0.4) then q _{0}=(0.4,0.3,0.1,0.2). Therefore, only the values for p ^{∗} and p _{0} are shown in Table 1.
We can observe that, as expected, both Bayesian CoD estimators perform better when the prior is matched to the true value of the parameters than when the match is poor or nonexistent. In addition, for the matched prior, accuracy improves substantially as one moves from a diffuse (highvariance) to a peaked (lowvariance) prior. This effect is especially visible in the case of the OBP CoD estimator. For example, with d=1 the RMS is reduced by nearly 80 % between the highvariance and lowvariance matched priors. In fact, the accuracy of the OBP CoD estimator beats that of the MMSE CoD estimator for peaked priors, while the opposite is true under diffuse priors. Both Bayesian CoD estimators outperform the nonparametric ones in cases d=1 and d=3, whereas, in the d=2 case, the Bayesian estimators based on mismatched or poorly matched priors can perform worse than the nonparametric estimators, for larger sample size. It is also observed that, as the variance of priors decreases (i.e., for a larger δ value), the performance of both Bayesian estimators improves over the nonparametric ones. Moreover, it is interesting that the Bayesian MMSE CoD estimator performs better than the OBP CoD estimator, for a highvariance prior with matched prior, while the OBP CoD estimator beats the Bayesian MMSE CoD estimator for medium and highvariance matched priors. This indicates that the OBP CoD estimator is preferable due to its straightforward representation and superior performance with lowvariance priors. Notice that the Bayesian MMSE CoD estimator has the least RMS only when averaged over all distributions and all possible samples, but its optimality does not apply to the settings with a fixed distribution. In addition, we observe that the Bayesian MMSE CoD estimator is less variant than the OBP CoD estimator. It can be seen that the Bayesian CoD estimators based on informative priors are less variant than those based on noninformative uniform priors. In the d=1 and d=3 cases, the OBP CoD estimator with uniform priors becomes more variant than even the crossvalidation estimator, for larger sample size. In addition, the OBP CoD estimator is less biased in magnitude than the MMSE estimator for lowvariance matched priors. However, as the variance of priors increases, the Bayesian MMSE CoD estimator turns out to have less bias than the OBP estimator.
6 Gene regulatory network inference: a melanoma example
We discuss in this section the application of the Bayesian CoD estimation approach discussed previously to the inference of gene regulatory networks. We apply the proposed inference procedure on data collected in a study of metastatic melanoma [24], containing 31 binarized sample expression profiles, which have been binarized, with 0 indicate no significant expression whereas 1 represents significant expression (either over or underexpression). It was found in [24] that the WNT5A gene is a major driver of processes that lead to metastatic melanoma. We derive the logic relationships and wiring of a 7gene WNT5A network consisting of genes selected using data analysis and prior biological knowledge: WNT5A, pirin, S100P, RET1, MART1, HADHB, and STC2; for more details about the selection of these genes, see [25, 26].
where f:{0,1}^{ d }→{0,1} is a Boolean function, the symbol “ ⊕” indicates modulo2 addition, and N∈{0,1} is a noise Bernoulli random variable, independent from X, such that P(N=0)=p. The modulo2 addition behaves as a XOR operation, which flips the state of the target Y when N=1, and leaves it unaltered when N=0. Hence, p quantifies the predictive power of the model: if p=1, the system is noiseless and prediction is deterministic, while if p<1, there is a degree of indeterminacy in the state of the target given the state of the predictors. This model is studied in detail in [15], where an inference procedure, based on a maximumlikelihood CoD estimator, is proposed to select the unknown Boolean function f, assuming that f is a member of a candidate model set F containing Boolean functions that depend on the same number k of essential variables. Each f in F is specified by (1) a Boolean function g:{0,1}^{ k }→{0,1} and (2) the indices for the predicting variable set {i _{1},…,i _{ k }}⊂{1,…,d}, or wiring, such that \(f(\mathbf {X}) \,=\, g\left (X_{i_{1}},\ldots,X_{i_{k}}\right)\). If the candidate boolean functions g belong to a model set G, then the total number of possible models is \(G \times \binom {d}{l}\).
Here, we modify the network inference in [15] to allow the use of the Bayesian CoD estimators described previously. For a given target Y and predictor set X, we assume Dirichlet prior distributions as in (5). Instead of adopting a noninformative choice of hyperparameters, we employ an “empirical Bayes” approach, where the hyperparameters are estimated in part from the sample data, as described next.
Recall from Section 3 that the shape of the Dirichlet prior distribution is determined by the hyperparameters through a location parameter, called the base measure, and a concentration parameter. Our strategy is to set up the estimates in (34) as the base measure, so that the Dirichlet priors are concentrated around them, to a degree specified by the concentration parameter. Formally, the hyperparameters are set to: \(\{\alpha _{1}, \dots, \alpha _{2^{d}}\} = \{ \lceil \hat {p}_{1} \Delta \rceil, \ldots, \lceil \hat {p}_{2^{d}} \Delta \rceil \}\), \(\{\beta _{1}, \dots, \beta _{2^{d}}\} = \{ \lceil \hat {q}_{1} \Delta \rceil, \ldots, \lceil \hat {q}_{2^{d}} \Delta \rceil \}\), \(\alpha = \lceil \hat {c} \Delta \rceil \) and \(\beta = \lceil (1\hat {c}) \Delta \rceil \), where ⌈x⌉ gives the smallest integer larger or equal to x. The value of δ is tuned by the experimenter, either manually or using a datadriven procedure.
We are now ready to state the procedure to select a function f in F, consisting of a kpredictor Boolean function g and its wiring.
6.1 Bayesian model selection procedure
 1.
For each of the Boolean functions g∈G, compute the prior hyperparameters as described earlier. Obtain the MMSE Bayesian CoD / OBP CoD estimate under each of the \(\binom {d}{k}\) possible wirings. Pick the wiring for g that produces the largest CoD estimate. Ties, if any, are broken randomly.
 2.
Among the G pairs of Boolean function g and wiring obtained in the previous step, select the one that produces the largest predictive power estimate \(\hat {p}\). Ties, if any, are broken randomly.
In our experiment with the 7gene WNT5A network, we consider in turn each gene as a target and the remaining six genes as predictors (so that a gene cannot be a predictor of itself). Hence, d=6. In addition, we assume that that each gene is predicted by three genes out of the six predictors. Therefore, k=3 and there are \(\binom {6}{3}=20\) possible wirings for each target gene. The set G contains all 218 Boolean functions of exactly three essential variables (this is less than the full set of \(2^{2^{3}} = 256\) 3input Boolean functions since those that are reducible to 0, 1, and 2input logics are not considered). We set Δ=1.0 and apply the proposed Bayesian model selection procedure to infer a gene regulatory network for the MMSE and OBP CoD estimators. We also obtain the gene regulatory network produced by employing the standard model selection procedure, which picks the predictor set (among all \(\binom {6}{3} = 20\) choices, in this case) with the largest estimated resubstitution CoD [25].
7 Conclusions
We introduced a Bayesian framework for the estimation of the CoD in a discrete prediction setting and analyzed the accuracy of the proposed Bayesian MMSE and OBP CoD estimators based on fixed and random parameters, using analytical and simulation methods. We also compared the accuracy of the two Bayesian CoD estimators against those of several classical CoD estimators, based on resubstitution, leaveoneout, bootstrap, and crossvalidation prediction error estimation. Our results indicated that the Bayesian MMSE CoD estimator has the best performance with zero bias and least RMS, when averaged over all distributions and sample data, whereas, for fixed distributions, we conclude that priors with higher densities around the fixed distributions present better accuracy with less RMS. It is also interesting to see that the OBP CoD estimator, one with very simple calculation, can beat the Bayesian MMSE CoD estimator when using lowvariance priors with higher densities around the parameters of the fixed distributions. Furthermore, we proposed an approach for inference of gene regulatory networks based on the proposed Bayesian CoD estimators, and applied it to the inference of a 7gene regulatory network using melanoma data. We observed that the inferred boolean functions and wirings were similar for both CoD Bayesian estimators. Interestingly, the network inferred with the OBP CoD estimator was very close to the network obtained with the standard inference method based on the resubstitution CoD estimator; however, the magnitude of the CoDs were larger in the latter case, which is consistent with the fact that resubstitution tends to be optimistic. We hope that this paper will provide a theoretical foundation for further work on Bayesian estimation methodologies for the inference of gene regulatory networks. The issue of obtaining informative priors based on established biological knowledge about regulatory relationships, which was not addressed in detail here, is one that deserves careful consideration in future work on this topic.
8 Endnote
^{1} A proof of this fact is given in the Appendix of [14].
9 Appendix A: the beta distribution
Clearly, B(a,b)=B(b,a).
For example, E[ X]=B(a+1,b)/B(a,b)=a/(a+b) (the second equality can be proved using the definition of the Beta function in terms of the Gamma function and the properties of the latter [28]). Similarly, E[ 1/X]=B(a,b−1)/B(a,b)=(a+b−1)/(b−1), provided that b>1.
Notice that B(a,b)=IB(1;a,b).
which follows easily from the definitions of the Beta density and the incomplete Beta function, and the fact that X∈[ 0,1]. In particular, if x≥1, then E[X ^{ k }(1−X)^{ l } I _{ X≤x }]=E[X ^{ k }(1−X)^{ l }].
Clearly, all the previous quantities can be computed in terms of the incomplete beta function, an expression of which is given by the next result.
Theorem 1.
otherwise.
Proof.

(i) \(\left (1)^{i} \binom {b1}{i} x^{a+i1} \right  \leq g_{i}(x)\), for all k and almost all x;

(ii) \(\sum _{i=0}^{\infty } g_{i}(x)\) converges for almost all x;

(iii) \(\sum _{i=0}^{\infty } {\int _{0}^{1}} g_{i}(x) dx < \infty \).
Let \( g_{i}(x) = \left  \binom {b1}{i} \right  x^{a+i1}, i = 0, \ldots, \infty \), and obviously the condition (i) is satisfied.
\(\sum _{i=0}^{\infty } {\int _{0}^{1}} g_{i}(x) dx\) converges by Raabe’s test [29].
Notice that \(B(a,b) = \sum _{i=0}^{P} r_{i}(a,b)\). Note also that the general case reduces to the special case if b is an integer. An equivalent expression can be derived where a appears in the binomial coefficient instead, which can then be used if a is an integer. If neither a nor b are integers, an approximation can be obtained by truncating the resulting infinite series, or by using a numerical software package.
If both a and b are integers, then IB(x;a,b) reduces to a polynomial in x. Otherwise, it is a simple matter to replace the finite summations by infinite series as specified in Theorem 1.
10 Appendix B: the Dirichlet distribution
From the previous equations, one can see that, as δ approaches infinity, variances converge to zero and X becomes equal to the base measure with probability 1; in addition, covariances also go to zero, rendering the components of X uncorrelated. The special case a _{ i }=1, for all i=1,…,K corresponds to a uniform over S _{ K−1}. This corresponds to a uniform base measure and concentration parameter Δ=K. If the base measure is not uniform but Δ=K, the distribution is approximately uniform. For δ approaching zero, the distribution becomes concentrated at the boundary of the simplex.
Summing up, large δ implies large probability density around the base measure, Δ=K implies a nearly uniform distribution, whereas δ close to zero produces sparse sample vectors with most of the components close to zero.
The Dirichlet distribution is the multivariate generalization of the Beta distribution, in the sense that the components of a Dirichletdistributed vector X=(X _{1},…,X _{ K }) are Beta distributed: X _{ i }∼Beta(a _{ i },Δ−a _{ i }), for i=1,…,K. Notice that in the case K=2 the Dirichlet distribution essentially reduces to the Beta distribution.
Declarations
Acknowledgements
The authors acknowledge the support of the National Science Foundation, through NSF award CCF1320884.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Authors’ Affiliations
References
 S Kauffman, Metabolic stability and epigenesis in randomly constructed genetic nets. J Theor. Biol. 22(3), 437–467 (1969).View ArticleGoogle Scholar
 S Kauffman, The Origins of Order: SelfOrganization and Selection in Evolution (Oxford University Press, New York, NY, 1993).Google Scholar
 S Bornholdt, Boolean network models of cellular regulation: prospects and limitations. J. R. Soc. Interface. 5(1), S85—S94 (2008).Google Scholar
 R Albert, H Othmer, The topology of the regulatory interactions predicts the expression pattern of the segment polarity genes in drosophila melanogaster. J. Theor. Biol. 223(1), 1–18 (2003).MathSciNetView ArticleGoogle Scholar
 F Li, YLu T Long, Q Ouyang, C Tang, The yeast cellcycle network is robustly designed. Proc. Natl. Acad. Sci. U.S.A.101(14), 4781–4876 (2004).View ArticleGoogle Scholar
 A Faure, A Naldi, C Chaouiya, D Thieffry, Dynamical analysis of a generic boolean model for the control of the mammalian cell cycle. Bionformatics. 22(14), 124–131 (2006).View ArticleGoogle Scholar
 ER Dougherty, S Kim, Y Chen, Coefficient of determination in nonlinear signal processing. EURASIP J. Signal Process. 80(10), 2219–2235 (2000).MATHView ArticleGoogle Scholar
 S Kim, ER Dougherty, Y Chen, K Sivakumar, P Meltzer, JM Trent, M Bittner, Multivariate measurement of gene expression relationships. Genom. 67(2), 201–209 (2000).View ArticleGoogle Scholar
 X Zhou, X Wang, ER Dougherty, Binarization of microarray data based on a mixture model. Mol. Cancer Ther. 2(7), 679–684 (2003).Google Scholar
 S Kim, ER Dougherty, ML Bittner, Y Chen, K Sivakumar, P Meltzer, JM Trent, General nonlinear framework for the analysis of gene interaction via multivariate expression arrays. J. Biomed. Opt. 5(4), 411–424 (2000).View ArticleGoogle Scholar
 I Shmulevich, ER Dougherty, S Kim, W Zhang, Probabilistic Boolean networks: a rulebased uncertainty model for gene regulatory networks. Bioinforma. 18(2), 261–274 (2002).View ArticleGoogle Scholar
 D Martins, U BragaNeto, R Hashimoto, M Bittner, ER Dougherty, Intrinsically multivariate predictive genes. IEEE J. Sel. Top. Sign. Proces. 2(3), 424–439 (2008).View ArticleGoogle Scholar
 T Chen, UM BragaNeto, Statistical detection of intrinsically multivariate predictive genes. IEEE/ACM Trans. Comput. Biol. Bioinform. 12(4), 951–964 (2015).View ArticleGoogle Scholar
 T Chen, UM BragaNeto, Exact performance of CoD estimators in discrete prediction. EURASIP J. Adv. Signal Process (2010). (Article ID 2010:487893).
 T Chen, UM BragaNeto, Maximumlikelihood estimation of the discrete coefficient of determination in stochastic boolean systems. IEEE Trans. Signal Process. 61(15), 3880–3894 (2013).MathSciNetView ArticleGoogle Scholar
 T Chen, UM BragaNeto, Statistical detection of Boolean regulatory relationships. IEEE/ACM Trans. Comput. Biol. Bioinform. 10(5), 1310–1321 (2013).View ArticleGoogle Scholar
 LA Dalton, ER Dougherty, Bayesian minimum meansquare error estimation for classification error – Part I: Definition and the Bayesian mmse error estimator for discrete classification. IEEE Trans. Signal Process. 59(1), 115–129 (2011).MathSciNetView ArticleGoogle Scholar
 LA Dalton, ER Dougherty, Bayesian minimum meansquare error estimation for classification error – Part II: Linear classification of gaussian models. IEEE Trans. Signal Process. 59(1), 130–144 (2011).MathSciNetView ArticleGoogle Scholar
 T Chen, UM BragaNeto, in In Proceedings of the 2013 IEEE International Workshop on Genomic Signal Processing and Statistics (GENSIPS’2013). Optimal Bayesian MMSE estimation of the coefficient of determination for discrete prediction (TXHouston, Nov 2013), pp. 66–69.View ArticleGoogle Scholar
 LA Dalton, ER Dougherty, Optimal classifiers with minimum expected error within a Bayesian framework – Part I: Discrete and gaussian models. Pattern Recogn. 46(5), 1301–1314 (2013).MATHView ArticleGoogle Scholar
 LA Dalton, ER Dougherty, Optimal classifiers with minimum expected error within a Bayesian framework – Part II: Properties and performance analysis. Pattern Recogn. 46(5), 1288–1300 (2013).MATHView ArticleGoogle Scholar
 L Devroye, L Gyorfi, G Lugosi, A Probabilistic Theory of Pattern Recognition (Springer, New York, 1996).MATHView ArticleGoogle Scholar
 G Casella, R Berger, Statistical Inference, 2nd ed (Pacific Grove, CA, Duxbury, 2002).Google Scholar
 M Bittner, P Meltzer, Y Chen, Y Jiang, E Seftor, M Hendrix, M Radmacher, R Simon, Z Yakhini, A BenDor, N Sampas, ER Dougherty, F Marincola, E Wang, C Gooden, J Lueders, A Glatfelter, P Pollock, J Carpten, E Gillanders, D Leja, K Dietrich, C Beaudry, M Berens, D Alberts, V Sondak, N Hayward, J Trent, Molecular classification of cutaneous malignant melanoma by gene expression profiling. Nature. 406:, 536–540 (2000).View ArticleGoogle Scholar
 S Kim, ER Dougherty, N Cao, Y Chen, M Bittner, E Suh, Can markov chain models mimic biological regulation?J. Biol. Syst. 10:, 437–458 (2002).View ArticleGoogle Scholar
 A Datta, A Choudhary, M Bittner, ER Dougherty, External control in markovian genetic regulatory networks. Mach. Learn. 52:, 169–191 (2003).MATHView ArticleGoogle Scholar
 UM BragaNeto, ER Dougherty, Error Estimation for Pattern Recognition (Wiley, New York, 2015).View ArticleGoogle Scholar
 S Ross, A first course in probability, 4th ed (Macmillan, New York, 1994).MATHGoogle Scholar
 G Arfken, Mathematical Methods for Physicists, 3rd ed (Academic Press, Orlando, FL, 1985).Google Scholar
 N Balakrishnan, V Nevzorov, A Primer on Statistical Distributions (Wiley, Hoboken, NJ, 2003).MATHView ArticleGoogle Scholar