Open Access

Gene Regulatory Network Reconstruction Using Conditional Mutual Information

EURASIP Journal on Bioinformatics and Systems Biology20082008:253894

DOI: 10.1155/2008/253894

Received: 30 December 2007

Accepted: 22 May 2008

Published: 4 June 2008

Abstract

The inference of gene regulatory network from expression data is an important area of research that provides insight to the inner workings of a biological system. The relevance-network-based approaches provide a simple and easily-scalable solution to the understanding of interaction between genes. Up until now, most works based on relevance network focus on the discovery of direct regulation using correlation coefficient or mutual information. However, some of the more complicated interactions such as interactive regulation and coregulation are not easily detected. In this work, we propose a relevance network model for gene regulatory network inference which employs both mutual information and conditional mutual information to determine the interactions between genes. For this purpose, we propose a conditional mutual information estimator based on adaptive partitioning which allows us to condition on both discrete and continuous random variables. We provide experimental results that demonstrate that the proposed regulatory network inference algorithm can provide better performance when the target network contains coregulated and interactively regulated genes.

1. Introduction

The prediction of the functions of genes and the elucidation of the gene regulatory mechanisms have been an important topic of genomic research. The advances in microarray technology over the past decade have provided a wealth of information by allowing us to observe the expression levels of thousands of genes at once. With the increasing availability of gene expression data, the development of tools that can more accurately predict gene-to-gene interactions and uncover more complex interactions between genes has become an intense area of research.

1.1. Background

Gene Clustering Algorithms

Some of the first attempts at determining gene regulations are based on the gene expression clustering algorithms. These algorithms determine genes that are likely to be coregulated by grouping genes that exhibit similar gene expressions under the same conditions. Different clustering algorithms differ in the metric used to measure similarity between gene expressions, and how the metric is used to cluster into groups similarly expressed genes [1]. In [2], a hierarchical clustering algorithm using a correlation coefficient metric is proposed. The K-means algorithm has also been applied to partition genes into different clusters [3]. Other clustering algorithms such as self-organizing map (SOM) [4], mutual-information-based algorithms [56], and graph-theory-based algorithms [7] have also been proposed.

Graphical Algorithms

While gene clustering algorithms allow us to discover genes that are coregulated, they do not reveal much of the underlying biological mechanism such as the regulatory pathways. In recent years, many models have been proposed attempting to understand how individual genes interact with each other to govern the diverse biological processes in the cell. In [810], gene regulatory network inference based on graphical models is proposed. A graphical model depicts the relationships among nodes in a graph which are considered as random variables. Links between nodes represent dependence of the two variables. For network inference based on the graphical Gaussian model [1112], the nodes with corresponding random variables https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq1_HTML.gif are assumed to be jointly distributed according to the multivariate Gaussian distribution https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq2_HTML.gif , with mean vector https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq3_HTML.gif and covariance matrix https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq4_HTML.gif . In [13], the gene-to-gene interaction is predicted from expression data using Bayesian networks, another type of graphical model. The dependence relationship between the variables is denoted by a directed acyclic graph where the nodes are associated with the variables https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq5_HTML.gif , https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq6_HTML.gif , and the nodes are linked if a dependent relationship exists between the two corresponding variables. Given a set of expression values https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq7_HTML.gif , the algorithm selects the graph https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq8_HTML.gif that best describes https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq9_HTML.gif by choosing the graph that maximizes a scoring function based on the Bayes' rule https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq10_HTML.gif . In [14], gene regulatory network reconstruction based on the dynamic Bayesian network is proposed to support cycles in the network, and time-series data in https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq11_HTML.gif .

Relevance Network Algorithms

Another method that is related to graphical model is called relevance network. Relevance networks are based on the idea of "covariance graph" where a link exists between genes https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq12_HTML.gif and https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq13_HTML.gif , if and only if the corresponding gene expressions of https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq14_HTML.gif and https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq15_HTML.gif are marginally dependent [15]. Different measures of dependence have been used in relevance-network-based algorithms. In [16], the correlation coefficient is used to represent the dependence between two genes, and in both [1617], mutual information is used to measure the nonlinear relationship between the expressions of two genes. Since these metrics are computed from a finite number of samples, a threshold is often imposed so that two nodes are connected if the computed metric between the two nodes is above the threshold. In [17], entropy and joint entropy are first computed based on the histogram, then the mutual information of https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq16_HTML.gif and https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq17_HTML.gif is computed by https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq18_HTML.gif . In [18], the proposed ARACNE algorithm uses the Gaussian kernel estimator to estimate the mutual information between the expressions https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq19_HTML.gif and https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq20_HTML.gif of genes https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq21_HTML.gif and https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq22_HTML.gif . Before estimating https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq23_HTML.gif from the observed expressions https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq24_HTML.gif and https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq25_HTML.gif using the Gaussian kernel estimator, https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq26_HTML.gif and https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq27_HTML.gif are copula-transformed to take values between 0 and 1. This step is performed so that the expression data are transformed to uniform distribution, and arbitrary artifacts from microarray processing are removed. In gene regulatory networks, if gene https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq28_HTML.gif regulates https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq29_HTML.gif , which in turn regulates https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq30_HTML.gif , then https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq31_HTML.gif and https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq32_HTML.gif will also be highly correlated. Using methods based on relevance network, a link will often be incorrectly inferred between https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq33_HTML.gif and https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq34_HTML.gif due to the high correlation measures. In [18], ARACNE tries to resolve this problem by using the data processing inequality (DPI). From DPI, if https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq35_HTML.gif , and https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq36_HTML.gif form a Markov chain (denoted as https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq37_HTML.gif ), then https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq38_HTML.gif [19]. For a triplet of genes where the estimated mutual information of all three pairs of genes exceed the threshold, the link with the lowest mutual information is removed by ARACNE in the DPI step.

While relevance-network-based methods such as ARACNE perform well when the interactions in the gene regulatory network are between pairs of genes, they are unable to completely discover interactions that are results of the joint regulation of the target gene by two or more genes. The XOR interactive regulation is one such interaction that can be recognized only by exploiting the conditional dependence between variables of interest. Using conditional mutual information (CMI), it is possible to detect the XOR and other nonlinear interactive regulation by two genes.

Several recent works have attempted to incorporate information theoretic measures for more than two variables in regulatory network discovery. In [20], a CMI measure where the conditioning variable takes discrete values in two states (high and low) is proposed to discovery the transcriptional interactions in the human B lymphocytes. In [2122], methods based on both MI and CMI have also been proposed to decrease the false positive rate for the detection of the interactions. In [23], the conditional coexpression model is introduced, and the CMI is used as a measure of conditional coexpression. In [24], an extension of the context likelihood of relatedness (CLR) algorithm [25], called "synergy augmented CLR" is proposed. The technique uses the recently developed information theoretic concept of synergy [26] to define a numerical score for a transcriptional interaction by identifying the most synergistic partner gene for the interaction. In this work, we propose a relevance-network-based gene regulatory network inference algorithm similar to [24], using information theoretic measure to determine the relationship between triplets of genes.

1.2. Objective

Here, we make use of both mutual information and conditional mutual information as measures of dependence between gene expressions. The main focus of this work is to discover the potential interactions between genes by adapting the relevance network model, which is also used in [1718]. The inference of the connectivity, or the "wiring" of the network, is also an important aspect of biological network inference. The proposed network inference algorithm uses an adaptive partitioning scheme to estimate the mutual information between https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq39_HTML.gif and https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq40_HTML.gif conditioned on https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq41_HTML.gif , where https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq42_HTML.gif can be either discrete or continuous. We show that using both mutual information and conditional mutual information allows us to more accurately detect correlations due to interactive regulation and other complex gene-to-gene relationships. In this work, our primary focus is on the detection of Boolean interactive regulation and other interactions which cause incorrect inferences, such as coregulation and indirect regulation. The experimental results show that the proposed network inference algorithm can successfully detect these types of regulation, and outperform two commonly used algorithms, BANJO and ARACNE.

The remainder of the paper is organized as follows. In Section 2, we present the system model for regulatory network inference. In Section 3, we present the adaptive partitioning algorithms for estimating mutual information and conditional mutual information as well as our proposed network inference algorithm based on MI-CMI. In Section 4, we present experimental results. Section 5 concludes the paper.

2. System Model

Suppose that the given set of genes https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq43_HTML.gif form a regulatory network, where each node of the network is represented by a gene. Associated with each node, https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq44_HTML.gif is a random variable https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq45_HTML.gif with unknown steady-state distribution from which the expressions of https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq46_HTML.gif are generated. We assume that for gene https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq47_HTML.gif , we have the vector of https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq48_HTML.gif steady-state gene expressions https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq49_HTML.gif , where https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq50_HTML.gif is the gene expression of gene https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq51_HTML.gif under condition https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq52_HTML.gif .

In a network inference problem, our primary goal is to correctly identify the links representing direct regulation and reduce the false negative and false positive links. A false negative can be due to the incorrect estimation of the metric that measures the interaction between the expressions of two genes. When interactive regulation is introduced into the network, false negatives may occur for certain interactive regulations due to that no significant interaction is detected between the regulated gene and any one of the regulating genes, but rather the regulation is only detectable when the regulated gene and all of the regulating genes are considered together. For example, in Figure 1(a), gene https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq53_HTML.gif is being regulated by an XOR interaction of genes https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq54_HTML.gif and https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq55_HTML.gif . Using mutual information as metric, the individual interactions between https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq56_HTML.gif and https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq57_HTML.gif and between https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq58_HTML.gif and https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq59_HTML.gif are not discovered since https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq60_HTML.gif .
https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_Fig1_HTML.jpg
Figure 1

(a) XOR interactive regulation of https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq61_HTML.gif by https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq62_HTML.gif and https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq63_HTML.gif . (b) Coregulation of https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq64_HTML.gif and https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq65_HTML.gif .

In the relevance network approach, two nodes are connected when they exhibit high degrees of interaction according to the chosen metric. Using metrics such as correlation coefficient and mutual information, high degrees of interaction between two genes typically indicate that one of the genes is directly or indirectly regulating the other gene, or the two genes are being coregulated by another gene. In relevance networks, indirect regulation and coregulated genes often are the cause of false positive links. ARACNE, as discussed in the previous section, removes indirect regulation by the application of DPI. However, ARACNE and other network inference algorithms based only on correlation coefficient or mutual information are unable to identify genes that are being coregulated, particularly if they are coregulated by the same mechanism. For example, in Figure 1(b), both https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq66_HTML.gif and https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq67_HTML.gif are regulated by an AND interaction of https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq68_HTML.gif and https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq69_HTML.gif . Using correlation coefficient or mutual information as metric will always result in a high interaction between https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq70_HTML.gif and https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq71_HTML.gif , and in most cases, greater than the interaction between the regulated gene and either one of the regulating genes, whereas using DPI will result in a false positive link.

The insufficiencies of using only mutual information or correlation coefficient as discussed above naturally lead us to the use of conditional mutual information as the metric of choice in our proposed regulatory network inference algorithm. For Figure 1(a), it is clear that the interaction between https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq72_HTML.gif and https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq73_HTML.gif and that between https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq74_HTML.gif and https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq75_HTML.gif can be detected by https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq76_HTML.gif and https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq77_HTML.gif . To resolve false positives due to coregulated genes recall that the conditional mutual information https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq78_HTML.gif measures the reduction of information provided about https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq79_HTML.gif by observing https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq80_HTML.gif conditioned on having observed https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq81_HTML.gif . An example of Figure 1(a) can be seen in [27]. In Figure 1(b), coregulation of https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq82_HTML.gif and https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq83_HTML.gif can be recognized by the fact that if https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq84_HTML.gif and https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq85_HTML.gif are regulated by the same biological mechanism, https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq86_HTML.gif and https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq87_HTML.gif , since having observed https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq88_HTML.gif or https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq89_HTML.gif , no more information is provided about https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq90_HTML.gif by observing https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq91_HTML.gif , or information provided about https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq92_HTML.gif by observing https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq93_HTML.gif , respectively. On the other hand, having observed https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq94_HTML.gif , which regulates both https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq95_HTML.gif and https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq96_HTML.gif , the information provided about https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq97_HTML.gif by observing https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq98_HTML.gif is reduced, and we have https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq99_HTML.gif . Thus, by considering both the mutual information and conditional mutual information, we are able to reduce the amount of false positive links due to coregulation. Example of Figure 1(b) can be seen in [28].

From the above discussion, in the next section, we develop a relevance-network-based regulatory network inference algorithm that utilizes both mutual information and conditional mutual information to predict interactions between genes from the observed gene expression data. It is clear that we need efficient estimators that can accurately compute mutual information and conditional mutual information from data. Moreover, the conditional mutual information estimator should be able to support both discrete and continuous conditioning variables to allow for wider ranging uses.

3. MI-CMI Regulatory Network Inference Algorithm

There are several mutual information estimators such as the Gaussian kernel estimator and the equipartition estimator [29] but each has its weakness. The Gaussian kernel estimator requires a smoothing window that needs to be optimized for different underlying distributions, thus increasing the estimator complexity. While the equipartition estimator is simple in nature, the different grids in a partition often have variable efficiency in terms of contribution to the mutual information estimate due to the underlying sample distribution. In this section, we make use of an adaptive partitioning mutual information estimator proposed in [30] and extend it to estimating conditional mutual information. These estimators are then employed in building our MI-CMI-based relevance network for regulatory network inference.

3.1. Adaptive Partitioning Mutual Information Estimator

Let us consider a pair of random variables https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq100_HTML.gif and https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq101_HTML.gif taking values in https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq102_HTML.gif and https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq103_HTML.gif , both of which are assumed to be the real line https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq104_HTML.gif for simplicity. For each random variable, we have https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq105_HTML.gif samples https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq106_HTML.gif and https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq107_HTML.gif . From the samples we wish to obtain an estimate https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq108_HTML.gif of the mutual information https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq109_HTML.gif .

For mutual information estimators that partition the samples according to equal length or equiprobable partition, many of the grids may turn out to be inefficient due to the distribution of the samples. For example, let https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq110_HTML.gif and https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq111_HTML.gif , where https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq112_HTML.gif is uniformly distributed on https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq113_HTML.gif . Hence, the samples fall on a unit circle; and grids inside the circle do not contribute to the estimation of the mutual information between https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq114_HTML.gif and https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq115_HTML.gif . Therefore, a partitioning scheme that can adaptively change the number, size, and placement of the grids is more efficient in estimating mutual information. In the following, we describe a mutual information estimator proposed in [30] that adaptively partitions the observation space based on the unknown underlying distributions of the samples.

In the adaptive partitioning scheme, the sample space https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq116_HTML.gif is divided into rectangular grids of varying sizes depending on the underlying distributions. A grid denoted as https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq117_HTML.gif has the https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq118_HTML.gif -axis range https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq119_HTML.gif and https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq120_HTML.gif -axis range https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq121_HTML.gif . Furthermore, the set containing all the grids of the partitioning is denoted as https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq122_HTML.gif .

Let us denote https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq123_HTML.gif , https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq124_HTML.gif , and https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq125_HTML.gif as the densities of the distributions https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq126_HTML.gif , https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq127_HTML.gif , and https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq128_HTML.gif , respectively. We then define the following conditional distributions:
https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_Equ1_HTML.gif
(1)
and their densities
https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_Equ2_HTML.gif
(2)
respectively, where https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq129_HTML.gif denotes the indicator function of the set https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq130_HTML.gif can now be written as
https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_Equ3_HTML.gif
(3)

where https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq131_HTML.gif is called the restricted divergence and https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq132_HTML.gif is the residual divergence.

We define a sequence of the partitioning of the sample space https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq133_HTML.gif as nested if each grid https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq134_HTML.gif is a disjoint union of grids https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq135_HTML.gif , where https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq136_HTML.gif can be different for each https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq137_HTML.gif . Thus, https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq138_HTML.gif can be seen as a refinement of https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq139_HTML.gif . A nested sequence https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq140_HTML.gif is said to be asymptotically sufficient for https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq141_HTML.gif and https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq142_HTML.gif if for every https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq143_HTML.gif there exists a https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq144_HTML.gif such that for each https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq145_HTML.gif , one can find an https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq146_HTML.gif satisfying
https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_Equ4_HTML.gif
(4)
where https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq147_HTML.gif denotes the https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq148_HTML.gif -algebra of https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq149_HTML.gif , and https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq150_HTML.gif denotes the symmetric difference. In [30], it is shown that if the nested sequence https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq151_HTML.gif is asymptotically sufficient for https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq152_HTML.gif and https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq153_HTML.gif , then
https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_Equ5_HTML.gif
(5)
Given the pairs of samples https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq154_HTML.gif , we define
https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_Equ6_HTML.gif
(6)
that is, the frequency of the samples falling into the grid https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq155_HTML.gif . Then, the restricted divergence https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq156_HTML.gif can be estimated from the samples with the following estimator:
https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_Equ7_HTML.gif
(7)
Furthermore, in [30] it is shown that the residual diversity approaches zero as https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq157_HTML.gif and that
https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_Equ8_HTML.gif
(8)

Thus, mutual information can be estimated by computing the relative sample frequency on appropriately placed rectangular grids.

We now give the adaptive partitioning algorithm that constructs an asymptotic sufficient sequence of partitions for mutual information estimation.

Algorithm 1 (Adaptive partitioning algorithm for mutual information estimation).
  1. (i)
    Initialization: Partition https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq158_HTML.gif and https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq159_HTML.gif at https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq160_HTML.gif and https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq161_HTML.gif , respectively, such that
    https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_Equ9_HTML.gif
    (9)
     
that is, https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq162_HTML.gif and https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq163_HTML.gif are the equiprobable partition points for https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq164_HTML.gif and https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq165_HTML.gif with respect to the empirical distribution of marginal distributions, and https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq166_HTML.gif is divided into 4 grids. This partition is denoted as https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq167_HTML.gif .
  1. (ii)
    Partitioning https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq168_HTML.gif : for a grid https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq169_HTML.gif , select the partition points https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq170_HTML.gif and https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq171_HTML.gif , such that
    https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_Equ10_HTML.gif
    (10)
     
Denote https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq172_HTML.gif as the total number of samples in the grid https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq173_HTML.gif and https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq174_HTML.gif as the total number of samples in each of the quadrants created by the above partition. Compute the Pearson's chi-squared test for uniform distribution,
https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_Equ11_HTML.gif
(11)
If the sample distribution of the quadrants passes the uniform test, that is, (11) holds, https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq175_HTML.gif is added to https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq176_HTML.gif . If the sample distribution does not pass the uniform test, the grids https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq177_HTML.gif , https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq178_HTML.gif , https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq179_HTML.gif , and https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq180_HTML.gif are added to https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq181_HTML.gif .
  1. (iii)

    Repeat step (ii) for all grids in https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq182_HTML.gif .

     
  2. (iv)

    Repeat steps (ii) and (iii) until https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq183_HTML.gif . When the partitioning process is terminated, define https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq184_HTML.gif .

     
  3. (v)

    Using the partition https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq185_HTML.gif , compute the mutual information estimate https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq186_HTML.gif according to (7).

     
Here, we give an example of how to adaptively partition a given set of sampled data. In this example, we sampled 100 times https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq187_HTML.gif and https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq188_HTML.gif that are jointly Gaussian with correlation coefficient of 0.9 and both with mean zero. The 100 sample pairs are plotted in Figure 2(a). In Figure 2(b), we plot the same samples in their ordinal plot, meaning that each sample of https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq189_HTML.gif and https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq190_HTML.gif is ranked in decreasing order with respect to other samples from the same random variable, and the sample pairs are plotted by their integer-valued ranks. In the ordinal plots, equiprobable partition is equivalent to partition at the midpoint. In Figure 2(b), we can also see the dashed lines dividing the samples into 4 grids. This is the initialization partition that is always kept no matter how the samples are distributed. In Figure 2(c), we can see that the 4 grids are each partitioned into 4 quadrants by the dashed lines. Table 1 shows the distribution of the samples in quadrants created by the partitioning of the 4 grids during the second-level partition, and their chi-squared statistics. To pass the uniform chi-squared test for 95%, the chi-squared test statistic should be less than 7.815. As we can see from Table 1, all 4 grids failed the test, thus require further partitioning.
Table 1

Quadrant sample counts in each grid after second-level partition, and result of chi-squared test.

 

Quadrant 1

Quadrant 2

Quadrant 3

Quadrant 4

https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq191_HTML.gif statistic

Pass?

Grid 1

18

7

6

11

8.4762

no

Grid 2

0

0

7

1

17.0000

no

Grid 3

1

6

1

0

11.0000

no

Grid 4

12

6

18

6

9.4286

no

https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_Fig2_HTML.jpg
Figure 2

Example of adaptive partitioning steps for pairwise mutual information. (a) 100 samples of https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq192_HTML.gif and https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq193_HTML.gif jointly Gaussian with https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq194_HTML.gif (b) Initialization partition of the original samples (c) Second-level partition of the grids (d) Third-level partition of the grids

In Figure 2(d), we can see that 13 nonzero grids from the previous steps are each divided into 4 quadrants by the dashed lines. Table 2 shows similarly for the third-level partitions the quadrant sample counts in each of the grids, and their chi-squared test results. From Table 2, we can see that all grids pass the chi-squared test, thus the third-level partition is not needed, and the adaptive partitioning scheme has partitioned the samples into the 13 grids shown in Figure 2(d).
Table 2

Quadrant sample counts in each grid after third-level partition, and result of chi-squared test.

 

Quadrant 1

Quadrant 2

Quadrant 3

Quadrant 4

https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq195_HTML.gif statistic

Pass?

Grid 1

7

3

5

3

2.4444

yes

Grid 2

2

1

1

3

1.5714

yes

Grid 3

2

3

0

1

3.3333

yes

Grid 4

3

3

4

1

1.7273

yes

Grid 5

2

0

2

3

2.7143

yes

Grid 6

0

0

1

0

3.0000

yes

Grid 7

0

1

0

0

3.0000

yes

Grid 8

2

2

1

1

0.6667

yes

Grid 9

4

2

5

1

3.3333

yes

Grid 10

2

0

1

3

3.3333

yes

Grid 11

1

0

0

0

3.0000

yes

Grid 12

3

2

1

0

3.3333

yes

Grid 13

2

5

5

6

2.0000

yes

3.2. Conditional Mutual Information Estimator

Works in various fields have utilized conditional mutual information to test for conditional independence. However, in most cases, they are often limited to conditioning on a discrete, often binary, random variable [3132]. When conditioning on a discrete random variable, the conditional mutual information can be computed as
https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_Equ12_HTML.gif
(12)

This is done by simply dividing the samples into https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq196_HTML.gif bins according to the value https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq197_HTML.gif takes, and taking the weighted summation of the mutual information in each bin. In the case of conditioning on a continuous random variable, however, the partitioning of https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq198_HTML.gif is often not so clear. Next, we propose a modification to the adaptive partitioning estimator that also adaptively partitions the z-axis to allow the estimation of conditional mutual information when the conditioned random variable is continuous.

Let us consider a triplet of random variables https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq199_HTML.gif , and https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq200_HTML.gif taking real values in https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq201_HTML.gif , and https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq202_HTML.gif , respectively. Given the samples https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq203_HTML.gif , https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq204_HTML.gif , and https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq205_HTML.gif , we wish to compute an estimate https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq206_HTML.gif of the conditional mutual information https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq207_HTML.gif .

Suppose that the space https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq208_HTML.gif is divided into cuboids of various sizes depending on the underlying distributions. The cuboid denoted as https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq209_HTML.gif has range https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq210_HTML.gif on the https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq211_HTML.gif -axis, https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq212_HTML.gif on the https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq213_HTML.gif -axis, and https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq214_HTML.gif on the https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq215_HTML.gif -axis, and the set containing all the cuboids of the partition is denoted as https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq216_HTML.gif . We then define the following conditional distribution:
https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_Equ13_HTML.gif
(13)
and its density
https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_Equ14_HTML.gif
(14)
Similar to (3), we can write https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq217_HTML.gif as
https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_Equ15_HTML.gif
(15)

where Q denotes https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq218_HTML.gif https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq219_HTML.gif and R denotes https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq220_HTML.gif https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq221_HTML.gif .

We can rewrite https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq222_HTML.gif as
https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_Equ16_HTML.gif
(16)
Notice that this is simply a weighted sum for the restricted diversity as computed in (3) for samples grouped according to the z-axis partition https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq223_HTML.gif , and for a partition https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq224_HTML.gif ,
https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_Equ17_HTML.gif
(17)
and it can be estimated as
https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_Equ18_HTML.gif
(18)
Following the proof in [3033],
https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_Equ19_HTML.gif
(19)
We can see from (15) and (17) that
https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_Equ20_HTML.gif
(20)
and the integral
https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_Equ21_HTML.gif
(21)

in the definition of https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq225_HTML.gif vanishes if and only if https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq226_HTML.gif , that is, https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq227_HTML.gif and https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq228_HTML.gif are independent in the cuboid https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq229_HTML.gif . In the following, we propose an adaptive partitioning scheme that partitions the given samples into cuboids, where in each cuboid the conditional distributions of https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq230_HTML.gif and https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq231_HTML.gif given https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq232_HTML.gif are independent. Similar to Algorithm 1, we use the Pearson's chi-square test to determine the independence of the samples.

We now present the algorithm for estimating the conditional mutual information with continuous conditioning variable.

Algorithm 2 (Adaptive partitioning algorithm for conditional mutual information estimation).
  1. (i)
    Initialization: partition https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq233_HTML.gif , and https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq234_HTML.gif at https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq235_HTML.gif , and https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq236_HTML.gif , respectively, such that
    https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_Equ22_HTML.gif
    (22)
     
that is, https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq237_HTML.gif , and https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq238_HTML.gif are the equiprobable partition points for https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq239_HTML.gif , https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq240_HTML.gif , and https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq241_HTML.gif with respect to the empirical distribution of marginal distributions, and https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq242_HTML.gif is divided into 8 cuboids. This partition is denoted as https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq243_HTML.gif .
  1. (ii)
    Partitioning https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq244_HTML.gif : for a cuboid https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq245_HTML.gif , select the partition points https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq246_HTML.gif , and https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq247_HTML.gif , such that
    https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_Equ23_HTML.gif
    (23)
     
Denote https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq248_HTML.gif as the total number of samples in the cuboid https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq249_HTML.gif and https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq250_HTML.gif as the total number of samples in each of the octants created by the above partition. Compute the Pearson's chi-squared test for uniform distribution,
https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_Equ24_HTML.gif
(24)
If the sample distribution passes the uniform test, that is, if (24) holds, the cuboid https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq251_HTML.gif is added to https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq252_HTML.gif . If the sample distribution does not pass the uniform test, the cuboids
https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_Equ25_HTML.gif
(25)
are added to https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq253_HTML.gif .
  1. (iii)

    Repeat step (ii) for all cuboids in https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq254_HTML.gif .

     
  2. (iv)

    Repeat steps (ii) and (iii) until https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq255_HTML.gif . When the partitioning process is terminated, define https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq256_HTML.gif .

     
  3. (v)

    Using the partition https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq257_HTML.gif , compute the conditional mutual information estimate https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq258_HTML.gif according to (18).

     
Figures 3 and 4 give an adaptive partition of a trivariate sample data. Note that https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq259_HTML.gif is the output of an XOR gate with https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq260_HTML.gif and https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq261_HTML.gif as inputs, with random noise added to both the inputs and the output. We can see that the https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq262_HTML.gif -axis is partitioned into two regions, https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq263_HTML.gif and https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq264_HTML.gif . In the initial step, the sample data is divided into 8 cuboids. The 4 cuboids without any data points are discarded, and the other 4 are added to https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq265_HTML.gif . In the second step, each of the 4 cuboids is divided into 8 cuboids and tested for uniform distribution with the chi-squared test. All 4 pass the test and are added to https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq266_HTML.gif . In the next step, we see that https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq267_HTML.gif , and the partitioning process is terminated with https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq268_HTML.gif .
https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_Fig3_HTML.jpg
Figure 3

Adaptive partition of https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq269_HTML.gif and https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq270_HTML.gif given https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq271_HTML.gif .

https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_Fig4_HTML.jpg
Figure 4

Adaptive partition of https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq272_HTML.gif and https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq273_HTML.gif given https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq274_HTML.gif .

Compared to the estimation of conditional mutual information for discrete conditioning variable, we can see that instead of grouping samples into subsets where samples belonging in the same subset have the same values for the discrete-valued conditioning variable, here we group samples based on the adaptively determined partitioning of https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq275_HTML.gif on the z-axis. The problem of estimating the conditional mutual information is thus broken down into estimating the mutual information for each group of samples, where the samples are grouped by which https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq276_HTML.gif they belong to.

Note that the complexity of the Gaussian kernel estimator is known to be https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq277_HTML.gif . However, the complexity of the adaptive partitioning estimator is dependent upon the joint distribution of the variables. For example, suppose https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq278_HTML.gif and https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq279_HTML.gif are independent and identically distributed uniform distributions. To compute https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq280_HTML.gif from https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq281_HTML.gif pairs of https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq282_HTML.gif will take on average only the four initializing grids, since the sample pairs are typically uniformly distributed in each of the grids, and no further subpartitions are necessary according to the chi-squared test. On the other hand, suppose that https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq283_HTML.gif and https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq284_HTML.gif , where https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq285_HTML.gif is uniformly distributed between https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq286_HTML.gif , it will take many more subpartitions to obtain uniform distribution of the samples on each of the resulting grids. From our experience, for https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq287_HTML.gif samples of https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq288_HTML.gif and https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq289_HTML.gif jointly Gaussian pairs, the Gaussian kernel estimator takes about 2 minutes to compute the MI, whereas for the adaptive partitioning algorithm, the time is between 2.5 to 3 minutes, on MATLAB code running on a Pentium 4 2.54 GHz machine. However, this is without taking into consideration the overhead required by the Gaussian estimator to compute the smoothing window.

3.3. Gene Regulatory Network Inference Algorithm

To infer a gene regulatory network that has various interactive regulations and coregulations, we propose a strategy of using both mutual information and conditional mutual information to reconstruct the regulatory network. In our proposed algorithm, we first use mutual information as metric to build regulatory network similarly to [17] to capture most of the direct regulations. To decrease the complexity of the algorithm by avoiding computing conditional mutual information for all triplets, while still allowing us to detect most of the causes for false positives and false negatives, we only compute the CMI for triplets of genes where either all three genes are connected, or all three genes are not connected. The decrease in complexity would depend on several factors. Once the pairwise MI threshold is chosen, the triplets that have one or two connections between the three genes indicate that the pairwise MI is sufficient for the determination of the interaction between the three genes, and the use of CMI is not necessary. Thus, instead of computing the CMI for all triplets of genes, CMI needs to be computed only for those triplets that are completely connected or completely unconnected. The amount of decrease in complexity would then depend on the ratio of triplets that have only one or two connections, which would depend on the actual connectivities between the genes, and the threshold selected for the pairwise mutual information phase of the algorithm.

The MI-CMI gene regulatory network inference algorithm is as follows.

Algorithm 3 (MI-CMI gene regulatory network inference algorithm).
  1. (i)

    For a gene expression dataset containing https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq290_HTML.gif genes, compute the mutual information estimate https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq291_HTML.gif for all gene pairs https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq292_HTML.gif , using Algorithm 1.

     
  2. (ii)

    Initialize the graph https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq293_HTML.gif as a zero matrix. Set https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq294_HTML.gif if https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq295_HTML.gif , where https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq296_HTML.gif is a predetermined threshold.

     
  3. (iii)

    Detecting indirect regulation and coregulation: for any triplet of genes https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq297_HTML.gif where https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq298_HTML.gif , compute the conditional mutual information estimate https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq299_HTML.gif , https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq300_HTML.gif , and https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq301_HTML.gif using Algorithm 2.

     
  4. (a)
    If
    https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_Equ26_HTML.gif
    (26)
     
this means that https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq302_HTML.gif and https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq303_HTML.gif contain nearly the same information regarding https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq304_HTML.gif , that having observed https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq305_HTML.gif , https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq306_HTML.gif contains no new information about https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq307_HTML.gif , and vice versa. Also, having observed https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq308_HTML.gif , the information contained about https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq309_HTML.gif in https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq310_HTML.gif is reduced. This indicates that https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq311_HTML.gif and https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq312_HTML.gif are regulated by https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq313_HTML.gif through the same mechanism, meaning that the gene pair https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq314_HTML.gif is coregulated, thus https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq315_HTML.gif is set to https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq316_HTML.gif .
  1. (b)
    If
    https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_Equ27_HTML.gif
    (27)
     
and https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq317_HTML.gif , this indicates that https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq318_HTML.gif regulates https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq319_HTML.gif , and https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq320_HTML.gif regulates https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq321_HTML.gif , and that the https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq322_HTML.gif is indirectly regulated by https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq323_HTML.gif , indicated by the smallest CMI. Using DPI similarly to [18], https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq324_HTML.gif is set to https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq325_HTML.gif .
  1. (iv)
    Detecting interactive regulation: for any triplet of genes https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq326_HTML.gif where https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq327_HTML.gif , compute the conditional mutual information estimate https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq328_HTML.gif , and https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq329_HTML.gif using Algorithm 2.
    1. (a)

      If one or two of the CMI estimates is greater than https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq330_HTML.gif , this indicates that the genes contain interactions that was not captured using MI, and we set the corresponding link or links to https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq331_HTML.gif .

       
    2. (b)

      If all three of the CMI estimates are greater than https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq332_HTML.gif , this may indicate that the two regulating genes may have had some prior interactions, or there is an XOR interaction between the 3 genes. Thus, we apply the DPI to remove the link with the weakest estimated CMI, and the links corresponding to the two largest estimated CMI are set to https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq333_HTML.gif .

       
     

4. Experimental Results

In this section, we present simulation results to demonstrate the performance of the algorithms discussed in Section 3. We first illustrate the performance of Algorithm 2 for estimating the conditional mutual information of jointly Gaussian random variables. Next, we consider the performance of Algorithm 1 for estimating mutual information, by implementing the regulatory network inference algorithm in [18], but replacing the Gaussian kernel mutual information estimator employed there with Algorithm 1. Finally, we compare the network inference performance of Algorithm 3 with that of ARACNE [18] and BANJO [11] on synthetic networks.

4.1. Conditional Mutual Information of Jointly Gaussian Random Variables

To assess the accuracy of Algorithms 1 and 2 for the estimation of gene regulatory networks, we consider estimating the pairwise and conditional mutual information of multivariate Gaussian distributions. In our simulation, we compare the MI and CMI estimates of Algorithms 1 and 2 with those of the b-spline estimators. A b-spline MI estimator is proposed in [34] which divides the sample range into a number of bins. Contrary to the approach in the classical histogram estimators, where each sample contributes only to the bin it is in, for the b-spline estimator, the weight of a sample is spread to the bins. In the case of a third-order b-spline estimator, for a sample located in bin https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq334_HTML.gif , the sample is assigned to the bins https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq335_HTML.gif , https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq336_HTML.gif , and https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq337_HTML.gif , and the weight of the sample in each bin is computed using the b-spline coefficients. Here, we modify the b-spline estimator as proposed in [34] to estimate the 3-way MI https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq338_HTML.gif so that the CMI can be obtained with the relationship https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq339_HTML.gif .

For MI estimation, we generated bivariate Gaussian samples with correlation coefficients 0, 0.3, and 0.6. For each coefficient, we generated https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq340_HTML.gif samples, and computed the estimated MI, https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq341_HTML.gif , for each sample size using Algorithm 1 and the third-order b-spline estimator with 10 bins proposed in [34]. Each sample size is averaged over 500 sets of samples. For a bivariate Gaussian distribution, the exact MI of https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq342_HTML.gif and https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq343_HTML.gif is given by
https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_Equ28_HTML.gif
(28)

where https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq344_HTML.gif is the correlation coefficient between https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq345_HTML.gif and https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq346_HTML.gif .

For CMI estimation, we generated samples trivariate Gaussian distributions with the following covariance matrices:
https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_Equ29_HTML.gif
(29)
For each Gaussian distribution, we generated https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq347_HTML.gif samples, and computed the estimated CMI, https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq348_HTML.gif , for each sample size, using Algorithm 2 and the modified third-order b-spline estimator with 10 bins. For each sample size https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq349_HTML.gif , the estimated CMI is averaged over 500 sets of samples. For a trivariate Gaussian distribution, the exact CMI of https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq350_HTML.gif and https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq351_HTML.gif given https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq352_HTML.gif is given by
https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_Equ30_HTML.gif
(30)
where https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq353_HTML.gif , and https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq354_HTML.gif are the conditional covariances of https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq355_HTML.gif , and conditional covariance between https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq356_HTML.gif and https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq357_HTML.gif , given https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq358_HTML.gif , respectively. For a trivariate Gaussian distribution, the conditional covariance matrix between https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq359_HTML.gif and https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq360_HTML.gif given https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq361_HTML.gif is given by
https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_Equ31_HTML.gif
(31)
where https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq362_HTML.gif denotes the covariance of https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq363_HTML.gif and https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq364_HTML.gif . The results of the MI estimation are given in Table 3, and the results of the CMI estimation are given in Table 4. We can see that in both the MI and CMI estimation, the adaptive algorithms have closer estimates to the analytical values for all correlation coefficients and covariance matrices, except for the MI estimation for https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq365_HTML.gif . From both tables, we can see that as the sample size grows, the adaptive algorithms converge toward the analytical values for both MI and CMI estimation. However, this is not true for the b-spline algorithms, where in the cases of MI estimation for https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq366_HTML.gif , and CMI estimation for covariance matrix 4, the b-spline estimators converge to incorrect values.
Table 3

Comparison of the estimated MI of bivariate Gaussian distribution with different correlation coefficient using Algorithm 1 and b-spline algorithm.

Correlation coefficient

Algorithm

100

200

300

400

500

Analytical

0

Adaptive

0.0080

0.0036

0.0022

0.0022

0.0015

0

 

b-spline 10

0.0912

0.0443

0.0288

0.0210

0.0166

 

0.3

Adaptive

0.0280

0.0287

0.0305

0.0319

0.0330

0.0472

 

b-spline 10

0.1248

0.0789

0.0640

0.0562

0.0515

 

0.6

Adaptive

0.1371

0.1730

0.1916

0.1999

0.2052

0.2231

 

b-spline 10

0.2471

0.2029

0.1879

0.1781

0.1719

 
Table 4

Comparison of the estimated CMI of trivariate Gaussian distribution with different covariance matrices using Algorithm 2 and the modified b-spline algorithm.

Cond. Corr.

Algorithm

100

200

300

400

500

Analytical

0

Adaptive

0.0263

0.0215

0.0171

0.0175

0.0113

0

 

b-spline 10

0.1899

0.1039

0.0711

0.0536

0.0429

 
 

b-spline 20

0.7888

0.5592

0.4330

0.3497

0.2943

 

0.1612

Adaptive

0.0310

0.0278

0.0253

0.0249

0.0187

0.0132

 

b-spline 10

0.1899

0.1065

0.0759

0.0603

0.0495

 

0.3035

Adaptive

0.0497

0.0510

0.0534

0.0565

0.0582

0.0483

 

b-spline 10

0.2251

0.1377

0.1032

0.0855

0.0761

 

0.7408

Adaptive

0.2294

0.3050

0.3234

0.3444

0.3784

0.3979

 

b-spline 10

0.2773

0.2390

0.2190

0.2092

0.2029

 
 

b-spline 20

0.6387

0.5323

0.4719

0.4378

0.4121

 

As a comparison, we performed CMI estimation of covariance matrices 1 and 4 using b-spline estimator with 20 bins. In [34], it is shown that the b-spline method has similar performance to that of the kernel density estimator (KDE), and the MI computed has the same level of significance. However, the KDE is shown to be https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq367_HTML.gif more computationally intensive than the b-spline method. Thus in our comparisons, we only included the results from the b-spline method. For matrix 4, the b-spline estimator now converges to the correct analytical value. However, for matrix 1, the b-spline estimator does not converge to zero as the estimator with 10 bins does. This illustrates the drawback of using the b-spline estimators for MI and CMI estimation. The accuracy of the b-spline estimators depend on the choice for its parameters. On the other hand, Algorithms 1 and 2 are nonparametric, and do not need any prior knowledge of the underlying distributions to produce good estimates.

Looking more closely at CMI estimation, for small sample size and large CMI value, Algorithm 2 has a negative bias. As the sample size increases, the bias quickly reduces. On the other hand, when the true CMI value is small, Algorithm 2 tends to overestimate. It should be noted that estimating the CMI from a finite number of samples for a distribution with zero conditional correlation coefficient will typically result in a nonzero value. Nevertheless, the estimation results are still reasonably accurate, even for only 100 samples, so that conditional independence can be easily detected.

4.2. Regulatory Networks with Only Direct Regulation

Next, we implemented the algorithm described in [18] by replacing the Gaussian kernel MI estimator there with Algorithm 1. The modified algorithm is then compared with the original ARACNE algorithm in [18]. The purpose of this comparison is to show that the adaptive partitioning MI estimator is a valid alternative for the Gaussian kernel estimator. Specifically, we constructed 25 synthetic regulatory networks, each with 20 one-to-one gene regulations, using NetBuilder [35]. To compare the network inference performance, we adopt the same metrics as used in [18]—recall and precision. Recall, defined as https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq368_HTML.gif , where https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq369_HTML.gif is the number of true positive links and https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq370_HTML.gif is the number of false negative links, measures the ratio of correctly identified links out of total number of links. Precision, defined as https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq371_HTML.gif , where https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq372_HTML.gif is the number of false positive links, measures the ratio of correctly predicted links out of total predicted links. The values and relationship between the two metrics change with the selected threshold value, https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq373_HTML.gif . At low https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq374_HTML.gif , more links will be admitted as gene interactions, potentially capturing more true links, resulting in high recall values. However, as more links are included, the number of false positives also increases, which decreases the precision. On the other hand, when https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_IEq375_HTML.gif is high, only links with high interactions are admitted, and they in most cases represent true interactions between genes, thus improving the precision. However, true interactions that exhibit lower interaction are not admitted, resulting in a decrease in recall.

In Figure 5, we plot the precision versus recall performance of the two algorithms. It is seen that both algorithms perform exactly the same. This shows that the adaptive partitioning MI estimator can be employed as an alternative to the Gaussian kernel estimator in capturing the gene-to-gene interactions. The comparison shown in Figure 5 only uses synthetic networks constructed so that there are only pairwise connectivities. This is to illustrate that the adaptive partitioning algorithm can be used as an alternative to the kernel-based estimator in the ARACNE algorithm without degradation in performance. In the later simulations, we showed that in the presence of coregulation by two genes, the CMI is needed to improve the performance of regulatory network inference. Note that since MI and CMI are estimated from finite number of samples, the estimated MI and CMI are always greater than 0. From the relevance-network approach, by setting an arbitrarily low threshold, any number of links can be admitted as detected gene interactions, and with sufficiently low threshold, all possible links can be admitted. When large numbers of links are admitted, the number of false negative will be small, which leads to large values of recall. Thus, the comparisons at large recall values tend to be meaningless and are not included in the figures.
https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_Fig5_HTML.jpg
Figure 5

Comparison of ARACNE and relevance-network-based algorithm with adaptive partitioning MI estimator and DPI.

4.3. Regulatory Networks with Coregulation and Interactive Regulation

We now compare the performance of Algorithm 3, ARACNE, and BANJO for regulatory network inference in the presence of coregulated and interactively regulated genes. We again use the synthetic network modeling software NetBuilder to generate random networks. NetBuilder allows modeling of gene-to-gene interactions such as activation by transcription factor combination (AND and OR), repression (NOT), and other combinatory interactions. We generated 50 synthetic networks, each containing 15 to 25 nodes with 20 links. For each node, we generated 100 steady-state expression data samples. To compare the effects of interactive regulation and coregulation on the performance of the three algorithms, two sets of synthetic networks are constructed: one set contains 25 networks where 30% of the interactions involve interactive regulation and coregulation, the other set contains 25 networks where 60% of the interactions involve interactive regulation and coregulation. In Figures 6 and 7, we plotted precision versus recall performance for the two sets of synthetic networks. It is seen that Algorithm 3 is able to outperform both ARACNE and BANJO in terms of precision for all ranges of interest. Notice that the improvement over ARACNE is greater for dataset with 60% of coregulation and interactive regulation, which is expected since ARACNE in most cases cannot detect the XOR interactions, and the application of DPI for gene coregulation can introduce both false positives and false negatives. Surprisingly, BANJO is found to have better performance than ARACNE at high recall values for the set of networks that contains 60% coregulation and interactive regulation. In [18], it is shown that the Gaussian network algorithm performs worse when the network contains only direct interaction between two genes. It is possible that due to the use of joint distributions to model the expression values of nodes in Gaussian-network-based algorithms such as BANJO, they are able to discover some of the coregulations and interactive regulations that are not found by ARACNE.
https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_Fig6_HTML.jpg
Figure 6

Precision versus recall for datasets with 30% coregulated or interactively regulated links.

https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_Fig7_HTML.jpg
Figure 7

Precision versus recall for datasets with 60% coregulated or interactively regulated links.

In Figure 9, we give an example of a network discovered by each algorithm. For the MI-CMI algorithm, we randomly permute for each gene the expressions across the different conditions, similar to what is done in [17]. We performed 30 such permutations, and for each permutation we computed the pairwise mutual information using Algorithm 1 for all possible pairs. The highest observed mutual information out of the 30 permutations is used as the threshold for both MI-CMI algorithm and ARACNE. Results for BANJO were obtained using the default parameters.

Figure 9(a) represents the network inferred by the MI-CMI algorithm, Figure 9(b) the network inferred by ARACNE, and Figure 9(c) the network discovered by BANJO. In each figure, red links represent XOR interactions, green links represent OR interactions, and blue links represent AND interactions. In Figure 9, false negative links are indicated with a cross mark, and false positive links are represented by dashed lines. The true underlying network is shown in Figure 8. As we can see from the figures, BANJO produced the most false positive links, both from indirect regulation and coregulation, whereas both the MI-CMI algorithm and ARACNE only have one each. However, the MI-CMI algorithm and BANJO discovered similar numbers of interactive regulation completely, discovering 5 and 4, respectively. An interactive regulation is completely discovered when both regulating genes are linked correctly to the interactively regulated gene. For ARACNE, only 2 interactive regulations are discovered completely, and for most of the interactive regulations only one of the links is discovered.
https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_Fig8_HTML.jpg
Figure 8

True underlying network configuration inferred in Figure 9.

https://static-content.springer.com/image/art%3A10.1155%2F2008%2F253894/MediaObjects/13637_2007_Article_94_Fig9_HTML.jpg
Figure 9

(a) Synthetic network inferred by MI-CMI algorithm. (b) Synthetic network inferred by ARACNE. (c) Synthetic network inferred by BANJO.

5. Conclusions

We have proposed a new gene regulatory network inference algorithm that employs both mutual information and conditional information to discover possible direct and interactive regulations between genes, and to eliminate false links due to indirect regulations and coregulation. The mutual information and conditional mutual information are estimated from the expression data using an adaptive partitioning estimator. We have shown that the proposed network inference method outperforms BANJO and ARACNE when the underlying regulatory network contains coregulated or interactively regulated genes. In this work, we have focused on the discovery of the joint regulation of a gene by two other genes. It is possible to extend this work to joint regulation by multiple genes by modifying the proposed conditional mutual information estimator to a higher order. However, doing so would pose several computational problems. As the dimension of the CMI increases, increasing number of samples is needed to maintain the same level of accuracy. Also, as the dimension of the CMI increases, the number of sets of genes to be tested also increases, thus rendering this method impractical for brute force computation of all possible sets of genes. One possibility to reduce the amount of computations needed is to take into consideration the constraints placed on the possible connectivities from known biochemical reactions between the genes involved. This can be a future direction for research in this area.

Authors’ Affiliations

(1)
Department of Electrical Engineering, Columbia University

References

  1. D'Haeseleer P: How does gene expression clustering work? Nature Biotechnology 2005,23(12):1499-1501. 10.1038/nbt1205-1499View ArticleGoogle Scholar
  2. Eisen MB, Spellman PT, Brown PO, Botstein D: Cluster analysis and display of genome-wide expression patterns. Proceedings of the National Academy of Sciences of the United States of America 1998,95(25):14863-14868. 10.1073/pnas.95.25.14863View ArticleGoogle Scholar
  3. Tavazoie S, Hughes JD, Campbell MJ, Cho RJ, Church GM: Systematic determination of genetic network architecture. Nature Genetics 1999,22(3):281-285. 10.1038/10343View ArticleGoogle Scholar
  4. Tamayo P, Slonim D, Mesirov J, et al.: Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. Proceedings of the National Academy of Sciences of the United States of America 1999,96(6):2907-2912. 10.1073/pnas.96.6.2907View ArticleGoogle Scholar
  5. Zhou X, Wang X, Dougherty ER: Construction of genomic networks using mutual-information clustering and reversible-jump Markov-chain-Monte-Carlo predictor design. Signal Processing 2003,83(4):745-761. 10.1016/S0165-1684(02)00469-3View ArticleMATHGoogle Scholar
  6. Zhou X, Wang X, Dougherty ER, Russ D, Suh E: Gene clustering based on cluster-wide mutual information. Journal of Computational Biology 2004,11(1):147-161. 10.1089/106652704773416939View ArticleGoogle Scholar
  7. Hartuv E, Schmitt A, Lange J, Meier-Ewert S, Lehrach H, Shamir R: An algorithm for clustering cDNAs for gene expression analysis using short oligonucleotide fingerprints. Proceedings of the 3rd Annual International Conference on Computational Molecular Biology (RECOMB '99), Lyon, France, April 1999 188-197.Google Scholar
  8. Kishino H, Waddell PJ: Correspondence analysis of genes and tissue types and finding genetic links from microarray data. Proceedings of the 11th Workshop on Genome Informatics (GIW '00), Tokyo, Japan, December 2000 83-95.Google Scholar
  9. Toh H, Horimoto K: Inference of a genetic network by a combined approach of cluster analysis and graphical Gaussian modeling. Bioinformatics 2002,18(2):287-297. 10.1093/bioinformatics/18.2.287View ArticleGoogle Scholar
  10. Zhou X, Wang X, Pal R, Ivanov I, Bittner M, Dougherty ER: A Bayesian connectivity-based approach to constructing probabilistic gene regulatory networks. Bioinformatics 2004,20(17):2918-2927. 10.1093/bioinformatics/bth318View ArticleGoogle Scholar
  11. Hartemink AJ, Gifford DK, Jaakkola TS, Young RA: Bayesian methods for elucidating genetic regulatory networks. IEEE Intelligent Systems 2002,17(2):37-43.Google Scholar
  12. Schäfer J, Strimmer K: An empirical Bayes approach to inferring large-scale gene association networks. Bioinformatics 2005,21(6):754-764. 10.1093/bioinformatics/bti062View ArticleGoogle Scholar
  13. Friedman N, Linial M, Nachman I, Pe'er D: Using Bayesian networks to analyze expression data. Journal of Computational Biology 2000,7(3-4):601-620. 10.1089/106652700750050961View ArticleGoogle Scholar
  14. Yu J, Smith VA, Wang PP, Hartemink AJ, Jarvis ED: Advances to Bayesian network inference for generating causal networks from observational biological data. Bioinformatics 2004,20(18):3594-3603. 10.1093/bioinformatics/bth448View ArticleGoogle Scholar
  15. Chaudhuri S, Drton M, Richardson TS: Estimation of a covariance matrix with zeros. Biometrika 2007,94(1):199-216. 10.1093/biomet/asm007MathSciNetView ArticleMATHGoogle Scholar
  16. D'haeseleer P, Wen X, Fuhrman S, Somogyi R: Mining the gene expression matrix: inferring gene relationships from large scale gene expression data. Proceedings of the 2nd International Workshop on Information Processing in Cell and Tissues (IPCA '97), Sheffield, UK, September 1997 203-212.Google Scholar
  17. Butte AJ, Kohane IS: Mutual information relevance networks: functional genomic clustering using pairwise entropy measurements. Proceedings of the 5th Pacific Symposium on Biocomputing (PSB '00), Honolulu, Hawaii, USA, January 2000 418-429.Google Scholar
  18. Margolin AA, Nemenman I, Basso K, et al.: ARACNE: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. BMC Bioinformatics 2006.,7(supplement 1): article S7Google Scholar
  19. Cover TM, Thomas JA: Elements of Information Theory. John Wiley & Sons, New York, NY, USA; 1990.MATHGoogle Scholar
  20. Wang K, Nemenman I, Banerjee N, Margolin AA, Califano A: Genome-wide discovery of modulators of transcriptional interactions in human B lymphocytes. Proceedings of the 10th Annual International Conference on Research in Computational Molecular Biology (RECOMB '06), Venice, Italy, April 2006 348-362.Google Scholar
  21. Zhao W, Serpedin E, Dougherty ER: Inferring the structure of genetic regulatory networks using information theoretic tools. Proceedings of IEEE/NLM Life Science Systems and Applications Workshop (LSSA '06), Bethesda, Md, USA, July 2006 1-2.Google Scholar
  22. Zhao W, Serpedin E, Dougherty ER: Inferring connectivity of genetic regulatory networks using information-theoretic criteria. IEEE/ACM Transactions on Computational Biology and Bioinformatics 2008,5(2):262-274.View ArticleGoogle Scholar
  23. Boscolo R, Liao JC, Roychowdhury VP: An information theoretic exploratory method for learning patterns of conditional gene coexpression from microarray data. IEEE/ACM Transactions on Computational Biology and Bioinformatics 2008,5(1):15-24.View ArticleGoogle Scholar
  24. Watkinson J, Liang K-C, Wang X, Zheng T, Anastassiou D: Inference of regulatory gene interactions from expression data using three-way mutual information. Annals of the New York Academy of Sciences, in press.
  25. Faith JJ, Hayete B, Thaden JT, et al.: Large-scale mapping and validation of Escherichia coli transcriptional regulation from a compendium of expression profiles. PLoS Biology 2007,5(1):e8. 10.1371/journal.pbio.0050008View ArticleGoogle Scholar
  26. Anastassiou D: Computational analysis of the synergy among multiple interacting genes. Molecular Systems Biology 2007, 3, article 83: 1-8.MathSciNetGoogle Scholar
  27. Pal R, Datta A, Fornace AJ Jr., Bittner ML, Dougherty ER: Boolean relationships among genes responsive to ionizing radiation in the NCI 60 ACDS. Bioinformatics 2005,21(8):1542-1549. 10.1093/bioinformatics/bti214View ArticleGoogle Scholar
  28. Liu C-Q, Charoechai P, Khunajakr N, Deng Y-M, Widodo , Dunn NW: Genetic and transcriptional analysis of a novel plasmid-encoded copper resistance operon from Lactococcus lactis . Gene 2002,297(1-2):241-247. 10.1016/S0378-1119(02)00918-6View ArticleGoogle Scholar
  29. Beirlant J, Dudewicz E, Gyorfi L, van der Meulen E: Nonparametric entropy estimation: an overview. International Journal of Mathematical and Statistical Sciences 1997,80(1):17-39.MathSciNetMATHGoogle Scholar
  30. Darbellay GA, Vajda I: Estimation of the information by an adaptive partitioning of the observation space. IEEE Transactions on Information Theory 1999,45(4):1315-1321. 10.1109/18.761290MathSciNetView ArticleMATHGoogle Scholar
  31. de Campos LM, Huete JF: A new approach for learning belief networks using independence criteria. International Journal of Approximate Reasoning 2000,24(1):11-37. 10.1016/S0888-613X(99)00042-0MathSciNetView ArticleMATHGoogle Scholar
  32. Fleuret F: Fast binary feature selection with conditional mutual information. The Journal of Machine Learning Research 2004, 5: 1531-1555.MathSciNetMATHGoogle Scholar
  33. Liese F, Vajda I: Convex Statistical Distances. Teubner, Leipzig, Germany; 1987.MATHGoogle Scholar
  34. Daub CO, Steuer R, Selbig J, Kloska S: Estimating mutual information using B-spline functions—an improved similarity measure for analysing gene expression data. BMC Bioinformatics 2004., 5, article 118:Google Scholar
  35. Schilstra M, Bolouri H: Modelling the regulation of gene expression in genetic regulatory networks.2002. [http://strc.herts.ac.uk/bio/maria/NetBuilder/Theory/NetBuilderTheoryDownload.pdf]Google Scholar

Copyright

© K.-C. Liang and X. Wang. 2008

This article is published under license to BioMed Central Ltd. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Advertisement