Open Access

Integrating multi-platform genomic data using hierarchical Bayesian relevance vector machines

  • Sanvesh Srivastava1,
  • Wenyi Wang2,
  • Ganiraju Manyam2,
  • Carlos Ordonez3 and
  • Veerabhadran Baladandayuthapani4Email author
EURASIP Journal on Bioinformatics and Systems Biology20132013:9

DOI: 10.1186/1687-4153-2013-9

Received: 7 March 2013

Accepted: 10 June 2013

Published: 28 June 2013

Abstract

Background

Recent advances in genome technologies and the subsequent collection of genomic information at various molecular resolutions hold promise to accelerate the discovery of new therapeutic targets. A critical step in achieving these goals is to develop efficient clinical prediction models that integrate these diverse sources of high-throughput data. This step is challenging due to the presence of high-dimensionality and complex interactions in the data. For predicting relevant clinical outcomes, we propose a flexible statistical machine learning approach that acknowledges and models the interaction between platform-specific measurements through nonlinear kernel machines and borrows information within and between platforms through a hierarchical Bayesian framework. Our model has parameters with direct interpretations in terms of the effects of platforms and data interactions within and across platforms. The parameter estimation algorithm in our model uses a computationally efficient variational Bayes approach that scales well to large high-throughput datasets.

Results

We apply our methods of integrating gene/mRNA expression and microRNA profiles for predicting patient survival times to The Cancer Genome Atlas (TCGA) based glioblastoma multiforme (GBM) dataset. In terms of prediction accuracy, we show that our non-linear and interaction-based integrative methods perform better than linear alternatives and non-integrative methods that do not account for interactions between the platforms. We also find several prognostic mRNAs and microRNAs that are related to tumor invasion and are known to drive tumor metastasis and severe inflammatory response in GBM. In addition, our analysis reveals several interesting mRNA and microRNA interactions that have known implications in the etiology of GBM.

Conclusions

Our approach gains its flexibility and power by modeling the non-linear interaction structures between and within the platforms. Our framework is a useful tool for biomedical researchers, since clinical prediction using multi-platform genomic information is an important step towards personalized treatment of many cancers. We have a freely available software at: http://odin.mdacc.tmc.edu/~vbaladan.

Keywords

Bayesian modeling; Multiple kernel learning; Genomics; High-dimensional data analysis; Prediction; Variational inference

1 Introduction

Recent advances in genome technologies such as microarrays and next-generation sequencing have enabled the measurement of genomic activity at a very detailed resolution (e.g., base pair, single-nucleotide polymorphisms) as well as across multiple molecular levels: the epigenome, transcriptome and proteome. The collection of genomic information at various resolutions holds promise to accelerate the amalgamation of discovery science and clinical medicine [1]. One of the overarching goals of such studies is to relate these genomic data to relevant (patient-specific) clinical outcomes, not only to find significant biomarkers of disease progression/evolution but also to use the biomarkers to develop prediction models for deployment in future therapeutic studies. Furthermore, genomic data are now available from multiple platforms and resolutions for the same individual, thus allowing a researcher to simultaneously query these multiple sources of data to achieve these goals. Such motivating data have been collected under the aegis of The Cancer Genome Atlas (TCGA) project, wherein data from multiple genomic platforms such as gene/mRNA expression, DNA copy number, methylation and microRNA expression profiles are available for multiple tumor types (see http://cancergenome.nih.gov for more details). In addition, the available clinical information, such as stage of disease and survival times, motivates the analytic frameworks that integrate patient-specific data.

One of the main challenges in modeling the statistical dependence between such high-throughput studies is that a large number of measurements (usually in thousands) is available for a relatively small number (usually in tens or hundreds) of patient samples; therefore, classical statistical approaches based on linear models and hierarchical clustering are prone to over-fitting [2, 3]. In these situations, [3] recommends accounting for high-dimensionality by using approaches that borrow information across covariates to compensate for the limited information available across samples, which leads to better and more reliable inference. Several approaches have been developed to address these challenges in various contexts. Some examples include linear parametric models and hierarchical clustering for inferring the relation between phenotypes and genomic features [4], hierarchical Bayesian modeling approaches based on linear shrinkage estimators [5], linear canonical correlation analysis [6], intensity-based approaches for merging datasets [7], and regularized linear regression approaches [8].

Although these approaches are computationally efficient, interpretable, and simple, they make two unrealistic assumptions for practical data analysis. First, due to the parametric and linear assumptions, they might miss the underlying non-linear patterns in the data. Second, and more importantly, these non-linear patterns are further amplified in the presence of complex interactions within and between the different platforms that must be modeled while integrating data from these platforms. In this paper, we present a statistical machine learning approach called the hierarchical relevance vector machines (H-RVM) to address these modeling and inferential challenges. Briefly, the framework presented here: (a) models the relation between a relevant clinical outcome (scalar) and high-dimensional covariates/features through a data-adaptive and flexible nonparametric approach,(b) borrows information within and between platforms through a hierarchical Bayesian framework, (c) acknowledges and models the interaction between platforms through nonlinear kernel-based functionals, (d) has parameters that have explicit interpretation as the effects of the platforms and their interactions on the outcome, and (e) uses a computationally efficient variational Bayes approach that can be readily scaled to large datasets.

Our methods are motivated by and applied to a TCGA based glioblastoma multiforme (GBM) dataset, for which we integrate gene (mRNA) and microRNA (miRNA) expression profiles to predict patient survival timesa. There is an increasing interest in identifying subtypes of GBM based on its gene expression data. The ultimate goal of subtyping GBM is to identify gene expression profiles that are prognostic or predictive of treatment outcomes. The known subtypes of GBM samples in TCGA include pro-neural, neural, classical, and mesenchymal; with the first two classes of which are suspected to differ from the other two in the cell of origin, which is a critical determinant of effective treatment regimens [9]. Differential expressions of miRNAs were recently found to be associated with many diseases, including cancers [10, 11]. Previous studies have shown that combining multiple types of data, such as mRNA and miRNA expressions, could significantly improve the accuracy of detecting GBM subtypes, and thereby potentially predict the clinical outcomes [12]. However, methods are lacking to accurately model the effect of interactions between these data types directly on clinical outcomes. Here we show that our non-linear and interaction-based integrative methods have better prediction accuracy than linear alternatives and non-integrative methods that do not account for the interactions between the platforms. We also find several prognostic mRNAs and microRNAs that are related to tumor invasion and that are known to drive tumor metastasis and severe inflammatory response in GBM. In addition, our analysis reveals several interesting mRNA-miRNA interactions that have known implications in the etiology of GBM. The paper is structured as follows. The basic construction of H-RVM is detailed in Section 2. The analysis of GBM data is presented in Section 3, and concluding remarks about the H-RVM framework are presented in Section 4.

2 Hierarchical Relevance Vector Machine model

For ease of exposition, we illustrate the model building process of H-RVM using data from two sources: gene/mRNA and miRNA expression measurements. The framework is easily extended to multiple platforms as discussed in Section 4. Suppose, we have data for N patients, and X and Y represent mean-centered and -standardized gene and miRNA expression matrices, with rows corresponding to patients and columns representing the G genes and M miRNAs, respectively b. Centering and standardizing the gene and miRNA expression matrices remove any systematic mean or scaling effects caused by the use of different data sources, and make them compatible for model fitting. We denote the gene and miRNA expression for the i-th patient as row vectors x i T = ( x i 1 , , x iG ) and y i T = ( y i 1 , , y iM ) . These covariates are high-dimensional, that is, both G and M are much larger than N; for example, in the GBM data G≈12000,M≈540,N≈250. Based on these measurements, our aim is to predict a relevant clinical outcome, which in our case is the (log-transformed) survival time measured from time of diagnosis to death, denoted by the column vector t=(t1,…,t N ) for the N patients.

2.1 Basic construction

A basic (conceptual) model can be written in a high-dimensional regression setting as,
t i = α 0 + f x ( x i , α 1 ) + f y ( y i , α 2 ) + f ( x y ) ( x i , y i , α 3 ) + ε i ,
(1)
where α0 is the overall mean effect and ε i is the random error; f(•)’s, generally referred to as basis functions, are chosen to achieve a desired level of flexibility for modeling the effects of X,Y, and their functions on t. Of these functions, f(xy) models the interactions between X and Y, and the remaining basis functions, f x and f y , respectively, model the main effects of X and Y for predicting t. In most situations the regression coefficients, α=(α0,α1,α2,α3), linearly relate the covariate effects (i.e., values of the basis functions evaluated at the covariates) to the response. Linear regression is a special case of (1) when all the basis functions are linear, and the response for the i-th patient,
t i = x i T α 1 + y i T α 2 + ( x i y i ) T α 3 + ε i ,
(2)

where (x i y i )=(xi 1yi 1,…,xi 1y i M ,…,x i G yi 1,…,x i G y i M ) models the first order interactions between genes and miRNAs and α0=0 because of the centered covariates. Further, due to the high-dimensional covariates x i ’s and y i ’s, a penalty is imposed on the regression coefficients α=(α1,α2,α3) to avoid overfitting. The most popular of such penalties is the Lasso because it has many desirable properties for high-dimensional linear regression and variable selection [13, 14]. Although (2) with a Lasso penalty is a popular choice for high-dimensional regression, the linearity of the basis functions imposes serious restrictions on the flexibility of the model. For example, (2) does not model nonlinear covariate effects as well as second or higher order interactions between genes and miRNAs.

Through H-RVM, we propose a regression model as a special case of (1), using kernel-based functions to respectively model f x ,f y , and f(xy). The kernel functions incorporate nonlinear effects of possible interactions within and between high-dimensional gene and miRNA expression measurements. Further, H-RVM estimates the respective contributions of genes, miRNAs, and their interactions in predicting survival times, which is of primary importance in developing novel drug targets. H-RVM posits the following regression of t on X and Y for the i-th patient:
t i = β 1 f x { x i , α 1 } + β 2 f y { y i , α 2 } + β 3 f ( x y ) { ( x y ) i , α 3 } + ε i ,
(3)

where (xy) i =(xi 1,…,x i G ,yi 1,…,y i M ) is a vector of length G+M and β=(β1,β2,β3) is such that its components lie on a probability simplex i.e. β m >0 for m=1,2,3 and m = 1 3 β m = 1 . H-RVM posits different kernels for the data sources and combines them through weights β. The model parameters have the following interpretation:

  • The kernel functions f x () and f y () model all possible interactions among genes and among miRNAs, respectively, and f(xy)() models all possible interactions between genes and miRNAs. The three kernels together account for the high-dimensionality and non-linearity of the covariate effects of X and Y by embedding them in the space of kernels.

  • The m-th component of β, β m , denotes the influence of the m-th source on predicting the log survival time and has the following interpretation: if β={1,0,0}, then (3) corresponds to a functional regression model that predicts t (log survival time) with only X (gene expressions) as covariates. Conversely, if β={1/3,1/3,1/3}, then (3) corresponds to a regression model, with the platforms and their interactions contributing equally to the prediction of the survival time. In reality, we expect (and show) different contributions from each platform and estimate these weights from the data.

The task now remains to explicitly characterize the functions f x (•), f y (•) and f(xy)(•) using multiple kernels, as detailed below.

2.2 Multiple kernel learning

Kernel learning (KL) is an approach for nonparametric classification and regression that can be used for predicting t based on X and Y[14]. First, for simplicity, assume that we want to predict t based on X. KL posits the following relation between t and X
t i = α 0 + j = 1 N α j K ( x j , x i | σ 2 ) + ε i t = K T α + ε ,
(4)
where σ2 is a kernel-specific “bandwidth” parameter and depends on the choice of kernel, K(•) (detailed later in the section) and ε=(ε1,…,ε N ) is the (white-noise) error. The primary parameter of interest is α=(α0,…,α N ) T , and α1,…,α N correspond to weights assigned to the features for N x j ’s. Support Vector Machine (SVM) and Relevance Vector Machine (RVM) are canonical examples of KL [14]. We prefer RVM because of its probabilistic interpretation and other optimal properties compared to those of SVM [15]. There are cases, however, where one feature matrix may not fit the data well. Based on this observation, Multiple Kernel Learning (MKL) extends (4) and replaces K by a weighted average of L feature matrices { K l } l = 1 L ,
t i = α 0 + j = 1 N α j l = 1 L β l K l ( x j , x i | σ l 2 ) + ε i t = β 1 K 1 T α + + β l K l T α + + β L K L T α + ε .
(5)

MKL improves the flexibility of KL by introducing L bandwidth parameters { σ l 2 } l = 1 L and L weights for feature matrices β=(β1,…,β L ) T . A variety of approaches exist to learn { σ l 2 } l = 1 L , β, and α for MKL (for details see [14, 16, 17]). Note that in all these works the data source (i.e., X) remains the same for both KL and MKL. The H-RVM framework developed in this article extends KL to include multiple data sources and their interactions, and uses a learning algorithm similar to the MKL framework.

Because the three data sources (gene expressions, miRNA expressions, and their interactions) can be used separately for predicting the log survival time, it is reasonable to combine their predictions to obtain more reliable estimates. To this end, H-RVM combines respective predictions obtained from different sources obtained using KL (4) through a weighted average, and chooses appropriate weights using MKL (5). Similar to (4), K 1 T α , K 2 T α , and K 3 T α are the predicted values of t that correspond to genes, miRNAs, and their interactions, respectively. Using (5), we combine the predictions { K i T α } i = 1 3 through the weight vector β=(β1,β2,β3) to model t as
t = ( β 1 K 1 T + β 2 K 2 T + β 3 K 3 T ) α + ε = K β T α + ε .
(6)

We further constrain β such that its components lie on a probability simplex, i.e., m = 1 3 β m = 1 . This constraint ensures that the joint (convolved) kernel, K β , is positive definite and that β m denotes the influence of the m-th source in predicting the log survival time. Note that H-RVM is a special case of (3) with f x ( x i , α ) k 1 , i T α , f y ( y i , α ) k 2 , i T α , and f ( x y ) ( ( x y ) i , α ) k 3 , i T α , where km,i is the i-th column of K m . Given { K i } i = 1 3 , MKL can be used to learn α and β.

Although similar to (5), (6) differs in two important ways. First, (6) obtains kernels using (4) for different data sources, namely gene expression, miRNA expression, and their interaction. Second, we allow for dependence between data sources via the interaction kernel (K3), but MKL does not; instead MKL uses a convex combination of the different kernels from the same data source to aid prediction.

The learning algorithm of H-RVM is independent of the choice of kernels, but in this work we use a Gaussian radial basis function (RBF) kernel (denoted by K) [14]. The RBF kernel maps the m-th high-dimensional covariate to its feature space that is represented as feature matrix K m . The feature matrices K1,K2, and K3 correspond to genes, miRNAs, and interactions, and their (i,j)-th entries are as follows:
( K 1 ) ij = e x i x j 2 2 σ 1 2 = K ( x i , x j | σ 1 2 ) , ( K 2 ) ij = e y i y j 2 2 σ 2 2 = K ( y i , y j | σ 2 2 ) , ( K 3 ) ij = e ( x y ) i ( x y ) j 2 2 σ 3 2 = K ( ( x y ) i , ( x y ) j | σ 3 2 ) ,
(7)

where σ m 2 is the “bandwidth” parameter of the m-th kernel matrix and is chosen a priori through cross-validation (see [14] for details). The other choices of kernels include polynomial kernels and matern kernels [18]. To account for the overall mean (or intercept) in (1), an extra row of 1’s is appended to the feature matrices in (7); therefore, { K i } i = 1 3 hereafter have dimensions (N+1)×N.

2.3 Generative Bayesian model for H-RVM

H-RVM reformulates (6) as a hierarchical Bayesian model for greater flexibility and better interpretation of its parameters. This reformulation serves two important purposes. First, H-RVM is interpreted as a hierarchical Bayesian extension of RVM [15], which is a special case of Bayesian KL. Second, instead of using MKL methods, H-RVM learns parameters α and β from t,X, and Y using the variational learning algorithm of hierarchical kernel learning (HKL) [14, 16].

H-RVM posits the following generative model for the (noisy) log survival time measurements t. Similar to MKL, K β T α represents the mean of t. The error distribution is Gaussian with mean 0 and precision parameter γ (8). Further, we impose a Gamma prior on γ such that
t | α , β , γ , X , Y N ( t | K β T α , γ 1 I ) ,
(8)
γ Gamma ( γ | c γ , d γ ) ,
(9)

where N ( . | μ , Σ ) represents a multivariate Gaussian distribution with mean μ and covariance matrix Σ and Gamma(.|c,d) represents a Gamma distribution with respective shape and rate parameters c and d.

Motivated by the “automatic relevance determination” idea of RVM, we impose independent Gaussian priors on the α j ’s with the same mean 0 and different precision parameters ϕ j ’s (10), where ϕ j controls (a priori) predictive power of the j-th feature vector from the three data sources for the log survival time. A large ϕ j indicates low predictive power. We also impose independent Gamma priors on the ϕ j ’s,
α | ϕ N ( α | 0 , diag ( ϕ 1 ) ) ,
(10)
ϕ j = 0 N Gamma ( ϕ j | c ϕ , d ϕ ) ,
(11)

where ϕ = ( ϕ 0 , ϕ 1 , , ϕ n ) . This setting forces many α j ’s a posteriori to be near 0 with high precision. Most of the variance in t is explained by a small number of feature vectors that correspond to nonzero α j ’s. These feature vectors are the “relevance vectors” of H-RVM that have the following three characteristics: they prevent over-fitting, represent a parsimonious description of the data, and correspond to feature vectors that are most predictive of the log survival time. An equivalent prior setting is found by marginalizing the ϕ j ’s from the joint distribution of α and ϕ above, which imposes a multivariate Student’s t prior on α with mean 0.

Finally, we impose a Dirichlet prior on β to ensure that its components lie on a probability simplex:
β = ( β 1 , β 2 , β 3 ) Dirichlet ( β | a 1 , a 2 , a 3 ) ,
(12)

where the m-th component of β, β m , denotes the influence of m-th source in predicting the log survival time.

The hierarchical Bayesian model (8) – (12) specifies a complete sampling model for the H-RVM framework. It can also be interpreted as a probabilistic approach for combining the predictions of log survival times from the three RVMs respectively corresponding to gene expressions, miRNA expressions, and their interactions. H-RVM introduces an additional hierarchy and combines the predictions of these three RVMs as a weighted average, with the weights generated from a Dirichlet distribution (12). The increased flexibility of H-RVM over RVM comes at the cost of analytic intractability of the posterior distributions of the H-RVM parameters. Estimation of the posterior distributions of the H-RVM’s parameters can proceed via either simulation-based Markov chain Monte Carlo (MCMC) approaches or deterministic variational Bayes approaches. Given the complexity and size of high-throughput data in general and GBM data in particular, MCMC approaches tend to be computationally expensive and slow. We employ variational Bayes methods from HKL [16] and obtain the analytically tractable variational posterior distribution, q(α,β,ϕ,γ|t,X,Y,c ϕ ,d ϕ ,c γ ,d γ ,a1,a2,a3), that approximates analytically intractable true posterior distribution, p(α,β,ϕ,γ|t,X,Y,c ϕ ,d ϕ ,c γ ,d γ ,a1,a2,a3). This approximation achieves analytic tractability by assuming that α,β,ϕ, and γ are independent under the variational posterior distribution. The analytic tractability leads to improved computational efficiency of the variational Bayes approach over sampling-based MCMC approaches. The derivations for variational posterior distributions are provided in Appendix A Appendix: Variational inference for H-RVM.

3 Data analysis

We apply the H-RVM approach to the GBM data as introduced in Section 1. GBM was one of the first cancers evaluated by the TCGA. GBM data have multiple molecular measurements on over 500 samples that include gene expression, copy number, methylation and microRNA expression. TCGA datasets are available at http://tcga-data.nci.nih.gov/tcga/. The dataset we analyze here includes information about the gene expressions (11972 probes), miRNA expressions (534 probes), and (uncensored) survival times for matched patient samples (248).

To remove the irrelevant noise variables before model fitting, we prescreened the gene and miRNA probes as follows. We performed univariate regression of the log survival times on the gene expression values, obtained p-values, and retained gene and miRNA probes that cross a liberal p-value threshold (≤ 0.05 here) – to balance the practical and statistical significance. This pre-selection identifies 1747 and 43 gene expression and miRNA probes, respectively, for downstream modeling. All our analyses and comparisons were based on these selected gene and miRNA probes.

We compare the predictions of H-RVM and three linear methods: penalized regressions (2) with the Lasso penalty [13] and with the following covariates: i. gene expressions (Gene-Lasso), ii. miRNA expressions (MiRNA-Lasso), and iii. both gene and miRNA expressions, and their first order interactions (Interaction-Lasso). We randomly split the GBM survival data into a training data and a test data with 223 (90%) and 25 (10%) patients, respectively. H-RVM, Gene-Lasso, MiRNA-Lasso, and Interaction Lasso are fit using the gene and miRNA expressions and log survival times in the training data. The variational inference algorithm is used for fitting H-RVM (see Appendix A Appendix: Variational inference for H-RVM). The R package glmnet is used for the three penalized linear regressions [19, 20]. The log survival times of the test data are predicted for the four methods using the model fits on the training data. The mean squared prediction errors (MSPEs) are respectively calculated for the four models as the average of the squared difference between the observed and predicted values for the test data. This process of randomly splitting the GBM survival data into training+test data and fitting the four models is repeated 50 times. The results are summarized below.

Figure 1 shows the prediction results for H-RVM, Gene-Lasso, MiRNA-Lasso, and Interaction-Lasso using the kernel density estimates (KDEs) of the MSPEs for the 50 random splits. The KDEs of the MSPEs for H-RVM is shifted to lower values than those for the three penalized linear regressions. The KDEs of the MSPEs for Gene-Lasso, MiRNA-Lasso, and Interaction-Lasso are close to each other, which implies that the MSPEs for these models are fairly similar. Two observations arise from these results. First, the results indicate that penalized linear regression with the Lasso penalty does not lead to improved performance after accounting for interactions among covariates, which has been well-established in literature [19]. Second, the prediction results of the penalized linear regressions do not improve after modeling the first order interactions among genes and miRNAs, thus indicating the presence of non-linear genomic effects and second or higher order interactions among them. For this case study, we see that H-RVM performs better than the penalized linear regression methods. This may be because of H-RVM accounts for the nonlinear effects of genes and miRNAs and models the interactions within genes, within miRNAs, and between genes and miRNAs. Further, because we model the log survival time, the gain for survival time predictions is, in fact, exponentially higher for H-RVM compared to those for the other methods.
https://static-content.springer.com/image/art%3A10.1186%2F1687-4153-2013-9/MediaObjects/13637_2013_Article_35_Fig1_HTML.jpg
Figure 1

Kernel density estimates. Kernel density estimates (KDEs) of mean square prediction errors (MSPEs) for H-RVM, Gene-Lasso, MiRNA-Lasso, and Interaction-Lasso. The GBM survival data is randomly split 50 times into training and test data, all four models are fit on the training data, and MSPEs for log survival times are calculated for the test data using the model fit on training data. The x-axis represents the MSPEs and the y-axis represents the respective KDEs for H-RVM (in solid red), Gene-Lasso (in dotted blue), MiRNA-Lasso (in dotted and dashed blue), and Interaction-Lasso (in dashed blue).

We compared the performance of the predictions of the log survival times from H-RVM and the observed survival times using Kaplan-Meier estimates of the survival curves. We used the R package survival to perform the log rank test and estimate the Kaplan-Meier survival curves [21]. Figure 2 compares the survival probability curves of the log survival times of patients predicted to be in the long and short survival groups by H-RVM. The patients are assigned to the long and short survival groups based on a median cut-off of the predicted log survival times obtained from H-RVM. The p-value of the log rank test that the two survival curves are the same is close to 0, indicating that the survival group predictions of H-RVM closely agree with the observed survival groups of the patients. In addition, Figure 3 compares the actual survival probability curves of the observed and predicted log survival times of patients in the test data with the minimum MSPE. The p-values and the survival probability curves indicate that the log survival time predictions of H-RVM agree closely agree with the observed log survival times, as well.
https://static-content.springer.com/image/art%3A10.1186%2F1687-4153-2013-9/MediaObjects/13637_2013_Article_35_Fig2_HTML.jpg
Figure 2

Survival probability curves for TCGA data. Survival probability curves for log survival time in the TCGA GBM data. The solid lines are the Kaplan-Meier estimates of survival probabilities for the patients predicted to have long survival times (in blue) and for the patients predicted to have short survival times (in red), respectively. The patients are assigned to the long and short survival groups based on the estimates of log survival times obtained from H-RVM. The dotted lines indicate point-wise 95% confidence intervals for the survival probabilities. The p-value of the log rank test is 0.

https://static-content.springer.com/image/art%3A10.1186%2F1687-4153-2013-9/MediaObjects/13637_2013_Article_35_Fig3_HTML.jpg
Figure 3

True and predicted survival probability curves. Survival probability curves for the observed log survival time and predicted log survival time (using H-RVM) of the patients in the test data with minimum mean square prediction error. The solid lines are the Kaplan-Meier estimates of survival probabilities for the predicted (in blue) and observed (in red) log survival times in the test data. The dotted lines indicate point-wise 95% confidence intervals for the survival probabilities. The p-value of the log rank test is 0.0964.

One of the additional gains of our modeling framework is the determination of which platform has a more profound influence on predicting the response, as captured by the weight parameter β. Figure 4 shows the estimates of the weights β for predicting the log survival time of the patients for gene expression, miRNA expressions, and their interactions obtained from H-RVM. The medians of the weights (25% and 75% quartiles) for the three data sources are 0.239 (0.113 and 0.360), 0.504 (0.408 and 0.583), and 0.201 (0.108 and 0.404), respectively. Interestingly, H-RVM shows that miRNAs have better predictive power than genes in predicting the log survival times of patients in the GBM data. The nonzero weight for interactions between gene and miRNA expressions further confirms the presence of nonlinear interactions.
https://static-content.springer.com/image/art%3A10.1186%2F1687-4153-2013-9/MediaObjects/13637_2013_Article_35_Fig4_HTML.jpg
Figure 4

Boxplots of weights. Boxplots of weights β=(β1,β2,β3) of gene expressions, miRNA expressions, and their interactions in predicting log survival time. The GBM survival data is randomly split 50 times into training and test data, H-RVM is fit on the training data, and β is obtained from the fit on training data. The y-axis shows the distributions of respective weights for gene expressions, miRNA expressions, and their interactions in predicting log survival time of patients across 50 random splits of the GBM survival data.

To gain biological insights into our results, we performed a functional analysis of our model fitting results. We used Ingenuity Pathway Analysis software to perform functional analysis on selected significant genes used in fitting H-RVM. We used targetHub [22] to find the known and predicted interactions between significant genes and miRNAs. mirTarBase, a curated database of experimentally validated miRNA targets, was our choice as a source of known gene and miRNA interactions [23]. To identify the predicted gene and miRNA interactions, we used targetScan data [24] to filter out miRNA-gene interactions in which the miRNA and gene effects on survival were concordant, since discordant behavior is expected in biological systems for a direct interaction between miRNA and its targets.

Pathway analyses indicates that the anti-survival genes (i.e., genes with negative effects on survival times) are enriched with signaling pathways related to tumor invasion (see Figure 5). HMGB1 and TWEAK signaling pathways, which are known to drive tumor metastasis and severe inflammatory responses in GBM and other cancers, are associated with these genes [2528]. Pro-survival genes are represented by PDGF, PTEN and other signaling pathways. It is well-established that the PDGF signaling pathway dominates the proneural subgroup, which correlates with a good survival time for patients with GBM [29]. The functional terms cellular movement and cell-to-cell signaling and interaction pathways are enriched for anti-survival genes, reinforcing their role in invasive GBM.
https://static-content.springer.com/image/art%3A10.1186%2F1687-4153-2013-9/MediaObjects/13637_2013_Article_35_Fig5_HTML.jpg
Figure 5

Comparison of the signaling pathways. Comparison of the signaling pathways associated with significant prognostic genes in Glioblastoma multiforme.

The target analysis of miRNA revealed 22 known interactions between 8 miRNAs and 20 genes, as shown in Table 1. Four of these eight miRNAs (hsa-miR-31, hsa-miR-146b, hsa-miR-221 and hsa-miR-222) were previously identified as anti-survival markers of GBM [30]. Mir-21 is an established marker of GBM and is known to target many tumor suppressor genes [31]. Mir-34a expression is higher in other GBM subtypes compared to that in the pro-survival proneural glioma subtype [32]. The anti-survival patterns of all these miRNAs indicate that these gene and miRNA interactions can be targeted for therapy of GBM subgroups with expected poor survival. We also identified 1006 predicted interactions (by TargetScan) between 31 miRNA and 484 genes that are significant (see Additional file 1).
Table 1

List of known gene-microRNA interactions identified as significant in the H-RVM model using target analysis

Gene symbols

microRNA

FOXP3, YY1, KLF13, ETS1

hsa-mir-31

FOXO3, DDIT4

hsa-mir-221

ATAT1

hsa-mir-23a

FOXO3

hsa-mir-222

FGG, CPEB3, FGB, PIK3R1

hsa-mir-29a

PDGFB

hsa-mir-146b

PDCD4, TOPORS, BASP1, MARCKS, TP53BP2

hsa-mir-21

SIRT1, YY1, E2F3, CDC25C

hsa-mir-34a

4 Conclusions and future work

We have presented an integrative framework, H-RVM, that generalizes the multiple kernel learning framework for integrating high-dimensional data from multiple sources, incorporating within and between platform interactions to develop a prediction model for clinical outcomes. We applied H-RVM to a high-dimensional TCGA GBM data to predict patient survival times using two data sources: gene and miRNA expressions, and found that the predictive performance of H-RVM is better than those of linear methods that do not model nonlinear effects and interactions. We hypothesize that H-RVM gains flexibility and power by modeling the non-linear interaction structures between gene and miRNA expressions. H-RVM will be a useful tool for biomedical researchers, as clinical prediction using multi-platform genomic information is an important step towards identifying personalized treatments for many cancers. We have code for fitting H-RVM that is freely available at the corresponding author’s website (see Additional file 2).

Although we have presented the application of H-RVM in the context of two platforms, the framework is general and can be extended and adapted to data from multiple platforms with different distributional assumptions. This will essentially entail a generalization of the H-RVM model by assuming additional terms for the different platforms. One key issue that warrants further investigation is an increase in the number of (multiplicative) between-platform interaction terms. We used the computationally efficient variational Bayes methods, which are extremely useful for handling large datasets from projects such as TCGA. In addition, [17] presents more scalable versions of HKL and MKL that can be adapted to our framework. Our future work will concentrate on extending the H-RVM framework using Bayesian spike and slab priors to select variables from the interacting covariates before embedding the data in the space of kernels, as well as incorporating uncertainty estimations of the scale parameters – thus aiding the joint model building process.

Endnotes

a We use gene and mRNA interchangeably to mean transcript-level expression.

b We use bold lowercase and uppercase alphabets to denote column vectors and matrices, respectively.

A Appendix: Variational inference for H-RVM

Following the hierarchic kernel learning algorithm (HKL) of [16], we provide the derivation for the variational inference algorithm that estimates the variational posterior distributions for the parameters of H-RVM. Our interest lies in the posterior distributions of α and β that are obtained using the Bayesian model (8) – (12). Unlike RVM, the posterior distributions of α and β in H-RVM are analytically intractable. There are several techniques that can be used to obtain these posteriors distributions. We employ the variational Bayes methods from HKL [16] and obtain analytically tractable variational posterior distribution, q(α,β,ϕ,γ |t,X,Y,c ϕ ,d ϕ ,c γ ,d γ ,a1,a2,a3), that approximates analytically intractable true posterior distribution, p(α,β,ϕ,γ|t,X,Y,c ϕ ,d ϕ ,c γ ,d γ ,a1,a2,a3). The variational approach minimizes the Kullback-Liebler (KL) divergence between q(α, β, ϕ,γ|t, X,Y,c ϕ ,d ϕ ,c γ , d γ ,a1,a2,a3) and p(α,β, ϕ,γ|t,X,Y, c ϕ ,d ϕ ,c γ ,d γ ,a1,a2,a3). This approximation achieves analytic tractability by assuming that α,β, ϕ, and γ are independent under q(α,β,ϕ, γ|t,X,Y,c ϕ ,d ϕ ,c γ , d γ ,a1,a2,a3). Therefore,
q ( α , β , ϕ , γ | t , X , Y , c ϕ , d ϕ , c γ , d γ , a 1 , a 2 , a 3 ) = q ( α ) q ( β ) q ( ϕ ) q ( γ ) ,
(13)
where we have suppressed the conditioning on the data and hyperparameters for the variational posteriors on the right. Notice that the factorization (13) alone guarantees the analytic tractability of q(α,β,ϕ,γ|t,X,Y,c ϕ ,d ϕ ,c γ ,d γ ,a1,a2,a3), and we do not assume any distributional form for the q’s. Following [16] and [14], the variational posterior distributions are derived as
log q ( α ) E β , ϕ , γ p ( α , β , ϕ , γ , t , X , Y | c ϕ , d ϕ , c γ , d γ , a 1 , a 2 , a 3 ) ,
(14)
log q ( β ) E α , ϕ , γ p ( α , β , ϕ , γ , t , X , Y | c ϕ , d ϕ , c γ , d γ , a 1 , a 2 , a 3 ) ,
(15)
log q ( ϕ ) E α , β , γ p ( α , β , ϕ , γ , t , X , Y | c ϕ , d ϕ , c γ , d γ , a 1 , a 2 , a 3 ) ,
(16)
log q ( γ ) E α , β , ϕ p ( α , β , ϕ , γ , t , X , Y | c ϕ , d ϕ , c γ , d γ , a 1 , a 2 , a 3 ) ,
(17)

where all expectations are with respect to the variational posterior distributions. Hereafter, we will denote E [ f ] as 〈f for notational simplicity.

Following [16] and using (14), the variational posterior distribution of α,
q ( α ) = N ( μ α , Σ α ) ; Σ α = γ γ K β K β T β + diag ( ϕ ϕ ) 1 , μ α = γ γ Σ α K β β t .
(18)
Following [16] and using (16), the variational posterior distribution of ϕ,
q ( ϕ ) = j = 0 N q ( ϕ j ) , where q ( ϕ j ) = Gamma 1 2 + c ϕ , 1 2 α i 2 α + d ϕ , for j = 1 , , N.
(19)
Following [16] and using (17), the variational posterior distribution of γ,
q ( γ ) = Gamma N 2 + c γ , 1 2 e 2 α , β + d γ , where e 2 = t K β T α 2 .
(20)
All the expectations above are determined using 〈α α ,〈α α T α ,〈ϕ ϕ , and 〈γ γ , which are available from the distributional forms of q(α),q(ϕ), and q(γ) in (18) – (20). Specifically,
α α = μ α , α α T α = Σ α + μ α μ α T , ϕ j ϕ j = 1 2 + c ϕ 1 2 ( ( Σ α ) jj + ( μ α μ α T ) jj ) + d ϕ , γ γ = N 2 + c γ 1 2 e 2 α , β + d γ , e 2 α , β = n = 1 N t n 2 2 n = 1 N t n m = 1 3 β m β α T α k mn + m = 1 3 l = 1 3 β m β l β Ω ml , Ω ml = n = 1 N k mn T α α T α k ln .
Instead of q(β), its non-normalized version q(β) is available from [16] as
q ( β ) = m = 1 3 β m a m 1 exp γ γ 2 β T Ω β 2 β T b , where Ω ml = n = 1 N k mn T α α T α k ln , for m , l = 1 , 2 , 3 and b m = α T α K m t , for m = 1 , 2 , 3 .
(21)
Following [16], calculate 〈β β ,〈logβ β , and 〈β β T β , as follows. Sample S β’s from Dirichlet(a1,a2,a3) and estimate the expectations as f ( β ) β s = 1 S f ( β s ) w ( β s ) where f(β)≡β, logβ,andβ β T , respectively, and
w ( β s ) = exp γ γ 2 β s T Ω β s 2 β s T b i = 1 S exp γ γ 2 β i T Ω β i 2 β i T b .
The analytic tractability of q(α,β,ϕ,γ|t,X,Y,c ϕ ,d ϕ ,c γ , d γ ,a1,a2,a3) in variational inference guarantees that the marginal variational distribution (or likelihood) of the data q(t,X,Y|c ϕ ,d ϕ ,c γ ,d γ ,a1,a2,a3) is also analytically tractable. Estimate hyperparameters c ϕ ,d ϕ ,c γ ,d γ ,a1,a2, and a3 as
arg max c ϕ , d ϕ , c γ , d γ , a 1 , a 2 , a 3 log q ( t , X , Y | c ϕ , d ϕ , c γ , d γ , a 1 , a 2 , a 3 ) ,

which is the type II maximum likelihood procedure as recommended in [16]. The kernel parameters { σ i 2 } i = 1 3 are learned respectively from three RVMs for each of the three sources using cross-validation as recommended by [15].

Declarations

Acknowledgements

VB’s research is partially supported by NIH grant R01 CA160736; NSF grant IIS-915196 and the Cancer Center Support Grant (CCSG) (P30 CA016672). WW’s work is in part funded by 5U24 CA143883-04 and P30 CA016672. CO is supported by NSF grant IIS-914861. The content is solely the responsibility of the authors and does not necessarily represent the official views of the U.S. National Cancer Institute, the National Institutes of Health, or the National Science Foundation. We also thank LeeAnn Chastain for editorial assistance with the manuscript.

Authors’ Affiliations

(1)
Department of Statistics, Purdue University
(2)
Department of Bioinformatics and Computational Biology, Division of Quantitative Sciences, The University of Texas MD Anderson Cancer Center
(3)
Department of Computer Science, University of Houston
(4)
Department of Biostatistics, Division of Quantitative Sciences, The University of Texas MD Anderson Cancer Center

References

  1. Chin L, Andersen JN, Futreal PA: Cancer genomics: from discovery science to personalized medicine. Nature Med 2011,17(3):297-303. 10.1038/nm.2323View ArticleGoogle Scholar
  2. Witten D, Tibshirani R: A framework for feature selection in clustering. J. Am. Stat. Assoc 2010,105(490):713-726. 10.1198/jasa.2010.tm09415MathSciNetView ArticleGoogle Scholar
  3. Efron B: Large-Scale Inference: Empirical Bayes Methods for Estimation, Testing, and Prediction. Cambridge University Press, New York, USA; 2010.View ArticleGoogle Scholar
  4. Diehn M, Nardini C, Kuo M: Identification of noninvasive imaging surrogates for brain tumor gene-expression modules. Proc. Natl. Acad. Sci 2008,105(13):5213. 10.1073/pnas.0801279105View ArticleGoogle Scholar
  5. Wang W, Baladandayuthapani V, Morris JS, Broom BM, Manyam G, Do KA: iBAG: integrative Bayesian analysis of high-dimensional multiplatform genomics data. Bioinformatics 2013,29(2):149-159. http://bioinformatics.oxfordjournals.org/content/29/2/149.abstract 10.1093/bioinformatics/bts655View ArticleGoogle Scholar
  6. Witten DM, Tibshirani RJ, et al.: Extensions of sparse canonical correlation analysis with applications to genomic data. Stat. Appl. Genet. Mol. Biol 2009, 8: 28.MathSciNetGoogle Scholar
  7. Shabalin AA, Tjelmeland H, Fan C, Perou CM, Nobel AB: Merging two gene-expression studies via cross-platform normalization. Bioinformatics 2008,24(9):1154-1160. 10.1093/bioinformatics/btn083View ArticleGoogle Scholar
  8. Ma S, Zhang Y, Huang J, Huang Y, Lan Q, Rothman N, Zheng T: Integrative analysis of cancer prognosis data with multiple subtypes using regularized gradient descent. Genet. Epidemiol 2012,36(8):829-838. http://dx.doi.org/10.1002/gepi.21669 10.1002/gepi.21669Google Scholar
  9. Verhaak RGx, Hoadley RG, Perou CM, Hayes DN: Integrated genomic analysis identifies clinically relevant subtypes of glioblastoma characterized by abnormalities in PDGFRA, IDH1, EGFR, and NF1. Cancer Cell 2010, 17: 98-110. 10.1016/j.ccr.2009.12.020View ArticleGoogle Scholar
  10. Iorio MV, Ferracin M, Negrini M, Croce CM: MicroRNA gene expression deregulation in human breast cancer. Cancer Res 2005,65(16):7065-7070. 10.1158/0008-5472.CAN-05-1783View ArticleGoogle Scholar
  11. Fasanaro P, Greco S, Ivan M, Capogrossi MC, Martelli F: microRNA: emerging therapeutic targets in acute ischemic diseases. Pharmacol. Ther 2010, 125: 92-104. 10.1016/j.pharmthera.2009.10.003View ArticleGoogle Scholar
  12. Tang W, Duan J, Zhang JG, Wang YP: Subtyping glioblastoma by combining miRNA and mRNA expression data using compressed sensing-based approach. EURASIP J. Bioinformatics Syst. Biol 2013, 2013: 2. 10.1186/1687-4153-2013-2View ArticleGoogle Scholar
  13. Tibshirani R: Regression shrinkage selection via the lasso. J. R. Stat. Soc. Ser. B (Methodological) 1996,58(1):267-288.MATHMathSciNetGoogle Scholar
  14. Bishop CM: Pattern Recognition and Machine Learning. Springer, New York; 2006.MATHGoogle Scholar
  15. Tipping M: Sparse Bayesian learning and the relevance vector machine. J. Mach. Learn. Res 2001, 1: 211-244.MATHMathSciNetGoogle Scholar
  16. Girolami M, Rogers S: Hierarchic Bayesian models for kernel learning. In Proceedings of the 22nd International Conference on Machine Learning (ICML-05). ACM, New York, USA, Bonn, Germany; 2005:241-248. http://doi.acm.org/10.1145/1102351.1102382 10.1145/1102351.1102382View ArticleGoogle Scholar
  17. Gönen M: Bayesian efficient multiple kernel learning. In Proceedings of the 29th International Conference on Machine Learning (ICML-12). Edited by: Langford J, Pineau J. Omnipress, Edinburgh, Scotland; 2012:1-8.Google Scholar
  18. Shawe-Taylor J, Cristianini N: Kernel Methods for Pattern Analysis. Cambridge university press, New York, USA; 2004.View ArticleGoogle Scholar
  19. Friedman J, Hastie T, Tibshirani R: Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw 2010, 33: 1.Google Scholar
  20. R Development Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna; 2013. http://www.R-project.org/Google Scholar
  21. Therneau TM: Modeling Survival Data: Extending the Cox Model. Springer-Verlag New York, Inc., New York, USA; 2000.View ArticleGoogle Scholar
  22. targetHub: targetHub. 2013.http://app1.bioinformatics.mdanderson.org/tarhub/_design/basic/index.htmlGoogle Scholar
  23. Hsu SD, Lin FM, Wu WY, Liang C, Huang WC, Chan WL, Tsai WT, Chen GZ, Lee CJ, Chiu CM, et al.: miRTarBase: a database curates experimentally validated microRNA–target interactions. Nucleic Acids Res 2011,39(suppl 1):D163—D169.Google Scholar
  24. Friedman RC, Farh KKH, Burge CB, Bartel DP: Most mammalian mRNAs are conserved targets of microRNAs. Genome Res 2009, 19: 92-105.View ArticleGoogle Scholar
  25. Tang D, Kang R, Zeh HJ, Lotze MT: High-mobility group box 1 and cancer. Biochimica et Biophysica Acta (BBA)-Gene Regul. Mech 2010, 1799: 131-140. 10.1016/j.bbagrm.2009.11.014View ArticleGoogle Scholar
  26. Tran NL, McDonough WS, Savitch BA, Sawyer TF, Winkles JA, Berens ME: The tumor necrosis factor-like weak inducer of apoptosis (TWEAK)-fibroblast growth factor-inducible 14 (Fn14) signaling system regulates glioma cell survival via NFκB pathway activation and BCL-XL/BCL-W expression. J. Biol. Chem 2005,280(5):3483-3492.View ArticleGoogle Scholar
  27. Huang M, Narita S, Tsuchiya N, Ma Z, Numakura K, Obara T, Tsuruta H, Saito M, Inoue T, Horikawa Y, et al.: Overexpression of Fn14 promotes androgen-independent prostate cancer progression through MMP-9 and correlates with poor treatment outcome. Carcinogenesis 2011,32(11):1589-1596. 10.1093/carcin/bgr182View ArticleGoogle Scholar
  28. Dai L, Gu L, Ding C, Qiu L, Di W: TWEAK promotes ovarian cancer cell metastasis via NF-ΰB pathway activation and VEGF expression. Cancer Lett 2009,283(2):159-167. http://www.sciencedirect.com/science/article/pii/S0304383509002286 10.1016/j.canlet.2009.03.036View ArticleGoogle Scholar
  29. Huse JT, Holland E, DeAngelis LM: Glioblastoma: molecular analysis and clinical implications. Ann. Rev. Med 2012, 64: 59-70. http://dx.doi.org/10.1146/annurev-med-100711-143028View ArticleGoogle Scholar
  30. Srinivasan S, Patric IRP, Somasundaram K: A ten-microRNA expression signature predicts survival in glioblastoma. PLoS One 2011,6(3):e17438. 10.1371/journal.pone.0017438View ArticleGoogle Scholar
  31. Papagiannakopoulos T, Shapiro A, Kosik KS: MicroRNA-21 targets a network of key tumor-suppressive pathways in glioblastoma cells. Cancer Res 2008,68(19):8164-8172. 10.1158/0008-5472.CAN-08-1305View ArticleGoogle Scholar
  32. Silber J, Jacobsen A, Ozawa T, Harinath G, Pedraza A, Sander C, Holland EC, Huse JT: miR-34a repression in proneural malignant gliomas upregulates expression of its target PDGFRA and promotes tumorigenesis. PLoS ONE 2012,7(3):e33844. http://dx.doi.org/10.1371 10.1371/journal.pone.0033844View ArticleGoogle Scholar

Copyright

© Srivastava et al.; licensee Springer. 2013

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.