Over recent years, miRNAs have emerged as major players in the complex networks of gene regulation and have been implicated in various aspects of human diseases. Deciphering functional associations between miRNAs and diseases is a major step toward understanding the underlying patterns governing miRNA disease associations. In addition, it gives better insight into the functional role of miRNAs in disease development. The accumulated data on miRNA expression levels in tumors demonstrate that miRNAs are promising diagnostic candidates to distinguish different tumors and different subtypes of tumors as well as to predict their clinical behavior . The observations supported the role of miRNAs as either prognostic and/or diagnostic markers. miRNAs have therapeutic applications by which disease-causing miRNAs could be antagonized or functional miRNAs could be restored.
Lasso regression modeling demonstrated promise to to construct miRNA-target networks . Motivated by this work, we used Lasso regression model to predict functional associations between miRNAs and diseases based on gene signatures of each. Since there is an explosion of disease microarray data, we used it to define gene signature for each disease. To assess the noisiness in the disease signature, we integrated disease gene signature from pubmed abstracts to generate signature that cover wider spectrum of genes. For the miRNA-gene network, we only considered genes that are interacting with other proteins or genes and are directly or indirectly influenced by the miRNAs as these genes are anticipated to have higher influence on disease progression compared to genes that are targeted by miRNAs and not propagating their influence on the protein network.
We first evaluated the performance of Lasso regression as a miRNA enrichment analysis method as a proof of concept. Lasso regression successfully identified miRNAs from downregulated genes after miRNA treatment. We further evaluated the performance of Lasso regression model on the disease -miRNA interaction networks. We extracted disease-miRNA association network from miR2Disease and HMDD that contain manually curated database for microRNA deregulation in human diseases. ROC curve analysis showed that integrating microarray and text abstracts to define disease signature gives better performance compared to using the signatures separately. Similarly, integrating miRNAs’ indirect influence on genes to define miRNA target signature demonstrated better performance compared to using the direct influence alone. This suggests that refining signatures is a key step for accurate regression modeling. Two key issues have big effect on the accuracy of the model. First, the completeness and noisiness in the disease and miRNA signature. The more complete and refined the signature is, the more accurate the model is. Since microarray disease gene signature might harbour many off target genes that are irrelevant to the disease, more robust disease gene signature that is based on integrating more evidences is essential for the success of the modeling process. Similarly, incomplete miRNA-target interactions showed to affect the performance of the model. Using miRNA-target interactions from PITA showed less accuracy compared with TargetScan results. This suggests that miRNa-target data plays critical role in Lasso regression modeling to predict functional associations between miRNAs and diseases.
The second issue is the gold standard data. We realized that gold standard data was biased toward certain diseases like prostate cancer, breast cancer, and glioblastoma that have around hundred associated miRNAs. However, other disease like sarcoma, and colon cancer are associated with very few miRNAs like let-miR-7a and miR-21, respectively. This have big impact on false discovery rates and thus AUC performance measure. A more curated miRNA-disease interactions network is required to have more accurate performance evaluation. Unfortunately, we do not have complete manually accurated miRNA-disease databases. We tried to combine miR2Disease and HMDD to reduce incompleteness in the used miRNA-target interactions.
To further validate the novel miRNA-disease associations predicted by the model, we focused on prostate cancer as a case study. The model predicted 37 miRNAs to be involved in prostate cancer development. We extracted their expression from prostate miRNA expression data (Taylor and GSE23022); 16 of which have expression in Taylor miRNA expression data. Analyzing the diagnostic potential of these new miRNAs showed that these newly discovered miRNAs are diagnostically as good as prostate miRNAs in the gold standard data. Furthermore, the 16 miRNAs showed to be prognostically significant as they are associated with cancer recurrence. When we looked deeper into the literature, we found several of the 16 miRNAs have been validated to have a role in prostate cancer. For example, miRNA-1 showed to be a tumor suppressor miRNA that act as prognostic biomarker . These results support the the power of integrating signatures to construct functional network associations.
Finally, these results showed a promise of using regression models for integrating disease and miRNA signatures to find underlying functional associations between miRNAs and diseases. This could give us more insight on the functional role and implications of miRNAs in disease development.