Gene expression analysis supports tumor threshold over 2.0 cm for Tcategory breast cancer
 Hiroko K. Solvang^{1}Email author,
 Arnoldo Frigessi^{2},
 Fateme Kaveh^{3},
 Margit L. H. Riis^{4, 5},
 Torben Lüders^{4, 5},
 Ida R. K. Bukholm^{4, 6},
 Vessela N. Kristensen^{5, 7} and
 Bettina K. Andreassen^{8}
DOI: 10.1186/s1363701500345
© Solvang et al. 2016
Received: 3 October 2014
Accepted: 23 December 2015
Published: 8 February 2016
Abstract
Tumor size, as indicated by the Tcategory, is known as a strong prognostic indicator for breast cancer. It is common practice to distinguish the T1 and T2 groups at a tumor size of 2.0 cm. We investigated the 2.0cm rule from a new point of view. Here, we try to find the optimal threshold based on the differences between the gene expression profiles of the T1 and T2 groups (as defined by the threshold). We developed a numerical algorithm to measure the overall differential gene expression between patients with smaller tumors and those with larger tumors among multiple expression datasets from different studies. We confirmed the performance of the proposed algorithm by a simulation study and then applied it to three different studies conducted at two Norwegian hospitals. We found that the maximum difference in gene expression is obtained at a threshold of 2.2–2.4 cm, and we confirmed that the optimum threshold was over 2.0 cm, as indicated by a validation study using five publicly available expression datasets. Furthermore, we observed a significant differentiation between the two threshold groups in terms of time to local recurrence for the Norwegian datasets. In addition, we performed an associated network and canonical pathway analyses for the genes differentially expressed between tumors below and above the given thresholds, 2.0 and 2.4 cm, using the Norwegian datasets. The associated network function illustrated a cellular assembly of the genes for the 2.0cm threshold: an energy production for the 2.4cm threshold and an enrichment in lipid metabolism based on the genes in the intersection for the 2.0 and 2.4cm thresholds.
Keywords
Breast cancer Tcategory Differentially expressed Microarray data Twogroup comparison statistical test Optimization algorithm1 Introduction
Breast cancer is known as a complex biological system, and tumors are complex organ systems shaped by gene aberrations, cellular biological context, characteristics specific to the person, and environmental factors. Management of breast cancer relies on the availability of robust clinical and pathological prognostic and predictive factors to guide patient decisionmaking and the selection of treatment options [1]. Tumor size, indicated by the Tcategory, is known as a strong prognostic indicator for breast cancer and is one of the factors taken into account when deciding how and whether to treat a patient, independent of lymph node status. Significantly better survival can be expected in tumors categorized as T1. It is common practice to distinguish between T1 (0.1 cm < and < 2.0 cm) and T2 (2.0 cm < and < 5.0 cm) groups by the 2cm rule [1]. It is well known that the T1T2 distinction is reflected in prognosis: tumors categorized into the T2 group are more aggressive and might have already spread.
Gene expression profiling has in the last decade entered the field of molecular classification. An arraybased approach to characterize T1 and T2 tumors was recently attempted, based on microarray data that present the expression level for each feature (gene or probe) and revealed distinct molecular pathways characterizing each stage [2]. The differential expression (DE) for a feature is measured using twogroup comparison, for which several statistical methods, such as tstatistics, significant analysis of microarray (SAM), fold changes, and Bstatistics, have been proposed [3]. However, DE measures are obviously dependent on the threshold chosen to distinguish between T1 and T2 tumors. In fact, the study by Riis et al. [2] suggested that using the Tsize expression signatures instead of tumor size leads to a significant difference in risk for distant metastases and that the molecular signature can be used to select patients with tumor category T1 who may need more aggressive treatment and patients with tumor category T2 who may have less benefit from it. To stratify patients into two groups each requiring a different treatment for breast cancer, ‘Cutoff Finder’ was developed by [4]. The ‘Cutoff’ point is determined by the distribution of the marker under investigation and optimizing the correlation of the dichotomization with regard to an outcome or survival variable. The method was considered for stratifications based on the expression of specific genes, estrogen receptor, and progesterone receptor, neither whole genomic regions nor tumor size. In this article, we develop an algorithm to evaluate the traditional 2.0cm threshold in the light of gene expression differences between breast cancer patients below and above the threshold. We use two different measurements from metaanalysis theory that are useful for handling multiple genetic studies; these apply different preprocessing techniques, platforms, and lab environments. The choice of which metaanalysis technique to use depends on the type of response and objective. When the objective is to identify the DE between two conditions, methods include vote counting, combining ranks, p values, and effect sizes [3]. Campain and Yang provided an intuitive measure, called meta differential expression via distance synthesis (mDEDS) [5], using DE via distance synthesis (DEDS) [6] to aggregate multiple DE measurements. The performance of mDEDS was compared with existing metaanalysis methods, such as Fisher’s inverse chisquare, GeneMeta, metaArray, RankProd, and Naïve metamethods, using a simulation study and two case studies [3]. The results mostly showed better performance for mDEDS, while some cases favored the Fisher’s inverse chisquare [7]. This method uses a simple procedure that combines the p values from independent datasets. Therefore, we apply both the mDEDS and the Fisher’s score in our proposed algorithm in order to analyze different thresholds. To confirm the reliability of the proposed algorithm, we performed a simulation study. Then, we applied this algorithm to three different expression datasets gathered at two Norwegian hospitals. To validate the estimated optimum threshold for the Norwegian datasets, we applied our algorithm to five publicly available expression datasets. Based on the estimated optimum threshold for the Norwegian datasets, we investigated the prognostic status from the viewpoints of local recurrence and the associated network and canonical pathway.
2 Method
Given i = 1, ⋅ ⋅⋅, I genes from k = 1, ⋅ ⋅⋅, K datasets, the measures are described below. We should use two measures of comparison.
2.1 Fisher’s inverse chisquare statistic
This statistic tests the null hypothesis that gene i is not the DE between the two groups given K datasets. Under this null hypothesis, S _{ i } is chisquare distributed with 2K degrees of freedom. In our case, the p value is calculated by the WilcoxonMannWhitney (WMW) test for each gene and each dataset.
2.2 Differential expression via distance synthesis (DEDS)
It is possible to calculate various statistics to describe the differences in expression between the two groups, including WMW test, tstatistics, and fold change (FC). DEDS then integrates and summarizes these statistics using a weighted distance approach [6] used for twogroup comparisons, and next, it measures the distance between the aggregated point and the extreme origin that is assumed to represent the largest measurement of all. These procedures can be performed by the R package called ‘DEDS’ (http://www.bioconductor.org/). In the procedure, t, SAM, FC, B, moderated t, and moderated Fstatistics were selected as t _{ j }. Campain and Yang expanded DEDS to a metaanalysis method, called mDEDS [5]. The flow for the analysis by mDEDS proceeds as follows. (1) Apply J appropriate statistics t _{ ij } to each of i = 1, ⋅ ⋅⋅, I genes and J with 1 ≤ J ≤ 6. The observed coordinatewise extreme point over all genes is defined by E _{0} = (max_{ i }(t _{ i1}), ⋅ ⋅⋅, max_{ i }(t _{ iJ })). (2) For each permuted dataset b = 1, ⋅ ⋅⋅, B, obtain the permutation extreme point E _{ b } and evaluate the coordinatewise extreme point E _{ p } by maximizing over all permutations E _{ p } = (max_{ b }(E _{ b1}), ⋅ ⋅⋅, max_{ b }(E _{ bJ })). (3) Obtain the overall maximum E = max(E _{0}, E _{ p }). (4) Calculate the distance d _{ i } from each gene to E = (E _{1}, ⋅ ⋅⋅, E _{ J }), defined by \( {d}_i={\displaystyle {\sum}_{j=1}^J\frac{{\left({t}_{ij}{E}_j\right)}^2}{\mathrm{MAD}{\left({t}_{ij}\right)}^2}} \), where MAD is the median absolute deviation from the median. (5) Do steps (1)–(4) for all k = 1, ⋅ ⋅⋅, K studies and summarize the distances coordinatewise. The package outputs the list for estimated statistics and the distance for each dataset. To perform procedure (5), we summarize the obtained distances for all datasets and order them according to the genes.
2.3 An extension to DEDS
For mDEDS, the original study [5] did not touch on the procedure for using the extreme origin to measure the distance between the points by applying measurements that may change across different cohorts. DEDS’s original procedure selects the larger one of the original data or the permutated data as the extreme origin, obtained without taking into account changes in the extreme origin. In fact, the extreme origin and the coordinatewise extreme origin changed if the dataset changed. When mDEDS is calculated for the threshold shifting at 0.1 intervals within a region from 1.5 to 3.5, the origin should also change in this manner: \( {E}_{1.5}= \max \left({E}_0^{(1.5)},{E}_p^{(1.5)}\right) \) for q = 1.5,…, \( {E}_q= \max \left({E}_0^{(q)},{E}_p^{(q)}\right) \) for q,…, \( {E}_{3.5}= \max \left({E}_0^{(3.5)},{E}_p^{(3.5)}\right) \) for q = 3.5, where \( {E}_0^q \) and \( {E}_p^q \) indicate the extreme point obtained by the original data and permuted data, respectively. Therefore, we define the following extreme point, named ‘totally extreme point (TEP)’: E _{max} = max(E _{1.5}, ⋅ ⋅⋅, E _{ q }, ⋅ ⋅⋅, E _{3.5}) if q ∈ (1.5, 3.5)_{.}
Then, the scaled distance for each gene across studies K is \( {d}_i={\displaystyle {\sum}_{k=1}^K{\displaystyle {\sum}_{j=1}^6\frac{{\left({t}_{ikj}{E}_{\max}\right)}^2}{\mathrm{MAD}{\left({t}_{ikj}\right)}^2}}} \).
2.4 Estimation of optimal threshold q between T1 and T2
This is motivated by the idea that we are looking for the threshold that best divides the two tumor groups from each other based on the genomewide expression profiles.
3 Simulation study
Summarizing these simulation studies, both Fisher’s and mDEDS scores found the optimal threshold 3.0, which was the boundary set for generating random small and large values. Thereby, we could demonstrate the validity of our proposed algorithm.
4 Materials
4.1 Norwegian datasets
Three datasets were gathered at two Norwegian hospitals. The two datasets consist of onecolored expression data (mdata1) (27 samples and 43,376 probes) and twocolored expression data (mdata2) (46 samples and 41,674 probes), which were collected at Akershus University Hospital, Lørenskog, Norway. The third dataset is 40,995 probes with onecolored mRNA expression for 102 tumor samples (mdata3), taken from patients with earlystage breast cancer [8] managed by Oslo University Hospital Radiumhospitalet in Norway. All datasets were processed on the Agilent platform, and the preprocessing of all datasets was performed by the methods provided by Bioconductor (http://www.bioconductor.org/help/workflows/oligoarrays/). We applied quantile normalization to onecolor data and the lowest normalization to twocolor data. No background correction was performed for these data. The probes were matched across datasets. Consequently, 40,995 probes were used for the analysis. Given the relatively large full range of tumor sizes of 0.1–5.0 cm, however, the number of samples for less than 1.0 cm and over 4.0 cm were very few depending on the dataset. Therefore, we fixed 1.0–3.0 cm as the range we should search to find the optimum size.
4.2 Validation datasets
To validate the optimum threshold estimated by the above datasets, we used the five different expression datasets, collectively called the Affy947 expression dataset [9]. The dataset is a collection of six published datasets containing microarray data of breast cancer samples. These datasets are all measured on Human Genome HG U133A Affymetrix arrays and normalized using the same protocol. Since one dataset (Pawitan et al. dataset [10]) did not involve the tumor sizes data, we excluded it from further analysis. They were assessable from NCBI’s Gene Expression Omnibus (GEO, http://www.ncbi.nlm.nih.gov/geo/) with the following identifies, GSE6532 for the Loi et al. dataset [11], GSE3494 [12] for Miller dataset, GSE7390 for the Desmedt et al. dataset [13], and GSE5327 for the Minn et al. dataset [14]. The Chin et al. [15] dataset is available from ArrayExpress (http://www.ebi.ac.uk/, identifier ETABM158). This pooled dataset was preprocessed and normalized as described in Zhao et al. [16]. Microarray qualitycontrol assessment was carried out using the R AffyPLM package from the Bioconductor web site (http://www.bioconductor.org, [17]). The relative log expression (RLE) test and the Normalized Unscaled Standard Errors (NUSE) test were applied. Chip pseudoimages were produced to assess artifacts on arrays that did not pass the preceding qualitycontrol tests. Selected arrays were normalized according to threestep procedures using the robust multiarray average (RMA) expression measure algorithm (http://www.bioconductor.org; [18]): RMA background correction convolution, median centering of each gene across arrays separately for each dataset and quantile normalization of all arrays. Gene mean centering has been shown to effectively remove many datasetspecific biases allowing effective integration of multiple datasets [19].
5 Results and discussion
5.1 Optimal tumor size
It is important to notice that the optimal value of q, obtained by optimizing the objective functions (3) and (4), cannot be equipped with a confidence interval obtained by bootstrap. This is similar to other situations in statistics, where certain parameters are obtained by optimization, for example, the smoothing parameter in nonparametric regression or the penalty in lasso regression, obtained by optimizing some cross validation criteria. To explain this, let us follow the bootstrapping paradigm. Let us fix a value q _{1}. Then we can compute the p values p _{ i }(q _{1}) and the Fisher score S*(q _{1}). We can bootstrap the data and obtain bootstrap distributions for all p values and compute the corresponding bootstrap distribution for S(q _{1}), which has a mean equal to S*(q _{1}). We now repeat for various q in Q and obtain score S*(q) and the bootstrap distributions for score S(q) for all q in Q. What we do in this article is to minimize over q the score S*(q), which can be interpreted as the bootstrap mean. But we cannot minimize the sum of the bootstrapped distributions of S(q) for all q in Q. We need to summarize these distributions by a point estimate, and our method uses the mean. For example, we could use the bootstrap medians instead. In any case, the obtained optimal q cannot carry any bootstrapbased uncertainty. On the other hand, we can repeat the threshold selection separately on each of the three datasets. For mDEDS, this gave the optimal values of 2.1, 2.2, and 2.2 cm; for Fisher’s score, we obtained 1.7 (slightly preferable to 2.5 cm), 2.4, and 2.5 cm. Three values do not allow an estimate of variability, but they appear consistent.
5.2 Validation study
5.3 Survival analysis using optimum threshold
p values obtained by the logrank test for survival time (months) and time to local recurrence (in months)
Thresholds [cm]  Overall survival  BC specific  Local recurrence 

2.0  0.021  0.027  0.13 
2.2  0.18  0.047  0.045 
2.4  0.30  0.083  0.089 
5.4 Associated network and canonical pathway analyses based on the gene lists of expression differences between T1 and T2 groups based on the 2.4 and 2.0cm thresholds
Estimated number of significant probes (genes) by SAM
Size used for response  2.4 cm  2.0 cm  

Statistics for SAM  # probes  FDR (%)  # probes  FDR (%)  
Data  mdata1  12  15.03  44  6.03 
mdata2  11,740  5.02  8036  4.81  
mdata3  93  20.29  9  10.42 
Summary of pathway analyses
(Part A) Biological functions enriched in 5618 unique probes separating tumors below and above size 2.4 cm  
Associated network functions  Score 
Cellular assembly and organization, cellular compromise, protein synthesis  26 
Cell signaling, nucleic acid metabolism, small molecule biochemistry  24 
Energy production, nucleic acid metabolism, small molecule biochemistry  24 
Hair and skin development and function, dermatological diseases and conditions, developmental disorder  22 
Posttranslational modification, gene expression, infectious disease  22 
Top canonical pathways  −log (FDRcorrected p value) 
Neuropathic pain signaling in dorsal horn neurons  1.13 
Role of NNFAT in cardiac hypertrophy  1.13 
Melatonin signaling  6.94 × 10^{−1} 
Molecular mechanisms of cancer  6.94 × 10^{−1} 
Calciuminduced T lymphocyte apoptosis  6.94 × 10^{−1} 
Diseases and disorders  FDRcorrected p value 
Cancer  3.95 × 10^{−1}–5.44 × 10^{−1} 
Hematological disease  3.95 × 10^{−1}–5.44 × 10^{−1} 
Immunological disease  5.13 × 10^{−1}–5.44 × 10^{−1} 
Hypersensitivity response  5.44 × 10^{−1}–5.44 × 10^{−1} 
Inflammatory response  5.44 × 10^{−1}–5.44 × 10^{−1} 
Molecular and cellular functions  FDRcorrected p value 
Gene expression  2.03 × 10^{−1}–5.44 × 10^{−1} 
Cellular growth and proliferation  2.03 × 10^{−1}–5.44 × 10^{−1} 
Energy production  2.03 × 10^{−1}–5.44 × 10^{−1} 
Amino acid metabolism  2.03 × 10^{−1}–5.44 × 10^{−1} 
Small molecule biochemistry  2.03 × 10^{−1}–5.44 × 10^{−1} 
(Part B) Biological functions enriched in 1914 unique probes separating tumors below and above size 2.0 cm  
Associated network functions  Score 
Antigen presentation, cellular movement, hematological system development and function  29 
Cell assembly, and organization, cellular function and maintenance, protein synthesis  29 
Gene expression, infectious disease, small molecule biochemistry  29 
Cellular assembly and organization, cell signaling, gene expression  29 
Posttranslational modification, protein folding, cell death  24 
Top canonical pathways  −log (FDRcorrected p value) 
Tight junction signaling  9.13 × 10^{−1} 
Germ cellSertoli cell junction signaling  8.69 × 10^{−1} 
Cfc42 signaling  1.26 × 10^{−1} 
Fatty acid biosynthesis  8.69 × 10^{−1} 
Integrin signaling  8.69 × 10^{−1} 
Diseases and disorders  FDRcorrected p value 
Dermatological diseases and conditions  6.58 × 10^{−3}–2.73 × 10^{−1} 
Genetic disorder  6.58 × 10^{−3}–2.73 × 10^{−1} 
Infectious disease  1.14 × 10^{−2}–2.73 × 10^{−1} 
Inflammatory disease  3.38 × 10^{−2}–2.73 × 10^{−1} 
Inflammatory response  3.38 × 10^{−2}–2.73 × 10^{−1} 
Molecular and cellular functions  FDRcorrected p value 
Antigen presentation  3.38 × 10^{−2}–2.73 × 10^{−1} 
Celltocell signaling and interaction  3.38 × 10^{−2}–2.73 × 10^{−1} 
Cellular compromise  3.38 × 10^{−2}–2.73 × 10^{−1} 
Cellular function and maintenance  3.38 × 10^{−2}–2.73 × 10^{−1} 
Cellular movement  7.14 × 10^{−2}–2.73 × 10^{−1} 
(Part C) Biological functions enriched in 6112 overlapping probes separating tumors below and above size 2.4 cm and 2.0 cm  
Associated network functions  Score 
Protein synthesis, posttranslational modification, cancer  26 
Cell signaling, nucleic acid metabolism, small molecule biochemistry  24 
Lipid metabolism, small molecule biochemistry, vitamin, and mineral metabolism  24 
Connective tissue development and function, embryonic development, skeletal and muscular system development and function  24 
Cancer, dematological diseases and conditions, tumor morphology  22 
Top canonical pathways  −log (FDRcorrected p value) 
Cytotoxic T lymphocytemediated apoptosis of target cells  3.65 
Allograft rejection signaling  2.77 
Nur77 signaling in T lymphocytes  2.77 
Antigen presentation pathway  2.77 
T helper cell differentiation  1.75 
Diseases and disorders  FDRcorrected p value 
Dermatological diseases and conditions  1.98 × 10^{−7}–1.89 × 10^{−1} 
Respiratory disease  7.98 × 10^{−5}–1.89 × 10^{−1} 
Cancer  4.49 × 10^{−4}–1.89 × 10^{−1} 
Genetic disorder  4.49 × 10^{−4}–1.71 × 10^{−1} 
Inflammatory response  4.49 × 10^{−4}–1.89 × 10^{−1} 
Molecular and cellular functions  FDRcorrected p value 
Celltocell signaling and interaction  2.68 × 10^{−4}–1.89 × 10^{−1} 
Cellular movement  8.02 × 10^{−4}–1.89 × 10^{−1} 
Cellular growth and proliferation  1.29 × 10^{−3}–1.89 × 10^{−1} 
Cellular development  2.99 × 10^{−3}–1.89 × 10^{−1} 
Cell death  4.1 × 10^{−3}–1.89 × 10^{−1} 
Associated network functions explain the tendencies of cellular assembly in tumor interaction for the early stage of tumors and energy production for the progressive stage of tumors. Part C represents a transitional stage from early to progressive, which involves associated network functions including lipid metabolism and cell signaling, nucleic acid metabolism, and small molecule biochemistry.
For the common genes shown in part C, besides known genes in breast cancer, such as AKT, ERBB2, and PTEN, we found also MTDH. When it was introduced, the gene Metadherin (MTDH) was shown to affect the expression of many genes of relevance to the metastatic and chemoresistance phenotypes [23]. MTDH may also represent a novel mediator of malignant breast cancer progression. Furthermore, we found interesting genes in part A such as MYC, which is known as an oncogene frequently deregulated in breast cancer; TP53, which is associated with high risk for various cancers; RAD50, which is known to moderately increase breast cancer risk; and BRCA2, whose mutation is associated with a significantly elevated risk for breast and ovarian cancers [24].
6 Conclusions
We study various tumor size thresholds that can be used to create two groups of patients. We proposed a numerical algorithm involving Fisher’s score and mDEDS using gene expressions. Both measurements found that the difference in gene expression between smaller and larger tumors appears to be slightly larger than 2.0 cm. The over 2.0cm optimum thresholds were supported by a validation using the five published expression datasets. We also found that the thresholds over 2.0 cm lead to the most distinct KaplanMeier curves of time to local recurrence. From the associated network and canonical pathway analyses for Norwegian datasets, the lists of DE genes for the 2.4cm threshold also included some genes related to the metastasis of breast cancer. The same approach can be extended to also controlling other factors such as tumor grades and estrogen receptor (ER) status, which are also important prognostic indicators for breast cancer. It could also apply to other cancer considering tumor size as a prognostic indicator. A further extension of our approach would be to determine more than two groups of patients, on the base of two (or more) thresholds. This would indicate that tumor dimension has a similar role with tumor grades. We decided to remain within the consolidated clinical practice with just the T1/T2 distinction. In summary, our analysis based on gene expressions indicates that the 2.0cm rule applied to determine patients who will benefit from more aggressive therapy appears to be justified. However, we find indications that a slightly larger threshold, of 2.2 cm could instead be applied, thus reducing therapy for some borderline patients. This could spare negative effects of strong therapies to patients that possibly do not need them. We interpret our results as a call for a critical revision of the 2.0cm rule in the light of individual genomic data.
Abbreviations
 BC:

breast cancer
 DE:

differential expression
 DEDS:

differential expression via distance synthesis
 FC:

fold change
 GEO:

Gene Expression Omnibus
 MAD:

median absolute deviation from the median
 mDEDS:

meta differential expression via distance synthesis
 NCBI:

National Center for Biotechnology Information
 NUSE:

normalized unscaled standard errors
 RLE:

relative log expression
 RMA:

robust multiarray average
 SAM:

significant analysis of microarray
 TEP:

totally extreme point
 WMW:

WilcoxonMannWhitney
Declarations
Acknowledgements
This work was supported by grants 193387/V50 Understanding breast cancer genomics to ALBD/VNK from the Norwegian Research Council (NFR) and by grants from the SouthEastern Norway Regional Health Authority (Helse SørØst) 2789119 and the Akershus University Hospital 2679030 and 2699015 to VNK. Furthermore, we thank, for the valuable suggestion and help for the validation data, Dr. Xi Zhao, Stanford Center for Cancer System Biology, Stanford University.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Authors’ Affiliations
References
 EA Rakha, JS ReisFilho, F Baehner, DJ Dabbs, T Decker, V Eusebi, EB Fox, S Ichihara, J Jacquemier, SR Lakhani, J Palacios, AL Richardson, SJ Schnitt, FC Schmitt, PH Tan, CM Tse, S Badve, IO Ellis, Breast cancer prognostic classification in the molecular era: the role of histological grade. Breast Cancer Research 12, 207 (2010)Google Scholar
 M.L. Riis, X. Zhao, F. Kaveh, H.S. Vollan, A.J. Nesbakken, H.K. Solvang, T. Lüders, I.R. Bukholm, and V.N. Kristensen, Gene expression profile analysis of T1 and T2 breast cancer reveals different activation pathways, ISRN Oncol. (2013). doi:10.1155/2013/924971.
 A Ramasamy, A Mondry, CC Holmes, DG Altman, Key issues in conducting a metaanalysis of gene expression microarray datasets. PLoS Medicine 5(9), e184 (2008)View ArticleGoogle Scholar
 J Budczies, F Klauschen, BV Sinn, B Gyӧrffy, WD Schmitt, S DarbEsfahani, C Denkert, F Cutoff, A comprehensive and straightforward web application enabling rapid biomarker cutoff optimization. PLoS ONE 7(12), e51862 (2012)View ArticleGoogle Scholar
 A Campain, YH Yang, Comparison study of microarray metaanalysis methods. BMC Bioinformatics 11, 408 (2010)View ArticleGoogle Scholar
 YH Yang, Y Xiao, MR Segal, Identifying differentially expressed genes from microarray experiments via statistic synthesis. Bioinformatics 21(7), 1084–1093 (2004)View ArticleGoogle Scholar
 RA Fisher, Statistical Methods for Research Workers (Fisher Oliver & Boyd, Edinburgh, 1950), p. 11Google Scholar
 B Naume, X Zhao, M Synnestvedt, E Borgen, HG Russness, OC Lingjærde, M Strømberg, G Wiedswang, G Kvalheim, R Kåresen, JM Nesland, AL BørresenDale, T Sørlie, Presence of bone marrow micrometastasis is associated with different recurrence risk within molecular subtypes of breast cancer. Molecular Oncology 1, 160–171 (2007)View ArticleGoogle Scholar
 MH van Vliet, F Reyal, HM Horlings, MJ van de Vijver, MJT Reinders, LFA Wessels, Pooling breast cancer datasets has a synergetic effect on classification performance and improves signature stability. BMC Genomics 9, 375 (2008)View ArticleGoogle Scholar
 Y Pawitan, J Bjöhle, L Amler, AL Borg, S Egyhazi, P Hall, X Han, L Holmberg, F Huang, S Klaar, ET Liu, L Miller, H Nordgren, A Ploner, K Sandelin, PM Shaw, J Smeds, L Skoog, S Wedrén, J Bergh, Gene expression profiling spares early breast cancer patients from adjuvant therapy: derived and validated in two populationbased cohorts. Breast Cancer Research 7, R953–R964 (2005)View ArticleGoogle Scholar
 S Loi, B HaibeKains, C Desmedt, F Lallemand, AM Tutt, C Gillet, P Ellis, A Harris, J Bergh, JA Foekens, JG Klijn, D Larsimont, M Buyse, G Botempi, M Delorenzi, MJ Piccart, C Sotiriou, Definition of clinically distinct molecular subtypes in estrogen receptorpositive breast carcinomas through genomic grade. J. Clini Oncol. 25(10), 1239–1246 (2007)View ArticleGoogle Scholar
 LD Miller, J Smeds, J George, VB Vega, L Vergara, A Ploner, Y Pawitan, P Hall, S Klaar, ET Liu, J Bergh, An expression signature for p53 status in human breast cancer predicts mutation status, transcriptional effects, and patient survival. PNAS 102(38), 13550–13555 (2005)View ArticleGoogle Scholar
 C Desmedt, F Piette, S Loi, Y Wang, F Lallemand, B HaibeKains, M Delorenzi, MS D’Assignies, J Bergh, R Lidereau, P Ellis, AL Harris, JG Klijn, JA Foekens, F Cardoso, MJ Piccart, M Buyse, C Sotiriou, TRANSBIG Consortium, Strong time dependence of the 76gene prognostic signature for nodenegative breast cancer patients in the TRANSBIG multicenter independent validation series. Clinical Cancer Research 13(11), 3207–3214 (2007)View ArticleGoogle Scholar
 AJ Minn, GP Gupta, PM Siegel, PD Bos, W Shu, DD Giri, A Viale, AB Olshen, WL Gerald, J Massaqué, Genes that mediate breast cancer metastasis to lung. Nature 436, 518–524 (2005)View ArticleGoogle Scholar
 K Chin, S DeVries, J Fridlyand, PT Spellman, R Roydasgupta, WL Kuo, A Lapuk, RM Neve, Z Qian, T Ryder, F Chen, H Feiler, T Tokuyasu, C Kingsley, S Dairkee, Z Meng, K Chew, D Pinkel, A Jain, BM Ljung, L Esseman, DG Albertson, FM Waldman, JW Gray, Genomic and transcriptional aberrations linked to breast cancer pathophysiologies. Cancer Cell 10, 529–541 (2006)View ArticleGoogle Scholar
 X Zhao, EA Rødland, T Sørlie, HKM Vollan, HG Russnes, VN Kristensen, OC Lingjærde, AL BørresenDale, Systematic assessment of prognstic gene signatures for breast cancer shows distinct influence of time and ER status. BMC Cancer 14, 211 (2014)View ArticleGoogle Scholar
 BM Bolstad, F Collin, J Brettschneider, K Simpson, L Cope, RA Irizarry, TP Speed, Quality Assessment of Affymetrix GeneChip Data. Bioinformatics and Computational Biology Solutions Using R and Bioconductor Statistics for Biology and Health (Springer, New York, 2005), pp. 33–47View ArticleGoogle Scholar
 RA Irizarry, BM Bolstad, F Collin, LM Cope, B Hobbs, TP Speed, Summaries of Affymetrix GeneChip probe level data. Nucleic Acids Research 31(4), e15 (2003)View ArticleGoogle Scholar
 AH Sims, GJ Smethurst, Y Hey, MJ Okoniewski, SD Pepper, A Howell, CJ Miller, RB Clarke, The removal of multiplicative, systematic bias allows integration of breast cancer gene expression datasets—improving metaanalysis and prediction of prognosis. BMC Medical Genomics 1, 42 (2008)View ArticleGoogle Scholar
 VG Tusher, R Tibshirani, G Chu, Significance analysis of microarrays applied to the ionizing radiation response. PNAS 98(9), 5116–5121 (2001)View ArticleMATHGoogle Scholar
 Y Benjamini, Y Hochberg, Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. Royal Stat. Soc. Series B 57(1), 289–300 (1995)MathSciNetMATHGoogle Scholar
 Ingenuity systems, http://www.ingenuity.com.
 MA Blanco, Y Kang, Signaling pathways in breast cancer metastasis—novel insights from functional genomics. Breast Cancer Research 13, 206 (2011) (2011)View ArticleGoogle Scholar
 E.Y.H.P. Lee and W.J. Muller, ‘Ongogenes and tumor suppressor genes’, Cold Spring Harbor Persp. Biol. (2010). doi:10.1101/cshperspect.a003236.