Open Access

Normalization Benefits Microarray-Based Classification

  • Jianping Hua1Email author,
  • Yoganand Balagurunathan1,
  • Yidong Chen2,
  • James Lowey1,
  • Michael L Bittner1,
  • Zixiang Xiong3,
  • Edward Suh1 and
  • Edward R Dougherty1, 3
EURASIP Journal on Bioinformatics and Systems Biology20062006:43056

DOI: 10.1155/BSB/2006/43056

Received: 11 December 2005

Accepted: 18 May 2006

Published: 24 August 2006

Abstract

When using cDNA microarrays, normalization to correct labeling bias is a common preliminary step before further data analysis is applied, its objective being to reduce the variation between arrays. To date, assessment of the effectiveness of normalization has mainly been confined to the ability to detect differentially expressed genes. Since a major use of microarrays is the expression-based phenotype classification, it is important to evaluate microarray normalization procedures relative to classification. Using a model-based approach, we model the systemic-error process to generate synthetic gene-expression values with known ground truth. These synthetic expression values are subjected to typical normalization methods and passed through a set of classification rules, the objective being to carry out a systematic study of the effect of normalization on classification. Three normalization methods are considered: offset, linear regression, and Lowess regression. Seven classification rules are considered: 3-nearest neighbor, linear support vector machine, linear discriminant analysis, regular histogram, Gaussian kernel, perceptron, and multiple perceptron with majority voting. The results of the first three are presented in the paper, with the full results being given on a complementary website. The conclusion from the different experiment models considered in the study is that normalization can have a significant benefit for classification under difficult experimental conditions, with linear and Lowess regression slightly outperforming the offset method.

[1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20]

Authors’ Affiliations

(1)
Computational Biology Division, Translational Genomics Research Institute
(2)
Genetics Branch, Center for Cancer Research, National Cancer Institute, National Institutes of Health
(3)
Department of Electrical & Computer Engineering, Texas A&M University

References

  1. Quackenbush J: Microarray data normalization and transformation. Nature Genetics 2002,32(5 supplement):496-501.View ArticleGoogle Scholar
  2. Bilban M, Buehler LK, Head S, Desoye G, Quaranta V: Normalizing DNA microarray data. Current Issues in Molecular Biology 2002,4(2):57-64.Google Scholar
  3. Attoor S, Dougherty ER, Chen Y, Bittner ML, Trent JM: Which is better for cDNA-microarray-based classification: ratios or direct intensities. Bioinformatics 2004,20(16):2513-2520. 10.1093/bioinformatics/bth272View ArticleGoogle Scholar
  4. Chen Y, Kamat V, Dougherty ER, Bittner ML, Meltzer PS, Trent JM: Ratio statistics of gene expression levels and applications to microarray data analysis. Bioinformatics 2002,18(9):1207-1215. 10.1093/bioinformatics/18.9.1207View ArticleGoogle Scholar
  5. Yang YH, Dudoit S, Luu P, et al.: Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. Nucleic Acids Research 2002,30(4):e15. 10.1093/nar/30.4.e15View ArticleGoogle Scholar
  6. Tseng GC, Oh M-K, Rohlin L, Liao JC, Wong WH: Issues in cDNA microarray analysis: quality filtering, channel normalization, models of variations and assessment of gene effects. Nucleic Acids Research 2001,29(12):2549-2557. 10.1093/nar/29.12.2549View ArticleGoogle Scholar
  7. Devroye L, Gyorfi L, Lugosi G: A Probabilistic Theory of Pattern Recognition. Springer, New York, NY, USA; 1996.MATHView ArticleGoogle Scholar
  8. Vapnik VN: Statistical Learning Theory. John Wiley & Sons, New York, NY, USA; 1998.MATHGoogle Scholar
  9. Rosenblatt F: Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms. Spartan Books, Washington, DC, USA; 1962.MATHGoogle Scholar
  10. Duda R, Hart P: Pattern Classification. 2nd edition. John Wiley & Sons, New York, NY, USA; 2001.MATHGoogle Scholar
  11. Chang C-C, Lin C-J: LIBSVM: introduction and benchmarks. Department of Computer Science and Information Engineering, National Taiwan University, Taipei, Taiwan; 2000.Google Scholar
  12. Braga-Neto UM, Dougherty ER: Is cross-validation valid for small-sample microarray classification? Bioinformatics 2004,20(3):374-380. 10.1093/bioinformatics/btg419View ArticleGoogle Scholar
  13. Pudil P, Novovičová J, Kittler J: Floating search methods in feature selection. Pattern Recognition Letters 1994,15(11):1119-1125. 10.1016/0167-8655(94)90127-9View ArticleGoogle Scholar
  14. Jain AK, Zongker D: Feature selection: evaluation, application, and small sample performance. IEEE Transactions on Pattern Analysis and Machine Intelligence 1997,19(2):153-158. 10.1109/34.574797View ArticleGoogle Scholar
  15. Kudo M, Sklansky J: Comparison of algorithms that select features for pattern classifiers. Pattern Recognition 2000,33(1):25-41. 10.1016/S0031-3203(99)00041-2View ArticleGoogle Scholar
  16. Braga-Neto U, Dougherty ER: Bolstered error estimation. Pattern Recognition 2004,37(6):1267-1281. 10.1016/j.patcog.2003.08.017MATHView ArticleGoogle Scholar
  17. Sima C, Attoor S, Brag-Neto U, Lowey J, Suh E, Dougherty ER: Impact of error estimation on feature selection algorithms. Pattern Recognition 2005,38(12):2472-2482. 10.1016/j.patcog.2005.03.026View ArticleGoogle Scholar
  18. Hua J, Xiong Z, Lowey J, Suh E, Dougherty ER: Optimal number of features as a function of sample size for various classification rules. Bioinformatics 2005,21(8):1509-1515. 10.1093/bioinformatics/bti171View ArticleGoogle Scholar
  19. Jain AK, Waller WG: On the optimal number of features in the classification of multivariate Gaussian data. Pattern Recognition 1978,10(5-6):365-374. 10.1016/0031-3203(78)90008-0MATHView ArticleGoogle Scholar
  20. Chen Y, Dougherty ER, Bittner ML: Ratio-based decisions and the quantitative analysis of cDNA microarray images. Journal of Biomedical Optics 1997,2(4):364-374. 10.1117/12.281504View ArticleGoogle Scholar

Copyright

© Jianping Hua et al. 2006

This article is published under license to BioMed Central Ltd. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.