Open Access

Gene Selection for Multiclass Prediction by Weighted Fisher Criterion

  • Jianhua Xuan1Email author,
  • Yue Wang1,
  • Yibin Dong1,
  • Yuanjian Feng1,
  • Bin Wang1,
  • Javed Khan2,
  • Maria Bakay3,
  • Zuyi Wang1, 3,
  • Lauren Pachman4,
  • Sara Winokur5,
  • Yi-Wen Chen3,
  • Robert Clarke6 and
  • Eric Hoffman3
EURASIP Journal on Bioinformatics and Systems Biology20072007:64628

DOI: 10.1155/2007/64628

Received: 30 August 2006

Accepted: 20 March 2007

Published: 10 July 2007

Abstract

Gene expression profiling has been widely used to study molecular signatures of many diseases and to develop molecular diagnostics for disease prediction. Gene selection, as an important step for improved diagnostics, screens tens of thousands of genes and identifies a small subset that discriminates between disease types. A two-step gene selection method is proposed to identify informative gene subsets for accurate classification of multiclass phenotypes. In the first step, individually discriminatory genes (IDGs) are identified by using one-dimensional weighted Fisher criterion (wFC). In the second step, jointly discriminatory genes (JDGs) are selected by sequential search methods, based on their joint class separability measured by multidimensional weighted Fisher criterion (wFC). The performance of the selected gene subsets for multiclass prediction is evaluated by artificial neural networks (ANNs) and/or support vector machines (SVMs). By applying the proposed IDG/JDG approach to two microarray studies, that is, small round blue cell tumors (SRBCTs) and muscular dystrophies (MDs), we successfully identified a much smaller yet efficient set of JDGs for diagnosing SRBCTs and MDs with high prediction accuracies (96.9% for SRBCTs and 92.3% for MDs, resp.). These experimental results demonstrated that the two-step gene selection method is able to identify a subset of highly discriminative genes for improved multiclass prediction.

[1234567891011121314151617181920212223242526272829303132333435363738394041424344454647]

Authors’ Affiliations

(1)
Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University
(2)
Department of Pediatric Oncology, National Cancer Institute
(3)
Research Center for Genetic Medicine, Children's National Medical Center
(4)
Disease Pathogenesis Program, Children's Memorial Research Center
(5)
Department of Biological Chemistry, University of California
(6)
Lombardi Cancer Center, Georgetown University

References

  1. Bittner M, Meltzer P, Chen Y, et al.: Molecular classification of cutaneous malignant melanoma by gene expression profiling. Nature 2000, 406(6795):536-540. 10.1038/35020115View ArticleGoogle Scholar
  2. Golub TR, Slonim DK, Tamayo P, et al.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 1999, 286(5439):531-537. 10.1126/science.286.5439.531View ArticleGoogle Scholar
  3. Shipp MA, Ross KN, Tamayo P, et al.: Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nature Medicine 2002, 8(1):68-74. 10.1038/nm0102-68View ArticleGoogle Scholar
  4. Liotta L, Petricoin E: Molecular profiling of human cancer. Nature Reviews Genetics 2000, 1(1):48-56. 10.1038/35049567View ArticleGoogle Scholar
  5. Jain AK, Duin RPW, Mao J: Statistical pattern recognition: a review. IEEE Transactions on Pattern Analysis and Machine Intelligence 2000, 22(1):4-37. 10.1109/34.824819View ArticleGoogle Scholar
  6. Jain AK, Zongker D: Feature selection: evaluation, application, and small sample performance. IEEE Transactions on Pattern Analysis and Machine Intelligence 1997, 19(2):153-158. 10.1109/34.574797View ArticleGoogle Scholar
  7. Raudys SJ, Jain AK: Small sample size effects in statistical pattern recognition: recommendations for practitioners. IEEE Transactions on Pattern Analysis and Machine Intelligence 1991, 13(3):252-264. 10.1109/34.75512View ArticleGoogle Scholar
  8. Fukunaga K: Introduction to Statistical Pattern Recognition. 2nd edition. Academic Press, Boston, Mass, USA; 1990.MATHGoogle Scholar
  9. Devijver PA, Kittler J: Pattern Recognition: A Statistical Approach. Prentice-Hall, Englewood Cliffs, NJ, USA; 1982.MATHGoogle Scholar
  10. Pudil P, Novovicova J, Kittler J: Floating search methods in feature selection. Pattern Recognition Letters 1994, 15(11):1119-1125. 10.1016/0167-8655(94)90127-9View ArticleGoogle Scholar
  11. Dudoit S, Fridlyand J, Speed TP: Comparison of discrimination methods for the classification of tumors using gene expression data. Journal of the American Statistical Association 2002, 97(457):77-87. 10.1198/016214502753479248View ArticleMathSciNetMATHGoogle Scholar
  12. Khan J, Wei JS, Ringnér M, et al.: Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nature Medicine 2001, 7(6):673-679. 10.1038/89044View ArticleGoogle Scholar
  13. Li T, Zhang C, Ogihara M: A comparative study of feature selection and multiclass classfication methods for tissue classification based on gene expression. Bioinformatics 2004, 20(15):2429-2437. 10.1093/bioinformatics/bth267View ArticleGoogle Scholar
  14. Tibshirani R, Hastie T, Narasimhan B, Chu G: Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proceedings of the National Academy of Sciences of the United States of America 2002, 99(10):6567-6572. 10.1073/pnas.082099299View ArticleGoogle Scholar
  15. Xiong M, Fang X, Zhao J: Biomarker identification by feature wrappers. Genome Research 2001, 11(11):1878-1887.Google Scholar
  16. Loog M: Approximate Pairwise Accuracy Criteria for Multiclass Linear Dimension Reduction: Generalisations of the Fisher Criterion. Delft University Press, Delft, The Netherlands; 1999.Google Scholar
  17. Loog M, Duin RPW, Haeb-Umbach R: Multiclass linear dimension reduction by weighted pairwise Fisher criteria. IEEE Transactions on Pattern Analysis and Machine Intelligence 2001, 23(7):762-766. 10.1109/34.935849View ArticleGoogle Scholar
  18. Koop JC: Generalized inverse of a singular matrix. Nature 1963, 200: 716.View ArticleGoogle Scholar
  19. Press WM, Flannery BP, Teukolsky SA, Vetterling WT: Numerical Recipes: The Art of Scientific Computing. Cambridge University Press, New York, NY, USA; 1986.Google Scholar
  20. Narendra PM, Fukunaga K: A branch and bound algorithm for feature subset selection. IEEE Transactions on Computers 1977, 26(9):917-922.View ArticleMATHGoogle Scholar
  21. Marill T, Green DM: On the effectiveness of receptors in cognition system. IEEE Transactions on Information Theory 1963, 9: 11-17. 10.1109/TIT.1963.1057810View ArticleGoogle Scholar
  22. Whitney AW: A direct method of nonparametric measurement selection. IEEE Transactions on Computers 1971, 20(9):1100-1103.View ArticleMathSciNetMATHGoogle Scholar
  23. Stearns SD: On selecting features for pattern classifiers. Proceedings of the 3rd International Conference on Pattern Recognition, Coronado, Calif, USA, November 1976 71-75.Google Scholar
  24. Haykin S: Neural Networks: A Comprehensive Foundation. 2nd edition. Prentice-Hall, Upper Saddle River, NJ, USA; 1999.MATHGoogle Scholar
  25. Lee Y, Lee C-K: Classification of multiple cancer types by multicategory support vector machines using gene expression data. Bioinformatics 2003, 19(9):1132-1139. 10.1093/bioinformatics/btg102View ArticleGoogle Scholar
  26. Ramaswamy S, Tamayo P, Rifkin R, et al.: Multiclass cancer diagnosis using tumor gene expression signatures. Proceedings of the National Academy of Sciences of the United States of America 2001, 98(26):15149-15154. 10.1073/pnas.211566398View ArticleGoogle Scholar
  27. Bakay M, Chen Y-W, Borup R, Zhao P, Nagaraju K, Hoffman E: Sources of variability and effect of experimental approach on expression profiling data interpretation. BMC Bioinformatics 2002, 3(1):4. 10.1186/1471-2105-3-4View ArticleGoogle Scholar
  28. Bakay M, Wang Z, Melcon G, et al.: Nuclear envelope dystrophies show a transcriptional fingerprint suggesting disruption of Rb-MyoD pathways in muscle regeneration. Brain 2006, 129(4):996-1013. 10.1093/brain/awl023View ArticleGoogle Scholar
  29. Affymetrix Technical Note: Statistical algorithms description document. Affymetrix 2002. [http://www.affymetrix.com/support/technical/whitepapers/sadd_whitepaper.pdf]Google Scholar
  30. Zhao P, Seo J, Wang Z, Wang Y, Shneiderman B, Hoffman E: In vivo filtering of in vitro expression data reveals MyoD targets. Comptes Rendus - Biologies 2003, 326(10-11):1049-1065. 10.1016/j.crvi.2003.09.035View ArticleGoogle Scholar
  31. Zhao P, Hoffman E: Embryonic myogenesis pathways in muscle regeneration. Developmental Dynamics 2004, 229(2):380-392. 10.1002/dvdy.10457View ArticleGoogle Scholar
  32. Winokur S, Chen Y-W, Masny PS, et al.: Expression profiling of FSHD muscle supports a defect in specific stages of myogenic differentiation. Human Molecular Genetics 2003, 12(22):2895-2907. 10.1093/hmg/ddg327View ArticleGoogle Scholar
  33. Bakay M, Zhao P, Chen J, Hoffman E: A web-accessible complete transcriptome of normal human and DMD muscle. Neuromuscular Disorders 2002, 12(1):S125-S141.View ArticleGoogle Scholar
  34. Chen Y-W, Zhao P, Borup R, Hoffman E: Expression profiling in the muscular dystrophies: identification of novel aspects of molecular pathophysiology. Journal of Cell Biology 2000, 151(6):1321-1336. 10.1083/jcb.151.6.1321View ArticleGoogle Scholar
  35. Hoffman E, Brown RH Jr., Kunkel LM: Dystrophin: the protein product of the Duchenne muscular dystrophy locus. Cell 1987, 51(6):919-928. 10.1016/0092-8674(87)90579-4View ArticleGoogle Scholar
  36. Koening M, Hoffman E, Bertelson CJ, Monaco AP, Feener C, Kunkel LM: Complete cloning of the Duchenne muscular dystrophy (DMD) cDNA and preliminary genomic organization of the DMD gene in normal and affected individuals. Cell 1987, 50(3):509-517. 10.1016/0092-8674(87)90504-6View ArticleGoogle Scholar
  37. Zhao P, Iezzi S, Carver E, et al.: Slug is a novel downstream target of MyoD. Temporal profiling in muscle regeneration. Journal of Biological Chemistry 2002, 277(33):30091-30101. 10.1074/jbc.M202668200View ArticleGoogle Scholar
  38. Fernandes RJ, Skiena SS: Microarray synthesis through multiple-use PCR primer design. Bioinformatics 2002, 18(1):S128-S135. 10.1093/bioinformatics/18.suppl_1.S128View ArticleGoogle Scholar
  39. Jaeger J, Weichenhan D, Ivandic B, Spang R: Early diagnostic marker panel determination for microarray based clinical studies. Statistical Applications in Genetics and Molecular Biology 2005., 4(1, article 9):Google Scholar
  40. Li W: How many genes are needed for early detection of breast cancer, based on gene expression patterns in peripheral blood cells? Breast Cancer Research 2005, 7(5):E5. 10.1186/bcr1295View ArticleGoogle Scholar
  41. Glas AM, Floore A, Delahaye LJMJ, et al.: Converting a breast cancer microarray signature into a high-throughput diagnostic test. BMC Genomics 2006, 7: 278. 10.1186/1471-2164-7-278View ArticleGoogle Scholar
  42. Duin RPW: Classifiers in almost empty spaces. Proceedings of the 15th International Conference on Pattern Recognition (ICPR '00), Barcelona, Spain, September 2000 2: 1-7.View ArticleGoogle Scholar
  43. Raudys SJ: Evolution and generalization of a single neurone—I: single-layer perceptron as seven statistical classifiers. Neural Networks 1998, 11(2):283-296. 10.1016/S0893-6080(97)00135-4View ArticleGoogle Scholar
  44. Raudys SJ: Evolution and generalization of a single neurone—II: complexity of statistical classifiers and sample size considerations. Neural Networks 1998, 11(2):297-313. 10.1016/S0893-6080(97)00136-6View ArticleGoogle Scholar
  45. Raudys SJ, Duin RPW: Expected classification error of the Fisher linear classifier with pseudo-inverse covariance matrix. Pattern Recognition Letters 1998, 19(5-6):385-392. 10.1016/S0167-8655(98)00016-6View ArticleMATHGoogle Scholar
  46. Vapnik VN: Statistical Learning Theory. John Wiley & Sons, New York, NY, USA; 1998.MATHGoogle Scholar
  47. Oja E: Subspace Methods of Pattern Recognition. John Wiley & Sons, New York, NY, USA; 1984.Google Scholar

Copyright

© Jianhua Xuan et al. 2007

This article is published under license to BioMed Central Ltd. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.