Application of neural networks to biological data mining: A case study in protein sequence classification

Jason T L Wang, Qicheng Ma, Dennis Shasha, Cathy H. Wu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Biological data mining aims to extract significant information from DNA, RNA and proteins. The significant information may refer to motifs, functional sites, clustering and classification rules. This paper presents an example of biological data mining: The classification of protein sequences using neural networks. We propose new techniques to extract features from protein data and use them in combination with the Bayesian neural network to classify protein sequences obtained from the PIR protein database maintained at the National Biomedical Research Foundation. To evaluate the performance of the proposed approach, we compare it with other protein classifiers built based on sequence alignment and machine learning methods. Experimental results show the high precision of the proposed classifier and the complementarity of the tools studied in the paper.

Original languageEnglish (US)
Title of host publicationProceeding of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
EditorsR. Ramakrishnan, S. Stolfo, R. Bayardo, I. Parsa, R. Ramakrishnan, S. Stolfo, R. Bayardo, I. Parsa
Pages305-309
Number of pages5
StatePublished - 2000
EventProceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2001) - Boston, MA, United States
Duration: Aug 20 2000Aug 23 2000

Other

OtherProceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2001)
CountryUnited States
CityBoston, MA
Period8/20/008/23/00

Fingerprint

Data mining
Neural networks
Proteins
Classifiers
RNA
Learning systems
DNA

Keywords

  • Bioinformatics
  • Biological data mining
  • Feature extraction from protein data
  • Machine learning
  • Neural networks
  • Sequence alignment

ASJC Scopus subject areas

  • Engineering(all)

Cite this

Wang, J. T. L., Ma, Q., Shasha, D., & Wu, C. H. (2000). Application of neural networks to biological data mining: A case study in protein sequence classification. In R. Ramakrishnan, S. Stolfo, R. Bayardo, I. Parsa, R. Ramakrishnan, S. Stolfo, R. Bayardo, ... I. Parsa (Eds.), Proceeding of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 305-309)

Application of neural networks to biological data mining : A case study in protein sequence classification. / Wang, Jason T L; Ma, Qicheng; Shasha, Dennis; Wu, Cathy H.

Proceeding of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ed. / R. Ramakrishnan; S. Stolfo; R. Bayardo; I. Parsa; R. Ramakrishnan; S. Stolfo; R. Bayardo; I. Parsa. 2000. p. 305-309.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Wang, JTL, Ma, Q, Shasha, D & Wu, CH 2000, Application of neural networks to biological data mining: A case study in protein sequence classification. in R Ramakrishnan, S Stolfo, R Bayardo, I Parsa, R Ramakrishnan, S Stolfo, R Bayardo & I Parsa (eds), Proceeding of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. pp. 305-309, Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2001), Boston, MA, United States, 8/20/00.
Wang JTL, Ma Q, Shasha D, Wu CH. Application of neural networks to biological data mining: A case study in protein sequence classification. In Ramakrishnan R, Stolfo S, Bayardo R, Parsa I, Ramakrishnan R, Stolfo S, Bayardo R, Parsa I, editors, Proceeding of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2000. p. 305-309
Wang, Jason T L ; Ma, Qicheng ; Shasha, Dennis ; Wu, Cathy H. / Application of neural networks to biological data mining : A case study in protein sequence classification. Proceeding of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. editor / R. Ramakrishnan ; S. Stolfo ; R. Bayardo ; I. Parsa ; R. Ramakrishnan ; S. Stolfo ; R. Bayardo ; I. Parsa. 2000. pp. 305-309
@inproceedings{410c16dfe28044128d6126299dcaa1c4,
title = "Application of neural networks to biological data mining: A case study in protein sequence classification",
abstract = "Biological data mining aims to extract significant information from DNA, RNA and proteins. The significant information may refer to motifs, functional sites, clustering and classification rules. This paper presents an example of biological data mining: The classification of protein sequences using neural networks. We propose new techniques to extract features from protein data and use them in combination with the Bayesian neural network to classify protein sequences obtained from the PIR protein database maintained at the National Biomedical Research Foundation. To evaluate the performance of the proposed approach, we compare it with other protein classifiers built based on sequence alignment and machine learning methods. Experimental results show the high precision of the proposed classifier and the complementarity of the tools studied in the paper.",
keywords = "Bioinformatics, Biological data mining, Feature extraction from protein data, Machine learning, Neural networks, Sequence alignment",
author = "Wang, {Jason T L} and Qicheng Ma and Dennis Shasha and Wu, {Cathy H.}",
year = "2000",
language = "English (US)",
isbn = "1581132336",
pages = "305--309",
editor = "R. Ramakrishnan and S. Stolfo and R. Bayardo and I. Parsa and R. Ramakrishnan and S. Stolfo and R. Bayardo and I. Parsa",
booktitle = "Proceeding of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining",

}

TY - GEN

T1 - Application of neural networks to biological data mining

T2 - A case study in protein sequence classification

AU - Wang, Jason T L

AU - Ma, Qicheng

AU - Shasha, Dennis

AU - Wu, Cathy H.

PY - 2000

Y1 - 2000

N2 - Biological data mining aims to extract significant information from DNA, RNA and proteins. The significant information may refer to motifs, functional sites, clustering and classification rules. This paper presents an example of biological data mining: The classification of protein sequences using neural networks. We propose new techniques to extract features from protein data and use them in combination with the Bayesian neural network to classify protein sequences obtained from the PIR protein database maintained at the National Biomedical Research Foundation. To evaluate the performance of the proposed approach, we compare it with other protein classifiers built based on sequence alignment and machine learning methods. Experimental results show the high precision of the proposed classifier and the complementarity of the tools studied in the paper.

AB - Biological data mining aims to extract significant information from DNA, RNA and proteins. The significant information may refer to motifs, functional sites, clustering and classification rules. This paper presents an example of biological data mining: The classification of protein sequences using neural networks. We propose new techniques to extract features from protein data and use them in combination with the Bayesian neural network to classify protein sequences obtained from the PIR protein database maintained at the National Biomedical Research Foundation. To evaluate the performance of the proposed approach, we compare it with other protein classifiers built based on sequence alignment and machine learning methods. Experimental results show the high precision of the proposed classifier and the complementarity of the tools studied in the paper.

KW - Bioinformatics

KW - Biological data mining

KW - Feature extraction from protein data

KW - Machine learning

KW - Neural networks

KW - Sequence alignment

UR - http://www.scopus.com/inward/record.url?scp=0034592803&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0034592803&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:0034592803

SN - 1581132336

SP - 305

EP - 309

BT - Proceeding of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

A2 - Ramakrishnan, R.

A2 - Stolfo, S.

A2 - Bayardo, R.

A2 - Parsa, I.

A2 - Ramakrishnan, R.

A2 - Stolfo, S.

A2 - Bayardo, R.

A2 - Parsa, I.

ER -