New techniques for DNA sequence classification

Jason T L Wang, Steve Rozen, Bruce A. Shapiro, Dennis Shasha, Zhiyuan Wang, Maisheng Yin

Research output: Contribution to journalArticle

Abstract

DNA sequence classification is the activity of determining whether or not an unlabeled sequence S belongs to an existing class C. This paper proposes two new techniques for DNA sequence classification. The first technique works by comparing the unlabeled sequence S with a group of active motifs discovered from the elements of C and by distinction with elements outside of C. The second technique generates and matches gapped fingerprints of S with elements of C. Experimental results obtained by running these algorithms on long and well conserved Alu sequences demonstrate the good performance of the presented methods compared with FASTA. When applied to less conserved and relatively short functional sites such as splice- junctions, a variation of the second technique combining fingerprinting with consensus sequence analysis gives better results than the current classifiers employing text compression and machine learning algorithms.

Original languageEnglish (US)
Pages (from-to)209-218
Number of pages10
JournalJournal of Computational Biology
Volume6
Issue number2
StatePublished - Jun 1999

Fingerprint

DNA sequences
DNA Sequence
Conserved Sequence
Consensus Sequence
Dermatoglyphics
Learning algorithms
Sequence Analysis
Learning systems
Classifiers
Text Compression
Fingerprinting
Fingerprint
Learning Algorithm
Machine Learning
Classifier
Experimental Results
Demonstrate

Keywords

  • Algorithms
  • Consensus sequence
  • DNA sequence recognition
  • Pattern matching
  • Tools for computational biology

ASJC Scopus subject areas

  • Molecular Biology
  • Genetics

Cite this

Wang, J. T. L., Rozen, S., Shapiro, B. A., Shasha, D., Wang, Z., & Yin, M. (1999). New techniques for DNA sequence classification. Journal of Computational Biology, 6(2), 209-218.

New techniques for DNA sequence classification. / Wang, Jason T L; Rozen, Steve; Shapiro, Bruce A.; Shasha, Dennis; Wang, Zhiyuan; Yin, Maisheng.

In: Journal of Computational Biology, Vol. 6, No. 2, 06.1999, p. 209-218.

Research output: Contribution to journalArticle

Wang, JTL, Rozen, S, Shapiro, BA, Shasha, D, Wang, Z & Yin, M 1999, 'New techniques for DNA sequence classification', Journal of Computational Biology, vol. 6, no. 2, pp. 209-218.
Wang JTL, Rozen S, Shapiro BA, Shasha D, Wang Z, Yin M. New techniques for DNA sequence classification. Journal of Computational Biology. 1999 Jun;6(2):209-218.
Wang, Jason T L ; Rozen, Steve ; Shapiro, Bruce A. ; Shasha, Dennis ; Wang, Zhiyuan ; Yin, Maisheng. / New techniques for DNA sequence classification. In: Journal of Computational Biology. 1999 ; Vol. 6, No. 2. pp. 209-218.
@article{6cdfc2022acb49a5941b2ed7a147d5e1,
title = "New techniques for DNA sequence classification",
abstract = "DNA sequence classification is the activity of determining whether or not an unlabeled sequence S belongs to an existing class C. This paper proposes two new techniques for DNA sequence classification. The first technique works by comparing the unlabeled sequence S with a group of active motifs discovered from the elements of C and by distinction with elements outside of C. The second technique generates and matches gapped fingerprints of S with elements of C. Experimental results obtained by running these algorithms on long and well conserved Alu sequences demonstrate the good performance of the presented methods compared with FASTA. When applied to less conserved and relatively short functional sites such as splice- junctions, a variation of the second technique combining fingerprinting with consensus sequence analysis gives better results than the current classifiers employing text compression and machine learning algorithms.",
keywords = "Algorithms, Consensus sequence, DNA sequence recognition, Pattern matching, Tools for computational biology",
author = "Wang, {Jason T L} and Steve Rozen and Shapiro, {Bruce A.} and Dennis Shasha and Zhiyuan Wang and Maisheng Yin",
year = "1999",
month = "6",
language = "English (US)",
volume = "6",
pages = "209--218",
journal = "Journal of Computational Biology",
issn = "1066-5277",
publisher = "Mary Ann Liebert Inc.",
number = "2",

}

TY - JOUR

T1 - New techniques for DNA sequence classification

AU - Wang, Jason T L

AU - Rozen, Steve

AU - Shapiro, Bruce A.

AU - Shasha, Dennis

AU - Wang, Zhiyuan

AU - Yin, Maisheng

PY - 1999/6

Y1 - 1999/6

N2 - DNA sequence classification is the activity of determining whether or not an unlabeled sequence S belongs to an existing class C. This paper proposes two new techniques for DNA sequence classification. The first technique works by comparing the unlabeled sequence S with a group of active motifs discovered from the elements of C and by distinction with elements outside of C. The second technique generates and matches gapped fingerprints of S with elements of C. Experimental results obtained by running these algorithms on long and well conserved Alu sequences demonstrate the good performance of the presented methods compared with FASTA. When applied to less conserved and relatively short functional sites such as splice- junctions, a variation of the second technique combining fingerprinting with consensus sequence analysis gives better results than the current classifiers employing text compression and machine learning algorithms.

AB - DNA sequence classification is the activity of determining whether or not an unlabeled sequence S belongs to an existing class C. This paper proposes two new techniques for DNA sequence classification. The first technique works by comparing the unlabeled sequence S with a group of active motifs discovered from the elements of C and by distinction with elements outside of C. The second technique generates and matches gapped fingerprints of S with elements of C. Experimental results obtained by running these algorithms on long and well conserved Alu sequences demonstrate the good performance of the presented methods compared with FASTA. When applied to less conserved and relatively short functional sites such as splice- junctions, a variation of the second technique combining fingerprinting with consensus sequence analysis gives better results than the current classifiers employing text compression and machine learning algorithms.

KW - Algorithms

KW - Consensus sequence

KW - DNA sequence recognition

KW - Pattern matching

KW - Tools for computational biology

UR - http://www.scopus.com/inward/record.url?scp=0033052632&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0033052632&partnerID=8YFLogxK

M3 - Article

VL - 6

SP - 209

EP - 218

JO - Journal of Computational Biology

JF - Journal of Computational Biology

SN - 1066-5277

IS - 2

ER -