Rational kernels: Theory and algorithms

Corinna Cortes, Patrick Haffner, Mehryar Mohri

Research output: Contribution to journalArticle

Abstract

Many classification algorithms were originally designed for fixed-size vectors. Recent applications in text and speech processing and computational biology require however the analysis of variablelength sequences and more generally weighted automata. An approach widely used in statistical learning techniques such as Support Vector Machines (SVMs) is that of kernel methods, due to their computational efficiency in high-dimensional feature spaces. We introduce a general family of kernels based on weighted transducers or rational relations, rational kernels, that extend kernel methods to the analysis of variable-length sequences or more generally weighted automata. We show that rational kernels can be computed efficiently using a general algorithm of composition of weighted transducers and a general single-source shortest-distance algorithm. Not all rational kernels are positive definite and symmetric (PDS), or equivalently verify the Mercer condition, a condition that guarantees the convergence of training for discriminant classification algorithms such as SVMs. We present several theoretical results related to PDS rational kernels. We show that under some general conditions these kernels are closed under sum, product, or Kleene-closure and give a general method for constructing a PDS rational kernel from an arbitrary transducer defined on some non-idempotent semirings. We give the proof of several characterization results that can be used to guide the design of PDS rational kernels. We also show that some commonly used string kernels or similarity measures such as the edit-distance, the convolution kernels of Haussler, and some string kernels used in the context of computational biology are specific instances of rational kernels. Our results include the proof that the edit-distance over a non-trivial alphabet is not negative definite, which, to the best of our knowledge, was never stated or proved before. Rational kernels can be combined with SVMs to form efficient and powerful techniques for a variety of classification tasks in text and speech processing, or computational biology. We describe examples of general families of PDS rational kernels that are useful in many of these applications and report the result of our experiments illustrating the use of rational kernels in several difficult large-vocabulary spoken-dialog classification tasks based on deployed spoken-dialog systems. Our results show that rational kernels are easy to design and implement and lead to substantial improvements of the classification accuracy.

Original languageEnglish (US)
Pages (from-to)1035-1062
Number of pages28
JournalJournal of Machine Learning Research
Volume5
StatePublished - Aug 1 2004

Fingerprint

kernel
Text processing
Support vector machines
Transducers
Speech processing
Positive definite
Computational Biology
Computational efficiency
Convolution
Transducer
Weighted Automata
Text Processing
Speech Processing
Support Vector Machine
Edit Distance
Kernel Methods
Classification Algorithm
Chemical analysis
Strings
Spoken Dialogue Systems

ASJC Scopus subject areas

  • Artificial Intelligence
  • Software
  • Control and Systems Engineering
  • Statistics and Probability

Cite this

Rational kernels : Theory and algorithms. / Cortes, Corinna; Haffner, Patrick; Mohri, Mehryar.

In: Journal of Machine Learning Research, Vol. 5, 01.08.2004, p. 1035-1062.

Research output: Contribution to journalArticle

Cortes, Corinna ; Haffner, Patrick ; Mohri, Mehryar. / Rational kernels : Theory and algorithms. In: Journal of Machine Learning Research. 2004 ; Vol. 5. pp. 1035-1062.
@article{d3e3bf7a9c764cfbbe8920265e19d1eb,
title = "Rational kernels: Theory and algorithms",
abstract = "Many classification algorithms were originally designed for fixed-size vectors. Recent applications in text and speech processing and computational biology require however the analysis of variablelength sequences and more generally weighted automata. An approach widely used in statistical learning techniques such as Support Vector Machines (SVMs) is that of kernel methods, due to their computational efficiency in high-dimensional feature spaces. We introduce a general family of kernels based on weighted transducers or rational relations, rational kernels, that extend kernel methods to the analysis of variable-length sequences or more generally weighted automata. We show that rational kernels can be computed efficiently using a general algorithm of composition of weighted transducers and a general single-source shortest-distance algorithm. Not all rational kernels are positive definite and symmetric (PDS), or equivalently verify the Mercer condition, a condition that guarantees the convergence of training for discriminant classification algorithms such as SVMs. We present several theoretical results related to PDS rational kernels. We show that under some general conditions these kernels are closed under sum, product, or Kleene-closure and give a general method for constructing a PDS rational kernel from an arbitrary transducer defined on some non-idempotent semirings. We give the proof of several characterization results that can be used to guide the design of PDS rational kernels. We also show that some commonly used string kernels or similarity measures such as the edit-distance, the convolution kernels of Haussler, and some string kernels used in the context of computational biology are specific instances of rational kernels. Our results include the proof that the edit-distance over a non-trivial alphabet is not negative definite, which, to the best of our knowledge, was never stated or proved before. Rational kernels can be combined with SVMs to form efficient and powerful techniques for a variety of classification tasks in text and speech processing, or computational biology. We describe examples of general families of PDS rational kernels that are useful in many of these applications and report the result of our experiments illustrating the use of rational kernels in several difficult large-vocabulary spoken-dialog classification tasks based on deployed spoken-dialog systems. Our results show that rational kernels are easy to design and implement and lead to substantial improvements of the classification accuracy.",
author = "Corinna Cortes and Patrick Haffner and Mehryar Mohri",
year = "2004",
month = "8",
day = "1",
language = "English (US)",
volume = "5",
pages = "1035--1062",
journal = "Journal of Machine Learning Research",
issn = "1532-4435",
publisher = "Microtome Publishing",

}

TY - JOUR

T1 - Rational kernels

T2 - Theory and algorithms

AU - Cortes, Corinna

AU - Haffner, Patrick

AU - Mohri, Mehryar

PY - 2004/8/1

Y1 - 2004/8/1

N2 - Many classification algorithms were originally designed for fixed-size vectors. Recent applications in text and speech processing and computational biology require however the analysis of variablelength sequences and more generally weighted automata. An approach widely used in statistical learning techniques such as Support Vector Machines (SVMs) is that of kernel methods, due to their computational efficiency in high-dimensional feature spaces. We introduce a general family of kernels based on weighted transducers or rational relations, rational kernels, that extend kernel methods to the analysis of variable-length sequences or more generally weighted automata. We show that rational kernels can be computed efficiently using a general algorithm of composition of weighted transducers and a general single-source shortest-distance algorithm. Not all rational kernels are positive definite and symmetric (PDS), or equivalently verify the Mercer condition, a condition that guarantees the convergence of training for discriminant classification algorithms such as SVMs. We present several theoretical results related to PDS rational kernels. We show that under some general conditions these kernels are closed under sum, product, or Kleene-closure and give a general method for constructing a PDS rational kernel from an arbitrary transducer defined on some non-idempotent semirings. We give the proof of several characterization results that can be used to guide the design of PDS rational kernels. We also show that some commonly used string kernels or similarity measures such as the edit-distance, the convolution kernels of Haussler, and some string kernels used in the context of computational biology are specific instances of rational kernels. Our results include the proof that the edit-distance over a non-trivial alphabet is not negative definite, which, to the best of our knowledge, was never stated or proved before. Rational kernels can be combined with SVMs to form efficient and powerful techniques for a variety of classification tasks in text and speech processing, or computational biology. We describe examples of general families of PDS rational kernels that are useful in many of these applications and report the result of our experiments illustrating the use of rational kernels in several difficult large-vocabulary spoken-dialog classification tasks based on deployed spoken-dialog systems. Our results show that rational kernels are easy to design and implement and lead to substantial improvements of the classification accuracy.

AB - Many classification algorithms were originally designed for fixed-size vectors. Recent applications in text and speech processing and computational biology require however the analysis of variablelength sequences and more generally weighted automata. An approach widely used in statistical learning techniques such as Support Vector Machines (SVMs) is that of kernel methods, due to their computational efficiency in high-dimensional feature spaces. We introduce a general family of kernels based on weighted transducers or rational relations, rational kernels, that extend kernel methods to the analysis of variable-length sequences or more generally weighted automata. We show that rational kernels can be computed efficiently using a general algorithm of composition of weighted transducers and a general single-source shortest-distance algorithm. Not all rational kernels are positive definite and symmetric (PDS), or equivalently verify the Mercer condition, a condition that guarantees the convergence of training for discriminant classification algorithms such as SVMs. We present several theoretical results related to PDS rational kernels. We show that under some general conditions these kernels are closed under sum, product, or Kleene-closure and give a general method for constructing a PDS rational kernel from an arbitrary transducer defined on some non-idempotent semirings. We give the proof of several characterization results that can be used to guide the design of PDS rational kernels. We also show that some commonly used string kernels or similarity measures such as the edit-distance, the convolution kernels of Haussler, and some string kernels used in the context of computational biology are specific instances of rational kernels. Our results include the proof that the edit-distance over a non-trivial alphabet is not negative definite, which, to the best of our knowledge, was never stated or proved before. Rational kernels can be combined with SVMs to form efficient and powerful techniques for a variety of classification tasks in text and speech processing, or computational biology. We describe examples of general families of PDS rational kernels that are useful in many of these applications and report the result of our experiments illustrating the use of rational kernels in several difficult large-vocabulary spoken-dialog classification tasks based on deployed spoken-dialog systems. Our results show that rational kernels are easy to design and implement and lead to substantial improvements of the classification accuracy.

UR - http://www.scopus.com/inward/record.url?scp=84925661323&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84925661323&partnerID=8YFLogxK

M3 - Article

AN - SCOPUS:84925661323

VL - 5

SP - 1035

EP - 1062

JO - Journal of Machine Learning Research

JF - Journal of Machine Learning Research

SN - 1532-4435

ER -