Learning languages with rational kernels

Corinna Cortes, Leonid Kontorovich, Mehryar Mohri

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We present a general study of learning and linear separability with rational kernels, the sequence kernels commonly used in computational biology and natural language processing. We give a characterization of the class of all languages linearly separable with rational kernels and prove several properties of the class of languages linearly separable with a fixed rational kernel. In particular, we show that for kernels with transducer values in a finite set, these languages are necessarily finite Boolean combinations of preimages by a transducer of a single sequence. We also analyze the margin properties of linear separation with rational kernels and show that kernels with transducer values in a finite set guarantee a positive margin and lead to better learning guarantees. Creating a rational kernel with values in a finite set is often non-trivial even for relatively simple cases. However, we present a novel and general algorithm, double-tape disambiguation, that takes as input a transducer mapping sequences to sequence features, and yields an associated transducer that defines a finite range rational kernel. We describe the algorithm in detail and show its application to several cases of interest.

Original languageEnglish (US)
Title of host publicationLearning Theory - 20th Annual Conference on Learning Theory, COLT 2007, Proceedings
Pages349-364
Number of pages16
StatePublished - Dec 1 2007
Event20th Annual Conference on Learning Theory, COLT 2007 - San Diego, CA, United States
Duration: Jun 13 2007Jun 15 2007

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume4539 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other20th Annual Conference on Learning Theory, COLT 2007
CountryUnited States
CitySan Diego, CA
Period6/13/076/15/07

    Fingerprint

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

Cortes, C., Kontorovich, L., & Mohri, M. (2007). Learning languages with rational kernels. In Learning Theory - 20th Annual Conference on Learning Theory, COLT 2007, Proceedings (pp. 349-364). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 4539 LNAI).