Vapor engine: Demonstrating an early prototype of a language-independent search engine for speech

Douglas W. Oard, Rashmi Sankepally, Jerome White, Craig Harman

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Typical search engines for spoken content begin with some form of language-specific audio processing such as phonetic word recognition. Many languages, however, lack the language tuned preprocessing tools that are needed to create indexing terms for speech. One approach in such cases is to rely on repetition, detected using acoustic features, to find terms that might be worth indexing. Experiments have shown that this approach yields term sets that might be sufficient for some applications in both spoken term detection and ranked retrieval experiments. Such approaches currently work only with spoken queries, however, and only when the searcher is able to speak in a manner similar to that of the speakers in the collection. This demonstration paper proposes Vapor Engine, a new tool for selectively transcribing repeated terms that can be automatically detected from spoken content in any language. These transcribed terms could then be matched to queries formulated using written terms. Vapor Engine is early in development: it currently supports only single-term queries and has not yet having been formally evaluated. This paper introduces the interface and summarizes the challenges it seeks to address.

Original languageEnglish (US)
Title of host publicationCHIIR 2016 - Proceedings of the 2016 ACM Conference on Human Information Interaction and Retrieval
PublisherAssociation for Computing Machinery, Inc
Pages301-304
Number of pages4
ISBN (Electronic)9781450337519
DOIs
StatePublished - Mar 13 2016
EventACM Conference on Human Information Interaction and Retrieval, CHIIR 2016 - Carrboro, United States
Duration: Mar 13 2016Mar 17 2016

Other

OtherACM Conference on Human Information Interaction and Retrieval, CHIIR 2016
CountryUnited States
CityCarrboro
Period3/13/163/17/16

Fingerprint

Search engines
Vapors
Engines
Speech analysis
Demonstrations
Acoustics
Experiments
Processing

ASJC Scopus subject areas

  • Human-Computer Interaction
  • Information Systems

Cite this

Oard, D. W., Sankepally, R., White, J., & Harman, C. (2016). Vapor engine: Demonstrating an early prototype of a language-independent search engine for speech. In CHIIR 2016 - Proceedings of the 2016 ACM Conference on Human Information Interaction and Retrieval (pp. 301-304). Association for Computing Machinery, Inc. https://doi.org/10.1145/2854946.2854987

Vapor engine : Demonstrating an early prototype of a language-independent search engine for speech. / Oard, Douglas W.; Sankepally, Rashmi; White, Jerome; Harman, Craig.

CHIIR 2016 - Proceedings of the 2016 ACM Conference on Human Information Interaction and Retrieval. Association for Computing Machinery, Inc, 2016. p. 301-304.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Oard, DW, Sankepally, R, White, J & Harman, C 2016, Vapor engine: Demonstrating an early prototype of a language-independent search engine for speech. in CHIIR 2016 - Proceedings of the 2016 ACM Conference on Human Information Interaction and Retrieval. Association for Computing Machinery, Inc, pp. 301-304, ACM Conference on Human Information Interaction and Retrieval, CHIIR 2016, Carrboro, United States, 3/13/16. https://doi.org/10.1145/2854946.2854987
Oard DW, Sankepally R, White J, Harman C. Vapor engine: Demonstrating an early prototype of a language-independent search engine for speech. In CHIIR 2016 - Proceedings of the 2016 ACM Conference on Human Information Interaction and Retrieval. Association for Computing Machinery, Inc. 2016. p. 301-304 https://doi.org/10.1145/2854946.2854987
Oard, Douglas W. ; Sankepally, Rashmi ; White, Jerome ; Harman, Craig. / Vapor engine : Demonstrating an early prototype of a language-independent search engine for speech. CHIIR 2016 - Proceedings of the 2016 ACM Conference on Human Information Interaction and Retrieval. Association for Computing Machinery, Inc, 2016. pp. 301-304
@inproceedings{9741f37ff7244c8f9e190a0fd7772cf7,
title = "Vapor engine: Demonstrating an early prototype of a language-independent search engine for speech",
abstract = "Typical search engines for spoken content begin with some form of language-specific audio processing such as phonetic word recognition. Many languages, however, lack the language tuned preprocessing tools that are needed to create indexing terms for speech. One approach in such cases is to rely on repetition, detected using acoustic features, to find terms that might be worth indexing. Experiments have shown that this approach yields term sets that might be sufficient for some applications in both spoken term detection and ranked retrieval experiments. Such approaches currently work only with spoken queries, however, and only when the searcher is able to speak in a manner similar to that of the speakers in the collection. This demonstration paper proposes Vapor Engine, a new tool for selectively transcribing repeated terms that can be automatically detected from spoken content in any language. These transcribed terms could then be matched to queries formulated using written terms. Vapor Engine is early in development: it currently supports only single-term queries and has not yet having been formally evaluated. This paper introduces the interface and summarizes the challenges it seeks to address.",
author = "Oard, {Douglas W.} and Rashmi Sankepally and Jerome White and Craig Harman",
year = "2016",
month = "3",
day = "13",
doi = "10.1145/2854946.2854987",
language = "English (US)",
pages = "301--304",
booktitle = "CHIIR 2016 - Proceedings of the 2016 ACM Conference on Human Information Interaction and Retrieval",
publisher = "Association for Computing Machinery, Inc",

}

TY - GEN

T1 - Vapor engine

T2 - Demonstrating an early prototype of a language-independent search engine for speech

AU - Oard, Douglas W.

AU - Sankepally, Rashmi

AU - White, Jerome

AU - Harman, Craig

PY - 2016/3/13

Y1 - 2016/3/13

N2 - Typical search engines for spoken content begin with some form of language-specific audio processing such as phonetic word recognition. Many languages, however, lack the language tuned preprocessing tools that are needed to create indexing terms for speech. One approach in such cases is to rely on repetition, detected using acoustic features, to find terms that might be worth indexing. Experiments have shown that this approach yields term sets that might be sufficient for some applications in both spoken term detection and ranked retrieval experiments. Such approaches currently work only with spoken queries, however, and only when the searcher is able to speak in a manner similar to that of the speakers in the collection. This demonstration paper proposes Vapor Engine, a new tool for selectively transcribing repeated terms that can be automatically detected from spoken content in any language. These transcribed terms could then be matched to queries formulated using written terms. Vapor Engine is early in development: it currently supports only single-term queries and has not yet having been formally evaluated. This paper introduces the interface and summarizes the challenges it seeks to address.

AB - Typical search engines for spoken content begin with some form of language-specific audio processing such as phonetic word recognition. Many languages, however, lack the language tuned preprocessing tools that are needed to create indexing terms for speech. One approach in such cases is to rely on repetition, detected using acoustic features, to find terms that might be worth indexing. Experiments have shown that this approach yields term sets that might be sufficient for some applications in both spoken term detection and ranked retrieval experiments. Such approaches currently work only with spoken queries, however, and only when the searcher is able to speak in a manner similar to that of the speakers in the collection. This demonstration paper proposes Vapor Engine, a new tool for selectively transcribing repeated terms that can be automatically detected from spoken content in any language. These transcribed terms could then be matched to queries formulated using written terms. Vapor Engine is early in development: it currently supports only single-term queries and has not yet having been formally evaluated. This paper introduces the interface and summarizes the challenges it seeks to address.

UR - http://www.scopus.com/inward/record.url?scp=84974530709&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84974530709&partnerID=8YFLogxK

U2 - 10.1145/2854946.2854987

DO - 10.1145/2854946.2854987

M3 - Conference contribution

SP - 301

EP - 304

BT - CHIIR 2016 - Proceedings of the 2016 ACM Conference on Human Information Interaction and Retrieval

PB - Association for Computing Machinery, Inc

ER -