Using zero-resource spoken term discovery for ranked retrieval

Jerome White, Douglas W. Oard, Jiaul Paik, Rashmi Sankepally, Aren Jansen

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Research on ranked retrieval of spoken content has assumed the existence of some automated (word or phonetic) transcription. Recently, however, methods have been demonstrated for matching spoken terms to spoken content without the need for language-tuned transcription. This paper describes the first application of such techniques to ranked retrieval, evaluated using a newly created test collection. Both the queries and the collection to be searched are based on Gujarati produced naturally by native speakers; relevance assessment was performed by other native speakers of Gujarati. Ranked retrieval is based on fast acoustic matching that identifies a deeply nested set of matching speech regions, coupled with ways of combining evidence from those matching regions. Results indicate that the resulting ranked lists may be useful for some practical similarity-based ranking tasks.

Original languageEnglish (US)
Title of host publicationNAACL HLT 2015 - 2015 Conference of the North American Chapter of the Association for Computational Linguistics
Subtitle of host publicationHuman Language Technologies, Proceedings of the Conference
PublisherAssociation for Computational Linguistics (ACL)
Pages588-597
Number of pages10
ISBN (Electronic)9781941643495
StatePublished - Jan 1 2015
EventConference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2015 - Denver, United States
Duration: May 31 2015Jun 5 2015

Other

OtherConference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2015
CountryUnited States
CityDenver
Period5/31/156/5/15

Fingerprint

Transcription
Speech analysis
resources
Acoustics
phonetics
acoustics
ranking
Resources
language
evidence
Native Speaker

ASJC Scopus subject areas

  • Computer Science Applications
  • Language and Linguistics
  • Linguistics and Language

Cite this

White, J., Oard, D. W., Paik, J., Sankepally, R., & Jansen, A. (2015). Using zero-resource spoken term discovery for ranked retrieval. In NAACL HLT 2015 - 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference (pp. 588-597). Association for Computational Linguistics (ACL).

Using zero-resource spoken term discovery for ranked retrieval. / White, Jerome; Oard, Douglas W.; Paik, Jiaul; Sankepally, Rashmi; Jansen, Aren.

NAACL HLT 2015 - 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference. Association for Computational Linguistics (ACL), 2015. p. 588-597.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

White, J, Oard, DW, Paik, J, Sankepally, R & Jansen, A 2015, Using zero-resource spoken term discovery for ranked retrieval. in NAACL HLT 2015 - 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference. Association for Computational Linguistics (ACL), pp. 588-597, Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2015, Denver, United States, 5/31/15.
White J, Oard DW, Paik J, Sankepally R, Jansen A. Using zero-resource spoken term discovery for ranked retrieval. In NAACL HLT 2015 - 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference. Association for Computational Linguistics (ACL). 2015. p. 588-597
White, Jerome ; Oard, Douglas W. ; Paik, Jiaul ; Sankepally, Rashmi ; Jansen, Aren. / Using zero-resource spoken term discovery for ranked retrieval. NAACL HLT 2015 - 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference. Association for Computational Linguistics (ACL), 2015. pp. 588-597
@inproceedings{734fa958a68d45a39621ccbf8f738df8,
title = "Using zero-resource spoken term discovery for ranked retrieval",
abstract = "Research on ranked retrieval of spoken content has assumed the existence of some automated (word or phonetic) transcription. Recently, however, methods have been demonstrated for matching spoken terms to spoken content without the need for language-tuned transcription. This paper describes the first application of such techniques to ranked retrieval, evaluated using a newly created test collection. Both the queries and the collection to be searched are based on Gujarati produced naturally by native speakers; relevance assessment was performed by other native speakers of Gujarati. Ranked retrieval is based on fast acoustic matching that identifies a deeply nested set of matching speech regions, coupled with ways of combining evidence from those matching regions. Results indicate that the resulting ranked lists may be useful for some practical similarity-based ranking tasks.",
author = "Jerome White and Oard, {Douglas W.} and Jiaul Paik and Rashmi Sankepally and Aren Jansen",
year = "2015",
month = "1",
day = "1",
language = "English (US)",
pages = "588--597",
booktitle = "NAACL HLT 2015 - 2015 Conference of the North American Chapter of the Association for Computational Linguistics",
publisher = "Association for Computational Linguistics (ACL)",

}

TY - GEN

T1 - Using zero-resource spoken term discovery for ranked retrieval

AU - White, Jerome

AU - Oard, Douglas W.

AU - Paik, Jiaul

AU - Sankepally, Rashmi

AU - Jansen, Aren

PY - 2015/1/1

Y1 - 2015/1/1

N2 - Research on ranked retrieval of spoken content has assumed the existence of some automated (word or phonetic) transcription. Recently, however, methods have been demonstrated for matching spoken terms to spoken content without the need for language-tuned transcription. This paper describes the first application of such techniques to ranked retrieval, evaluated using a newly created test collection. Both the queries and the collection to be searched are based on Gujarati produced naturally by native speakers; relevance assessment was performed by other native speakers of Gujarati. Ranked retrieval is based on fast acoustic matching that identifies a deeply nested set of matching speech regions, coupled with ways of combining evidence from those matching regions. Results indicate that the resulting ranked lists may be useful for some practical similarity-based ranking tasks.

AB - Research on ranked retrieval of spoken content has assumed the existence of some automated (word or phonetic) transcription. Recently, however, methods have been demonstrated for matching spoken terms to spoken content without the need for language-tuned transcription. This paper describes the first application of such techniques to ranked retrieval, evaluated using a newly created test collection. Both the queries and the collection to be searched are based on Gujarati produced naturally by native speakers; relevance assessment was performed by other native speakers of Gujarati. Ranked retrieval is based on fast acoustic matching that identifies a deeply nested set of matching speech regions, coupled with ways of combining evidence from those matching regions. Results indicate that the resulting ranked lists may be useful for some practical similarity-based ranking tasks.

UR - http://www.scopus.com/inward/record.url?scp=84960124463&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84960124463&partnerID=8YFLogxK

M3 - Conference contribution

SP - 588

EP - 597

BT - NAACL HLT 2015 - 2015 Conference of the North American Chapter of the Association for Computational Linguistics

PB - Association for Computational Linguistics (ACL)

ER -