Updating a name tagger using contemporary unlabeled data

Cristina Mota, Ralph Grishman

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

For many NLP tasks, including named entity tagging, semi-supervised learning has been proposed as a reasonable alternative to methods that require annotating large amounts of training data. In this paper, we address the problem of analyzing new data given a semi-supervised NE tagger trained on data from an earlier time period. We will show that updating the unlabeled data is sufficient to maintain quality over time, and outperforms updating the labeled data. Furthermore, we will also show that augmenting the unlabeled data with older data in most cases does not result in better performance than simply using a smaller amount of current unlabeled data.

Original languageEnglish (US)
Title of host publicationACL-IJCNLP 2009 - Joint Conf. of the 47th Annual Meeting of the Association for Computational Linguistics and 4th Int. Joint Conf. on Natural Language Processing of the AFNLP, Proceedings of the Conf.
Pages353-356
Number of pages4
StatePublished - 2009
EventJoint Conference of the 47th Annual Meeting of the Association for Computational Linguistics and 4th International Joint Conference on Natural Language Processing of the AFNLP, ACL-IJCNLP 2009 - Suntec, Singapore
Duration: Aug 2 2009Aug 7 2009

Other

OtherJoint Conference of the 47th Annual Meeting of the Association for Computational Linguistics and 4th International Joint Conference on Natural Language Processing of the AFNLP, ACL-IJCNLP 2009
CountrySingapore
CitySuntec
Period8/2/098/7/09

Fingerprint

Names
Tag
Natural Language Processing
Entity
Tagging
learning
performance
time

ASJC Scopus subject areas

  • Language and Linguistics
  • Linguistics and Language

Cite this

Mota, C., & Grishman, R. (2009). Updating a name tagger using contemporary unlabeled data. In ACL-IJCNLP 2009 - Joint Conf. of the 47th Annual Meeting of the Association for Computational Linguistics and 4th Int. Joint Conf. on Natural Language Processing of the AFNLP, Proceedings of the Conf. (pp. 353-356)

Updating a name tagger using contemporary unlabeled data. / Mota, Cristina; Grishman, Ralph.

ACL-IJCNLP 2009 - Joint Conf. of the 47th Annual Meeting of the Association for Computational Linguistics and 4th Int. Joint Conf. on Natural Language Processing of the AFNLP, Proceedings of the Conf.. 2009. p. 353-356.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Mota, C & Grishman, R 2009, Updating a name tagger using contemporary unlabeled data. in ACL-IJCNLP 2009 - Joint Conf. of the 47th Annual Meeting of the Association for Computational Linguistics and 4th Int. Joint Conf. on Natural Language Processing of the AFNLP, Proceedings of the Conf.. pp. 353-356, Joint Conference of the 47th Annual Meeting of the Association for Computational Linguistics and 4th International Joint Conference on Natural Language Processing of the AFNLP, ACL-IJCNLP 2009, Suntec, Singapore, 8/2/09.
Mota C, Grishman R. Updating a name tagger using contemporary unlabeled data. In ACL-IJCNLP 2009 - Joint Conf. of the 47th Annual Meeting of the Association for Computational Linguistics and 4th Int. Joint Conf. on Natural Language Processing of the AFNLP, Proceedings of the Conf.. 2009. p. 353-356
Mota, Cristina ; Grishman, Ralph. / Updating a name tagger using contemporary unlabeled data. ACL-IJCNLP 2009 - Joint Conf. of the 47th Annual Meeting of the Association for Computational Linguistics and 4th Int. Joint Conf. on Natural Language Processing of the AFNLP, Proceedings of the Conf.. 2009. pp. 353-356
@inproceedings{e17d8cd139424333a872142bdceb8f46,
title = "Updating a name tagger using contemporary unlabeled data",
abstract = "For many NLP tasks, including named entity tagging, semi-supervised learning has been proposed as a reasonable alternative to methods that require annotating large amounts of training data. In this paper, we address the problem of analyzing new data given a semi-supervised NE tagger trained on data from an earlier time period. We will show that updating the unlabeled data is sufficient to maintain quality over time, and outperforms updating the labeled data. Furthermore, we will also show that augmenting the unlabeled data with older data in most cases does not result in better performance than simply using a smaller amount of current unlabeled data.",
author = "Cristina Mota and Ralph Grishman",
year = "2009",
language = "English (US)",
isbn = "9781617382581",
pages = "353--356",
booktitle = "ACL-IJCNLP 2009 - Joint Conf. of the 47th Annual Meeting of the Association for Computational Linguistics and 4th Int. Joint Conf. on Natural Language Processing of the AFNLP, Proceedings of the Conf.",

}

TY - GEN

T1 - Updating a name tagger using contemporary unlabeled data

AU - Mota, Cristina

AU - Grishman, Ralph

PY - 2009

Y1 - 2009

N2 - For many NLP tasks, including named entity tagging, semi-supervised learning has been proposed as a reasonable alternative to methods that require annotating large amounts of training data. In this paper, we address the problem of analyzing new data given a semi-supervised NE tagger trained on data from an earlier time period. We will show that updating the unlabeled data is sufficient to maintain quality over time, and outperforms updating the labeled data. Furthermore, we will also show that augmenting the unlabeled data with older data in most cases does not result in better performance than simply using a smaller amount of current unlabeled data.

AB - For many NLP tasks, including named entity tagging, semi-supervised learning has been proposed as a reasonable alternative to methods that require annotating large amounts of training data. In this paper, we address the problem of analyzing new data given a semi-supervised NE tagger trained on data from an earlier time period. We will show that updating the unlabeled data is sufficient to maintain quality over time, and outperforms updating the labeled data. Furthermore, we will also show that augmenting the unlabeled data with older data in most cases does not result in better performance than simply using a smaller amount of current unlabeled data.

UR - http://www.scopus.com/inward/record.url?scp=79952416941&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79952416941&partnerID=8YFLogxK

M3 - Conference contribution

SN - 9781617382581

SP - 353

EP - 356

BT - ACL-IJCNLP 2009 - Joint Conf. of the 47th Annual Meeting of the Association for Computational Linguistics and 4th Int. Joint Conf. on Natural Language Processing of the AFNLP, Proceedings of the Conf.

ER -