Automatic topic identification and classification of text messages in the SMSAll system

Fahad Pervaiz, Lakshminarayanan Subramanian, Umar Saif

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper presents a way to identify topics and classify text messages in the SMSAll system, which is the Twitter of Pakistan (except over SMS). Among many challenges, one is to develop an unsupervised algorithm for text messages containing Urdu-English words written in roman letters. Still in 1-gram we are able to have 72%, 53% and 58% true positives for popular, medium and rare topics respectively and 48% and 40% true positives in 2 and 3-grams respectively.

Original languageEnglish (US)
Title of host publicationProceedings of the 2nd ACM Symposium on Computing for Development, DEV 2012
DOIs
StatePublished - 2012
Event2nd ACM Symposium on Computing for Development, DEV 2012 - Atlanta, GA, United States
Duration: Mar 11 2012Mar 12 2012

Other

Other2nd ACM Symposium on Computing for Development, DEV 2012
CountryUnited States
CityAtlanta, GA
Period3/11/123/12/12

Fingerprint

Classify
Text

Keywords

  • classification
  • SMS messages
  • SMSAll
  • topic identification

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Applied Mathematics

Cite this

Pervaiz, F., Subramanian, L., & Saif, U. (2012). Automatic topic identification and classification of text messages in the SMSAll system. In Proceedings of the 2nd ACM Symposium on Computing for Development, DEV 2012 https://doi.org/10.1145/2160601.2160626

Automatic topic identification and classification of text messages in the SMSAll system. / Pervaiz, Fahad; Subramanian, Lakshminarayanan; Saif, Umar.

Proceedings of the 2nd ACM Symposium on Computing for Development, DEV 2012. 2012.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Pervaiz, F, Subramanian, L & Saif, U 2012, Automatic topic identification and classification of text messages in the SMSAll system. in Proceedings of the 2nd ACM Symposium on Computing for Development, DEV 2012. 2nd ACM Symposium on Computing for Development, DEV 2012, Atlanta, GA, United States, 3/11/12. https://doi.org/10.1145/2160601.2160626
Pervaiz F, Subramanian L, Saif U. Automatic topic identification and classification of text messages in the SMSAll system. In Proceedings of the 2nd ACM Symposium on Computing for Development, DEV 2012. 2012 https://doi.org/10.1145/2160601.2160626
Pervaiz, Fahad ; Subramanian, Lakshminarayanan ; Saif, Umar. / Automatic topic identification and classification of text messages in the SMSAll system. Proceedings of the 2nd ACM Symposium on Computing for Development, DEV 2012. 2012.
@inproceedings{b357c0f3971644b1b946e2807a387a53,
title = "Automatic topic identification and classification of text messages in the SMSAll system",
abstract = "This paper presents a way to identify topics and classify text messages in the SMSAll system, which is the Twitter of Pakistan (except over SMS). Among many challenges, one is to develop an unsupervised algorithm for text messages containing Urdu-English words written in roman letters. Still in 1-gram we are able to have 72{\%}, 53{\%} and 58{\%} true positives for popular, medium and rare topics respectively and 48{\%} and 40{\%} true positives in 2 and 3-grams respectively.",
keywords = "classification, SMS messages, SMSAll, topic identification",
author = "Fahad Pervaiz and Lakshminarayanan Subramanian and Umar Saif",
year = "2012",
doi = "10.1145/2160601.2160626",
language = "English (US)",
isbn = "9781450312622",
booktitle = "Proceedings of the 2nd ACM Symposium on Computing for Development, DEV 2012",

}

TY - GEN

T1 - Automatic topic identification and classification of text messages in the SMSAll system

AU - Pervaiz, Fahad

AU - Subramanian, Lakshminarayanan

AU - Saif, Umar

PY - 2012

Y1 - 2012

N2 - This paper presents a way to identify topics and classify text messages in the SMSAll system, which is the Twitter of Pakistan (except over SMS). Among many challenges, one is to develop an unsupervised algorithm for text messages containing Urdu-English words written in roman letters. Still in 1-gram we are able to have 72%, 53% and 58% true positives for popular, medium and rare topics respectively and 48% and 40% true positives in 2 and 3-grams respectively.

AB - This paper presents a way to identify topics and classify text messages in the SMSAll system, which is the Twitter of Pakistan (except over SMS). Among many challenges, one is to develop an unsupervised algorithm for text messages containing Urdu-English words written in roman letters. Still in 1-gram we are able to have 72%, 53% and 58% true positives for popular, medium and rare topics respectively and 48% and 40% true positives in 2 and 3-grams respectively.

KW - classification

KW - SMS messages

KW - SMSAll

KW - topic identification

UR - http://www.scopus.com/inward/record.url?scp=84889717123&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84889717123&partnerID=8YFLogxK

U2 - 10.1145/2160601.2160626

DO - 10.1145/2160601.2160626

M3 - Conference contribution

AN - SCOPUS:84889717123

SN - 9781450312622

BT - Proceedings of the 2nd ACM Symposium on Computing for Development, DEV 2012

ER -