What's in a name? A method for extracting information about ethnicity from names

Research output: Contribution to journalArticle

Abstract

Questions about racial or ethnic group identity feature centrally in many social science theories, but detailed data on ethnic composition are often difficult to obtain, out of date, or otherwise unavailable. The proliferation of publicly available geocoded person names provides one potential source of such data-if researchers can effectively link names and group identity. This article examines that linkage and presents a methodology for estimating local ethnic or racial composition using the relationship between group membership and person names. Common approaches for linking names and identity groups perform poorly when estimating group proportions. I have developed a new method for estimating racial or ethnic composition from names which requires no classification of individual names. This method provides more accurate estimates than the standard approach and works in any context where person names contain information about group membership. Illustrations from two very different contexts are provided: the United States and the Republic of Kenya.

Original languageEnglish (US)
Article numbermpu038
Pages (from-to)212-224
Number of pages13
JournalPolitical Analysis
Volume23
Issue number2
DOIs
StatePublished - Apr 1 2015

Fingerprint

ethnicity
group membership
human being
social science theory
Group
Kenya
proliferation
republic
ethnic group
methodology

ASJC Scopus subject areas

  • Sociology and Political Science
  • Political Science and International Relations

Cite this

What's in a name? A method for extracting information about ethnicity from names. / Harris, Jonathan Andrew.

In: Political Analysis, Vol. 23, No. 2, mpu038, 01.04.2015, p. 212-224.

Research output: Contribution to journalArticle

@article{7e69d217991d427292147cb47e7a7325,
title = "What's in a name? A method for extracting information about ethnicity from names",
abstract = "Questions about racial or ethnic group identity feature centrally in many social science theories, but detailed data on ethnic composition are often difficult to obtain, out of date, or otherwise unavailable. The proliferation of publicly available geocoded person names provides one potential source of such data-if researchers can effectively link names and group identity. This article examines that linkage and presents a methodology for estimating local ethnic or racial composition using the relationship between group membership and person names. Common approaches for linking names and identity groups perform poorly when estimating group proportions. I have developed a new method for estimating racial or ethnic composition from names which requires no classification of individual names. This method provides more accurate estimates than the standard approach and works in any context where person names contain information about group membership. Illustrations from two very different contexts are provided: the United States and the Republic of Kenya.",
author = "Harris, {Jonathan Andrew}",
year = "2015",
month = "4",
day = "1",
doi = "10.1093/pan/mpu038",
language = "English (US)",
volume = "23",
pages = "212--224",
journal = "Political Analysis",
issn = "1047-1987",
publisher = "Oxford University Press",
number = "2",

}

TY - JOUR

T1 - What's in a name? A method for extracting information about ethnicity from names

AU - Harris, Jonathan Andrew

PY - 2015/4/1

Y1 - 2015/4/1

N2 - Questions about racial or ethnic group identity feature centrally in many social science theories, but detailed data on ethnic composition are often difficult to obtain, out of date, or otherwise unavailable. The proliferation of publicly available geocoded person names provides one potential source of such data-if researchers can effectively link names and group identity. This article examines that linkage and presents a methodology for estimating local ethnic or racial composition using the relationship between group membership and person names. Common approaches for linking names and identity groups perform poorly when estimating group proportions. I have developed a new method for estimating racial or ethnic composition from names which requires no classification of individual names. This method provides more accurate estimates than the standard approach and works in any context where person names contain information about group membership. Illustrations from two very different contexts are provided: the United States and the Republic of Kenya.

AB - Questions about racial or ethnic group identity feature centrally in many social science theories, but detailed data on ethnic composition are often difficult to obtain, out of date, or otherwise unavailable. The proliferation of publicly available geocoded person names provides one potential source of such data-if researchers can effectively link names and group identity. This article examines that linkage and presents a methodology for estimating local ethnic or racial composition using the relationship between group membership and person names. Common approaches for linking names and identity groups perform poorly when estimating group proportions. I have developed a new method for estimating racial or ethnic composition from names which requires no classification of individual names. This method provides more accurate estimates than the standard approach and works in any context where person names contain information about group membership. Illustrations from two very different contexts are provided: the United States and the Republic of Kenya.

UR - http://www.scopus.com/inward/record.url?scp=84929670568&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84929670568&partnerID=8YFLogxK

U2 - 10.1093/pan/mpu038

DO - 10.1093/pan/mpu038

M3 - Article

VL - 23

SP - 212

EP - 224

JO - Political Analysis

JF - Political Analysis

SN - 1047-1987

IS - 2

M1 - mpu038

ER -