Estimating Grouped Data Models with a Binary-Dependent Variable and Fixed Eects via a Logit versus a Linear Probability Model: The Impact of Dropped Units

Research output: Contribution to journalArticle

Abstract

This letter deals with a very simple question: if we have grouped data with a binary-dependent variable and want to include fixed eects in the specification, canwemeaningfully compare results using a linear model to those estimated with a logit? The reason to doubt such a comparison is that the linear specification appears to keep all observations, whereas the logit drops the groups where the dependent variable is either all zeros or all ones. This letter demonstrates that a linear specification averages the estimates for all the homogeneous outcome groups (which, by definition, all have slope coeicients of zero) with the slope coeicients for the groups with a mix of zeros and ones. The correct comparison of the linear to logit formis to only look at groups with some variation in the dependent variable. Researchers using the linear specification are urged to report results for all groups and for the subset of groups where the dependent variable varies. The interpretation of the dierence between these two results depends upon assumptions which cannot be empirically assessed.

Original languageEnglish (US)
Pages (from-to)139-145
Number of pages7
JournalPublic Health Nutrition
Volume22
Issue number18
DOIs
StatePublished - Dec 1 2019

Fingerprint

Linear Models
Research Personnel

Keywords

  • binary logit
  • clustered data
  • fixed eects
  • marginal eects
  • panel data

ASJC Scopus subject areas

  • Medicine (miscellaneous)
  • Nutrition and Dietetics
  • Public Health, Environmental and Occupational Health

Cite this

@article{cb4d1ceb8a0147d48c491d2463fdac87,
title = "Estimating Grouped Data Models with a Binary-Dependent Variable and Fixed Eects via a Logit versus a Linear Probability Model: The Impact of Dropped Units",
abstract = "This letter deals with a very simple question: if we have grouped data with a binary-dependent variable and want to include fixed eects in the specification, canwemeaningfully compare results using a linear model to those estimated with a logit? The reason to doubt such a comparison is that the linear specification appears to keep all observations, whereas the logit drops the groups where the dependent variable is either all zeros or all ones. This letter demonstrates that a linear specification averages the estimates for all the homogeneous outcome groups (which, by definition, all have slope coeicients of zero) with the slope coeicients for the groups with a mix of zeros and ones. The correct comparison of the linear to logit formis to only look at groups with some variation in the dependent variable. Researchers using the linear specification are urged to report results for all groups and for the subset of groups where the dependent variable varies. The interpretation of the dierence between these two results depends upon assumptions which cannot be empirically assessed.",
keywords = "binary logit, clustered data, fixed eects, marginal eects, panel data",
author = "Nathaniel Beck",
year = "2019",
month = "12",
day = "1",
doi = "10.1017/pan.2019.20",
language = "English (US)",
volume = "22",
pages = "139--145",
journal = "Public Health Nutrition",
issn = "1368-9800",
publisher = "Cambridge University Press",
number = "18",

}

TY - JOUR

T1 - Estimating Grouped Data Models with a Binary-Dependent Variable and Fixed Eects via a Logit versus a Linear Probability Model

T2 - The Impact of Dropped Units

AU - Beck, Nathaniel

PY - 2019/12/1

Y1 - 2019/12/1

N2 - This letter deals with a very simple question: if we have grouped data with a binary-dependent variable and want to include fixed eects in the specification, canwemeaningfully compare results using a linear model to those estimated with a logit? The reason to doubt such a comparison is that the linear specification appears to keep all observations, whereas the logit drops the groups where the dependent variable is either all zeros or all ones. This letter demonstrates that a linear specification averages the estimates for all the homogeneous outcome groups (which, by definition, all have slope coeicients of zero) with the slope coeicients for the groups with a mix of zeros and ones. The correct comparison of the linear to logit formis to only look at groups with some variation in the dependent variable. Researchers using the linear specification are urged to report results for all groups and for the subset of groups where the dependent variable varies. The interpretation of the dierence between these two results depends upon assumptions which cannot be empirically assessed.

AB - This letter deals with a very simple question: if we have grouped data with a binary-dependent variable and want to include fixed eects in the specification, canwemeaningfully compare results using a linear model to those estimated with a logit? The reason to doubt such a comparison is that the linear specification appears to keep all observations, whereas the logit drops the groups where the dependent variable is either all zeros or all ones. This letter demonstrates that a linear specification averages the estimates for all the homogeneous outcome groups (which, by definition, all have slope coeicients of zero) with the slope coeicients for the groups with a mix of zeros and ones. The correct comparison of the linear to logit formis to only look at groups with some variation in the dependent variable. Researchers using the linear specification are urged to report results for all groups and for the subset of groups where the dependent variable varies. The interpretation of the dierence between these two results depends upon assumptions which cannot be empirically assessed.

KW - binary logit

KW - clustered data

KW - fixed eects

KW - marginal eects

KW - panel data

UR - http://www.scopus.com/inward/record.url?scp=85076393439&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85076393439&partnerID=8YFLogxK

U2 - 10.1017/pan.2019.20

DO - 10.1017/pan.2019.20

M3 - Article

AN - SCOPUS:85076167645

VL - 22

SP - 139

EP - 145

JO - Public Health Nutrition

JF - Public Health Nutrition

SN - 1368-9800

IS - 18

ER -