Automated versus do-it-yourself methods for causal inference: Lessons learned from a data analysis competition

Vincent Dorie, Jennifer Hill, Uri Shalit, Marc Scott, Dan Cervone

Research output: Contribution to journalArticle

Abstract

Statisticians have made great progress in creating methods that reduce our reliance on parametric assumptions. However, this explosion in research has resulted in a breadth of inferential strategies that both create opportunities for more reliable inference as well as complicate the choices that an applied researcher has to make and defend. Relatedly, researchers advocating for new methods typically compare their method to at best 2 or 3 other causal inference strategies and test using simulations that may or may not be designed to equally tease out flaws in all the competing methods. The causal inference data analysis challenge, "Is Your SATT Where It's At?", launched as part of the 2016 Atlantic Causal Inference Conference, sought to make progress with respect to both of these issues. The researchers creating the data testing grounds were distinct from the researchers submitting methods whose efficacy would be evaluated. Results from 30 competitors across the two versions of the competition (black-box algorithms and do-it-yourself analyses) are presented along with post-hoc analyses that reveal information about the characteristics of causal inference strategies and settings that affect performance. The most consistent conclusion was that methods that flexibly model the response surface perform better overall than methods that fail to do so. Finally new methods are proposed that combine features of several of the top-performing submitted methods.

Original languageEnglish (US)
Pages (from-to)43-68
Number of pages26
JournalStatistical Science
Volume34
Issue number1
DOIs
StatePublished - Feb 1 2019

Fingerprint

Causal Inference
Data analysis
Lessons learned
Causal inference
Breadth
Response Surface
Black Box
Explosion
Efficacy
Distinct
Testing

Keywords

  • Automated algorithms
  • Causal inference
  • Competition
  • Evaluation
  • Machine learning

ASJC Scopus subject areas

  • Statistics and Probability
  • Mathematics(all)
  • Statistics, Probability and Uncertainty

Cite this

Automated versus do-it-yourself methods for causal inference : Lessons learned from a data analysis competition. / Dorie, Vincent; Hill, Jennifer; Shalit, Uri; Scott, Marc; Cervone, Dan.

In: Statistical Science, Vol. 34, No. 1, 01.02.2019, p. 43-68.

Research output: Contribution to journalArticle

@article{749df0a753e444918bc92ca83bbf635d,
title = "Automated versus do-it-yourself methods for causal inference: Lessons learned from a data analysis competition",
abstract = "Statisticians have made great progress in creating methods that reduce our reliance on parametric assumptions. However, this explosion in research has resulted in a breadth of inferential strategies that both create opportunities for more reliable inference as well as complicate the choices that an applied researcher has to make and defend. Relatedly, researchers advocating for new methods typically compare their method to at best 2 or 3 other causal inference strategies and test using simulations that may or may not be designed to equally tease out flaws in all the competing methods. The causal inference data analysis challenge, {"}Is Your SATT Where It's At?{"}, launched as part of the 2016 Atlantic Causal Inference Conference, sought to make progress with respect to both of these issues. The researchers creating the data testing grounds were distinct from the researchers submitting methods whose efficacy would be evaluated. Results from 30 competitors across the two versions of the competition (black-box algorithms and do-it-yourself analyses) are presented along with post-hoc analyses that reveal information about the characteristics of causal inference strategies and settings that affect performance. The most consistent conclusion was that methods that flexibly model the response surface perform better overall than methods that fail to do so. Finally new methods are proposed that combine features of several of the top-performing submitted methods.",
keywords = "Automated algorithms, Causal inference, Competition, Evaluation, Machine learning",
author = "Vincent Dorie and Jennifer Hill and Uri Shalit and Marc Scott and Dan Cervone",
year = "2019",
month = "2",
day = "1",
doi = "10.1214/18-STS667",
language = "English (US)",
volume = "34",
pages = "43--68",
journal = "Statistical Science",
issn = "0883-4237",
publisher = "Institute of Mathematical Statistics",
number = "1",

}

TY - JOUR

T1 - Automated versus do-it-yourself methods for causal inference

T2 - Lessons learned from a data analysis competition

AU - Dorie, Vincent

AU - Hill, Jennifer

AU - Shalit, Uri

AU - Scott, Marc

AU - Cervone, Dan

PY - 2019/2/1

Y1 - 2019/2/1

N2 - Statisticians have made great progress in creating methods that reduce our reliance on parametric assumptions. However, this explosion in research has resulted in a breadth of inferential strategies that both create opportunities for more reliable inference as well as complicate the choices that an applied researcher has to make and defend. Relatedly, researchers advocating for new methods typically compare their method to at best 2 or 3 other causal inference strategies and test using simulations that may or may not be designed to equally tease out flaws in all the competing methods. The causal inference data analysis challenge, "Is Your SATT Where It's At?", launched as part of the 2016 Atlantic Causal Inference Conference, sought to make progress with respect to both of these issues. The researchers creating the data testing grounds were distinct from the researchers submitting methods whose efficacy would be evaluated. Results from 30 competitors across the two versions of the competition (black-box algorithms and do-it-yourself analyses) are presented along with post-hoc analyses that reveal information about the characteristics of causal inference strategies and settings that affect performance. The most consistent conclusion was that methods that flexibly model the response surface perform better overall than methods that fail to do so. Finally new methods are proposed that combine features of several of the top-performing submitted methods.

AB - Statisticians have made great progress in creating methods that reduce our reliance on parametric assumptions. However, this explosion in research has resulted in a breadth of inferential strategies that both create opportunities for more reliable inference as well as complicate the choices that an applied researcher has to make and defend. Relatedly, researchers advocating for new methods typically compare their method to at best 2 or 3 other causal inference strategies and test using simulations that may or may not be designed to equally tease out flaws in all the competing methods. The causal inference data analysis challenge, "Is Your SATT Where It's At?", launched as part of the 2016 Atlantic Causal Inference Conference, sought to make progress with respect to both of these issues. The researchers creating the data testing grounds were distinct from the researchers submitting methods whose efficacy would be evaluated. Results from 30 competitors across the two versions of the competition (black-box algorithms and do-it-yourself analyses) are presented along with post-hoc analyses that reveal information about the characteristics of causal inference strategies and settings that affect performance. The most consistent conclusion was that methods that flexibly model the response surface perform better overall than methods that fail to do so. Finally new methods are proposed that combine features of several of the top-performing submitted methods.

KW - Automated algorithms

KW - Causal inference

KW - Competition

KW - Evaluation

KW - Machine learning

UR - http://www.scopus.com/inward/record.url?scp=85065502384&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85065502384&partnerID=8YFLogxK

U2 - 10.1214/18-STS667

DO - 10.1214/18-STS667

M3 - Article

AN - SCOPUS:85065502384

VL - 34

SP - 43

EP - 68

JO - Statistical Science

JF - Statistical Science

SN - 0883-4237

IS - 1

ER -