The test–retest reliability of the latent construct of executive function depends on whether tasks are represented as formative or reflective indicators

Michael T. Willoughby, Laura J. Kuhn, Clancy Blair, Anya Samek, John A. List

Research output: Contribution to journalArticle

Abstract

This study investigates the test–retest reliability of a battery of executive function (EF) tasks with a specific interest in testing whether the method that is used to create a battery-wide score would result in differences in the apparent test–retest reliability of children’s performance. A total of 188 4-year-olds completed a battery of computerized EF tasks twice across a period of approximately two weeks. Two different approaches were used to create a score that indexed children’s overall performance on the battery—i.e., (1) the mean score of all completed tasks and (2) a factor score estimate which used confirmatory factor analysis (CFA). Pearson and intra-class correlations were used to investigate the test–retest reliability of individual EF tasks, as well as an overall battery score. Consistent with previous studies, the test–retest reliability of individual tasks was modest (rs ≈ .60). The test–retest reliability of the overall battery scores differed depending on the scoring approach (rmean = .72; rfactor_ score = .99). It is concluded that the children’s performance on individual EF tasks exhibit modest levels of test–retest reliability. This underscores the importance of administering multiple tasks and aggregating performance across these tasks in order to improve precision of measurement. However, the specific strategy that is used has a large impact on the apparent test–retest reliability of the overall score. These results replicate our earlier findings and provide additional cautionary evidence against the routine use of factor analytic approaches for representing individual performance across a battery of EF tasks.

Original languageEnglish (US)
Pages (from-to)1-16
Number of pages16
JournalChild Neuropsychology
DOIs
StateAccepted/In press - Jul 30 2016

Fingerprint

Executive Function
Task Performance and Analysis
Statistical Factor Analysis

Keywords

  • Confirmatory factor analysis
  • Early childhood
  • Executive function
  • Formative measurement
  • Test–retest reliability

ASJC Scopus subject areas

  • Pediatrics, Perinatology, and Child Health
  • Medicine(all)
  • Neuropsychology and Physiological Psychology
  • Developmental and Educational Psychology

Cite this

The test–retest reliability of the latent construct of executive function depends on whether tasks are represented as formative or reflective indicators. / Willoughby, Michael T.; Kuhn, Laura J.; Blair, Clancy; Samek, Anya; List, John A.

In: Child Neuropsychology, 30.07.2016, p. 1-16.

Research output: Contribution to journalArticle

@article{ede311056fa94d1c9de00520074b3e7c,
title = "The test–retest reliability of the latent construct of executive function depends on whether tasks are represented as formative or reflective indicators",
abstract = "This study investigates the test–retest reliability of a battery of executive function (EF) tasks with a specific interest in testing whether the method that is used to create a battery-wide score would result in differences in the apparent test–retest reliability of children’s performance. A total of 188 4-year-olds completed a battery of computerized EF tasks twice across a period of approximately two weeks. Two different approaches were used to create a score that indexed children’s overall performance on the battery—i.e., (1) the mean score of all completed tasks and (2) a factor score estimate which used confirmatory factor analysis (CFA). Pearson and intra-class correlations were used to investigate the test–retest reliability of individual EF tasks, as well as an overall battery score. Consistent with previous studies, the test–retest reliability of individual tasks was modest (rs ≈ .60). The test–retest reliability of the overall battery scores differed depending on the scoring approach (rmean = .72; rfactor_ score = .99). It is concluded that the children’s performance on individual EF tasks exhibit modest levels of test–retest reliability. This underscores the importance of administering multiple tasks and aggregating performance across these tasks in order to improve precision of measurement. However, the specific strategy that is used has a large impact on the apparent test–retest reliability of the overall score. These results replicate our earlier findings and provide additional cautionary evidence against the routine use of factor analytic approaches for representing individual performance across a battery of EF tasks.",
keywords = "Confirmatory factor analysis, Early childhood, Executive function, Formative measurement, Test–retest reliability",
author = "Willoughby, {Michael T.} and Kuhn, {Laura J.} and Clancy Blair and Anya Samek and List, {John A.}",
year = "2016",
month = "7",
day = "30",
doi = "10.1080/09297049.2016.1205009",
language = "English (US)",
pages = "1--16",
journal = "Child Neuropsychology",
issn = "0929-7049",
publisher = "Psychology Press Ltd",

}

TY - JOUR

T1 - The test–retest reliability of the latent construct of executive function depends on whether tasks are represented as formative or reflective indicators

AU - Willoughby, Michael T.

AU - Kuhn, Laura J.

AU - Blair, Clancy

AU - Samek, Anya

AU - List, John A.

PY - 2016/7/30

Y1 - 2016/7/30

N2 - This study investigates the test–retest reliability of a battery of executive function (EF) tasks with a specific interest in testing whether the method that is used to create a battery-wide score would result in differences in the apparent test–retest reliability of children’s performance. A total of 188 4-year-olds completed a battery of computerized EF tasks twice across a period of approximately two weeks. Two different approaches were used to create a score that indexed children’s overall performance on the battery—i.e., (1) the mean score of all completed tasks and (2) a factor score estimate which used confirmatory factor analysis (CFA). Pearson and intra-class correlations were used to investigate the test–retest reliability of individual EF tasks, as well as an overall battery score. Consistent with previous studies, the test–retest reliability of individual tasks was modest (rs ≈ .60). The test–retest reliability of the overall battery scores differed depending on the scoring approach (rmean = .72; rfactor_ score = .99). It is concluded that the children’s performance on individual EF tasks exhibit modest levels of test–retest reliability. This underscores the importance of administering multiple tasks and aggregating performance across these tasks in order to improve precision of measurement. However, the specific strategy that is used has a large impact on the apparent test–retest reliability of the overall score. These results replicate our earlier findings and provide additional cautionary evidence against the routine use of factor analytic approaches for representing individual performance across a battery of EF tasks.

AB - This study investigates the test–retest reliability of a battery of executive function (EF) tasks with a specific interest in testing whether the method that is used to create a battery-wide score would result in differences in the apparent test–retest reliability of children’s performance. A total of 188 4-year-olds completed a battery of computerized EF tasks twice across a period of approximately two weeks. Two different approaches were used to create a score that indexed children’s overall performance on the battery—i.e., (1) the mean score of all completed tasks and (2) a factor score estimate which used confirmatory factor analysis (CFA). Pearson and intra-class correlations were used to investigate the test–retest reliability of individual EF tasks, as well as an overall battery score. Consistent with previous studies, the test–retest reliability of individual tasks was modest (rs ≈ .60). The test–retest reliability of the overall battery scores differed depending on the scoring approach (rmean = .72; rfactor_ score = .99). It is concluded that the children’s performance on individual EF tasks exhibit modest levels of test–retest reliability. This underscores the importance of administering multiple tasks and aggregating performance across these tasks in order to improve precision of measurement. However, the specific strategy that is used has a large impact on the apparent test–retest reliability of the overall score. These results replicate our earlier findings and provide additional cautionary evidence against the routine use of factor analytic approaches for representing individual performance across a battery of EF tasks.

KW - Confirmatory factor analysis

KW - Early childhood

KW - Executive function

KW - Formative measurement

KW - Test–retest reliability

UR - http://www.scopus.com/inward/record.url?scp=84980004950&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84980004950&partnerID=8YFLogxK

U2 - 10.1080/09297049.2016.1205009

DO - 10.1080/09297049.2016.1205009

M3 - Article

SP - 1

EP - 16

JO - Child Neuropsychology

JF - Child Neuropsychology

SN - 0929-7049

ER -