Power and sample size calculations for the Wilcoxon-Mann-Whitney test in the presence of death-censored observations

Roland A. Matsouaka, Rebecca Betensky

Research output: Contribution to journalArticle

Abstract

We consider a clinical trial of a potentially lethal disease in which patients are randomly assigned to two treatment groups and are followed for a fixed period of time; a continuous endpoint is measured at the end of follow-up. For some patients; however, death (or severe disease progression) may preclude measurement of the endpoint. A statistical analysis that includes only patients with endpoint measurements may be biased. An alternative analysis includes all randomized patients, with rank scores assigned to the patients who are available for the endpoint measurement on the basis of the magnitude of their responses and with 'worst-rank' scores assigned to those patients whose death precluded the measurement of the continuous endpoint. The worst-rank scores are worse than all observed rank scores. The treatment effect is then evaluated using the Wilcoxon-Mann-Whitney test. In this paper, we derive closed-form formulae for the power and sample size of the Wilcoxon-Mann-Whitney test when missing measurements of the continuous endpoints because of death are replaced by worst-rank scores. We distinguish two approaches for assigning the worst-rank scores. In the tied worst-rank approach, all deaths are weighted equally, and the worst-rank scores are set to a single value that is worse than all measured responses. In the untied worst-rank approach, the worst-rank scores further rank patients according to their time of death, so that an earlier death is considered worse than a later death, which in turn is worse than all measured responses. In addition, we propose four methods for the implementation of the sample size formulae for a trial with expected early death. We conduct Monte Carlo simulation studies to evaluate the accuracy of our power and sample size formulae and to compare the four sample size estimation methods.

Original languageEnglish (US)
Pages (from-to)406-431
Number of pages26
JournalStatistics in Medicine
Volume34
Issue number3
DOIs
StatePublished - Jan 1 2015

Fingerprint

Mann-Whitney test
Wilcoxon Test
Sample Size Calculation
Censored Observations
Sample Size
Disease Progression
Treatment Effects
Clinical Trials
Period of time
Progression
Statistical Analysis
Biased
Closed-form

Keywords

  • Composite outcome
  • Informative missingness
  • Intention-to-treat
  • Worst-rank scores, Wilcoxon rank sum test

ASJC Scopus subject areas

  • Epidemiology
  • Statistics and Probability

Cite this

Power and sample size calculations for the Wilcoxon-Mann-Whitney test in the presence of death-censored observations. / Matsouaka, Roland A.; Betensky, Rebecca.

In: Statistics in Medicine, Vol. 34, No. 3, 01.01.2015, p. 406-431.

Research output: Contribution to journalArticle

@article{2ca7e7ff8fe14fd88055896e2502ef87,
title = "Power and sample size calculations for the Wilcoxon-Mann-Whitney test in the presence of death-censored observations",
abstract = "We consider a clinical trial of a potentially lethal disease in which patients are randomly assigned to two treatment groups and are followed for a fixed period of time; a continuous endpoint is measured at the end of follow-up. For some patients; however, death (or severe disease progression) may preclude measurement of the endpoint. A statistical analysis that includes only patients with endpoint measurements may be biased. An alternative analysis includes all randomized patients, with rank scores assigned to the patients who are available for the endpoint measurement on the basis of the magnitude of their responses and with 'worst-rank' scores assigned to those patients whose death precluded the measurement of the continuous endpoint. The worst-rank scores are worse than all observed rank scores. The treatment effect is then evaluated using the Wilcoxon-Mann-Whitney test. In this paper, we derive closed-form formulae for the power and sample size of the Wilcoxon-Mann-Whitney test when missing measurements of the continuous endpoints because of death are replaced by worst-rank scores. We distinguish two approaches for assigning the worst-rank scores. In the tied worst-rank approach, all deaths are weighted equally, and the worst-rank scores are set to a single value that is worse than all measured responses. In the untied worst-rank approach, the worst-rank scores further rank patients according to their time of death, so that an earlier death is considered worse than a later death, which in turn is worse than all measured responses. In addition, we propose four methods for the implementation of the sample size formulae for a trial with expected early death. We conduct Monte Carlo simulation studies to evaluate the accuracy of our power and sample size formulae and to compare the four sample size estimation methods.",
keywords = "Composite outcome, Informative missingness, Intention-to-treat, Worst-rank scores, Wilcoxon rank sum test",
author = "Matsouaka, {Roland A.} and Rebecca Betensky",
year = "2015",
month = "1",
day = "1",
doi = "10.1002/sim.6355",
language = "English (US)",
volume = "34",
pages = "406--431",
journal = "Statistics in Medicine",
issn = "0277-6715",
publisher = "John Wiley and Sons Ltd",
number = "3",

}

TY - JOUR

T1 - Power and sample size calculations for the Wilcoxon-Mann-Whitney test in the presence of death-censored observations

AU - Matsouaka, Roland A.

AU - Betensky, Rebecca

PY - 2015/1/1

Y1 - 2015/1/1

N2 - We consider a clinical trial of a potentially lethal disease in which patients are randomly assigned to two treatment groups and are followed for a fixed period of time; a continuous endpoint is measured at the end of follow-up. For some patients; however, death (or severe disease progression) may preclude measurement of the endpoint. A statistical analysis that includes only patients with endpoint measurements may be biased. An alternative analysis includes all randomized patients, with rank scores assigned to the patients who are available for the endpoint measurement on the basis of the magnitude of their responses and with 'worst-rank' scores assigned to those patients whose death precluded the measurement of the continuous endpoint. The worst-rank scores are worse than all observed rank scores. The treatment effect is then evaluated using the Wilcoxon-Mann-Whitney test. In this paper, we derive closed-form formulae for the power and sample size of the Wilcoxon-Mann-Whitney test when missing measurements of the continuous endpoints because of death are replaced by worst-rank scores. We distinguish two approaches for assigning the worst-rank scores. In the tied worst-rank approach, all deaths are weighted equally, and the worst-rank scores are set to a single value that is worse than all measured responses. In the untied worst-rank approach, the worst-rank scores further rank patients according to their time of death, so that an earlier death is considered worse than a later death, which in turn is worse than all measured responses. In addition, we propose four methods for the implementation of the sample size formulae for a trial with expected early death. We conduct Monte Carlo simulation studies to evaluate the accuracy of our power and sample size formulae and to compare the four sample size estimation methods.

AB - We consider a clinical trial of a potentially lethal disease in which patients are randomly assigned to two treatment groups and are followed for a fixed period of time; a continuous endpoint is measured at the end of follow-up. For some patients; however, death (or severe disease progression) may preclude measurement of the endpoint. A statistical analysis that includes only patients with endpoint measurements may be biased. An alternative analysis includes all randomized patients, with rank scores assigned to the patients who are available for the endpoint measurement on the basis of the magnitude of their responses and with 'worst-rank' scores assigned to those patients whose death precluded the measurement of the continuous endpoint. The worst-rank scores are worse than all observed rank scores. The treatment effect is then evaluated using the Wilcoxon-Mann-Whitney test. In this paper, we derive closed-form formulae for the power and sample size of the Wilcoxon-Mann-Whitney test when missing measurements of the continuous endpoints because of death are replaced by worst-rank scores. We distinguish two approaches for assigning the worst-rank scores. In the tied worst-rank approach, all deaths are weighted equally, and the worst-rank scores are set to a single value that is worse than all measured responses. In the untied worst-rank approach, the worst-rank scores further rank patients according to their time of death, so that an earlier death is considered worse than a later death, which in turn is worse than all measured responses. In addition, we propose four methods for the implementation of the sample size formulae for a trial with expected early death. We conduct Monte Carlo simulation studies to evaluate the accuracy of our power and sample size formulae and to compare the four sample size estimation methods.

KW - Composite outcome

KW - Informative missingness

KW - Intention-to-treat

KW - Worst-rank scores, Wilcoxon rank sum test

UR - http://www.scopus.com/inward/record.url?scp=84920742395&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84920742395&partnerID=8YFLogxK

U2 - 10.1002/sim.6355

DO - 10.1002/sim.6355

M3 - Article

C2 - 25393385

AN - SCOPUS:84920742395

VL - 34

SP - 406

EP - 431

JO - Statistics in Medicine

JF - Statistics in Medicine

SN - 0277-6715

IS - 3

ER -