A comparison of the ability of different propensity score models to balance measured variables between treated and untreated subjects: a Monte Carlo study

被引:889
作者
Austin, Peter C.
Grootendorst, Paul
Anderson, Geoffrey M.
机构
[1] Inst Clin Evaluat Sci, Toronto, ON M4N 3M5, Canada
[2] Univ Toronto, Dept Publ Hlth Sci, Toronto, ON, Canada
[3] Univ Toronto, Dept Hlth Policy Management & Evaluat, Toronto, ON, Canada
[4] Univ Toronto, Fac Pharm, Toronto, ON, Canada
关键词
propensity score; observational studies; balance; Monte Carlo simulations;
D O I
10.1002/sim.2580
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
The propensity score-the probability of exposure to a specific treatment conditional on observed variables-is increasingly being used in observational studies. Creating strata in which subjects are matched on the propensity score allows one to balance measured variables between treated and untreated subjects. There is an ongoing controversy in the literature as to which variables to include in the propensity score model. Some advocate including those variables that predict treatment assignment, while others suggest including all variables potentially related to the outcome, and still others advocate including only variables that are associated with both treatment and outcome. We provide a case study of the association between drug exposure and mortality to show that including a variable that is related to treatment, but not outcome, does not improve balance and reduces the number of matched pairs available for analysis. In order to investigate this issue more comprehensively, we conducted a series of Monte Carlo simulations of the performance of propensity score models that contained variables related to treatment allocation, or variables that were confounders for the treatment-outcome pair, or variables related to outcome or all variables related to either outcome or treatment or neither. We compared the use of these different propensity scores models in matching and stratification in terms of the extent to which they balanced variables. We demonstrated that all propensity scores models balanced measured confounders between treated and untreated subjects in a propensity-score matched sample. However, including only the true confounders or the variables predictive of the outcome in the propensity score model resulted in a substantially larger number of matched pairs than did using the treatment-allocation model. Stratifying on the quintiles of any propensity score model resulted in residual imbalance between treated and untreated subjects in the upper and lower quintiles. Greater balance between treated and untreated subjects was obtained after matching on the propensity score than after stratifying on the quintiles of the propensity score. When a confounding variable was omitted from any of the propensity score models, then matching or stratifying on the propensity score resulted in residual imbalance in prognostically important variables between treated and untreated subjects. We considered four propensity score models for estimating treatment effects: the model that included only true confounders; the model that included all variables associated with the outcome; the model that included all measured variables; and the model that included all variables associated with treatment selection. Reduction in bias when estimating a null treatment effect was equivalent for all four propensity score models when propensity score matching was used. Reduction in bias was marginally greater for the first two propensity score models than for the last two propensity score models when stratification on the quintiles of the propensity score model was employed. Furthermore, omitting a confounding variable from the propensity score model resulted in biased estimation of the treatment effect. Finally, the mean squared error for estimating a null treatment effect was lower when either of the first two propensity scores was used compared to when either of the last two propensity score models was used. Copyright (c) 2006 John Wiley & Sons, Ltd.
引用
收藏
页码:734 / 753
页数:20
相关论文
共 21 条
[1]   Effects and non-effects of paired identical observations in comparing proportions with binary matched-pairs data [J].
Agresti, A ;
Min, YY .
STATISTICS IN MEDICINE, 2004, 23 (01) :65-75
[2]   The use of the propensity score for estimating treatment effects: administrative versus clinical data [J].
Austin, PC ;
Mamdani, MM ;
Stukel, TA ;
Anderson, GM ;
Tu, JV .
STATISTICS IN MEDICINE, 2005, 24 (10) :1563-1578
[3]  
Breslow NE, 1980, STAT METHODS CANC RE, V1, DOI DOI 10.1097/00002030-199912240-00009
[4]  
COCHRAN WG, 1973, SANKHYA SER A, V35, P417
[5]  
D'Agostino RB, 1998, STAT MED, V17, P2265, DOI 10.1002/(SICI)1097-0258(19981015)17:19<2265::AID-SIM918>3.0.CO
[6]  
2-B
[7]   ADAPTING A CLINICAL COMORBIDITY INDEX FOR USE WITH ICD-9-CM ADMINISTRATIVE DATABASES [J].
DEYO, RA ;
CHERKIN, DC ;
CIOL, MA .
JOURNAL OF CLINICAL EPIDEMIOLOGY, 1992, 45 (06) :613-619
[8]  
GAIL MH, 1984, BIOMETRIKA, V71, P431
[9]   Validating recommendations for coronary angiography following acute myocardial infarction in the elderly: A matched analysis using propensity scores [J].
Normand, SLT ;
Landrum, NB ;
Guadagnoli, E ;
Ayanian, JZ ;
Ryan, TJ ;
Cleary, PD ;
McNeil, BJ .
JOURNAL OF CLINICAL EPIDEMIOLOGY, 2001, 54 (04) :387-398
[10]  
Perkins SM, 2000, PHARMACOEPIDEM DR S, V9, P93, DOI 10.1002/(SICI)1099-1557(200003/04)9:2<93::AID-PDS474>3.0.CO