Susan Gruberhttps://works.bepress.com/sgruber/Recent works by Susan Gruberen-usCopyright (c) 2019 All rights reserved.Thu, 01 Jan 2015 00:00:00 +00003600Ensemble learning of inverse probability weights for marginal structural modeling in large observational datasetshttps://works.bepress.com/sgruber/32/<p>nverse probability weights used to fit marginal structural models are typically estimated using logistic regression. However, a data-adaptive procedure may be able to better exploit information available in measured covariates. By combining predictions from multiple algorithms, ensemble learning offers an alternative to logistic regression modeling to further reduce bias in estimated marginal structural model parameters. We describe the application of two ensemble learning approaches to estimating stabilized weights: super learning (SL), an ensemble machine learning approach that relies on V-fold cross validation, and an ensemble learner (EL) that creates a single parti- tion of the data into training and validation sets. Longitudinal data from two multicenter cohort studies in Spain (CoRIS and CoRIS-MD) were analyzed to estimate the mortality hazard ratio for initiation versus no initiation of combined antiretroviral therapy among HIV positive subjects. Both ensemble approaches produced hazard ratio estimates further away from the null, and with tighter confidence intervals, than logistic regression model- ing. Computation time for EL was less than half that of SL. We conclude that ensemble learning using a library of diverse candidate algorithms offers an alternative to parametric modeling of inverse probability weights when fitting marginal structural models. With large datasets, EL provides a rich search over the solution space in less time than SL with comparable results.</p>
Susan Gruber et al.Thu, 01 Jan 2015 00:00:00 +0000https://works.bepress.com/sgruber/32/ArticlesA Causal Perspective on OSIM2 Data Generation, with Implications for Simulation Study Design and Interpretationhttps://works.bepress.com/sgruber/33/<p>Research by the Observational Medical Outcomes Partnership (OMOP) has focused on developing and evaluating strategies to exploit observational electronic data to improve post-market prescription drug surveillance. A data simulator known as OSIM2 developed by the OMOP statistical methods group has been used as a testbed for evaluating and comparing different estimation procedures for detecting adverse drug-related events from data similar to that found in electronic insurance claims data. The simulation scheme produces a longitudinal dataset with millions of observations designed to closely match marginal distributions of important covariates in a known dataset. In this paper we provide a non-parametric structural equation model for the data generating process and construct the associated directed acyclic graph (DAG) depicting the causal structure. These representations reveal key differences between simulated and real-world data, including a departure from longitudinal causal relationships, absence of (presumed) sources of bias and time ordering of covariates that conflicts with reality. The DAG also reveals the presence of unmeasured baseline confounding of the causal effect of a drug on a subsequent medical condition. Conclusions naively drawn from this simulation study could mislead an investigator trying to gain insight into estimator performance on real data. Applying causal inference tools allows us to draw more informed conclusions and suggests modifications to the simulation scheme that would more closely align simulated and real-world data.</p>
Susan GruberThu, 01 Jan 2015 00:00:00 +0000https://works.bepress.com/sgruber/33/ArticlesVariable Selection for Confounder Control, Flexible Modeling and Collaborative Targeted Minimum Loss-Based Estimation in Causal Inference.https://works.bepress.com/sgruber/34/<p>This paper investigates the appropriateness of the integration of flexible propensity score modeling (nonparametric or machine learning approaches) in semiparametric models for the estimation of a causal quantity, such as the mean outcome under treatment. We begin with an overview of some of the issues involved in knowledge-based and statistical variable selection in causal inference and the potential pitfalls of automated selection based on the fit of the propensity score. Using a simple example, we directly show the consequences of adjusting for pure causes of the exposure when using inverse probability of treatment weighting (IPTW). Such variables are likely to be selected when using a naive approach to model selection for the propensity score. We describe how the method of Collaborative Targeted minimum loss-based estimation (C-TMLE; van der Laan and Gruber, 2010 [27]) capitalizes on the collaborative double robustness property of semiparametric efficient estimators to select covariates for the propensity score based on the error in the conditional outcome model. Finally, we compare several approaches to automated variable selection in low- and high-dimensional settings through a simulation study. From this simulation study, we conclude that using IPTW with flexible prediction for the propensity score can result in inferior estimation, while Targeted minimum loss-based estimation and C-TMLE may benefit from flexible prediction and remain robust to the presence of variables that are highly correlated with treatment. However, in our study, standard influence function-based methods for the variance underestimated the standard errors, resulting in poor coverage under certain data-generating scenarios.</p>
Mireille E Schnitzer et al.Thu, 01 Jan 2015 00:00:00 +0000https://works.bepress.com/sgruber/34/ArticlesEvaluating treatment effectiveness under model misspecification: A comparison of targeted maximum likelihood estimation with bias-corrected matchinghttps://works.bepress.com/sgruber/31/<p>Statistical approaches for estimating treatment effectiveness commonly model the endpoint, or the propensity score, using parametric regressions such as generalised linear models. Misspecification of these models can lead to biased parameter estimates. We compare two approaches that combine the propensity score and the endpoint regression, and can make weaker modelling assumptions, by using machine learning approaches to estimate the regression function and the propensity score. Targeted maximum likelihood estimation is a double-robust method designed to reduce bias in the estimate of the parameter of interest. Bias-corrected matching reduces bias due to covariate imbalance between matched pairs by using regression predictions. We illustrate the methods in an evaluation of different types of hip prosthesis on the health-related quality of life of patients with osteoarthritis. We undertake a simulation study, grounded in the case study, to compare the relative bias, efficiency and confidence interval coverage of the methods. We consider data generating processes with non-linear functional form relationships, normal and non-normal endpoints. We find that across the circumstances considered, bias-corrected matching generally reported less bias, but higher variance than targeted maximum likelihood estimation. When either targeted maximum likelihood estimation or bias-corrected matching incorporated machine learning, bias was much reduced, compared to using misspecified parametric models.</p>
Noemi Kreif et al.Wed, 12 Feb 2014 00:00:00 +0000https://works.bepress.com/sgruber/31/ArticlesTargeted Maximum Likelihood Estimation for Dynamic and Static Longitudinal Marginal Structural Working Modelshttps://works.bepress.com/sgruber/36/<p>This paper describes a targeted maximum likelihood estimator (TMLE) for the parameters of longitudinal static and dynamic marginal structural models. We consider a longitudinal data structure consisting of baseline covariates, time-dependent intervention nodes, intermediate time-dependent covari- ates, and a possibly time-dependent outcome. The intervention nodes at each time point can include a binary treatment as well as a right-censoring indicator. Given a class of dynamic or static interventions, a marginal structural model is used to model the mean of the intervention-specific counterfactual outcome as a function of the intervention, time point, and possibly a subset of baseline covariates. Because the true shape of this function is rarely known, the marginal structural model is used as a working model. The causal quantity of interest is defined as the projection of the true function onto this working model. Iterated conditional expectation double robust estimators for marginal structural model parameters were previously proposed by Robins (2000, 2002) and Bang and Robins (2005). Here we build on this work and present a pooled TMLE for the parameters of marginal structural working models. We compare this pooled estimator to a stratified TMLE (Schnitzer et al. 2014) that is based on estimating the intervention-specific mean separately for each intervention of interest. The performance of the pooled TMLE is compared to the performance of the stratified TMLE and the performance of inverse probability weighted (IPW) estimators using simulations. Concepts are illustrated using an example in which the aim is to estimate the causal effect of delayed switch following immunological failure of first line antiretroviral therapy among HIV- infected patients. Data from the International Epidemiological Databases to Evaluate AIDS, Southern Africa are analyzed to investigate this question using both TML and IPW estimators. Our results demonstrate practical advantages of the pooled TMLE over an IPW estimator for working marginal structural models for survival, as well as cases in which the pooled TMLE is superior to its stratified counterpart.</p>
Maya Petersen et al.Wed, 01 Jan 2014 00:00:00 +0000https://works.bepress.com/sgruber/36/ArticlesCharacteristics of study design and elements that may contribute to the success of electronic safety monitoring systemshttps://works.bepress.com/sgruber/35/<p>In this commentary, we propose to examine the char- acteristics of a single drug-outcome pair for which two large collaborative groups have come to results that have many points in common. The goal is to suggest by example features of a disease problem and data systems that may lead to repeatable findings.</p>
Carlos Bell et al.Wed, 01 Jan 2014 00:00:00 +0000https://works.bepress.com/sgruber/35/ArticlesActive presecription drug safety surveillance: Exploring OMOP 2011-2012 experimentshttps://works.bepress.com/sgruber/28/<p>The Observational Medical Outcomes Partnership (OMOP), a consortium of pharmaceutical, FDA, and academic researchers focuses on developing and evaluating electronic records-based methods for enhancing post-market drug safety surveillance. The OMOP 2011-2012 experiment consists of applying variants of seven analysis methods to five different EMR or claims databases to estimate the increase (decrease) in risk associated with drug-outcome pairs whose causal association has been previously established, and serves as a gold standard for comparison. Variants of each method can produce very different effect estimates, sometimes at odds with the gold standard. We explore the reasons behind this heterogeneity, and in doing so increase our understanding of each methodâ€™s vulnerability to different sources of bias, limitations of the OMOP 2011-2012 experiment, and the challenges inherent in using observational data for post-market monitoring of drug safety.</p>
Susan Gruber et al.Wed, 16 Oct 2013 00:00:00 +0000https://works.bepress.com/sgruber/28/PresentationsEmpirical Performance of a New User Cohort Method: Lessons for Developing a Risk Identification and Analysis Systemhttps://works.bepress.com/sgruber/27/<p>Background: Observational healthcare data offer the potential to enable identification of risks of medical pro- ducts, but appropriate methodology has not yet been defined. The new user cohort method, which compares the post-exposure rate among the target drug to a referent comparator group, is the prevailing approach for many pharmacoepidemiology evaluations and has been proposed as a promising approach for risk identification but its per- formance in this context has not been fully assessed.</p>
<p>Objectives: To evaluate the performance of the new user cohort method as a tool for risk identification in observa- tional healthcare data.</p>
<p>Research Design: The method was applied to 399 drug- outcome scenarios (165 positive controls and 234 negative controls across 4 health outcomes of interest) in 5 real observational databases (4 administrative claims and 1 electronic health record) and in 6 simulated datasets with no effect and injected relative risks of 1.25, 1.5, 2, 4, and 10, respectively.</p>
<p>Measures: Method performance was evaluated through Area Under ROC Curve (AUC), bias, and coverage probability.</p>
<p>Results: The new user cohort method achieved modest predictive accuracy across the outcomes and databases under study, with the top-performing analysis near AUC [0.70 in most scenarios. The performance of the method was particularly sensitive to the choice of comparator population. For almost all drug-outcome pairs there was a large difference, either positive or negative, between the true effect size and the estimate produced by the method,although this error was near zero on average. Simulation studies showed that in the majority of cases, the true effect estimate was not within the 95 % confidence interval produced by the method.</p>
<p>Conclusion: The new user cohort method can contribute useful information toward a risk identification system, but should not be considered definitive evidence given the degree of error observed within the effect estimates. Careful consideration of the comparator selection and appropriate calibration of the effect estimates is required in order to properly interpret study findings.</p>
Susan Gruber et al.Tue, 01 Oct 2013 00:00:00 +0000https://works.bepress.com/sgruber/27/ArticlesTargeted Maximum Likelihood Estimation for Dynamic and Static Longitudinal Marginal Structural Working Modelshttps://works.bepress.com/sgruber/25/This paper describes a targeted maximum likelihood estimator (TMLE) for the parameters of longitudinal static and dynamic marginal structural models. We consider a longitudinal data structure consisting of baseline covariates, time-dependent intervention nodes, intermediate time-dependent covariates, and a possibly time dependent outcome. The intervention nodes at each time point can include a binary treatment as well as a right-censoring indicator. Given a class of dynamic or static interventions, a marginal structural model is used to model the mean of the intervention specific counterfactual outcome as a function of the intervention, time point, and possibly a subset of baseline covariates. Because the true shape of this function is rarely known, the marginal structural model is used as a working model. The causal quantity of interest is defined as the projection of the true function onto this working model. Iterated conditional expectation double robust estimators for marginal structural model parameters were previously proposed by Robins (2000a, 2002) and Bang and Robins (2005). Here we build on this work and present a pooled TMLE for the parameters of marginal structural working models. We compare this pooled estimator to a stratified TMLE that is based on estimating the intervention-specific mean separately for each intervention of interest (Schnitzer et al. 2014). The performance of the pooled TMLE is compared to the performance of the stratified TMLE and the performance of inverse probability weighted (IPW) estimators using simulations. Concepts are illustrated using an example in which the aim is to estimate the causal effect of delayed switch following immunological failure of first line antiretroviral therapy among HIV infected patients. Data from the International Epidemiological Databases to Evaluate AIDS, Southern Africa are analyzed to investigate this question using both TMLE and IPW estimators. Our results demonstrate practical advantages of the pooled TMLE over an IPW estimator for working marginal structural models for survival, as well as cases in which the pooled TMLE is superior to its stratified counterpart.Wed, 22 May 2013 07:00:00 +0000https://works.bepress.com/sgruber/25/Technical ReportsAn Application of Targeted Maximum Likelihood Estimation to the Meta-Analysis of Safety Datahttps://works.bepress.com/sgruber/26/<p>Safety analysis to estimate the effect of a treatment on an adverse event poses a challenging statistical problem even in randomized controlled trials because these events are typically rare, so studies originally powered for efficacy are underpowered for safety outcomes. A meta-analysis of data pooled across multiple studies may increase power, but missingness in the outcome or failed randomization can introduce bias. This article illustrates how targeted maximum likelihood estimation (TMLE) can be applied in a meta-analysis to reduce bias in causal effect estimates, and compares performance with other estimators in the literature. A simulation study in which missingness in the outcome is at random or completely at random highlights the differences in estimators with respect to the potential gains in bias and efficiency. Risk difference, relative risk, and odds ratio of the effect of treatment on 30-daymortality are estimated from data from eight randomized controlled trials. When an outcome event is rare there may be little opportunity to improve efficiency, and associations between covariates and the outcome may be hard to detect. TMLE attempts to exploit the available information to either meet or exceed the performance of a less sophisticated estimator.</p>
Susan Gruber et al.Fri, 01 Feb 2013 00:00:00 +0000https://works.bepress.com/sgruber/26/Articles