loader
publication

Innovation

Welcome to our research page featuring recent publications in the field of biostatistics and epidemiology! These fields play a crucial role in advancing our understanding of the causes, prevention, and treatment of various health conditions. Our team is dedicated to advancing the field through innovative studies and cutting-edge statistical analyses. On this page, you will find our collection of research publications describing the development of new statistical methods and their application to real-world data. Please feel free to contact us with any questions or comments.

Filter

Topic

History

Showing 1 of 10 publications

Visualizing the target estimand in comparative effectiveness studies with multiple treatments

Aim: Comparative effectiveness research using real-world data often involves pairwise propensity score matching to adjust for confounding bias. We show that corresponding treatment effect estimates may have limited external validity, and propose two visualization tools to clarify the target estimand.

Materials & methods: We conduct a simulation study to demonstrate, with bivariate ellipses and joy plots, that differences in covariate distributions across treatment groups may affect the external validity of treatment effect estimates. We showcase how these visualization tools can facilitate the interpretation of target estimands in a case study comparing the effectiveness of teriflunomide (TERI), dimethyl fumarate (DMF) and natalizumab (NAT) on manual dexterity in patients with multiple sclerosis.

Results: In the simulation study, estimates of the treatment effect greatly differed depending on the target population. For example, when comparing treatment B with C, the estimated treatment effect (and respective standard error) varied from -0.27 (0.03) to -0.37 (0.04) in the type of patients initially receiving treatment B and C, respectively. Visualization of the matched samples revealed that covariate distributions vary for each comparison and cannot be used to target one common treatment effect for the three treatment comparisons. In the case study, the bivariate distribution of age and disease duration varied across the population of patients receiving TERI, DMF or NAT. Although results suggest that DMF and NAT improve manual dexterity at 1 year compared with TERI, the effectiveness of DMF versus NAT differs depending on which target estimand is used.

Conclusion: Visualization tools may help to clarify the target population in comparative effectiveness studies and resolve ambiguity about the interpretation of estimated treatment effects.

Journal: J Comp Eff Res |
Year: 2024
Evaluating individualized treatment effect predictions: A modelā€based perspective on discrimination and calibration assessment

In recent years, there has been a growing interest in the prediction of individualized treatment effects. While there is a rapidly growing literature on the development of such models, there is little literature on the evaluation of their performance. In this paper, we aim to facilitate the validation of prediction models for individualized treatment effects. The estimands of interest are defined based on the potential outcomes framework, which facilitates a comparison of existing and novel measures. In particular, we examine existing measures of discrimination for benefit (variations of the c-for-benefit), and propose model-based extensions to the treatment effect setting for discrimination and calibration metrics that have a strong basis in outcome risk prediction. The main focus is on randomized trial data with binary endpoints and on models that provide individualized treatment effect predictions and potential outcome predictions. We use simulated data to provide insight into the characteristics of the examined discrimination and calibration statistics under consideration, and further illustrate all methods in a trial of acute ischemic stroke treatment. The results show that the proposed model-based statistics had the best characteristics in terms of bias and accuracy. While resampling methods adjusted for the optimism of performance estimates in the development data, they had a high variance across replications that limited their accuracy. Therefore, individualized treatment effect models are best validated in independent data. To aid implementation, a software implementation of the proposed methods was made available in R.

Journal: Stat Med |
Year: 2024
The use of imputation in clinical decision support systems: a cardiovascular risk management pilot vignette study among clinicians

Introduction: A major challenge of the use of prediction models in clinical care is missing data. Real-time imputation may alleviate this. However, to what extent clinicians accept this solution remains unknown. We aimed to assess acceptance of real-time imputation for missing patient data in a clinical decision support system (CDSS) including 10-year cardiovascular absolute risk for the individual patient.

Methods: We performed a vignette study extending an existing CDSS with the real-time imputation method Joint Modelling Imputation (JMI). We included 17 clinicians to use the CDSS with three different vignettes, describing potential use cases (missing data, no risk estimate; imputed values, risk estimate based on imputed data; complete information). In each vignette missing data was introduced to mimic a situation as could occur in clinical practice. Acceptance of end-users was assessed on three different axes: clinical realism, comfortableness and added clinical value.

Results: Overall, the imputed predictor values were found to be clinically reasonable and according to the expectations. However, for binary variables, use of a probability scale to express uncertainty was deemed inconvenient. The perceived comfortableness with imputed risk prediction was low and confidence intervals were deemed too wide for reliable decision making. The clinicians acknowledged added value for using JMI in clinical practice when used for educational, research or informative purposes.

Conclusion: Handling missing data in CDSS via JMI is useful, but more accurate imputations are needed to generate comfort in clinicians for use in routine care. Only then CDSS can create clinical value by improving decision making.

Journal: EHJ Digital Health |
Year: 2024
Network meta-analysis of MS DMTs

To the Editor: We recently became aware of the study by Chen et al. Notable differences in the 3-month confirmed disability progression (CDP3M) outcome in this analysis have been identified compared with previously published network meta-analysis (NMA). More specifically, the results for CDP3M greatly differ for interferon (IFN) beta-1A 30 mcg every week, IFN beta-1A 44 mcg 3 times a week, IFN beta-1A 22 mcg 3 times a week, natalizumab 300 mg every 4 weeks, and ocrelizumab 600 mg every 24 weeks. The published comparative estimates by Chen et al. may compromise the external validity of the SUCRA ranking results given that it is inconsistent with the totality of the existing body of published evidence. For example, the NMA of Chen et al. includes only one trial assessing the efficacy of natalizumab (where it was compared with placebo). Because there are no trials comparing natalizumab with other active treatments, the pooled effect estimate for natalizumab versus placebo (hazard ratio = 0.85) should remain similar to the treatment effect estimate from the original trial (hazard ratio = 0.58). However, this is not the case in the review of Chen et al. Similar discrepancies appear for ponesimod where the original trial reported a hazard ratio versus teriflunomide 14 mg of 0.83 (0.58; 1.18), whereas the NMA reported a hazard ratio of 1.39 (0.55; 3.57). Therefore, we respectfully request additional transparency from Chen et al. regarding the NMA methods and additional clarity supporting their results.

Journal: J Am Pharm Assoc |
Year: 2024
Propensity-based standardization to enhance the validation and interpretation of prediction model discrimination for a target population

External validation of the discriminative ability of prediction models is of key importance. However, the interpretation of such evaluations is challenging, as the ability to discriminate depends on both the sample characteristics (ie, case-mix) and the generalizability of predictor coefficients, but most discrimination indices do not provide any insight into their respective contributions. To disentangle differences in discriminative ability across external validation samples due to a lack of model generalizability from differences in sample characteristics, we propose propensity-weighted measures of discrimination. These weighted metrics, which are derived from propensity scores for sample membership, are standardized for case-mix differences between the model development and validation samples, allowing for a fair comparison of discriminative ability in terms of model characteristics in a target population of interest. We illustrate our methods with the validation of eight prediction models for deep vein thrombosis in 12 external validation data sets and assess our methods in a simulation study. In the illustrative example, propensity score standardization reduced between-study heterogeneity of discrimination, indicating that between-study variability was partially attributable to case-mix. The simulation study showed that only flexible propensity-score methods (allowing for non-linear effects) produced unbiased estimates of model discrimination in the target population, and only when the positivity assumption was met. Propensity score-based standardization may facilitate the interpretation of (heterogeneity in) discriminative ability of a prediction model as observed across multiple studies, and may guide model updating strategies for a particular target population. Careful propensity score modeling with attention for non-linear relations is recommended.

Journal: Stat Med |
Year: 2023
Methods for comparative effectiveness based on time to confirmed disability progression with irregular observations in multiple sclerosis

Real-world data sources offer opportunities to compare the effectiveness of treatments in practical clinical settings. However, relevant outcomes are often recorded selectively and collected at irregular measurement times. It is therefore common to convert the available visits to a standardized schedule with equally spaced visits. Although more advanced imputation methods exist, they are not designed to recover longitudinal outcome trajectories and typically assume that missingness is non-informative. We, therefore, propose an extension of multilevel multiple imputation methods to facilitate the analysis of real-world outcome data that is collected at irregular observation times. We illustrate multilevel multiple imputation in a case study evaluating two disease-modifying therapies for multiple sclerosis in terms of time to confirmed disability progression. This survival outcome is derived from repeated measurements of the Expanded Disability Status Scale, which is collected when patients come to the healthcare center for a clinical visit and for which longitudinal trajectories can be estimated. Subsequently, we perform a simulation study to compare the performance of multilevel multiple imputation to commonly used single imputation methods. Results indicate that multilevel multiple imputation leads to less biased treatment effect estimates and improves the coverage of confidence intervals, even when outcomes are missing not at random.

Journal: Stat Methods Med Res |
Year: 2023
Transparent reporting of multivariable prediction models for individual prognosis or diagnosis: checklist for systematic reviews and meta-analyses (TRIPOD-SRMA)

Most clinical specialties have a plethora of studies that develop or validate one or more prediction models, for example, to inform diagnosis or prognosis. Having many prediction model studies in a particular clinical field motivates the need for systematic reviews and meta-analyses, to evaluate and summarise the overall evidence available from prediction model studies, in particular about the predictive performance of existing models. Such reviews are fast emerging, and should be reported completely, transparently, and accurately. To help ensure this type of reporting, this article describes a new reporting guideline for systematic reviews and meta-analyses of prediction model research.

Journal: BMJ |
Year: 2023
Measuring the performance of prediction models to personalize treatment choice

When data are available from individual patients receiving either a treatment or a control intervention in a randomized trial, various statistical and machine learning methods can be used to develop models for predicting future outcomes under the two conditions, and thus to predict treatment effect at the patient level. These predictions can subsequently guide personalized treatment choices. Although several methods for validating prediction models are available, little attention has been given to measuring the performance of predictions of personalized treatment effect. In this article, we propose a range of measures that can be used to this end. We start by defining two dimensions of model accuracy for treatment effects, for a single outcome: discrimination for benefit and calibration for benefit. We then amalgamate these two dimensions into an additional concept, decision accuracy, which quantifies the model's ability to identify patients for whom the benefit from treatment exceeds a given threshold. Subsequently, we propose a series of performance measures related to these dimensions and discuss estimating procedures, focusing on randomized data. Our methods are applicable for continuous or binary outcomes, for any type of prediction model, as long as it uses baseline covariates to predict outcomes under treatment and control. We illustrate all methods using two simulated datasets and a real dataset from a trial in depression. We implement all methods in the R package predieval. Results suggest that the proposed measures can be useful in evaluating and comparing the performance of competing models in predicting individualized treatment effect.

Journal: Stat Med |
Year: 2023
Citation: 7
Development and validation of treatment-decision algorithms for children evaluated for pulmonary tuberculosis: an individual participant data meta-analysis

Background: Many children with pulmonary tuberculosis remain undiagnosed and untreated with related high morbidity and mortality. Diagnostic challenges in children include low bacterial burden, challenges around specimen collection, and limited access to diagnostic expertise. Algorithms that guide decisions to initiate tuberculosis treatment at primary healthcare centres in resource-limited settings could help to close the persistent childhood tuberculosis treatment gap. Recent advances in childhood tuberculosis algorithm development have incorporated prediction modelling, but studies conducted to date have been small and localised, with limited generalisability. We assembled individual participant data (IPD) from children being investigated for pulmonary tuberculosis in high-tuberculosis incidence settings, which we leveraged to 1) evaluate the performance of currently used diagnostic algorithms and 2) develop evidence-based algorithms to assist in tuberculosis treatment decision-making for children presenting to primary healthcare settings.

Methods: We collated IPD including clinical, bacteriological, and radiologic information from prospective diagnostic studies in high-tuberculosis incidence settings enrolling children <10 years with presumptive pulmonary tuberculosis. Using this dataset, we first retrospectively evaluated the performance of several existing treatment-decision algorithms. We then developed multivariable prediction models and investigated model generalisability using an internal-external cross-validation framework. A team of experts provided input to adapt the models into scoring systems with pre-determined sensitivity thresholds of 85% to be incorporated into pragmatic treatment-decision algorithms for use in resource-limited, primary healthcare settings.

Findings: Of 4,718 children from 13 studies from 12 countries, 1,811 (38.4%) were classified as having pulmonary tuberculosis; 541 (29.9%) bacteriologically confirmed and 1,270 (70.1%) unconfirmed. Existing treatment-decision algorithms had highly variable diagnostic performance. The scoring system derived from the prediction model that included clinical features and features from chest x-ray had a combined sensitivity of 86% [95% confidence interval (CI): 0.68-0.94] and specificity of 37% [95% CI: 0.15-0.66] against a composite reference standard. The scoring system derived from the model that included only clinical features had a combined sensitivity of 84% [95% confidence interval (CI): 0.66-0.93] and specificity of 30% [95% CI: 0.13-0.56] against a composite reference standard.

Interpretation: We adopted an evidence-based approach to develop pragmatic algorithms to guide tuberculosis treatment decisions in children, irrespective of the resources locally available. This approach will empower health workers in resourcelimited, primary healthcare settings to initiate tuberculosis treatment in children in order to improve access to care and reduce tuberculosis-related mortality. These algorithms have been included in the operational handbook accompanying the latest WHO guidelines on the management of tuberculosis in children and adolescents. Future prospective evaluation of algorithms, including those developed in this work, is necessary to investigate clinical performance.

Journal: Lancet Child Adolesc. Health |
Year: 2023
Dealing with missing data using the Heckman selection model: methods primer for epidemiologists
Journal: Int. J. Epidemiol. |
Year: 2023
Citation: 1