尊敬的 微信汇率:1円 ≈ 0.046078 元 支付宝汇率:1円 ≈ 0.046168元 [退出登录]
SlideShare a Scribd company logo
Calibration of risk prediction models:
decision making with the lights on or off?
Ben Van Calster
KU Leuven (B), LUMC (NL)
ISCB Krakow, August 27th 2020
TG6: Evaluating diagnostic tests and prediction models
• Chairs:
o Ewout Steyerberg (Leiden LUMC)
o Ben Van Calster (KU Leuven)
• Members (alphabetically):
o Patrick Bossuyt (AMC Amsterdam)
o Tom Boyles (U Witwatersrand, Johannesburg; clinician member)
o Gary Collins (U Oxford)
o Kathleen Kerr (U Washington, Seattle)
o Petra Macaskill (U Sydney)
o David McLernon (Aberdeen)
o Carl Moons (UMC Utrecht)
o Maarten van Smeden (UMC Utrecht)
o Andrew Vickers (MSKCC, New York)
o Laure Wynants (U Maastricht)
2
Risk prediction or binary prediction?
3
Risk prediction or binary prediction?
4
Risk prediction or binary prediction?
5
Risk is most interpretable, acknowledges imperfect prediction,
can be combined with other information, and allows to vary decision thresholds.
If you predict risk, you can assess the accuracy of the estimates (calibration).
Binary predictions easily hide potential miscalibration.
Level 1-2 TG6 paper on calibration
6
The Achilles heel of predictive analytics
7
Systematically wrong risk estimates can distort decision-making
o Risk overestimated: can lead to many unnecessary interventions
o Risk underestimated: can lead to withholding many important interventions
Calibration often not assessed during model validation.
So for many models, it is not known how accurate the risks are in a specific
setting. In that case, you are in fact using a model with the lights off.
The Achilles heel of predictive analytics
8
But if AUC is high, the ranking of patients into lower vs higher risk must be very
good?
 Good relative performance does not imply good absolute performance!
Using binary predictions only (e.g. treat vs don’t treat), you are not avoiding the
problem. I think you aggravate it by pretending to avoid the problem.
9
Published online in
J Ultrasound Med
on Aug 11 2020
Objective: develop risk model for first trimester miscarriage in very early pregnancies
- Retrospective data, single institution.
- 590 pregnancies, 345 miscarried; 9 parameters studied.
- Most important predictor (hCG rise) missing in 79%.
- No validation at all.
“It might appear to be a weakness of our study that the first trimester loss rate was
considerably higher than the rates found by other investigators (48% vs 10-30%).The rate
is high because of the high prevalence of pregnancy risk factors in our population.”
Web-calculator given that allows risk estimation. I cannot support that.
How can risks be inaccurate?
10
• Methodological issues at model development or validation
o Overfitting, leading to overly extreme risk estimates on new data
“in small datasets, it is reasonable for a model not to be developed at all”
o Heterogeneity of measurement error between settings (Luijken et al, Stat Med
2019)
• Variables and characteristics unrelated to model development
o Patient characteristics and outcome incidence/prevalence vary greatly between
settings
o Patient populations change over time within setting (“drift”)
o So there is “Heterogeneity across time and place”
Levels of calibration
1. Mean calibration / calibration-in-the-large
2. Weak calibration
3. Moderate calibration
4. Strong calibration
Work motivated by a very nice and thought provoking paper from WernerVach (JCE 2013;66:1296-1301)
11
1. Mean calibration
The average estimated risk is accurate
Compare average risk with outcome prevalence/incidence
12
2. Weak calibration
On average, the model does not overestimate or underestimate risk, and
does not give too extreme or too modest risks
‘Logistic recalibration’ framework:
Evaluate calibration intercept a: log
𝑃 𝑌=1
𝑃 𝑌=0
= 𝑎 + 𝐿
𝑎 < 0 means overestimation, 𝑎 > 0 means underestimation
Evaluate calibration slope b: log
𝑃 𝑌=1
𝑃 𝑌=0
= 𝑎 + 𝑏𝐿
𝑏 < 1 means too extreme risks, 𝑏 > 1 means too modest risks
13
3. Moderate calibration
Observed proportion of events correspond to estimated risk
Construct a flexible calibration curve based on log
𝑃 𝑌=1
𝑃 𝑌=0
= 𝑎 + 𝑓(𝐿).
𝑓(. ) is usually a loess fit, but can also be based on splines.
This is preferable at external validation, but sufficient N needed.
Intercept and slope are nice summaries, but reduce calibration to 2 numbers (weak).
The slope is usually sufficient for internal validation (using bootstrapping or cross-
validation), but the intercept or plotting a curve can sometimes be defended as well.
14
Some reference calibration curves
15
25% outcome
prevalence
Example curves with low N
16Verhoeven et al. Ultrasound ObstetGynecol 2009;34:316-321.
240 cases, 27 events (Caesarean delivery)
“Calibration of the model on the right was not as good
as the calibration of the model on the left”
4. Strong calibration
Observed proportion of events correspond to estimated risk for each
covariate pattern
Hard to assess (unless the model has only a few dichotomous predictors)
This is clinically desirable but utopic.The model needs to be fully correct.
A diagonal calibration curve (i.e. moderate) does not imply strong calibration.
We have shown that moderate calibration cannot lead to harmful decisions (in the
framework of decision curve analysis).
17
Example external validation
18
N=4905, 978 events
Multinomial outcomes?
1. Calibration intercepts and slopes can be calculated for multinomial logistic
regression by extending the approach for binary outcomes to
log
𝑃 𝑌 = 𝑘
𝑃 𝑌 = 𝐽
= 𝑎 𝑘 +
𝑖=1
𝐾−1
𝑏 𝑘,𝑖 𝐿𝑖
2. Flexible calibration curves can be obtained by using vector splines s(.)
log
𝑃 𝑌 = 𝑘
𝑃 𝑌 = 𝐽
= 𝑎 𝑘 +
𝑖=1
𝐾−1
𝑠 𝑘,𝑖 𝐿𝑖
This can be extended to risk models for ordinal outcomes, and to risk models
based on e.g. machine learning algorithms
19
Multinomial: example
20
Heterogeneity between centers
21
Heterogeneity between centers
22
Heterogeneity: example
23
Heterogeneity: example
Centre-specific and overall logistic (i.e. non-flexible) calibration curves:
logistic recalibration model with random intercept and random slope for the J
centres (Wynants et al, SMMR 2018):
𝑙𝑜𝑔
𝑃 𝑌=1
𝑃 𝑌=0
= 𝛼 + 𝑎𝑗 + 𝛽𝐿 + 𝑏𝑗 𝐿,
where
𝑎𝑗
𝑏𝑗
~𝑁
0
0
,
𝜏 𝑎
2 𝜏 𝑎𝑏
𝜏 𝑎𝑏 𝜏 𝑏
2 .
24
Cox models (TG6 paper in preparation)
25
Cox models
What you can do depends on the information you have (next to the validation
dataset)
In my view, level 2 is what is needed for clinical application. It is also what
TRIPOD recommends (Moons et al, Ann Intern Med 2005).
26
Level Available information about the model
Level 1 Only model coefficients (very common)
Level 2 Coefficients + cumulative baseline hazard at t1, 𝐻0 𝑡1
Level 3 Original dataset
Cox models
If 𝐻0 𝑡1 is available, flexible adaptive hazard regression can be used to
generate a flexible calibration curve at time t1
𝑙𝑜𝑔 ℎ 𝑡 = 𝑔 𝑙𝑜𝑔 −𝑙𝑜𝑔 1 − 𝑝𝑡1
, 𝑡 , with
𝑝𝑡1
= 1 − 𝑒𝑥𝑝 −𝐻0 𝑡1
exp 𝛃 𝑇 𝐗
Can also be used for other time-to-event models.
See Austin, Harrell, van Klaveren (Stat Med 2020).
27
3 myths about risk thresholds (TG6 paper)
28
3 myths about risk thresholds
1. Risk groups are more useful than continuous risk estimates
 Clinically actionable groups (that have consensus) can make sense, but
this remains rough for decision making at individual level
2. You can ask your statistician to get you the threshold
 Depends on clinical context, you need reasonable information on
misclassification costs
3. The threshold is a part of the model
 Different preferences, different healthcare systems
These 3 issues are obviously related to each other.
29
Further plans TG6
Practical guidance on validation of risk models for time-to-event outcomes
Practical guidance on validation of risk models accounting for competing risks
Simple paper (level 1) with advice for prediction model development
Multicenter diagnostic test evaluations: guidance on design and analysis
Hands-on tutorial of tools to assess calibration for different outcomes
30
“Medicine is a science of uncertainty and an art of probability”
WilliamOsler
31

More Related Content

What's hot

How to establish and evaluate clinical prediction models - Statswork
How to establish and evaluate clinical prediction models - StatsworkHow to establish and evaluate clinical prediction models - Statswork
How to establish and evaluate clinical prediction models - Statswork
Stats Statswork
 
Prediction research in a pandemic: 3 lessons from a living systematic review ...
Prediction research in a pandemic: 3 lessons from a living systematic review ...Prediction research in a pandemic: 3 lessons from a living systematic review ...
Prediction research in a pandemic: 3 lessons from a living systematic review ...
Laure Wynants
 
Clinical prediction models
Clinical prediction modelsClinical prediction models
Clinical prediction models
Maarten van Smeden
 
P-values in crisis
P-values in crisisP-values in crisis
P-values in crisis
Laure Wynants
 
The basics of prediction modeling
The basics of prediction modeling The basics of prediction modeling
The basics of prediction modeling
Maarten van Smeden
 
Introduction to prediction modelling - Berlin 2018 - Part II
Introduction to prediction modelling - Berlin 2018 - Part IIIntroduction to prediction modelling - Berlin 2018 - Part II
Introduction to prediction modelling - Berlin 2018 - Part II
Maarten van Smeden
 
Thoughts on Machine Learning and Artificial Intelligence
Thoughts on Machine Learning and Artificial IntelligenceThoughts on Machine Learning and Artificial Intelligence
Thoughts on Machine Learning and Artificial Intelligence
Maarten van Smeden
 
Evaluation of the clinical value of biomarkers for risk prediction
Evaluation of the clinical value of biomarkers for risk predictionEvaluation of the clinical value of biomarkers for risk prediction
Evaluation of the clinical value of biomarkers for risk prediction
Ewout Steyerberg
 
Is it causal, is it prediction or is it neither?
Is it causal, is it prediction or is it neither?Is it causal, is it prediction or is it neither?
Is it causal, is it prediction or is it neither?
Maarten van Smeden
 
Big Data Analytics for Healthcare
Big Data Analytics for HealthcareBig Data Analytics for Healthcare
Big Data Analytics for Healthcare
Chandan Reddy
 
Why the EPV≥10 sample size rule is rubbish and what to use instead
Why the EPV≥10 sample size rule is rubbish and what to use instead Why the EPV≥10 sample size rule is rubbish and what to use instead
Why the EPV≥10 sample size rule is rubbish and what to use instead
Maarten van Smeden
 
Development and evaluation of prediction models: pitfalls and solutions
Development and evaluation of prediction models: pitfalls and solutionsDevelopment and evaluation of prediction models: pitfalls and solutions
Development and evaluation of prediction models: pitfalls and solutions
Maarten van Smeden
 
How to establish and evaluate clinical prediction models - Statswork
How to establish and evaluate clinical prediction models - StatsworkHow to establish and evaluate clinical prediction models - Statswork
How to establish and evaluate clinical prediction models - Statswork
Stats Statswork
 
Day 1 (Lecture 3): Predictive Analytics in Healthcare
Day 1 (Lecture 3): Predictive Analytics in HealthcareDay 1 (Lecture 3): Predictive Analytics in Healthcare
Day 1 (Lecture 3): Predictive Analytics in Healthcare
Aseda Owusua Addai-Deseh
 
Regression shrinkage: better answers to causal questions
Regression shrinkage: better answers to causal questionsRegression shrinkage: better answers to causal questions
Regression shrinkage: better answers to causal questions
Maarten van Smeden
 
Data science in health care
Data science in health careData science in health care
Data science in health care
Chetan Khanzode
 
QUANTIFYING THE IMPACT OF DIFFERENT APPROACHES FOR HANDLING CONTINUOUS PREDIC...
QUANTIFYING THE IMPACT OF DIFFERENT APPROACHES FOR HANDLING CONTINUOUS PREDIC...QUANTIFYING THE IMPACT OF DIFFERENT APPROACHES FOR HANDLING CONTINUOUS PREDIC...
QUANTIFYING THE IMPACT OF DIFFERENT APPROACHES FOR HANDLING CONTINUOUS PREDIC...
GaryCollins74
 
Introduction to prediction modelling - Berlin 2018 - Part I
Introduction to prediction modelling - Berlin 2018 - Part IIntroduction to prediction modelling - Berlin 2018 - Part I
Introduction to prediction modelling - Berlin 2018 - Part I
Maarten van Smeden
 
O1
O1O1
Machine learning versus traditional statistical modeling and medical doctors
Machine learning versus traditional statistical modeling and medical doctorsMachine learning versus traditional statistical modeling and medical doctors
Machine learning versus traditional statistical modeling and medical doctors
Maarten van Smeden
 

What's hot (20)

How to establish and evaluate clinical prediction models - Statswork
How to establish and evaluate clinical prediction models - StatsworkHow to establish and evaluate clinical prediction models - Statswork
How to establish and evaluate clinical prediction models - Statswork
 
Prediction research in a pandemic: 3 lessons from a living systematic review ...
Prediction research in a pandemic: 3 lessons from a living systematic review ...Prediction research in a pandemic: 3 lessons from a living systematic review ...
Prediction research in a pandemic: 3 lessons from a living systematic review ...
 
Clinical prediction models
Clinical prediction modelsClinical prediction models
Clinical prediction models
 
P-values in crisis
P-values in crisisP-values in crisis
P-values in crisis
 
The basics of prediction modeling
The basics of prediction modeling The basics of prediction modeling
The basics of prediction modeling
 
Introduction to prediction modelling - Berlin 2018 - Part II
Introduction to prediction modelling - Berlin 2018 - Part IIIntroduction to prediction modelling - Berlin 2018 - Part II
Introduction to prediction modelling - Berlin 2018 - Part II
 
Thoughts on Machine Learning and Artificial Intelligence
Thoughts on Machine Learning and Artificial IntelligenceThoughts on Machine Learning and Artificial Intelligence
Thoughts on Machine Learning and Artificial Intelligence
 
Evaluation of the clinical value of biomarkers for risk prediction
Evaluation of the clinical value of biomarkers for risk predictionEvaluation of the clinical value of biomarkers for risk prediction
Evaluation of the clinical value of biomarkers for risk prediction
 
Is it causal, is it prediction or is it neither?
Is it causal, is it prediction or is it neither?Is it causal, is it prediction or is it neither?
Is it causal, is it prediction or is it neither?
 
Big Data Analytics for Healthcare
Big Data Analytics for HealthcareBig Data Analytics for Healthcare
Big Data Analytics for Healthcare
 
Why the EPV≥10 sample size rule is rubbish and what to use instead
Why the EPV≥10 sample size rule is rubbish and what to use instead Why the EPV≥10 sample size rule is rubbish and what to use instead
Why the EPV≥10 sample size rule is rubbish and what to use instead
 
Development and evaluation of prediction models: pitfalls and solutions
Development and evaluation of prediction models: pitfalls and solutionsDevelopment and evaluation of prediction models: pitfalls and solutions
Development and evaluation of prediction models: pitfalls and solutions
 
How to establish and evaluate clinical prediction models - Statswork
How to establish and evaluate clinical prediction models - StatsworkHow to establish and evaluate clinical prediction models - Statswork
How to establish and evaluate clinical prediction models - Statswork
 
Day 1 (Lecture 3): Predictive Analytics in Healthcare
Day 1 (Lecture 3): Predictive Analytics in HealthcareDay 1 (Lecture 3): Predictive Analytics in Healthcare
Day 1 (Lecture 3): Predictive Analytics in Healthcare
 
Regression shrinkage: better answers to causal questions
Regression shrinkage: better answers to causal questionsRegression shrinkage: better answers to causal questions
Regression shrinkage: better answers to causal questions
 
Data science in health care
Data science in health careData science in health care
Data science in health care
 
QUANTIFYING THE IMPACT OF DIFFERENT APPROACHES FOR HANDLING CONTINUOUS PREDIC...
QUANTIFYING THE IMPACT OF DIFFERENT APPROACHES FOR HANDLING CONTINUOUS PREDIC...QUANTIFYING THE IMPACT OF DIFFERENT APPROACHES FOR HANDLING CONTINUOUS PREDIC...
QUANTIFYING THE IMPACT OF DIFFERENT APPROACHES FOR HANDLING CONTINUOUS PREDIC...
 
Introduction to prediction modelling - Berlin 2018 - Part I
Introduction to prediction modelling - Berlin 2018 - Part IIntroduction to prediction modelling - Berlin 2018 - Part I
Introduction to prediction modelling - Berlin 2018 - Part I
 
O1
O1O1
O1
 
Machine learning versus traditional statistical modeling and medical doctors
Machine learning versus traditional statistical modeling and medical doctorsMachine learning versus traditional statistical modeling and medical doctors
Machine learning versus traditional statistical modeling and medical doctors
 

Similar to Calibration of risk prediction models: decision making with the lights on or off?

Common statistical pitfalls & errors in biomedical research (a top-5 list)
Common statistical pitfalls & errors in biomedical research (a top-5 list)Common statistical pitfalls & errors in biomedical research (a top-5 list)
Common statistical pitfalls & errors in biomedical research (a top-5 list)
Evangelos Kritsotakis
 
Developing and validating statistical models for clinical prediction and prog...
Developing and validating statistical models for clinical prediction and prog...Developing and validating statistical models for clinical prediction and prog...
Developing and validating statistical models for clinical prediction and prog...
Evangelos Kritsotakis
 
Errors2
Errors2Errors2
Errors2
sjsuchaya
 
Clinical Trials Versus Health Outcomes Research: SAS/STAT Versus SAS Enterpri...
Clinical Trials Versus Health Outcomes Research: SAS/STAT Versus SAS Enterpri...Clinical Trials Versus Health Outcomes Research: SAS/STAT Versus SAS Enterpri...
Clinical Trials Versus Health Outcomes Research: SAS/STAT Versus SAS Enterpri...
cambridgeWD
 
Clinical Trials Versus Health Outcomes Research: SAS/STAT Versus SAS Enterpri...
Clinical Trials Versus Health Outcomes Research: SAS/STAT Versus SAS Enterpri...Clinical Trials Versus Health Outcomes Research: SAS/STAT Versus SAS Enterpri...
Clinical Trials Versus Health Outcomes Research: SAS/STAT Versus SAS Enterpri...
cambridgeWD
 
Measuring clinical utility: uncertainty in Net Benefit
Measuring clinical utility: uncertainty in Net BenefitMeasuring clinical utility: uncertainty in Net Benefit
Measuring clinical utility: uncertainty in Net Benefit
Laure Wynants
 
NY Prostate Cancer Conference - A. Vickers - Session 1: Traditional statistic...
NY Prostate Cancer Conference - A. Vickers - Session 1: Traditional statistic...NY Prostate Cancer Conference - A. Vickers - Session 1: Traditional statistic...
NY Prostate Cancer Conference - A. Vickers - Session 1: Traditional statistic...
European School of Oncology
 
SLR Assumptions:Model Check Using SPSS
SLR Assumptions:Model Check Using SPSSSLR Assumptions:Model Check Using SPSS
SLR Assumptions:Model Check Using SPSS
Nermin Osman
 
The ASA president Task Force Statement on Statistical Significance and Replic...
The ASA president Task Force Statement on Statistical Significance and Replic...The ASA president Task Force Statement on Statistical Significance and Replic...
The ASA president Task Force Statement on Statistical Significance and Replic...
jemille6
 
The two statistical cornerstones of replicability: addressing selective infer...
The two statistical cornerstones of replicability: addressing selective infer...The two statistical cornerstones of replicability: addressing selective infer...
The two statistical cornerstones of replicability: addressing selective infer...
jemille6
 
Measurement Uncertainty (1).ppt
Measurement Uncertainty (1).pptMeasurement Uncertainty (1).ppt
Measurement Uncertainty (1).ppt
HoussemEddineSassi
 
The Lachman Test
The Lachman TestThe Lachman Test
The Lachman Test
Laura Torres
 
Risk Aggregation Inanoglu Jacobs 6 09 V1
Risk Aggregation Inanoglu Jacobs 6 09 V1Risk Aggregation Inanoglu Jacobs 6 09 V1
Risk Aggregation Inanoglu Jacobs 6 09 V1
Michael Jacobs, Jr.
 
Running head HYPOTHESIS TEST 1HYPOTHESIS TESTING.docx
Running head HYPOTHESIS TEST    1HYPOTHESIS TESTING.docxRunning head HYPOTHESIS TEST    1HYPOTHESIS TESTING.docx
Running head HYPOTHESIS TEST 1HYPOTHESIS TESTING.docx
cowinhelen
 
CH&Cie white paper value-at-risk in tuburlent times_VaR
CH&Cie white paper value-at-risk in tuburlent times_VaRCH&Cie white paper value-at-risk in tuburlent times_VaR
CH&Cie white paper value-at-risk in tuburlent times_VaR
Thibault Le Pomellec
 
A plea for good methodology when developing clinical prediction models
A plea for good methodology when developing clinical prediction modelsA plea for good methodology when developing clinical prediction models
A plea for good methodology when developing clinical prediction models
BenVanCalster
 
Machine Learning for Survival Analysis
Machine Learning for Survival AnalysisMachine Learning for Survival Analysis
Machine Learning for Survival Analysis
Chandan Reddy
 
ISCB 2023 Sources of uncertainty b.pptx
ISCB 2023 Sources of uncertainty b.pptxISCB 2023 Sources of uncertainty b.pptx
ISCB 2023 Sources of uncertainty b.pptx
BenVanCalster
 
PREDICTING CLASS-IMBALANCED BUSINESS RISK USING RESAMPLING, REGULARIZATION, A...
PREDICTING CLASS-IMBALANCED BUSINESS RISK USING RESAMPLING, REGULARIZATION, A...PREDICTING CLASS-IMBALANCED BUSINESS RISK USING RESAMPLING, REGULARIZATION, A...
PREDICTING CLASS-IMBALANCED BUSINESS RISK USING RESAMPLING, REGULARIZATION, A...
IJMIT JOURNAL
 
PREDICTING CLASS-IMBALANCED BUSINESS RISK USING RESAMPLING, REGULARIZATION, A...
PREDICTING CLASS-IMBALANCED BUSINESS RISK USING RESAMPLING, REGULARIZATION, A...PREDICTING CLASS-IMBALANCED BUSINESS RISK USING RESAMPLING, REGULARIZATION, A...
PREDICTING CLASS-IMBALANCED BUSINESS RISK USING RESAMPLING, REGULARIZATION, A...
IJMIT JOURNAL
 

Similar to Calibration of risk prediction models: decision making with the lights on or off? (20)

Common statistical pitfalls & errors in biomedical research (a top-5 list)
Common statistical pitfalls & errors in biomedical research (a top-5 list)Common statistical pitfalls & errors in biomedical research (a top-5 list)
Common statistical pitfalls & errors in biomedical research (a top-5 list)
 
Developing and validating statistical models for clinical prediction and prog...
Developing and validating statistical models for clinical prediction and prog...Developing and validating statistical models for clinical prediction and prog...
Developing and validating statistical models for clinical prediction and prog...
 
Errors2
Errors2Errors2
Errors2
 
Clinical Trials Versus Health Outcomes Research: SAS/STAT Versus SAS Enterpri...
Clinical Trials Versus Health Outcomes Research: SAS/STAT Versus SAS Enterpri...Clinical Trials Versus Health Outcomes Research: SAS/STAT Versus SAS Enterpri...
Clinical Trials Versus Health Outcomes Research: SAS/STAT Versus SAS Enterpri...
 
Clinical Trials Versus Health Outcomes Research: SAS/STAT Versus SAS Enterpri...
Clinical Trials Versus Health Outcomes Research: SAS/STAT Versus SAS Enterpri...Clinical Trials Versus Health Outcomes Research: SAS/STAT Versus SAS Enterpri...
Clinical Trials Versus Health Outcomes Research: SAS/STAT Versus SAS Enterpri...
 
Measuring clinical utility: uncertainty in Net Benefit
Measuring clinical utility: uncertainty in Net BenefitMeasuring clinical utility: uncertainty in Net Benefit
Measuring clinical utility: uncertainty in Net Benefit
 
NY Prostate Cancer Conference - A. Vickers - Session 1: Traditional statistic...
NY Prostate Cancer Conference - A. Vickers - Session 1: Traditional statistic...NY Prostate Cancer Conference - A. Vickers - Session 1: Traditional statistic...
NY Prostate Cancer Conference - A. Vickers - Session 1: Traditional statistic...
 
SLR Assumptions:Model Check Using SPSS
SLR Assumptions:Model Check Using SPSSSLR Assumptions:Model Check Using SPSS
SLR Assumptions:Model Check Using SPSS
 
The ASA president Task Force Statement on Statistical Significance and Replic...
The ASA president Task Force Statement on Statistical Significance and Replic...The ASA president Task Force Statement on Statistical Significance and Replic...
The ASA president Task Force Statement on Statistical Significance and Replic...
 
The two statistical cornerstones of replicability: addressing selective infer...
The two statistical cornerstones of replicability: addressing selective infer...The two statistical cornerstones of replicability: addressing selective infer...
The two statistical cornerstones of replicability: addressing selective infer...
 
Measurement Uncertainty (1).ppt
Measurement Uncertainty (1).pptMeasurement Uncertainty (1).ppt
Measurement Uncertainty (1).ppt
 
The Lachman Test
The Lachman TestThe Lachman Test
The Lachman Test
 
Risk Aggregation Inanoglu Jacobs 6 09 V1
Risk Aggregation Inanoglu Jacobs 6 09 V1Risk Aggregation Inanoglu Jacobs 6 09 V1
Risk Aggregation Inanoglu Jacobs 6 09 V1
 
Running head HYPOTHESIS TEST 1HYPOTHESIS TESTING.docx
Running head HYPOTHESIS TEST    1HYPOTHESIS TESTING.docxRunning head HYPOTHESIS TEST    1HYPOTHESIS TESTING.docx
Running head HYPOTHESIS TEST 1HYPOTHESIS TESTING.docx
 
CH&Cie white paper value-at-risk in tuburlent times_VaR
CH&Cie white paper value-at-risk in tuburlent times_VaRCH&Cie white paper value-at-risk in tuburlent times_VaR
CH&Cie white paper value-at-risk in tuburlent times_VaR
 
A plea for good methodology when developing clinical prediction models
A plea for good methodology when developing clinical prediction modelsA plea for good methodology when developing clinical prediction models
A plea for good methodology when developing clinical prediction models
 
Machine Learning for Survival Analysis
Machine Learning for Survival AnalysisMachine Learning for Survival Analysis
Machine Learning for Survival Analysis
 
ISCB 2023 Sources of uncertainty b.pptx
ISCB 2023 Sources of uncertainty b.pptxISCB 2023 Sources of uncertainty b.pptx
ISCB 2023 Sources of uncertainty b.pptx
 
PREDICTING CLASS-IMBALANCED BUSINESS RISK USING RESAMPLING, REGULARIZATION, A...
PREDICTING CLASS-IMBALANCED BUSINESS RISK USING RESAMPLING, REGULARIZATION, A...PREDICTING CLASS-IMBALANCED BUSINESS RISK USING RESAMPLING, REGULARIZATION, A...
PREDICTING CLASS-IMBALANCED BUSINESS RISK USING RESAMPLING, REGULARIZATION, A...
 
PREDICTING CLASS-IMBALANCED BUSINESS RISK USING RESAMPLING, REGULARIZATION, A...
PREDICTING CLASS-IMBALANCED BUSINESS RISK USING RESAMPLING, REGULARIZATION, A...PREDICTING CLASS-IMBALANCED BUSINESS RISK USING RESAMPLING, REGULARIZATION, A...
PREDICTING CLASS-IMBALANCED BUSINESS RISK USING RESAMPLING, REGULARIZATION, A...
 

Recently uploaded

(Shilpa) ➤ Call Girls Lucknow 🔥 9352988975 🔥 Real Fun With Sexual Girl Availa...
(Shilpa) ➤ Call Girls Lucknow 🔥 9352988975 🔥 Real Fun With Sexual Girl Availa...(Shilpa) ➤ Call Girls Lucknow 🔥 9352988975 🔥 Real Fun With Sexual Girl Availa...
(Shilpa) ➤ Call Girls Lucknow 🔥 9352988975 🔥 Real Fun With Sexual Girl Availa...
shourabjaat424
 
seed production, Nursery & Gardening.pdf
seed production, Nursery & Gardening.pdfseed production, Nursery & Gardening.pdf
seed production, Nursery & Gardening.pdf
Nistarini College, Purulia (W.B) India
 
The Limited Role of the Streaming Instability during Moon and Exomoon Formation
The Limited Role of the Streaming Instability during Moon and Exomoon FormationThe Limited Role of the Streaming Instability during Moon and Exomoon Formation
The Limited Role of the Streaming Instability during Moon and Exomoon Formation
Sérgio Sacani
 
20240515_CEBaP Poster_SR eating_drinking.pdf
20240515_CEBaP Poster_SR eating_drinking.pdf20240515_CEBaP Poster_SR eating_drinking.pdf
20240515_CEBaP Poster_SR eating_drinking.pdf
Hans Van Remoortel
 
SPERM FUNCTION TEST IN EMBRYOLOGY .pptx
SPERM FUNCTION TEST  IN EMBRYOLOGY .pptxSPERM FUNCTION TEST  IN EMBRYOLOGY .pptx
SPERM FUNCTION TEST IN EMBRYOLOGY .pptx
SRI AUROBINDO UNIVERSITY
 
Delhi Call Girls ✓WhatsApp 9999965857 🔝Top Class Call Girl Service Available
Delhi Call Girls ✓WhatsApp 9999965857 🔝Top Class Call Girl Service AvailableDelhi Call Girls ✓WhatsApp 9999965857 🔝Top Class Call Girl Service Available
Delhi Call Girls ✓WhatsApp 9999965857 🔝Top Class Call Girl Service Available
kk090568
 
Premuim Call Girls Pune 🔥 7014168258 🔥 Real Fun With Sexual Girl Available 24...
Premuim Call Girls Pune 🔥 7014168258 🔥 Real Fun With Sexual Girl Available 24...Premuim Call Girls Pune 🔥 7014168258 🔥 Real Fun With Sexual Girl Available 24...
Premuim Call Girls Pune 🔥 7014168258 🔥 Real Fun With Sexual Girl Available 24...
$Ak47
 
GBSN - Microbiology (Unit 2) Antimicrobial agents
GBSN - Microbiology (Unit 2) Antimicrobial agentsGBSN - Microbiology (Unit 2) Antimicrobial agents
GBSN - Microbiology (Unit 2) Antimicrobial agents
Areesha Ahmad
 
Centrifugation types and its application
Centrifugation types and its applicationCentrifugation types and its application
Centrifugation types and its application
MDAsifKilledar
 
Buy Best T-shirts for Men Online Buy Best T-shirts for Men Online
Buy Best T-shirts for Men Online Buy Best T-shirts for Men OnlineBuy Best T-shirts for Men Online Buy Best T-shirts for Men Online
Buy Best T-shirts for Men Online Buy Best T-shirts for Men Online
janvi$L14
 
Call Girls Versova ♨️ +91-9920725232 👈Open 24/7 at Top Mumbai Call Girls Service
Call Girls Versova ♨️ +91-9920725232 👈Open 24/7 at Top Mumbai Call Girls ServiceCall Girls Versova ♨️ +91-9920725232 👈Open 24/7 at Top Mumbai Call Girls Service
Call Girls Versova ♨️ +91-9920725232 👈Open 24/7 at Top Mumbai Call Girls Service
bhuhariaqueen9pm$S2
 
GBSN - Biochemistry (Unit 12) Hormones
GBSN - Biochemistry (Unit 12) HormonesGBSN - Biochemistry (Unit 12) Hormones
GBSN - Biochemistry (Unit 12) Hormones
Areesha Ahmad
 
حبوب الاجهاض الامارات | 00971547952044 | حبوب اجهاض امارات للبيع
حبوب الاجهاض الامارات | 00971547952044 | حبوب اجهاض امارات للبيعحبوب الاجهاض الامارات | 00971547952044 | حبوب اجهاض امارات للبيع
حبوب الاجهاض الامارات | 00971547952044 | حبوب اجهاض امارات للبيع
حبوب الاجهاض الامارات حبوب سايتوتك الامارات
 
Measuring gravitational attraction with a lattice atom interferometer
Measuring gravitational attraction with a lattice atom interferometerMeasuring gravitational attraction with a lattice atom interferometer
Measuring gravitational attraction with a lattice atom interferometer
Sérgio Sacani
 
Organic Farming and its importance today in the context of soil health and or...
Organic Farming and its importance today in the context of soil health and or...Organic Farming and its importance today in the context of soil health and or...
Organic Farming and its importance today in the context of soil health and or...
Nistarini College, Purulia (W.B) India
 
BIOLOGY ANIMAL KINGDOM CLASS. 11 NCERT..
BIOLOGY ANIMAL KINGDOM CLASS. 11 NCERT..BIOLOGY ANIMAL KINGDOM CLASS. 11 NCERT..
BIOLOGY ANIMAL KINGDOM CLASS. 11 NCERT..
suriyaj2310
 
23PH301 - Optics - Unit 2 - Interference
23PH301 - Optics - Unit 2 - Interference23PH301 - Optics - Unit 2 - Interference
23PH301 - Optics - Unit 2 - Interference
RDhivya6
 
Noida Call Girls Number 9999965857 Vip Call Girls Lady Of Your Dream Ready To...
Noida Call Girls Number 9999965857 Vip Call Girls Lady Of Your Dream Ready To...Noida Call Girls Number 9999965857 Vip Call Girls Lady Of Your Dream Ready To...
Noida Call Girls Number 9999965857 Vip Call Girls Lady Of Your Dream Ready To...
choudharydenunisha
 
Ross Wilson solved MCQS (Watan Dost).pdf
Ross Wilson solved MCQS (Watan Dost).pdfRoss Wilson solved MCQS (Watan Dost).pdf
Ross Wilson solved MCQS (Watan Dost).pdf
Khyber medical university Peshawar
 
Complement Activation Pathways: Key Mechanisms in Immune Defense
Complement Activation Pathways: Key Mechanisms in Immune DefenseComplement Activation Pathways: Key Mechanisms in Immune Defense
Complement Activation Pathways: Key Mechanisms in Immune Defense
deepsarao2001
 

Recently uploaded (20)

(Shilpa) ➤ Call Girls Lucknow 🔥 9352988975 🔥 Real Fun With Sexual Girl Availa...
(Shilpa) ➤ Call Girls Lucknow 🔥 9352988975 🔥 Real Fun With Sexual Girl Availa...(Shilpa) ➤ Call Girls Lucknow 🔥 9352988975 🔥 Real Fun With Sexual Girl Availa...
(Shilpa) ➤ Call Girls Lucknow 🔥 9352988975 🔥 Real Fun With Sexual Girl Availa...
 
seed production, Nursery & Gardening.pdf
seed production, Nursery & Gardening.pdfseed production, Nursery & Gardening.pdf
seed production, Nursery & Gardening.pdf
 
The Limited Role of the Streaming Instability during Moon and Exomoon Formation
The Limited Role of the Streaming Instability during Moon and Exomoon FormationThe Limited Role of the Streaming Instability during Moon and Exomoon Formation
The Limited Role of the Streaming Instability during Moon and Exomoon Formation
 
20240515_CEBaP Poster_SR eating_drinking.pdf
20240515_CEBaP Poster_SR eating_drinking.pdf20240515_CEBaP Poster_SR eating_drinking.pdf
20240515_CEBaP Poster_SR eating_drinking.pdf
 
SPERM FUNCTION TEST IN EMBRYOLOGY .pptx
SPERM FUNCTION TEST  IN EMBRYOLOGY .pptxSPERM FUNCTION TEST  IN EMBRYOLOGY .pptx
SPERM FUNCTION TEST IN EMBRYOLOGY .pptx
 
Delhi Call Girls ✓WhatsApp 9999965857 🔝Top Class Call Girl Service Available
Delhi Call Girls ✓WhatsApp 9999965857 🔝Top Class Call Girl Service AvailableDelhi Call Girls ✓WhatsApp 9999965857 🔝Top Class Call Girl Service Available
Delhi Call Girls ✓WhatsApp 9999965857 🔝Top Class Call Girl Service Available
 
Premuim Call Girls Pune 🔥 7014168258 🔥 Real Fun With Sexual Girl Available 24...
Premuim Call Girls Pune 🔥 7014168258 🔥 Real Fun With Sexual Girl Available 24...Premuim Call Girls Pune 🔥 7014168258 🔥 Real Fun With Sexual Girl Available 24...
Premuim Call Girls Pune 🔥 7014168258 🔥 Real Fun With Sexual Girl Available 24...
 
GBSN - Microbiology (Unit 2) Antimicrobial agents
GBSN - Microbiology (Unit 2) Antimicrobial agentsGBSN - Microbiology (Unit 2) Antimicrobial agents
GBSN - Microbiology (Unit 2) Antimicrobial agents
 
Centrifugation types and its application
Centrifugation types and its applicationCentrifugation types and its application
Centrifugation types and its application
 
Buy Best T-shirts for Men Online Buy Best T-shirts for Men Online
Buy Best T-shirts for Men Online Buy Best T-shirts for Men OnlineBuy Best T-shirts for Men Online Buy Best T-shirts for Men Online
Buy Best T-shirts for Men Online Buy Best T-shirts for Men Online
 
Call Girls Versova ♨️ +91-9920725232 👈Open 24/7 at Top Mumbai Call Girls Service
Call Girls Versova ♨️ +91-9920725232 👈Open 24/7 at Top Mumbai Call Girls ServiceCall Girls Versova ♨️ +91-9920725232 👈Open 24/7 at Top Mumbai Call Girls Service
Call Girls Versova ♨️ +91-9920725232 👈Open 24/7 at Top Mumbai Call Girls Service
 
GBSN - Biochemistry (Unit 12) Hormones
GBSN - Biochemistry (Unit 12) HormonesGBSN - Biochemistry (Unit 12) Hormones
GBSN - Biochemistry (Unit 12) Hormones
 
حبوب الاجهاض الامارات | 00971547952044 | حبوب اجهاض امارات للبيع
حبوب الاجهاض الامارات | 00971547952044 | حبوب اجهاض امارات للبيعحبوب الاجهاض الامارات | 00971547952044 | حبوب اجهاض امارات للبيع
حبوب الاجهاض الامارات | 00971547952044 | حبوب اجهاض امارات للبيع
 
Measuring gravitational attraction with a lattice atom interferometer
Measuring gravitational attraction with a lattice atom interferometerMeasuring gravitational attraction with a lattice atom interferometer
Measuring gravitational attraction with a lattice atom interferometer
 
Organic Farming and its importance today in the context of soil health and or...
Organic Farming and its importance today in the context of soil health and or...Organic Farming and its importance today in the context of soil health and or...
Organic Farming and its importance today in the context of soil health and or...
 
BIOLOGY ANIMAL KINGDOM CLASS. 11 NCERT..
BIOLOGY ANIMAL KINGDOM CLASS. 11 NCERT..BIOLOGY ANIMAL KINGDOM CLASS. 11 NCERT..
BIOLOGY ANIMAL KINGDOM CLASS. 11 NCERT..
 
23PH301 - Optics - Unit 2 - Interference
23PH301 - Optics - Unit 2 - Interference23PH301 - Optics - Unit 2 - Interference
23PH301 - Optics - Unit 2 - Interference
 
Noida Call Girls Number 9999965857 Vip Call Girls Lady Of Your Dream Ready To...
Noida Call Girls Number 9999965857 Vip Call Girls Lady Of Your Dream Ready To...Noida Call Girls Number 9999965857 Vip Call Girls Lady Of Your Dream Ready To...
Noida Call Girls Number 9999965857 Vip Call Girls Lady Of Your Dream Ready To...
 
Ross Wilson solved MCQS (Watan Dost).pdf
Ross Wilson solved MCQS (Watan Dost).pdfRoss Wilson solved MCQS (Watan Dost).pdf
Ross Wilson solved MCQS (Watan Dost).pdf
 
Complement Activation Pathways: Key Mechanisms in Immune Defense
Complement Activation Pathways: Key Mechanisms in Immune DefenseComplement Activation Pathways: Key Mechanisms in Immune Defense
Complement Activation Pathways: Key Mechanisms in Immune Defense
 

Calibration of risk prediction models: decision making with the lights on or off?

  • 1. Calibration of risk prediction models: decision making with the lights on or off? Ben Van Calster KU Leuven (B), LUMC (NL) ISCB Krakow, August 27th 2020
  • 2. TG6: Evaluating diagnostic tests and prediction models • Chairs: o Ewout Steyerberg (Leiden LUMC) o Ben Van Calster (KU Leuven) • Members (alphabetically): o Patrick Bossuyt (AMC Amsterdam) o Tom Boyles (U Witwatersrand, Johannesburg; clinician member) o Gary Collins (U Oxford) o Kathleen Kerr (U Washington, Seattle) o Petra Macaskill (U Sydney) o David McLernon (Aberdeen) o Carl Moons (UMC Utrecht) o Maarten van Smeden (UMC Utrecht) o Andrew Vickers (MSKCC, New York) o Laure Wynants (U Maastricht) 2
  • 3. Risk prediction or binary prediction? 3
  • 4. Risk prediction or binary prediction? 4
  • 5. Risk prediction or binary prediction? 5 Risk is most interpretable, acknowledges imperfect prediction, can be combined with other information, and allows to vary decision thresholds. If you predict risk, you can assess the accuracy of the estimates (calibration). Binary predictions easily hide potential miscalibration.
  • 6. Level 1-2 TG6 paper on calibration 6
  • 7. The Achilles heel of predictive analytics 7 Systematically wrong risk estimates can distort decision-making o Risk overestimated: can lead to many unnecessary interventions o Risk underestimated: can lead to withholding many important interventions Calibration often not assessed during model validation. So for many models, it is not known how accurate the risks are in a specific setting. In that case, you are in fact using a model with the lights off.
  • 8. The Achilles heel of predictive analytics 8 But if AUC is high, the ranking of patients into lower vs higher risk must be very good?  Good relative performance does not imply good absolute performance! Using binary predictions only (e.g. treat vs don’t treat), you are not avoiding the problem. I think you aggravate it by pretending to avoid the problem.
  • 9. 9 Published online in J Ultrasound Med on Aug 11 2020 Objective: develop risk model for first trimester miscarriage in very early pregnancies - Retrospective data, single institution. - 590 pregnancies, 345 miscarried; 9 parameters studied. - Most important predictor (hCG rise) missing in 79%. - No validation at all. “It might appear to be a weakness of our study that the first trimester loss rate was considerably higher than the rates found by other investigators (48% vs 10-30%).The rate is high because of the high prevalence of pregnancy risk factors in our population.” Web-calculator given that allows risk estimation. I cannot support that.
  • 10. How can risks be inaccurate? 10 • Methodological issues at model development or validation o Overfitting, leading to overly extreme risk estimates on new data “in small datasets, it is reasonable for a model not to be developed at all” o Heterogeneity of measurement error between settings (Luijken et al, Stat Med 2019) • Variables and characteristics unrelated to model development o Patient characteristics and outcome incidence/prevalence vary greatly between settings o Patient populations change over time within setting (“drift”) o So there is “Heterogeneity across time and place”
  • 11. Levels of calibration 1. Mean calibration / calibration-in-the-large 2. Weak calibration 3. Moderate calibration 4. Strong calibration Work motivated by a very nice and thought provoking paper from WernerVach (JCE 2013;66:1296-1301) 11
  • 12. 1. Mean calibration The average estimated risk is accurate Compare average risk with outcome prevalence/incidence 12
  • 13. 2. Weak calibration On average, the model does not overestimate or underestimate risk, and does not give too extreme or too modest risks ‘Logistic recalibration’ framework: Evaluate calibration intercept a: log 𝑃 𝑌=1 𝑃 𝑌=0 = 𝑎 + 𝐿 𝑎 < 0 means overestimation, 𝑎 > 0 means underestimation Evaluate calibration slope b: log 𝑃 𝑌=1 𝑃 𝑌=0 = 𝑎 + 𝑏𝐿 𝑏 < 1 means too extreme risks, 𝑏 > 1 means too modest risks 13
  • 14. 3. Moderate calibration Observed proportion of events correspond to estimated risk Construct a flexible calibration curve based on log 𝑃 𝑌=1 𝑃 𝑌=0 = 𝑎 + 𝑓(𝐿). 𝑓(. ) is usually a loess fit, but can also be based on splines. This is preferable at external validation, but sufficient N needed. Intercept and slope are nice summaries, but reduce calibration to 2 numbers (weak). The slope is usually sufficient for internal validation (using bootstrapping or cross- validation), but the intercept or plotting a curve can sometimes be defended as well. 14
  • 15. Some reference calibration curves 15 25% outcome prevalence
  • 16. Example curves with low N 16Verhoeven et al. Ultrasound ObstetGynecol 2009;34:316-321. 240 cases, 27 events (Caesarean delivery) “Calibration of the model on the right was not as good as the calibration of the model on the left”
  • 17. 4. Strong calibration Observed proportion of events correspond to estimated risk for each covariate pattern Hard to assess (unless the model has only a few dichotomous predictors) This is clinically desirable but utopic.The model needs to be fully correct. A diagonal calibration curve (i.e. moderate) does not imply strong calibration. We have shown that moderate calibration cannot lead to harmful decisions (in the framework of decision curve analysis). 17
  • 19. Multinomial outcomes? 1. Calibration intercepts and slopes can be calculated for multinomial logistic regression by extending the approach for binary outcomes to log 𝑃 𝑌 = 𝑘 𝑃 𝑌 = 𝐽 = 𝑎 𝑘 + 𝑖=1 𝐾−1 𝑏 𝑘,𝑖 𝐿𝑖 2. Flexible calibration curves can be obtained by using vector splines s(.) log 𝑃 𝑌 = 𝑘 𝑃 𝑌 = 𝐽 = 𝑎 𝑘 + 𝑖=1 𝐾−1 𝑠 𝑘,𝑖 𝐿𝑖 This can be extended to risk models for ordinal outcomes, and to risk models based on e.g. machine learning algorithms 19
  • 24. Heterogeneity: example Centre-specific and overall logistic (i.e. non-flexible) calibration curves: logistic recalibration model with random intercept and random slope for the J centres (Wynants et al, SMMR 2018): 𝑙𝑜𝑔 𝑃 𝑌=1 𝑃 𝑌=0 = 𝛼 + 𝑎𝑗 + 𝛽𝐿 + 𝑏𝑗 𝐿, where 𝑎𝑗 𝑏𝑗 ~𝑁 0 0 , 𝜏 𝑎 2 𝜏 𝑎𝑏 𝜏 𝑎𝑏 𝜏 𝑏 2 . 24
  • 25. Cox models (TG6 paper in preparation) 25
  • 26. Cox models What you can do depends on the information you have (next to the validation dataset) In my view, level 2 is what is needed for clinical application. It is also what TRIPOD recommends (Moons et al, Ann Intern Med 2005). 26 Level Available information about the model Level 1 Only model coefficients (very common) Level 2 Coefficients + cumulative baseline hazard at t1, 𝐻0 𝑡1 Level 3 Original dataset
  • 27. Cox models If 𝐻0 𝑡1 is available, flexible adaptive hazard regression can be used to generate a flexible calibration curve at time t1 𝑙𝑜𝑔 ℎ 𝑡 = 𝑔 𝑙𝑜𝑔 −𝑙𝑜𝑔 1 − 𝑝𝑡1 , 𝑡 , with 𝑝𝑡1 = 1 − 𝑒𝑥𝑝 −𝐻0 𝑡1 exp 𝛃 𝑇 𝐗 Can also be used for other time-to-event models. See Austin, Harrell, van Klaveren (Stat Med 2020). 27
  • 28. 3 myths about risk thresholds (TG6 paper) 28
  • 29. 3 myths about risk thresholds 1. Risk groups are more useful than continuous risk estimates  Clinically actionable groups (that have consensus) can make sense, but this remains rough for decision making at individual level 2. You can ask your statistician to get you the threshold  Depends on clinical context, you need reasonable information on misclassification costs 3. The threshold is a part of the model  Different preferences, different healthcare systems These 3 issues are obviously related to each other. 29
  • 30. Further plans TG6 Practical guidance on validation of risk models for time-to-event outcomes Practical guidance on validation of risk models accounting for competing risks Simple paper (level 1) with advice for prediction model development Multicenter diagnostic test evaluations: guidance on design and analysis Hands-on tutorial of tools to assess calibration for different outcomes 30
  • 31. “Medicine is a science of uncertainty and an art of probability” WilliamOsler 31
  翻译: