mecor is an R package for Measurement Error CORrection. mecor implements measurement error correction methods for linear models with continuous outcomes. The measurement error can either occur in a continuous covariate or in the continuous outcome. This vignette discusses covariate measurement error correction by means of standard regression calibration in mecor.
Regression calibration is one of the most popular
measurement error correction methods for covariate measurement error.
This vignette shows how standard regression calibration is
applied in an internal validation study, a replicates study, a
calibration study and an external validation study. Each of the four
studies will be introduced in the subsequent sections, along with
examples of how standard regression calibration can be applied
in each of the four studies using mecor’s function
mecor()
. In all four studies, our interest lies in
estimating the association between a continuous reference exposure X and a continuous outcome Y, given covariates Z. Instead of X, the substitute error-prone
exposure X* is
measured.
The simulated data set vat
in mecor is
an internal covariate-validation study. An internal validation study
includes a subset of individuals of whom the reference exposure X is observed. The data set
vat
contains 1000 observations of the outcome insulin
resistance IRln,
the error-prone exposure waist circumference WC and the covariates sex,
age and total body fat, sex, age and TBF, respectively.
The reference exposure visceral adipose tissue VAT is observed in
approximately 25% of the individuals in the study.
# load internal covariate validation study
data("vat", package = "mecor")
head(vat)
#> ir_ln wc sex age tbf vat
#> 1 -0.09341837 -1.3136816 1 48 -0.6571345 NA
#> 2 0.16820894 -2.0336624 0 54 -1.5882163 NA
#> 3 0.57299976 -0.2611214 0 46 -1.1033709 NA
#> 4 0.63677178 0.8631987 0 55 -1.4785869 0.5083247
#> 5 0.92908882 -1.2054861 1 61 0.9020136 NA
#> 6 -0.72410039 -2.5032852 1 47 -0.9584166 NA
When ignoring the measurement error in WC, one would naively regress WC and sex, age and TBF on IRln. This results in a biased estimation of the exposure-outcome association:
# naive estimate of the exposure-outcome association
lm(ir_ln ~ wc + sex + age + tbf, data = vat)
#>
#> Call:
#> lm(formula = ir_ln ~ wc + sex + age + tbf, data = vat)
#>
#> Coefficients:
#> (Intercept) wc sex age tbf
#> 0.50976 0.09697 -0.70953 0.01133 0.38783
Alternatively, one could perform an analysis restricted to the internal validation set:
# analysis restricted to the internal validation set
lm(ir_ln ~ vat + sex + age + tbf, data = subset(vat, !is.na(vat)))
#>
#> Call:
#> lm(formula = ir_ln ~ vat + sex + age + tbf, data = subset(vat,
#> !is.na(vat)))
#>
#> Coefficients:
#> (Intercept) vat sex age tbf
#> 0.542422 0.195751 -0.450991 0.008737 0.291963
Although the above would result in an unbiased estimation of the
exposure-outcome association, approximately 75% of the data is thrown
out. Instead of doing an analysis restricted to the internal validation
set, you could use standard regression calibration to correct
for the measurement error in X*. The following code
chunk shows standard regression calibration with
mecor()
:
mecor(formula = ir_ln ~ MeasError(substitute = wc, reference = vat) + sex + age + tbf,
data = vat,
method = "standard", # defaults to "standard"
B = 0) # defaults to 0
#>
#> Call:
#> mecor(formula = ir_ln ~ MeasError(substitute = wc, reference = vat) +
#> sex + age + tbf, data = vat, method = "standard", B = 0)
#>
#> Coefficients Corrected Model:
#> (Intercept) vat sex age tbf
#> 0.473398350 0.207598087 -0.438453038 0.009477677 0.270864391
#>
#> Coefficients Uncorrected Model:
#> (Intercept) wc sex age tbf
#> 0.50976395 0.09697045 -0.70952736 0.01132712 0.38782671
As shown in the above code chunk, the mecor()
function
needs a formula argument, a data argument, a
method argument and a B argument. Presumably, you are
familiar with the structure of a formula in R. The only thing that’s
different here is the use of a MeasError()
object in the
formula. A MeasError()
object is used to declare the
substitute measure, in our case WC, and the reference
measure, in our case VAT. The
B argument of mecor()
is used to calculate
bootstrap confidence intervals for the corrected coefficients of the
model. Let us construct 95% confidence intervals using the bootstrap
with 999 replicates:
# save corrected fit
rc_fit <-
mecor(formula = ir_ln ~ MeasError(substitute = wc, reference = vat) + age + sex + tbf,
data = vat,
method = "standard", # defaults to "standard"
B = 999) # defaults to 0
Print the confidence intervals to the console using
summary()
:
summary(rc_fit)
#>
#> Call:
#> mecor(formula = ir_ln ~ MeasError(substitute = wc, reference = vat) +
#> age + sex + tbf, data = vat, method = "standard", B = 999)
#>
#> Coefficients Corrected Model:
#> Estimate SE SE (btstr)
#> (Intercept) 0.473398 0.146766 0.133071
#> vat 0.207598 0.034210 0.033526
#> age 0.009478 0.002598 0.002360
#> sex -0.438453 0.079596 0.077800
#> tbf 0.270864 0.036662 0.034923
#>
#> 95% Confidence Intervals:
#> Estimate LCI UCI LCI (btstr) UCI (btstr)
#> (Intercept) 0.473398 0.185743 0.761054 0.207086 0.735638
#> vat 0.207598 0.140549 0.274648 0.145293 0.273734
#> age 0.009478 0.004385 0.014570 0.004813 0.014130
#> sex -0.438453 -0.594458 -0.282448 -0.588447 -0.283742
#> tbf 0.270864 0.199007 0.342721 0.202177 0.337608
#> Bootstrap Confidence Intervals are based on 999 bootstrap replicates using percentiles
#>
#> The measurement error is corrected for by application of regression calibration
#>
#> Coefficients Uncorrected Model:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 0.5097640 0.1264211 4.0323 6.185e-05
#> wc 0.0969705 0.0137957 7.0290 5.308e-12
#> age 0.0113271 0.0022048 5.1374 3.695e-07
#> sex -0.7095274 0.0390086 -18.1890 < 2.2e-16
#> tbf 0.3878267 0.0201489 19.2481 < 2.2e-16
#>
#> 95% Confidence Intervals:
#> Estimate LCI UCI
#> (Intercept) 0.509764 0.261517 0.758011
#> wc 0.096970 0.069881 0.124060
#> age 0.011327 0.006998 0.015657
#> sex -0.709527 -0.786127 -0.632928
#> tbf 0.387827 0.348261 0.427392
#>
#> Residual standard error: 0.3123469 on 645 degrees of freedom
Two types of 95% confidence intervals are shown in the output of the
summary()
object. Bootstrap confidence intervals and Delta
method confidence intervals. The default method to constructing
confidence intervals in mecor is the Delta method.
Further, Fieller method confidence intervals and zero variance method
confidence intervals can be constructed with summary()
:
# fieller method ci and zero variance method ci and se's for 'rc_fit'
summary(rc_fit, zerovar = TRUE, fieller = TRUE)
#>
#> Call:
#> mecor(formula = ir_ln ~ MeasError(substitute = wc, reference = vat) +
#> age + sex + tbf, data = vat, method = "standard", B = 999)
#>
#> Coefficients Corrected Model:
#> Estimate SE SE (btstr) SE (zerovar)
#> (Intercept) 0.473398 0.146766 0.133071 0.126665
#> vat 0.207598 0.034210 0.033526 0.029534
#> age 0.009478 0.002598 0.002360 0.002236
#> sex -0.438453 0.079596 0.077800 0.069276
#> tbf 0.270864 0.036662 0.034923 0.031805
#>
#> 95% Confidence Intervals:
#> Estimate LCI UCI LCI (btstr) UCI (btstr) LCI (zerovar)
#> (Intercept) 0.473398 0.185743 0.761054 0.207086 0.735638 0.225140
#> vat 0.207598 0.140549 0.274648 0.145293 0.273734 0.149712
#> age 0.009478 0.004385 0.014570 0.004813 0.014130 0.005096
#> sex -0.438453 -0.594458 -0.282448 -0.588447 -0.283742 -0.574231
#> tbf 0.270864 0.199007 0.342721 0.202177 0.337608 0.208528
#> UCI (zerovar) LCI (fieller) UCI (fieller)
#> (Intercept) 0.721657 NA NA
#> vat 0.265484 0.145068 0.281464
#> age 0.013860 NA NA
#> sex -0.302675 NA NA
#> tbf 0.333201 NA NA
#> Bootstrap Confidence Intervals are based on 999 bootstrap replicates using percentiles
#>
#> The measurement error is corrected for by application of regression calibration
#>
#> Coefficients Uncorrected Model:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 0.5097640 0.1264211 4.0323 6.185e-05
#> wc 0.0969705 0.0137957 7.0290 5.308e-12
#> age 0.0113271 0.0022048 5.1374 3.695e-07
#> sex -0.7095274 0.0390086 -18.1890 < 2.2e-16
#> tbf 0.3878267 0.0201489 19.2481 < 2.2e-16
#>
#> 95% Confidence Intervals:
#> Estimate LCI UCI
#> (Intercept) 0.509764 0.261517 0.758011
#> wc 0.096970 0.069881 0.124060
#> age 0.011327 0.006998 0.015657
#> sex -0.709527 -0.786127 -0.632928
#> tbf 0.387827 0.348261 0.427392
#>
#> Residual standard error: 0.3123469 on 645 degrees of freedom
Fieller method confidence intervals are only constructed for the corrected covariate (in this case X).
The simulated data set bloodpressure
in
mecor is a replicates study. A replicates study
includes a subset of individuals of whom the error-prone substitute
exposure is repeatedly measured. The dataset bloodpressure
contains 1000 observations of the outcome creatinine,
three replicate measures of the error-prone exposure systolic blood
pressure sbp30, sbp60
and sbp120,
and one covariates age age. It is assumed
that there is ‘random’ measurement error in the repeatedly measured
substitute exposure measure.
# load replicates study
data("bloodpressure", package = "mecor")
head(bloodpressure)
#> creatinine age sbp30 sbp60 sbp90 sbp120
#> 1 53.75670 27 120.7987 113.2812 118.0705 124.2282
#> 2 63.08498 36 121.7254 106.8143 118.9882 115.1341
#> 3 60.04718 31 108.8798 119.6577 106.5588 117.5473
#> 4 62.42976 43 116.5566 117.4964 126.3625 121.7148
#> 5 61.31801 25 123.3018 116.4629 112.0310 109.8754
#> 6 50.60952 35 124.9119 129.0927 129.0224 114.0828
When ignoring the measurement error in sbp30, one would naively regress sbp30, age on creatinine. Which results in a biased estimation of the exposure-outcome association:
# naive estimate of the exposure-outcome association
lm(creatinine ~ sbp30 + age,
data = bloodpressure)
#>
#> Call:
#> lm(formula = creatinine ~ sbp30 + age, data = bloodpressure)
#>
#> Coefficients:
#> (Intercept) sbp30 age
#> 41.3050 0.1165 0.1651
Or alternatively, one could calculate the mean of each of the three replicate measures. Yet, this would still lead to a biased estimation of the exposure-outcome association:
## calculate the mean of the three replicate measures
bloodpressure$sbp_123 <- with(bloodpressure, rowMeans(cbind(sbp30, sbp60, sbp120)))
# naive estimate of the exposure-outcome association version 2
lm(creatinine ~ sbp_123 + age,
data = bloodpressure)
#>
#> Call:
#> lm(formula = creatinine ~ sbp_123 + age, data = bloodpressure)
#>
#> Coefficients:
#> (Intercept) sbp_123 age
#> 35.4293 0.1641 0.1685
For an unbiased estimation of the exposure-outcome association, one
could use regression calibration using mecor()
:
mecor(formula = creatinine ~ MeasError(sbp30, replicate = cbind(sbp60, sbp120)) + age,
data = bloodpressure)
#>
#> Call:
#> mecor(formula = creatinine ~ MeasError(sbp30, replicate = cbind(sbp60,
#> sbp120)) + age, data = bloodpressure)
#>
#> Coefficients Corrected Model:
#> (Intercept) cor_sbp30 age
#> 32.6298873 0.1862331 0.1717354
#>
#> Coefficients Uncorrected Model:
#> (Intercept) sbp30 age
#> 41.3050286 0.1165333 0.1650849
Instead of using the reference argument in the
MeasError()
object, the replicate argument is
used. Standard errors of the regression calibration estimator and
confidence intervals can be constructed similar to what was shown for an
internal validation study.
The simulated data set sodium
in mecor
is a outcome calibration study. In a calibration study, two types of
measurements are used to measure the outcome (or exposure). A
measurement method prone to ‘systematic’ error, and a measurement method
prone to ‘random’ error. The measurement prone to ‘systematic’ error is
observed in the full study, the measurement prone to ‘classical’ error
is observed in a subset of the study and repeatedly measured. The
dataset sodium
contains 1000 observations of the
systematically error prone outcome recall,
the randomly error prone outcome urinary1
and urinary2,
and the exposure (in our case a indicator for diet) diet. The
two replicate measures of the outcome prone to random error are observed
in 498 individuals (approximately 50 percent).
# load calibration study
data("sodium", package = "mecor")
head(sodium)
#> recall diet urinary1 urinary2
#> 1 3.033925 1 3.249682 3.505636
#> 2 3.703586 0 4.416647 5.041728
#> 3 3.637268 1 NA NA
#> 4 3.892225 0 5.125907 5.033366
#> 5 3.625387 1 3.362000 4.626138
#> 6 3.754797 0 NA NA
When ignoring the measurement error in recall, one would naively regress diet on recall. Which results in a biased estimation of the exposure-outcome association:
## uncorrected regression
lm(recall ~ diet,
data = sodium)
#>
#> Call:
#> lm(formula = recall ~ diet, data = sodium)
#>
#> Coefficients:
#> (Intercept) diet
#> 3.8820 -0.3052
Alternatively, one could use the first half of the study population and use the mean of each of the two replicate measures. This would lead to an unbiased estimation of the exposure-outcome association since there is random measurement error int he replicate measures:
## calculate mean of three replicate measures
sodium$urinary_12 <- with(sodium, rowMeans(cbind(urinary1, urinary2)))
## uncorrected regression version 2
lm(urinary_12 ~ diet,
data = sodium)
#>
#> Call:
#> lm(formula = urinary_12 ~ diet, data = sodium)
#>
#> Coefficients:
#> (Intercept) diet
#> 4.5941 -0.5041
For an unbiased estimation of the exposure-outcome association, one
could alternatively use standard regression calibration using
mecor()
:
mecor(formula = MeasError(substitute = recall, replicate = cbind(urinary1, urinary2)) ~ diet,
data = sodium)
#>
#> Call:
#> mecor(formula = MeasError(substitute = recall, replicate = cbind(urinary1,
#> urinary2)) ~ diet, data = sodium)
#>
#> Coefficients Corrected Model:
#> (Intercept) diet
#> 4.6075011 -0.4843495
#>
#> Coefficients Uncorrected Model:
#> (Intercept) diet
#> 3.8819732 -0.3051777
Standard errors of the regression calibration estimator and confidence intervals can be constructed similar to what was shown for an internal validation study.
The simulated data set heamoglogin_ext
in
mecor is a external outcome-validation study. An
external validation study is used when in the main study, no information
is available to correct for the measurement error in the outcome Y (or exposure). Suppose for example
that venous heamoglobin levels venous
are not observed in the internal outcome-validation study
haemoglobin
. An external validation study is a (small) sub
study containing observations of the reference measure venous
heamoglobin levels venous,
the error-prone substitute measure capillary
(and the covariate(s) Z in
case of an covariate-validation study) of the original study. The
external validation study is then used to estimate the calibration
model, that is subsequently used to correct for the measurement error in
the main study.
# load internal covariate validation study
data("haemoglobin_ext", package = "mecor")
head(haemoglobin_ext)
#> capillary venous
#> 1 104.7269 115.3023
#> 2 133.9946 119.7616
#> 3 104.0304 108.0562
#> 4 119.0214 121.1780
#> 5 114.3891 111.7864
#> 6 111.7754 112.8943
data("haemoglobin", package = "mecor")
Suppose reference measure X
is not observed in dataset icvs
. To correct the bias in the
naive association between exposure X* and outcome Y given Z, using the external validation
study, one can proceed as follows using mecor()
:
# Estimate the calibration model in the external validation study
calmod <- lm(capillary ~ venous,
data = haemoglobin)
# Use the calibration model for measurement error correction:
mecor(MeasErrorExt(substitute = capillary, model = calmod) ~ supplement,
data = haemoglobin)
#>
#> Call:
#> mecor(formula = MeasErrorExt(substitute = capillary, model = calmod) ~
#> supplement, data = haemoglobin)
#>
#> Coefficients Corrected Model:
#> (Intercept) supplement
#> 117.99341 6.97392
#>
#> Coefficients Uncorrected Model:
#> (Intercept) supplement
#> 124.452261 7.764702
In the above, a MeasErrorExt()
object is used,
indicating that external information is used for measurement error
correction. The model argument of a MeasErrorExt()
object
takes a linear model of class lm (in the above calmod
).
Alternatively, a named list with the coefficients of the calibration
model can be used as follows:
# Use coefficients for measurement error correction:
mecor(MeasErrorExt(capillary, model = list(coef = c(-7, 1.1))) ~ supplement,
data = haemoglobin)
#>
#> Call:
#> mecor(formula = MeasErrorExt(capillary, model = list(coef = c(-7,
#> 1.1))) ~ supplement, data = haemoglobin)
#>
#> Coefficients Corrected Model:
#> (Intercept) supplement
#> 119.50206 7.05882
#>
#> Coefficients Uncorrected Model:
#> (Intercept) supplement
#> 124.452261 7.764702