I fitted a trilinear model
library(nlstools)
library(nlsMicrobio)
library(investr) # for plotFit function
trilinear
LOG10N ~ LOG10N0 - (t >= Sl) * (t <= (Sl + (LOG10N0 - LOG10Nres) *
log(10)/kmax)) * kmax * (t - Sl)/log(10) + (t >= Sl) * (t >
(Sl + (LOG10N0 - LOG10Nres) * log(10)/kmax)) * (LOG10Nres -
LOG10N0)
to bacterial survival data
data(survivalcurve1)
survivalcurve1
t LOG10N
1 0.00 7.56
2 0.33 7.41
3 1.00 7.26
4 2.00 7.30
5 3.00 7.26
6 4.00 7.15
7 5.00 7.30
8 6.00 6.48
9 7.00 6.15
10 8.00 5.30
11 9.00 4.78
12 10.00 5.11
13 11.00 2.30
14 13.00 3.15
15 14.00 2.00
16 16.00 1.00
17 18.00 1.00
18 20.00 1.00
19 23.00 1.00
using an OLS fit with nls :
nls = nls(trilinear, survivalcurve1,
list(Sl = 5, kmax = 1.5, LOG10N0 = 7, LOG10Nres = 1))
overview(nls)
Parameters:
Estimate Std. Error t value Pr(>|t|)
Sl 4.7064 0.5946 7.915 9.82e-07 ***
kmax 1.3223 0.1222 10.818 1.76e-08 ***
LOG10N0 7.3233 0.1884 38.875 < 2e-16 ***
LOG10Nres 1.0000 0.2307 4.334 0.00059 ***
t-based confidence interval:
2.5% 97.5%
Sl 3.4389618 5.973874
kmax 1.0617863 1.582868
LOG10N0 6.9218035 7.724863
LOG10Nres 0.5082284 1.491772
plotFit(nls, interval="confidence")
I was wondering though if I could also fit that model using maximum likelihood on the original (non-log transformed) cell nrs (which would be in this case be survivalcurve1$N = (10^survivalcurve1$LOG10N) ), taking into account that the error structure would be approx Poisson? Can this perhaps be done using bbmle's mle2, and if so, what would be the correct syntax?
EDIT: I tried with
survivalcurve1$N = as.integer(10^survivalcurve1$LOG10N)
trilinearN=formula(N ~ dpois( 10^(LOG10N0 - (t >= Sl) * (t <= (Sl + (LOG10N0 - LOG10Nres) *
log(10)/kmax)) * kmax * (t - Sl)/log(10) + (t >= Sl) * (t > (Sl + (LOG10N0 - LOG10Nres) * log(10)/kmax)) * (LOG10Nres - LOG10N0))))
m1 = mle2(trilinearN, start=list(Sl = 5, kmax = 1.5, LOG10N0 = 7, LOG10Nres = 1), data=survivalcurve1)
and
coef(summary(m1))
gives me
Estimate Std. Error z value Pr(z)
Sl 4.902048 1.669354e-04 2.936495e+04 0
kmax 1.475309 3.210865e-04 4.594739e+03 0
LOG10N0 7.344014 3.785883e-05 1.939842e+05 0
LOG10Nres -1.830498 1.343019e-10 -1.362972e+10 0
Couldn't get plotting the predictions to work though :
df=data.frame(t=seq(0,max(survivalcurve1$t),length=100))
df$pred=predict(m1,newdata=df)
with(df,lines(t,pred,col=2))
as this gave me the error
Error : object of type 'symbol' is not subsettable
Error in gfun(object, newdata = newdata, location = location, op = "predict") :
can only use predict() if formula specified
Any thoughts? Also, how would I make out if the Poisson mle2 fit was any better than the nls one? (As AIC cannot be compared due to the difference in scale)
PS The geeraerd model would be OK too, in case that would be any easier:
geeraerd
LOG10N ~ LOG10Nres + log10((10^(LOG10N0 - LOG10Nres) - 1) * exp(kmax *
Sl)/(exp(kmax * t) + (exp(kmax * Sl) - 1)) + 1)
Related
I am trying to fit a broken stick model to longitudinal data. I am unable to reproduce data, here is the first six observations:
pid arm pday pardens logpd
1 MOB2004_002 SP 0 40973 10.621
2 MOB2004_002 SP 1 91404 11.423
3 MOB2004_002 SP 2 14342 9.571
4 MOB2004_002 SP 3 0 0.000
5 MOB2004_002 SP 7 0 0.000
6 MOB2004_003 SP/ART 0 11428 9.344
and a plot showing the means of 'logpd' at each day (note: it is clearly non-linear, but I am specifically asked to use broken stick assuming the lines are linear):
The breakpoint is set at day 2. I have found the following syntax from:
(1) bp1
(2)bp2
The first consist of creating new parameters, as with segmented linear regression:
bp = 2
b1 <- function(x, bp) ifelse(x < bp, bp - x, 0)
b2 <- function(x, bp) ifelse(x < bp, 0, x - bp)
#Mixed effects model with break point = 2
(mod <- lmer(logpd ~ arm * b1(pday, bp) * b2(pday, bp) + (b1(pday, bp) + b2(pday, bp) | pid), data = q1))
summary(mod)
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 2.0071 0.1782 291.5626 11.26 < 2e-16 ***
armSP/ART -2.0949 0.2520 289.3692 -8.31 3.7e-15 ***
b1(pday, bp) 3.3350 0.1103 305.7435 30.24 < 2e-16 ***
b2(pday, bp) -0.3713 0.0488 438.5265 -7.61 1.7e-13 ***
armSP/ART:b1(pday, bp) 0.3178 0.1562 303.0042 2.03 0.043 *
armSP/ART:b2(pday, bp) 0.3972 0.0691 437.3673 5.75 1.7e-08 ***
The second using indicator variables:
mod.1 = lmer(logpd ~ arm * I(pday * (pday <= 2)) * I(pday* (pday > 2)) + (1 | pid), data = q1)
summary(mod.1)
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 7.4796 0.2001 1051.4035 37.37 < 2e-16 ***
armSP/ART -1.4758 0.2845 1048.5079 -5.19 2.5e-07 ***
I(pday * (pday <= 2)) -2.4792 0.1469 1142.4166 -16.88 < 2e-16 ***
I(pday * (pday > 2)) -1.1930 0.0429 1151.6921 -27.78 < 2e-16 ***
armSP/ART:I(pday * (pday <= 2)) -0.5121 0.2076 1138.0203 -2.47 0.0138 *
armSP/ART:I(pday * (pday > 2)) 0.1609 0.0610 1148.1222 2.64 0.0084 **
The results are extremely different, the second model seems correct based on the plot. But I don't understand why they would be so different?
I using an lm() like function called robu() from library robumeta within my own function foo.
However, I'm manipulating the formula argument such that when it is missing the default formula would be: formula(dint~1) or else any formula that user defines.
It works fine, however, in the output of foo the printed formula call always is: Model: missing(f) if formula(dint ~ 1) regardless of what formula is inputted in the foo.
Can I correct this part of output so that it only shows the exact formula used? (see below examples)
dat <- data.frame(dint = 1:9, SD = 1:9*.1,
time = c(1,1,2,3,4,3,2,4,1),
study.name = rep(c("bob", "jim", "jon"), 3))
library(robumeta)
# MY FUNCTION:
foo <- function(f, data){
robu(formula = if(missing(f)) formula(dint~1) else formula(f), data = data, studynum = study.name, var = SD^2)
}
# EXAMPLES OF USE:
foo(data = dat) ## HERE I expect: `Model: dint ~ 1`
foo(dint~as.factor(time), data = dat) ## HERE I expect: `Model: dint ~ time`
One option is to update the 'ml' object
foo <- function(f, data){
fmla <- if(missing(f)) {
formula(dint ~ 1)
} else {
formula(f)
}
model <- robu(formula = fmla, data = data, studynum = study.name, var = SD^2)
model$ml <- fmla
model
}
-checking
foo(data = dat)
RVE: Correlated Effects Model with Small-Sample Corrections
Model: dint ~ 1
Number of studies = 3
Number of outcomes = 9 (min = 3 , mean = 3 , median = 3 , max = 3 )
Rho = 0.8
I.sq = 96.83379
Tau.sq = 9.985899
Estimate StdErr t-value dfs P(|t|>) 95% CI.L 95% CI.U Sig
1 X.Intercept. 4.99 0.577 8.65 2 0.0131 2.51 7.48 **
---
Signif. codes: < .01 *** < .05 ** < .10 *
---
Note: If df < 4, do not trust the results
foo(dint~ as.factor(time), data = dat)
RVE: Correlated Effects Model with Small-Sample Corrections
Model: dint ~ as.factor(time)
Number of studies = 3
Number of outcomes = 9 (min = 3 , mean = 3 , median = 3 , max = 3 )
Rho = 0.8
I.sq = 97.24601
Tau.sq = 11.60119
Estimate StdErr t-value dfs P(|t|>) 95% CI.L 95% CI.U Sig
1 X.Intercept. 3.98 2.50 1.588 2.00 0.253 -6.80 14.8
2 as.factor.time.2 1.04 4.41 0.236 1.47 0.842 -26.27 28.3
3 as.factor.time.3 1.01 1.64 0.620 1.47 0.617 -9.10 11.1
4 as.factor.time.4 2.52 2.50 1.007 2.00 0.420 -8.26 13.3
---
Signif. codes: < .01 *** < .05 ** < .10 *
I am writing a paper about the validity of a billing code in hospitalized children. I am a very novice R studio user. I need the confidence intervals for the sensitive and specificity and positive and negative predictive values but I can't figure out how to do it.
My data has 3 columns : ID, true value, billing value
Here is my code:
confusionMatrix(table(finalcodedataset$billing_value, finalcodedataset$true_value),
positive="1", boot=TRUE, boot_samples=4669, alpha=0.05)
here is the output:
Confusion Matrix and Statistics
0 1
0 4477 162
1 10 20
Accuracy : 0.9632
95% CI : (0.9574, 0.9684)
No Information Rate : 0.961
P-Value [Acc > NIR] : 0.238
Kappa : 0.1796
Mcnemar's Test P-Value : <2e-16
Sensitivity : 0.109890
Specificity : 0.997771
Pos Pred Value : 0.666667
Neg Pred Value : 0.965079
Prevalence : 0.038981
Detection Rate : 0.004284
Detection Prevalence : 0.006425
Balanced Accuracy : 0.553831
'Positive' Class : 1
You can use epiR package for this purpouse.
Example:
library(epiR)
data <- as.table(matrix(c(670,202,74,640), nrow = 2, byrow = TRUE))
rval <- epi.tests(data, conf.level = 0.95)
print(rval)
Outcome + Outcome - Total
Test + 670 202 872
Test - 74 640 714
Total 744 842 1586
Point estimates and 95 % CIs:
---------------------------------------------------------
Apparent prevalence 0.55 (0.52, 0.57)
True prevalence 0.47 (0.44, 0.49)
Sensitivity 0.90 (0.88, 0.92)
Specificity 0.76 (0.73, 0.79)
Positive predictive value 0.77 (0.74, 0.80)
Negative predictive value 0.90 (0.87, 0.92)
Positive likelihood ratio 3.75 (3.32, 4.24)
Negative likelihood ratio 0.13 (0.11, 0.16)
---------------------------------------------------------
Caret and other packages use the Clopper-Pearson Interval method to calculate the confidence interval.
I consider your 2x2 reversed since the TP (True Positive) is on the bottom right. If the TP is at the top left then variables (A,B,C,D) would be switched.
D = 4477
C = 162
B = 10
A = 20
Acc = (A+D)/(A+B+C+D)
Sensitivity = A / (A + C)
Specificity = D / (D + B)
P = (A+C)/(A+B+C+D)
PPV = (Sensitivity*P)/((Sensitivity*P)+((1-Specificity)*(1-P)))
NPV = (Specificity*(1-P))/(((1 - Sensitivity)*P)+((Specificity)*(1-P)))
n = A+B+C+D
x = n - (A+D)
alpha = 0.05
ub = 1 - ((1 + (n - x + 1)/ (x * qf(alpha *.5, 2*x, 2*(n - x + 1))))^-1)
lb = 1 - ((1 + (n - x) / ((x + 1)* qf(1-(alpha*.5), 2*(x+1), 2*(n-x))))^-1)
CI = c(lb,ub)
> Acc
[1] 0.9631613
> CI
[1] 0.9573536 0.9683800
> Sensitivity
[1] 0.1098901
> Specificity
[1] 0.9977713
> PPV
[1] 0.6666667
> NPV
[1] 0.9650787
Here is also a good resource for where these formulas come from.
The following reproducible example is partially inspired from ROC curve from training data in caret.
library(MLeval)
library(caret)
library(pROC)
data(Sonar)
ctrl <- trainControl(method = "cv", summaryFunction = twoClassSummary, classProbs = TRUE, savePredictions = TRUE)
set.seed(42)
fit1 <- train(Class ~ ., data = Sonar,method = "rf",trControl = ctrl)
bestmodel <- merge(fit1$bestTune, fit1$pred)
mtx <- confusionMatrix(table(bestmodel$pred, bestmodel$obs))$table
# M R
# M 104 23
# R 7 74
# 95% Confident Interval
## Sensitivity
sens_errors <- sqrt(sensitivity(mtx) * (1 - sensitivity(mtx)) / sum(mtx[,1]))
sensLower <- sensitivity(mtx) - 1.96 * sens_errors
sensUpper <- sensitivity(mtx) + 1.96 * sens_errors
## Specificity
spec_errors <- sqrt(specificity(mtx) * (1 - specificity(mtx)) / sum(mtx[,2]))
specLower <- specificity(mtx) - 1.96 * spec_errors
specUpper <- specificity(mtx) + 1.96 * spec_errors
## Positive Predictive Values
ppv_errors <- sqrt(posPredValue(mtx) * (1 - posPredValue(mtx)) / sum(mtx[1,]))
ppvLower <- posPredValue(mtx) - 1.96 * ppv_errors
ppvUpper <- posPredValue(mtx) + 1.96 * ppv_errors
## Negative Predictive Values
npv_errors <- sqrt(negPredValue(mtx) * (1 - negPredValue(mtx)) / sum(mtx[2,]))
npvLower <- negPredValue(mtx) - 1.96 * npv_errors
npvUpper <- negPredValue(mtx) + 1.96 * npv_errors
I have a logistic regression model, for which I have been using the rms package. The model fits best using a log term for tn1, and for clinical interpretation I’m using log2. I ran the model using lrm from the rms package, and then to double check, I ran it using glm. The initial coefficients are the same:
h <- lrm(formula = outcomehosp ~ I(log2(tn1 + 0.001)) + apscore_ad +
emsurg + corrapiidiag, data = d, x = TRUE, y = TRUE)
Coef S.E. Wald Z Pr(>|Z|)
Intercept -3.4570 0.3832 -9.02 <0.0001
tn1 0.0469 0.0180 2.60 0.0093
apscore_ad 0.1449 0.0127 11.44 <0.0001
emsurg 0.0731 0.3228 0.23 0.8208
f <- glm(formula = outcomehosp ~ apscore_ad + emsurg + corrapiidiag +
I(log2(tn1 + 0.001)), family = binomial(), data = tn24)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -3.45699 0.38315 -9.023 < 2e-16
I(log2(tn1 + 0.001)) 0.04690 0.01804 2.600 0.00932
apscore_ad 0.14487 0.01267 11.438 < 2e-16
emsurg 0.07310 0.32277 0.226 0.82082
However when I try to get the odds ratios, they are noticeably different for tn1 between the two models, and this doesn’t seem to be a difference of log2 transformation.
summary(h)
Effects Response : outcomehosp
Factor Low High Diff. Effect S.E. Lower 0.95 Upper 0.95
tn1 0 0.21 0.21 0.362120 0.15417 6.5300e-02 0.673990
Odds Ratio 0 0.21 0.21 1.436400 NA 1.0675e+00 1.962100
apscore_ad 14 25.00 11.00 1.593600 0.15631 1.3605e+00 1.961000
Odds Ratio 14 25.00 11.00 4.921400 NA 3.8981e+00 7.106600
emsurg 0 1.00 1.00 0.073103 0.33051 -5.8224e-01 0.734860
Odds Ratio 0 1.00 1.00 1.075800 NA 5.5865e-01 2.085200
exp(f$coefficients)
(Intercept) 0.03152467
apscore_ad 1.15589222
emsurg 1.07584115
I(log2(tn1 + 0.001)) 1.04802
Would anyone be able to explain what the rms package is calculating the odds ratio of? Many thanks.
The tn1 effect from summary(h) is the effect on the log of the odds ratio of tn1 going from 0 to 0.21 -- the inter-quartile range. See ?summary.rms.
So, the effect from the first row of summary(h) is 0.36212 = (log2(0.211)-log2(0.001))*.0469.
I'm trying to solve a two-component decay model in R using the nls function, but running into errors. The equation is:
Where t is time, Ctot is C1+C2, and p1 and p2 are known proportions of Ctot.
my data (dd) is:
> head(dd,n=15)
t Ctot
1 0.00 6.62
2 0.33 6.45
3 0.50 6.38
4 0.67 6.44
5 0.83 6.38
6 1.00 6.39
7 1.17 6.35
8 1.33 6.33
9 1.50 6.33
10 1.67 6.28
11 1.83 6.17
12 2.00 6.11
13 2.17 6.07
14 2.33 5.89
15 2.50 5.86
Using nls I have tried:
p1 <- 0.3
p2 <- 0.7
z <- nls(Ctot~(p1*C1*(exp(-k1*t)))+(p2*C2*(exp(-k2*t))), data=dd, start=list(C1=6, C2=0.1, k1=0.01, k2=0.01))
However I am getting:
z <- nls(Ctot~(p1*C1*(exp(-k1*t)))+(p2*C2*(exp(-k2*t))), data=dd, start=list(C1=6, C2=0.1, k1=0.01, k2=0.01))
Error in numericDeriv(form[[3L]], names(ind), env) :
Missing value or an infinity produced when evaluating the model
I would be grateful if anyone has suggestions!
The data seems fairly limited and clearly incomplete since it only the head. If we make up some data for testing methods ... and leave out the confusing p1 and p2:
t=seq(0, 20, by=.3)
Ctot = 3 * exp( -1 * t) + 4 * exp(-5*t)
# following hte example on gnm::gnm's help page:
saved.fits <- list(); library(gnm)
for (i in 1:10) {
saved.fits[[i]] <- suppressWarnings( gnm(Ctot ~ Exp(1 + t, inst = 1) +
Exp(1 + t, inst = 2),
verbose=FALSE))}
plot(Ctot~t)
lines(saved.fits[[3]]$fitted~t)
lines(saved.fits[[3]]$fitted~t,col="red")
I wasn't familiar with the gnm package and so ended up reading the first few sections and then the worked 2 component data fitting example in its vignette: https://cran.r-project.org/web/packages/gnm/vignettes/gnmOverview.pdf . Most of the fits will be as expected, but some will find a local maximum in likelihood that is not a global max:
> saved.fits[[1]]$coefficients
(Intercept) Exp(. + t, inst = 1).(Intercept)
1.479909e-12 1.098612e+00
Exp(1 + ., inst = 1).t Exp(. + t, inst = 2).(Intercept)
-1.000000e+00 1.386294e+00
Exp(1 + ., inst = 2).t
-5.000000e+00
attr(,"eliminated")
[1] 0
> exp( saved.fits[[1]]$coefficients[4] )
Exp(. + t, inst = 2).(Intercept)
4
> exp( saved.fits[[1]]$coefficients[2] )
Exp(. + t, inst = 1).(Intercept)
3
With the data shown in the question it does not seem to work well but if you are open to other parametric models then this 3 parameter model seems reasonable.
fm <- nls(Ctot ~ 1 / (a + b * t^c), dd, st = list(a = 1, b = 1, c = 1))
plot(dd)
lines(fitted(fm) ~ t, dd, col = "red")