I'm hoping any of you could shed some light on the following. I have been attempting to replicate a Cox PH model from Stata in R. As you can see below, I get the same results for Cox PH models without tvcs in both programs:
Stata Cox PH model
stset date_endpoint, failure(cause_endpoint2==4) exit(failure) origin(time capture_date) id(wolf_ID)
id: wolf_ID
failure event: cause_endpoint2 == 4
obs. time interval: (date_endpoint[_n-1], date_endpoint]
exit on or before: failure
t for analysis: (time-origin)
origin: time capture_date
--------------------------------------------------------------------------
5,664 total observations
0 exclusions
--------------------------------------------------------------------------
5,664 observations remaining, representing
513 subjects
231 failures in single-failure-per-subject data
279,430.5 total analysis time at risk and under observation
at risk from t = 0
earliest observed entry t = 0
last observed exit t = 3,051
stcox deer_hunt bear_hunt, strata(winter lib_kill) efron robust cluster(wolf_ID)
failure _d: cause_endpoint2 == 4
analysis time _t: (date_endpoint-origin)
origin: time capture_date
id: wolf_ID
Iteration 0: log pseudolikelihood = -993.65252
Iteration 1: log pseudolikelihood = -992.55768
Iteration 2: log pseudolikelihood = -992.55733
Refining estimates:
Iteration 0: log pseudolikelihood = -992.55733
Stratified Cox regr. -- Efron method for ties
No. of subjects = 513 Number of obs = 5,664
No. of failures = 231
Time at risk = 279430.5
Wald chi2(2) = 2.21
Log pseudolikelihood = -992.55733 Prob > chi2 = 0.3317
(Std. Err. adjusted for 513 clusters in wolf_ID)
--------------------------------------------------------------------------
| Robust
_t | Haz. Ratio Std. Err. z P>|z| [95% Conf.Interval]
-------------+------------------------------------------------------------
deer_hunt | .7860433 .1508714 -1.25 0.210 .539596 1.145049
bear_hunt | 1.21915 .2687211 0.90 0.369 .7914762 1.877916
--------------------------------------------------------------------------
Stratified by winter lib_kill
R Cox PH model
> LTF.coxph <- coxph(Surv(`_t0`,`_t`, endpoint_r_enc=="ltf") ~ deer_hunt
+ + bear_hunt + strata(winter, lib_kill), data=statadta, ties = "efron", id = wolf_ID)
> summary(LTF.coxph)
Call:
coxph(formula = Surv(`_t0`, `_t`, endpoint_r_enc == "ltf") ~
deer_hunt + bear_hunt + strata(winter, lib_kill), data = statadta,
ties = "efron", id = wolf_ID)
n= 5664, number of events= 231
coef exp(coef) se(coef) z Pr(>|z|)
deer_hunt -0.2407 0.7860 0.1849 -1.302 0.193
bear_hunt 0.1982 1.2191 0.2174 0.911 0.362
exp(coef) exp(-coef) lower .95 upper .95
deer_hunt 0.786 1.2722 0.5471 1.129
bear_hunt 1.219 0.8202 0.7962 1.867
Concordance= 0.515 (se = 0.022 )
Likelihood ratio test= 2.19 on 2 df, p=0.3
Wald test = 2.21 on 2 df, p=0.3
Score (logrank) test = 2.22 on 2 df, p=0.3
> cox.zph(LTF.coxph)
chisq df p
deer_hunt 5.5773 1 0.018
bear_hunt 0.0762 1 0.783
GLOBAL 5.5773 2 0.062
The problem I have is that results look very different when adding a time-varying coefficient (tvc() in Stata and tt() in R) for one of the variables in my model. Nothing is the same between models (coefficients for all variables or their significance).
Stata Cox PH model with tvc()
stcox deer_hunt bear_hunt, tvc(deer_hunt) strata(winter lib_kill) efron robust cluster(wolf_ID)
failure _d: cause_endpoint2 == 4
analysis time _t: (date_endpoint-origin)
origin: time capture_date
id: wolf_ID
Iteration 0: log pseudolikelihood = -993.65252
Iteration 1: log pseudolikelihood = -990.70475
Iteration 2: log pseudolikelihood = -990.69386
Iteration 3: log pseudolikelihood = -990.69386
Refining estimates:
Iteration 0: log pseudolikelihood = -990.69386
Stratified Cox regr. -- Efron method for ties
No. of subjects = 513 Number of obs = 5,664
No. of failures = 231
Time at risk = 279430.5
Wald chi2(3) = 4.72
Log pseudolikelihood = -990.69386 Prob > chi2 = 0.1932
(Std. Err. adjusted for 513 clusters in wolf_ID)
--------------------------------------------------------------------------
| Robust
_t | Haz. Ratio Std. Err. z P>|z| [95% Conf Interval]
-------------+------------------------------------------------------------
main |
deer_hunt | 1.043941 .2643779 0.17 0.865 .6354908 1.714915
bear_hunt | 1.204522 .2647525 0.85 0.397 .7829279 1.853138
-------------+------------------------------------------------------------
tvc |
deer_hunt | .9992683 .0004286 -1.71 0.088 .9984287 1.000109
------------------------------------------------------------------------------
Stratified by winter lib_kill
Note: Variables in tvc equation interacted with _t.
R Cox PH model with tt()
> LTF.tvc1.coxph <- coxph(Surv(`_t0`,`_t`, endpoint_r_enc=="ltf") ~ deer_hunt + bear_hunt + tt(deer_hunt) + strata(winter, lib_kill),
+ data=statadta, ties = "efron", id = wolf_ID, cluster = wolf_ID,
+ tt=function(x,t,...){x*t})
> summary(LTF.tvc1.coxph)
Call:
coxph(formula = Surv(`_t0`, `_t`, endpoint_r_enc == "ltf") ~
deer_hunt + bear_hunt + tt(deer_hunt) + strata(winter, lib_kill),
data = statadta, ties = "efron", tt = function(x, t, ...) {
x * t
}, id = wolf_ID, cluster = wolf_ID)
n= 5664, number of events= 231
coef exp(coef) se(coef) robust se z Pr(>|z|)
deer_hunt 0.4741848 1.6067039 0.2082257 0.2079728 2.280 0.02261 *
bear_hunt -0.7923208 0.4527927 0.1894531 0.1890497 -4.191 2.78e-05 ***
tt(deer_hunt)-0.0009312 0.9990692 0.0003442 0.0003612 -2.578 0.00994 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
exp(coef) exp(-coef) lower .95 upper .95
deer_hunt 1.6067 0.6224 1.0688 2.4153
bear_hunt 0.4528 2.2085 0.3126 0.6559
tt(deer_hunt) 0.9991 1.0009 0.9984 0.9998
Concordance= 0.634 (se = 0.02 )
Likelihood ratio test= 28.29 on 3 df, p=3e-06
Wald test = 25.6 on 3 df, p=1e-05
Score (logrank) test = 26.19 on 3 df, p=9e-06, Robust = 32.6 p=4e-07
Moreover, I checked this post before posting this because I did not find it very helpful. The 'noadjust' Stata command was useful for SEs, but it does not answer my main issue of also getting different covariate coefficients between programs for the main and time-varying effects when I add those time-varying effects to the Cox model in each program (and the same formula for calculating the time-varying effects). That is really my main concern: the difference in covariate estimates seems substantial and would result in different prescriptions, I believe
I have been unable to figure out what is happening there, and am hoping the community can help.
Working with the "prostate" dataset in "ElemStatLearn" package.
set.seed(3434)
fit.lm = train(data=trainset, lpsa~., method = "lm")
fit.ridge = train(data=trainset, lpsa~., method = "ridge")
fit.lasso = train(data=trainset, lpsa~., method = "lasso")
Comparing RMSE (for bestTune in case of ridge and lasso)
fit.lm$results[,"RMSE"]
[1] 0.7895572
fit.ridge$results[fit.ridge$results[,"lambda"]==fit.ridge$bestTune$lambda,"RMSE"]
[1] 0.8231873
fit.lasso$results[fit.lasso$results[,"fraction"]==fit.lasso$bestTune$fraction,"RMSE"]
[1] 0.7779534
Comparing absolute value of coefficients
abs(round(fit.lm$finalModel$coefficients,2))
(Intercept) lcavol lweight age lbph svi lcp gleason pgg45
0.43 0.58 0.61 0.02 0.14 0.74 0.21 0.03 0.01
abs(round(predict(fit.ridge$finalModel, type = "coef", mode = "norm")$coefficients[8,],2))
lcavol lweight age lbph svi lcp gleason pgg45
0.49 0.62 0.01 0.14 0.65 0.05 0.00 0.01
abs(round(predict(fit.lasso$finalModel, type = "coef", mode = "norm")$coefficients[8,],2))
lcavol lweight age lbph svi lcp gleason pgg45
0.56 0.61 0.02 0.14 0.72 0.18 0.00 0.01
My question is: how can "ridge" RMSE be higher than that of plain "lm". Doesn't that defeat the very purpose of penalized regression vs plain "lm"?
Also, how can the absolute value of the coefficient of "lweight" be actually higher in ridge (0.62) vs that in lm (0.61)? Both coefficients are positive originally without the abs().
I was expecting ridge to perform similar to lasso, which not only reduced RMSE but also shrank the size of coefficients vs plain "lm".
Thank you!
I have a data frame with 5 variables: Lot / Wafer / Serial Number / Voltage / Amplification. In this data frame there are 1020 subsets grouped by Serial_number. Each subset has a certain number of measurement data points (Amplification against voltage).
I fit the data with
summary(fit2.lme <- lmer(log(log(Amplification)) ~ poly(Voltage, 3) + (poly(Voltage, 1) | Serial_number),
+ data = APD))
which yields:
Linear mixed model fit by REML ['lmerMod']
Formula: log(log(Amplification)) ~ poly(Voltage, 3) + (poly(Voltage, 1) | Serial_number)
Data: APD
REML criterion at convergence: 35286.1
Scaled residuals:
Min 1Q Median 3Q Max
-20.7724 -0.2438 -0.1297 0.2434 13.2663
Random effects:
Groups Name Variance Std.Dev. Corr
Serial_number (Intercept) 1.439e-02 0.1199
poly(Voltage, 1) 2.042e+03 45.1908 -0.76
Residual 8.701e-02 0.2950
Number of obs: 76219, groups: Serial_number, 1020
Fixed effects:
Estimate Std. Error t value
(Intercept) 5.944e-02 3.914e-03 15.2
poly(Voltage, 3)1 5.862e+02 1.449e+00 404.5
poly(Voltage, 3)2 -1.714e+02 3.086e-01 -555.4
poly(Voltage, 3)3 4.627e+01 3.067e-01 150.8
Correlation of Fixed Effects:
(Intr) p(V,3)1 p(V,3)2
ply(Vlt,3)1 -0.713
ply(Vlt,3)2 0.015 -0.004
ply(Vlt,3)3 0.004 0.012 0.018
and when I add a higher polynomial in the random effects I get a warning:
> summary(fit3.lme <- lmer(log(log(Amplification)) ~ poly(Voltage, 3) + (poly(Voltage, 2) | Serial_number),
+ data = APD))
Linear mixed model fit by REML ['lmerMod']
Formula: log(log(Amplification)) ~ poly(Voltage, 3) + (poly(Voltage, 2) | Serial_number)
Data: APD
REML criterion at convergence: 16285.9
Scaled residuals:
Min 1Q Median 3Q Max
-20.5042 -0.2393 -0.0697 0.3165 13.9634
Random effects:
Groups Name Variance Std.Dev. Corr
Serial_number (Intercept) 1.584e-02 0.1259
poly(Voltage, 2)1 1.777e+03 42.1536 -0.67
poly(Voltage, 2)2 1.579e+03 39.7365 0.87 -0.95
Residual 6.679e-02 0.2584
Number of obs: 76219, groups: Serial_number, 1020
Fixed effects:
Estimate Std. Error t value
(Intercept) 5.858e-02 4.062e-03 14.4
poly(Voltage, 3)1 5.938e+02 1.351e+00 439.5
poly(Voltage, 3)2 -1.744e+02 1.276e+00 -136.7
poly(Voltage, 3)3 5.719e+01 2.842e-01 201.2
Correlation of Fixed Effects:
(Intr) p(V,3)1 p(V,3)2
ply(Vlt,3)1 -0.641
ply(Vlt,3)2 0.825 -0.906
ply(Vlt,3)3 -0.001 0.030 -0.004
convergence code: 1
Model failed to converge with max|grad| = 2.22294 (tol = 0.002, component 1)
Model is nearly unidentifiable: large eigenvalue ratio
- Rescale variables?
Warning messages:
1: In optwrap(optimizer, devfun, getStart(start, rho$lower, rho$pp), :
convergence code 1 from bobyqa: bobyqa -- maximum number of function evaluations exceeded
2: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model failed to converge with max|grad| = 2.22294 (tol = 0.002, component 1)
3: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model is nearly unidentifiable: large eigenvalue ratio
- Rescale variables?
The data is as following (can provide the complete data of course, if desired). It includes 77479 observables of 6 variables:
'data.frame': 77479 obs. of 6 variables:
$ Serial_number: num 9.12e+08 9.12e+08 9.12e+08 9.12e+08 9.12e+08 ...
$ Lot : int 9 9 9 9 9 9 9 9 9 9 ...
$ Wafer : int 912 912 912 912 912 912 912 912 912 912 ...
$ Amplification: num 1 1 1.01 1.01 1.01 ...
$ Voltage : num 25 30 34.9 44.9 49.9 ...
and the data itself looks like:
Serial_number Lot Wafer Amplification Voltage
1 912009913 9 912 1.00252 24.9681
2 912009913 9 912 1.00452 29.9591
3 912009913 9 912 1.00537 34.9494
(...)
73 912009913 9 912 918.112 375.9850
74 912009913 9 912 1083.74 377.9990
75 912009897 9 912 1.00324 19.9895
76 912009897 9 912 1.00449 29.9777
(...)
What does the warnings mean?
According to the anova the fit3.lme model describes the data better:
> anova(fit3.lme, fit2.lme)
refitting model(s) with ML (instead of REML)
Data: APD
Models:
fit2.lme: log(log(Amplification)) ~ poly(Voltage, 3) + (poly(Voltage, 1) |
fit2.lme: Serial_number)
fit3.lme: log(log(Amplification)) ~ poly(Voltage, 3) + (poly(Voltage, 2) |
fit3.lme: Serial_number)
Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)
fit2.lme 8 35294 35368 -17638.9 35278
fit3.lme 11 16264 16366 -8121.1 16242 19036 3 < 2.2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Warning message:
In optwrap(optimizer, devfun, x#theta, lower = x#lower, calc.derivs = TRUE, :
convergence code 1 from bobyqa: bobyqa -- maximum number of function evaluations exceeded
Therefore I would like to use that model but I stuck in the warning.
update:
center and scale predictors
ss.CS<- transform(APD, Voltage=scale(Voltage))
> fit31.lme<- update(fit3.lme, data=ss.CS)
Error in poly(dots[[i]], degree, raw = raw, simple = raw) :
'degree' must be less than number of unique points
Also for the other variable (don't know for which it makes sense)
> ss.CS<- transform(APD, Amplitude=scale(Amplitude))
Error in scale(Amplitude) : object 'Amplitude' not found
> ss.CS<- transform(APD, Amplification=scale(Amplification))
> fit31.lme<- update(fit3.lme, data=ss.CS)
Warning messages:
1: In log(Amplification) : NaNs produced
2: In log(log(Amplification)) : NaNs produced
3: In log(Amplification) : NaNs produced
4: In log(log(Amplification)) : NaNs produced
5: In log(Amplification) : NaNs produced
6: In log(log(Amplification)) : NaNs produced
7: Some predictor variables are on very different scales: consider rescaling
check singularity
> diag.vals<- getME(fit3.lme, "theta")[getME(fit3.lme, "lower")==0]
> any(diag.vals<- 1e-6)
[1] TRUE
Warning message:
In any(diag.vals <- 1e-06) : coercing argument of type 'double' to logical
compute gradient and Hessian with Richardson extrapolation
> devfun<- update(fit3.lme, devFunOnly=TRUE)
> if(isLMM(fit3.lme)){
+ pars<- getME(fit3.lme, "theta")
+ } else {
+ pars<- getME(fit3.lme, c("theta", "fixef"))
+ }
> if (require("numDeriv")) {
+ cat("hess:\n"); print(hess <- hessian(devfun, unlist(pars)))
+ cat("grad:\n"); print(grad <- grad(devfun, unlist(pars)))
+ cat("scaled gradient:\n")
+ print(scgrad <- solve(chol(hess), grad))
+ }
Loading required package: numDeriv
hess:
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 39137.840764 -56.189442277 -1.348127e+02 3.789427141 25.456612941 -3.806942811
[2,] -56.189442 0.508077776 6.283795e-01 -0.068882737 -0.056159369 0.003228274
[3,] -134.812704 0.628379462 1.061584e+00 -0.079620905 -0.152816413 0.007457255
[4,] 3.789427 -0.068882737 -7.962090e-02 0.516054976 0.534346634 0.001513168
[5,] 25.456613 -0.056159369 -1.528164e-01 0.534346634 0.901191745 -0.002344407
[6,] -3.806943 0.003228274 7.457255e-03 0.001513168 -0.002344407 0.179283416
grad:
[1] -22.9114985 2.2229416 -0.2959238 0.6790044 -0.2343368 -0.4020556
scaled gradient:
[1] -0.1123624 4.4764140 -0.8777938 1.3980054 -0.4223921 -0.9508207
> fit3.lme#optinfo$derivs
$gradient
[1] -22.9118920 2.2229424 -0.2959264 0.6790037 -0.2343360 -0.4020605
$Hessian
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 39137.915527 -56.20745850 -134.87176514 3.74780273 25.47540283 -3.79016113
[2,] -56.207458 0.44262695 0.61462402 -0.04736328 -0.06585693 0.02130127
[3,] -134.871765 0.61462402 1.04296875 -0.10467529 -0.23223877 0.05438232
[4,] 3.747803 -0.04736328 -0.10467529 0.52026367 0.50909424 -0.02130127
[5,] 25.475403 -0.06585693 -0.23223877 0.50909424 0.68481445 -0.02044678
[6,] -3.790161 0.02130127 0.05438232 -0.02130127 -0.02044678 0.07617188
4. restart the fit from the original value (or a slightly perturbed value):
> fit3.lme.restart <- update(fit3.lme, start=pars)
> summary(fit3.lme.restart)
Linear mixed model fit by REML ['lmerMod']
Formula: log(log(Amplification)) ~ poly(Voltage, 3) + (poly(Voltage, 2) | Serial_number)
Data: APD
REML criterion at convergence: 16250.3
Scaled residuals:
Min 1Q Median 3Q Max
-20.4868 -0.2404 -0.0697 0.3166 13.9464
Random effects:
Groups Name Variance Std.Dev. Corr
Serial_number (Intercept) 1.823e-02 0.1350
poly(Voltage, 2)1 2.124e+03 46.0903 -0.77
poly(Voltage, 2)2 1.937e+03 44.0164 0.90 -0.96
Residual 6.668e-02 0.2582
Number of obs: 76219, groups: Serial_number, 1020
Fixed effects:
Estimate Std. Error t value
(Intercept) 0.05823 0.00434 13.4
poly(Voltage, 3)1 593.83396 1.47201 403.4
poly(Voltage, 3)2 -174.61257 1.40711 -124.1
poly(Voltage, 3)3 57.15901 0.28427 201.1
Correlation of Fixed Effects:
(Intr) p(V,3)1 p(V,3)2
ply(Vlt,3)1 -0.735
ply(Vlt,3)2 0.868 -0.927
ply(Vlt,3)3 -0.001 0.028 -0.003
5. try all available optimizers
> source(system.file("utils", "allFit.R", package="lme4"))
Loading required package: optimx
Loading required package: dfoptim
> fit3.lme.all <- allFit(fit3.lme)
bobyqa : [OK]
Nelder_Mead : [OK]
nlminbw : [OK]
nmkbw : [OK]
optimx.L-BFGS-B : [OK]
nloptwrap.NLOPT_LN_NELDERMEAD : [OK]
nloptwrap.NLOPT_LN_BOBYQA : [OK]
Warning messages:
1: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
unable to evaluate scaled gradient
2: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model failed to converge: degenerate Hessian with 1 negative eigenvalues
3: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
unable to evaluate scaled gradient
4: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model failed to converge: degenerate Hessian with 1 negative eigenvalues
5: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
unable to evaluate scaled gradient
6: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model failed to converge: degenerate Hessian with 1 negative eigenvalues
7: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
unable to evaluate scaled gradient
8: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model failed to converge: degenerate Hessian with 1 negative eigenvalues
ss <- summary(fit3.lme.all)
> ss$ fixef ## extract fixed effects
(Intercept) poly(Voltage, 3)1 poly(Voltage, 3)2 poly(Voltage, 3)3
bobyqa 0.05822789 593.8340 -174.6126 57.15901
Nelder_Mead 0.05822787 593.8340 -174.6126 57.15902
nlminbw 0.05822787 593.8340 -174.6126 57.15902
nmkbw 0.05841966 593.7804 -174.4999 57.17107
optimx.L-BFGS-B 0.05822845 593.8336 -174.6116 57.16183
nloptwrap.NLOPT_LN_NELDERMEAD 0.05823870 593.8330 -174.6076 57.16039
nloptwrap.NLOPT_LN_BOBYQA 0.05823870 593.8330 -174.6076 57.16039
> ss$ llik ## log-likelihoods
bobyqa Nelder_Mead nlminbw nmkbw optimx.L-BFGS-B
-8125.125 -8125.125 -8125.125 -8129.827 -8125.204
nloptwrap.NLOPT_LN_NELDERMEAD nloptwrap.NLOPT_LN_BOBYQA
-8125.137
> ss$ sdcor ## SDs and correlations
Serial_number.(Intercept) Serial_number.poly(Voltage, 2)1.(Intercept) Serial_number.poly(Voltage, 2)2.(Intercept)
bobyqa 0.1350049 46.09013 44.01631
Nelder_Mead 0.1350064 46.09104 44.01705
nlminbw 0.1350065 46.09106 44.01707
nmkbw 0.1347214 46.11336 43.81219
optimx.L-BFGS-B 0.1356576 46.32849 44.27171
nloptwrap.NLOPT_LN_NELDERMEAD 0.1347638 45.97995 43.91054
nloptwrap.NLOPT_LN_BOBYQA 0.1347638 45.97995 43.91054
Serial_number.poly(Voltage, 2)1 Serial_number.poly(Voltage, 2)2.poly(Voltage, 2)1 Serial_number.poly(Voltage, 2)2
bobyqa -0.7665898 0.9042387 -0.9608662
Nelder_Mead -0.7665981 0.9042424 -0.9608680
nlminbw -0.7665980 0.9042425 -0.9608680
nmkbw -0.7658163 0.9076551 -0.9649999
optimx.L-BFGS-B -0.7713801 0.9067725 -0.9617129
nloptwrap.NLOPT_LN_NELDERMEAD -0.7645748 0.9034336 -0.9606020
nloptwrap.NLOPT_LN_BOBYQA -0.7645748 0.9034336 -0.9606020
sigma
bobyqa 0.2582156
Nelder_Mead 0.2582156
nlminbw 0.2582156
nmkbw 0.2584714
optimx.L-BFGS-B 0.2582244
nloptwrap.NLOPT_LN_NELDERMEAD 0.2582207
nloptwrap.NLOPT_LN_BOBYQA 0.2582207
> ss$ theta ## Cholesky factors
Serial_number.(Intercept) Serial_number.poly(Voltage, 2)1.(Intercept) Serial_number.poly(Voltage, 2)2.(Intercept)
bobyqa 0.5228377 -136.8323 154.1396
Nelder_Mead 0.5228438 -136.8364 154.1428
nlminbw 0.5228439 -136.8365 154.1429
nmkbw 0.5212237 -136.6278 153.8521
optimx.L-BFGS-B 0.5253478 -138.3947 155.4631
nloptwrap.NLOPT_LN_NELDERMEAD 0.5218936 -136.1436 153.6293
nloptwrap.NLOPT_LN_BOBYQA 0.5218936 -136.1436 153.6293
Serial_number.poly(Voltage, 2)1 Serial_number.poly(Voltage, 2)2.poly(Voltage, 2)1 Serial_number.poly(Voltage, 2)2
bobyqa 114.6181 -71.06063 1.578418e+01
Nelder_Mead 114.6186 -71.06067 1.578354e+01
nlminbw 114.6187 -71.06067 1.578351e+01
nmkbw 114.7270 -71.14411 3.440466e-42
optimx.L-BFGS-B 114.1731 -70.65227 1.527854e+01
nloptwrap.NLOPT_LN_NELDERMEAD 114.7688 -71.19817 1.568481e+01
nloptwrap.NLOPT_LN_BOBYQA 114.7688 -71.19817 1.568481e+01
> ss$ which.OK ## which fits worked
bobyqa Nelder_Mead nlminbw nmkbw optimx.L-BFGS-B
TRUE TRUE TRUE TRUE TRUE
nloptwrap.NLOPT_LN_NELDERMEAD nloptwrap.NLOPT_LN_BOBYQA
TRUE TRUE
Due to users's coment I add the following:
> bam(log(log(Amplification)) ~ s(Voltage) + s(Serial_number, bs="re") + s(Voltage, Serial_number, bs="re"), data=APD, discrete = TRUE)
Family: gaussian
Link function: identity
Formula:
log(log(Amplification)) ~ s(Voltage) + s(Serial_number, bs = "re") +
s(Voltage, Serial_number, bs = "re")
Estimated degrees of freedom:
9 993 987 total = 1990.18
fREML score: -226.8182
> summary(bam(log(log(Amplification)) ~ s(Voltage) + s(Serial_number, bs="re") + s(Voltage, Serial_number, bs="re"), data=APD, discrete = TRUE))
Family: gaussian
Link function: identity
Formula:
log(log(Amplification)) ~ s(Voltage) + s(Serial_number, bs = "re") +
s(Voltage, Serial_number, bs = "re")
Parametric coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.11500 0.01896 6.066 1.31e-09 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Approximate significance of smooth terms:
edf Ref.df F p-value
s(Voltage) 8.998 9 89229 <2e-16 ***
s(Serial_number) 993.441 1019 55241 <2e-16 ***
s(Voltage,Serial_number) 986.741 1019 36278 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
R-sq.(adj) = 0.989 Deviance explained = 99%
fREML = -226.82 Scale est. = 0.051396 n = 76219
On https://uploadfiles.io/n7h9z you can download the r script and data.
Update
(some plots concerning the gam model):
Here are all measurement data points transformed double-logarithmically:
The physical behaviour of the device is at least exponentially and even almost double-exponentially (as I found in a book). By transforming them double-logarithmically the almost behave then linearly. A polynomial of degree described the data already well but a polynomial degree of three did it better, though. I guess this can also be seen on the plot why that.
Some additional plots (I'm not used to GAMs so I just add them):
You can download the data from the link: https://uploadfiles.io/n7h9z
The convergence warnings disappeared when I removed all data points <2. I stumbled over this by coincidence..
Probably this is somehow connected to the issue that for each subset within the range from 0 to about 50 all data points are almost exactly the same (and have values of about ~1).
I have a logistic regression model, for which I have been using the rms package. The model fits best using a log term for tn1, and for clinical interpretation I’m using log2. I ran the model using lrm from the rms package, and then to double check, I ran it using glm. The initial coefficients are the same:
h <- lrm(formula = outcomehosp ~ I(log2(tn1 + 0.001)) + apscore_ad +
emsurg + corrapiidiag, data = d, x = TRUE, y = TRUE)
Coef S.E. Wald Z Pr(>|Z|)
Intercept -3.4570 0.3832 -9.02 <0.0001
tn1 0.0469 0.0180 2.60 0.0093
apscore_ad 0.1449 0.0127 11.44 <0.0001
emsurg 0.0731 0.3228 0.23 0.8208
f <- glm(formula = outcomehosp ~ apscore_ad + emsurg + corrapiidiag +
I(log2(tn1 + 0.001)), family = binomial(), data = tn24)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -3.45699 0.38315 -9.023 < 2e-16
I(log2(tn1 + 0.001)) 0.04690 0.01804 2.600 0.00932
apscore_ad 0.14487 0.01267 11.438 < 2e-16
emsurg 0.07310 0.32277 0.226 0.82082
However when I try to get the odds ratios, they are noticeably different for tn1 between the two models, and this doesn’t seem to be a difference of log2 transformation.
summary(h)
Effects Response : outcomehosp
Factor Low High Diff. Effect S.E. Lower 0.95 Upper 0.95
tn1 0 0.21 0.21 0.362120 0.15417 6.5300e-02 0.673990
Odds Ratio 0 0.21 0.21 1.436400 NA 1.0675e+00 1.962100
apscore_ad 14 25.00 11.00 1.593600 0.15631 1.3605e+00 1.961000
Odds Ratio 14 25.00 11.00 4.921400 NA 3.8981e+00 7.106600
emsurg 0 1.00 1.00 0.073103 0.33051 -5.8224e-01 0.734860
Odds Ratio 0 1.00 1.00 1.075800 NA 5.5865e-01 2.085200
exp(f$coefficients)
(Intercept) 0.03152467
apscore_ad 1.15589222
emsurg 1.07584115
I(log2(tn1 + 0.001)) 1.04802
Would anyone be able to explain what the rms package is calculating the odds ratio of? Many thanks.
The tn1 effect from summary(h) is the effect on the log of the odds ratio of tn1 going from 0 to 0.21 -- the inter-quartile range. See ?summary.rms.
So, the effect from the first row of summary(h) is 0.36212 = (log2(0.211)-log2(0.001))*.0469.
I am using R to perform logistic regression on my data set. My data set has more than 50 variables.
I am running the following code:
glm(X...ResponseFlag ~ NetWorth + LOR + IntGrandChld + OccupInput, family = binomial, data = data)
When I see summary() I got the following output:
> summary(ResponseModel)
Call:
glm(formula = X...ResponseFlag ~ NetWorth + LOR + IntGrandChld +
OccupInput, family = binomial, data = data)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.2785 -0.9576 -0.8925 1.3736 1.9721
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.971166 0.164439 -5.906 3.51e-09 ***
NetWorth 0.082168 0.019849 4.140 3.48e-05 ***
LOR -0.019716 0.006494 -3.036 0.0024 **
IntGrandChld -0.021544 0.085274 -0.253 0.8005
OccupInput2 0.005796 0.138390 0.042 0.9666
OccupInput3 0.471020 0.289642 1.626 0.1039
OccupInput4 -0.031880 0.120636 -0.264 0.7916
OccupInput5 -0.148898 0.129922 -1.146 0.2518
OccupInput6 -0.481183 0.416277 -1.156 0.2477
OccupInput7 -0.057485 0.218309 -0.263 0.7923
OccupInput8 0.505676 0.123955 4.080 4.51e-05 ***
OccupInput9 -0.382375 0.821362 -0.466 0.6415
OccupInputA -12.903334 178.064831 -0.072 0.9422
OccupInputB 0.581272 1.003193 0.579 0.5623
OccupInputC -0.034188 0.294507 -0.116 0.9076
OccupInputD 0.224634 0.385959 0.582 0.5606
OccupInputE -1.292358 1.072864 -1.205 0.2284
OccupInputF 14.132144 308.212341 0.046 0.9634
OccupInputH 0.622677 1.006982 0.618 0.5363
OccupInputU 0.087526 0.095740 0.914 0.3606
OccupInputV -1.010939 0.637746 -1.585 0.1129
OccupInputW 0.262031 0.256238 1.023 0.3065
OccupInputX 0.332209 0.428806 0.775 0.4385
OccupInputY 0.059771 0.157135 0.380 0.7037
OccupInputZ 0.638520 0.711979 0.897 0.3698
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 5885.1 on 4467 degrees of freedom
Residual deviance: 5809.6 on 4443 degrees of freedom
AIC: 5859.6
Number of Fisher Scoring iterations: 12
From the output, it is seen that some new variable like OccupInput2... has arisen. Actually OccupInput had values 1,2,3,...A,B,C,D.. But it did not happen for NetWorth,LOR.
I am new to R and do not have any explanation, why there are new variables.
Can anybody give me an explanation? Thank you in advance.
I would assume that OccupInput in your model is a factor variable. R introduces so-called dummy variables, when you include factorial regressors in a linear model.
What you see as OccupInput2 and so forth in the table are the coefficients associated with the individual factor levels (the reference level OccupInput1 is covered by the intercept term).
You can verify the type of OccupInput from the output of the sapply(data, class) call, which yields the data types of the columns in your input data frame.