I am using the rms package for a cox model:
rmsfit2 <- cph(Surv(time,stop, Lstate) ~ gender+momedu+sibs+L1.Spostureaway_re+
L1.Sgazeaw_re + L1.Sgazepic_re+ frailty(id), x=TRUE, y=TRUE, surv = TRUE, data=data2step)
I had no problem when using the validate() function validate(rmsfit2, method = "boot", B = 100)
which returned:
index.orig training test optimism index.corrected n
Dxy 0.2711 0.2698 0.2671 0.0027 0.2684 100
R2 0.0582 0.0577 0.0562 0.0015 0.0568 100
Slope 1.0000 1.0000 0.9785 0.0215 0.9785 100
D 0.0122 0.0120 0.0117 0.0003 0.0119 100
U -0.0001 -0.0001 0.0001 -0.0001 0.0001 100
Q 0.0122 0.0121 0.0116 0.0004 0.0118 100
g 0.3162 0.3154 0.3066 0.0088 0.3075 100
However, I couldn't get the calibrate() function calibrate(rmsfit2, B = 20) to work. It returned an error message Error in reliability[, "index.corrected"] : subscript out of bounds.
I don't know what is the best way to reproduce this error with the sample data shipped with survival or rms packages but does anyone have any insight on this problem and how to make it work? Thank you!
Related
I have been trying to convert some PROC MIXED SAS code into R, but without success. The code is:
proc mixed data=rmanova4;
class randomization_arm cancer_type site wk;
model chgpf=randomization_arm cancer_type site wk;
repeated / subject=study_id;
contrast '12 vs 4' randomization_arm 1 -1;
lsmeans randomization_arm / cl pdiff alpha=0.05;
run;quit;
I have tried something like
mod4 <- lme(chgpf ~ Randomization_Arm + Cancer_Type + site + wk, data=rmanova.data, random = ~ 1 | Study_ID, na.action=na.exclude)
but I am getting different estimate values.
Perhaps I am misunderstanding something basic. Any comment/suggestion would be greatly appreciated.
(Additional editing)
I am adding here the output. Part of the output from the SAS code is below:
Least Squares Means
Effect Randomization_Arm Estimate Standard Error DF t Value Pr > |t| Alpha Lower Upper
Randomization_Arm 12 weekly BTA -4.5441 1.3163 222 -3.45 0.0007 0.05 -7.1382 -1.9501
Randomization_Arm 4 weekly BTA -6.4224 1.3143 222 -4.89 <.0001 0.05 -9.0126 -3.8322
Differences of Least Squares Means
Effect Randomization_Arm _Randomization_Arm Estimate Standard Error DF t Value Pr > |t| Alpha Lower Upper
Randomization_Arm 12 weekly BTA 4 weekly BTA 1.8783 1.4774 222 1.27 0.2049 0.05 -1.0332 4.7898
The output from the R code is below:
Linear mixed-effects model fit by REML
Data: rmanova.data
AIC BIC logLik
6522.977 6578.592 -3249.488
Random effects:
Formula: ~1 | Study_ID
(Intercept) Residual
StdDev: 16.59143 12.81334
Fixed effects: chgpf ~ Randomization_Arm + Cancer_Type + site + wk
Value Std.Error DF t-value p-value
(Intercept) 2.332268 2.314150 539 1.0078294 0.3140
Randomization_Arm4 weekly BTA -1.708401 2.409444 222 -0.7090435 0.4790
Cancer_TypeProsta -4.793787 2.560133 222 -1.8724761 0.0625
site2 -1.492911 3.665674 222 -0.4072678 0.6842
site3 -4.002252 3.510111 222 -1.1402066 0.2554
site4 -12.013758 5.746988 222 -2.0904442 0.0377
site5 -3.823504 4.938590 222 -0.7742097 0.4396
wk2 0.313863 1.281047 539 0.2450052 0.8065
wk3 -3.606267 1.329357 539 -2.7127905 0.0069
wk4 -4.246526 1.345526 539 -3.1560334 0.0017
Correlation:
(Intr) R_A4wB Cnc_TP site2 site3 site4 site5 wk2 wk3
Randomization_Arm4 weekly BTA -0.558
Cancer_TypeProsta -0.404 0.046
site2 -0.257 0.001 -0.087
site3 -0.238 0.004 -0.163 0.201
site4 -0.255 0.031 0.151 0.101 0.095
site5 -0.172 -0.016 -0.077 0.139 0.151 0.073
wk2 -0.254 -0.008 0.010 0.011 -0.003 0.005 -0.001
wk3 -0.257 0.005 0.020 0.014 0.006 -0.001 -0.002 0.464
wk4 -0.251 -0.007 0.022 0.020 0.002 0.006 -0.002 0.461 0.461
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-5.6784364 -0.3796392 0.1050812 0.4588555 3.1055046
Number of Observations: 771
Number of Groups: 229
Adding some comments and observations
Since my original posting, I have tried various pieces of R code but I am getting different estimates from those given in SAS.
More importantly, the standard errors are almost double than those given by SAS.
Any suggestions would be greatly appreciated.
I got the solution to the problem from someone after posting the question at the R-sig-ME. It seems that the above SAS fits actually a simple linear regression model, assuming independent across observations, which is equivalent to
proc glm data=rmanova4;
class randomization_arm cancer_type site wk;
model chgpf = randomization_arm cancer_type site wk;
run;
which of course in R is equivalent to
lm(chgpf ~ Randomization_Arm + Cancer_Type + site + wk, data=rmanova.data)
I use forecast package in R.
Hyndman says:
The arima() function in R (and Arima() and auto.arima() from the forecast package) fits a regression with ARIMA errors.
I have an output for auto.arima()
Regression with ARIMA(5,0,0) errors
Coefficients:
ar1 ar2 ar3 ar4 ar5 intercept xreg1 xreg2 xreg3 xreg4
xreg5 xreg6 xreg7 xreg8 xreg9
0.0212 -0.0530 0.7005 -0.0232 0.0334 862.0474 -4e-04 -0.0303 -0.0659 -0.1657 4.4673 0.1958 0.3381 -0.4270 5.3478
s.e. 0.0087 0.0086 0.0062 0.0087 0.0087 285.6206 1e-04 0.0604 0.0648 1.7225 0.5952 0.0213 0.0138 0.1415 0.0707
sigma^2 = 15.05: log likelihood = -37334.05
AIC=74700.1 AICc=74700.14 BIC=74820.22
Training set error measures:
ME RMSE MAE MPE MAPE MASE ACF1
Training set -0.0001219744 3.877156 1.434961 NaN Inf 0.3321699 -0.007453887
Can I rename all xreg variables somehow and have real names in my summary output?
Name the columns of the matrix to whatever you like.
library(forecast)
xreg <- ts(matrix(rnorm(900), ncol=9))
colnames(xreg) <- LETTERS[1:9]
auto.arima(WWWusage, xreg=xreg)
#> Series: WWWusage
#> Regression with ARIMA(1,1,1) errors
#>
#> Coefficients:
#> ar1 ma1 A B C D E F G
#> 0.6418 0.6653 0.1122 0.2939 0.0958 0.0923 -0.3412 0.0706 -0.0008
#> s.e. 0.0842 0.0927 0.0986 0.1038 0.0904 0.1061 0.1218 0.0919 0.1074
#> H I
#> 0.0137 -0.2185
#> s.e. 0.0773 0.1320
#>
#> sigma^2 = 9.524: log likelihood = -247.12
#> AIC=518.24 AICc=521.87 BIC=549.38
Created on 2022-03-01 by the reprex package (v2.0.1)
How to get the result of lrm() respectively?
I use lrm() to bulid a logistic model, and get the result as follows:
n <- 1000 # define sample size
y <- rep(0:1, 500)
age <- rnorm(n, 50, 10)
sex <- factor(sample(c('female','male'), n,TRUE))
f <- lrm(y ~ age + sex, x=TRUE, y=TRUE)
f
Model Likelihood Discrimination Rank Discrim.
Ratio Test Indexes Indexes
Obs 1000 LR chi2 1.50 R2 0.002 C 0.520
0 500 d.f. 2 g 0.088 Dxy 0.040
1 500 Pr(> chi2) 0.4714 gr 1.092 gamma 0.040
max |deriv| 2e-13 gp 0.022 tau-a 0.020
Brier 0.250
Coef S.E. Wald Z Pr(>|Z|)
Intercept 0.2206 0.3370 0.65 0.5127
age -0.0030 0.0065 -0.46 0.6485
sex=male -0.1455 0.1266 -1.15 0.2504
How to get the result above as data.frame respectively? like:
mydf$df1
Model Likelihood Discrimination Rank Discrim.
Ratio Test Indexes Indexes
Obs 1000 LR chi2 1.50 R2 0.002 C 0.520
0 500 d.f. 2 g 0.088 Dxy 0.040
1 500 Pr(> chi2) 0.4714 gr 1.092 gamma 0.040
max |deriv| 2e-13 gp 0.022 tau-a 0.020
Brier 0.250
mydf$df2
Coef S.E. Wald Z Pr(>|Z|)
Intercept 0.2206 0.3370 0.65 0.5127
age -0.0030 0.0065 -0.46 0.6485
sex=male -0.1455 0.1266 -1.15 0.2504
Try,
res = capture.output(print(f), append = F, sep = " ")
lapply(res, function(x) write.table(data.frame(x), 'res.csv' , append= T, sep=',' ))
So essentially I have two matrices containing the excess returns of stocks (R) and the expected excess return (ER).
R<-matrix(runif(47*78),ncol = 78)
ER<-matrix(runif(47*78),ncol = 78)
I then combine these removing the first row of R and adding the first row of ER to form a new matrix R1.
I then do this for R2 i.e. removing first two rows of and R and rbinding it with the first 2 rows of ER.
I do this until I have n-1 new matrices from R1 to R47.
I then find the Var-Cov matrix of each of the Return matrices using cov() i.e. Var-Cov1 to Var-Cov47.
n<-47
switch_matrices <- function(mat1, mat2, nrows){
rbind(mat1[(1+nrows):nrow(mat1),],mat2[1:nrows,])
}
l<-lapply(1:n-1, function(nrows) switch_matrices(R,ER, nrows))
list2env(setNames(l,paste0("R",seq_along(l))), envir = parent.frame())
b<-lapply(l, cov)
list2env(setNames(b,paste0("VarCov",seq_along(b))), envir = parent.frame())
I am now trying to find the asset allocation using quadprog. So for example:
D_mat <- 2*VarCov1
d_vec <- rep(0,78)
A_mat <- cbind(rep(1,78),diag(78))
b_vec <- c(1,d_vec)
library(quadprog)
output <- solve.QP(Dmat = D_mat, dvec = d_vec,Amat = A_mat, bvec = b_vec,meq =1)
# The asset allocation
(round(output$solution, 4))
For some reason when running solve.QP with any Var-Cov matrix found I get this error:
Error in solve.QP(Dmat = D_mat, dvec = d_vec, Amat = A_mat, bvec = b_vec, :
matrix D in quadratic function is not positive definite!
I'm wondering what I am doing wrong or even why this is not working.
The input matrix isn't positive definite, which is a necessary condition for the optimization algorithm.
Why your matrix isn't positive definite will have to do with your specific data (the real data, not the randomly generated example) and will be both a statistical and subject matter specific question.
However, from a programming perspective there is a workaround. We can use nearPD from the Matrix package to find the nearest positive definite matrix as a viable alternative:
# Data generated by code in the question using set.seed(123)
library(quadprog)
library(Matrix)
pd_D_mat <- nearPD(D_mat)
output <- solve.QP(Dmat = as.matrix(pd_D_mat$mat),
dvec = d_vec,
Amat = A_mat,
bvec = b_vec,
meq = 1)
# The asset allocation
(round(output$solution, 4))
[1] 0.0052 0.0000 0.0173 0.0739 0.0000 0.0248 0.0082 0.0180 0.0000 0.0217 0.0177 0.0000 0.0000 0.0053 0.0000 0.0173 0.0216 0.0000
[19] 0.0000 0.0049 0.0042 0.0546 0.0049 0.0088 0.0250 0.0272 0.0325 0.0298 0.0000 0.0160 0.0000 0.0064 0.0276 0.0145 0.0178 0.0000
[37] 0.0258 0.0000 0.0413 0.0000 0.0071 0.0000 0.0268 0.0095 0.0326 0.0112 0.0381 0.0172 0.0000 0.0179 0.0000 0.0292 0.0125 0.0000
[55] 0.0000 0.0000 0.0232 0.0058 0.0000 0.0000 0.0000 0.0143 0.0274 0.0160 0.0000 0.0287 0.0000 0.0000 0.0203 0.0226 0.0311 0.0345
[73] 0.0012 0.0004 0.0000 0.0000 0.0000 0.0000
I am using effects R package and effect function on a cox model. There is a default method for this function so it somehow should work for any model.
When I try to use this function I get this error:
Any idea how to fix this and what is wrong?
> eff_cf <- effect("TP53:MDM2", model)
Error in mod.matrix %*% mod$coefficients[!is.na(coef(mod))] :
non-conformable arguments
My model looks like this:
> model
Call:
coxph(formula = Surv(times, patient.vital_status) ~ TP53 + MDM2 +
TP53:MDM2, data = clinForPlot)
coef exp(coef) se(coef) z p
TP53Other -0.163 0.850 0.217 -0.752 4.5e-01
TP53WILD -1.086 0.337 0.277 -3.928 8.6e-05
MDM2(1183.7,1674.7] -0.669 0.512 0.235 -2.851 4.4e-03
MDM2(1674.7,2248.5] -0.744 0.475 0.305 -2.444 1.5e-02
MDM2(2248.5,50339] -0.867 0.420 0.375 -2.308 2.1e-02
TP53Other:MDM2(1183.7,1674.7] 0.394 1.483 0.412 0.958 3.4e-01
TP53WILD:MDM2(1183.7,1674.7] 0.133 1.142 0.413 0.323 7.5e-01
TP53Other:MDM2(1674.7,2248.5] -0.192 0.825 0.517 -0.372 7.1e-01
TP53WILD:MDM2(1674.7,2248.5] 0.546 1.726 0.433 1.260 2.1e-01
TP53Other:MDM2(2248.5,50339] -0.140 0.869 0.650 -0.215 8.3e-01
TP53WILD:MDM2(2248.5,50339] 0.786 2.195 0.484 1.623 1.0e-01
Likelihood ratio test=72.8 on 11 df, p=3.54e-11 n= 1321, number of events= 258
And the model and the data.frame used for model can be reproduced using this code
library(archivist)
model <- loadFromGitub("68eeefba87be70364eb3801cec58eb3d",
user = "MarcinKosinski",
repo = "Museum",
value = TRUE)
clinForPlot <- loadFromGitub("cfa5145e6b98964d5f8b760bf749e426",
user = "MarcinKosinski",
repo = "Museum",
value = TRUE)
Any idea how to fix this and what is wrong?