How to calculate survival probabilities in R? - r

I am trying to fit a parametric survival model. I think I managed to do so. However, I could not succeed in calculating the survival probabilities:
library(survival)
zaman <- c(65,156,100,134,16,108,121,4,39,143,56,26,22,1,1,5,65,
56,65,17,7,16,22,3,4,2,3,8,4,3,30,4,43)
test <- c(rep(1,17),rep(0,16))
WBC <- c(2.3,0.75,4.3,2.6,6,10.5,10,17,5.4,7,9.4,32,35,100,
100,52,100,4.4,3,4,1.5,9,5.3,10,19,27,28,31,26,21,79,100,100)
status <- c(rep(1,33))
data <- data.frame(zaman,test,WBC)
surv3 <- Surv(zaman[test==1], status[test==1])
fit3 <- survreg( surv3 ~ log(WBC[test==1]),dist="w")
On the other hand, no problem at all while calculating the survival probabilities using the Kaplan-Meier Estimation:
fit2 <- survfit(Surv(zaman[test==0], status[test==0]) ~ 1)
summary(fit2)$surv
Any idea why?

You can get the predicted probabilities from a survreg object with predict:
predict(fit3)
If you're interested in combining this with the original data, and also in the residual and standard errors of the predictions, you can use the augment function in my broom package:
library(broom)
augment(fit3)
A full analysis might look something like:
library(survival)
library(broom)
data <- data.frame(zaman, test, WBC, status)
subdata <- data[data$test == 1, ]
fit3 <- survreg( Surv(zaman, status) ~ log(WBC), subdata, dist="w")
augment(fit3, subdata)
With the output:
zaman test WBC status .fitted .se.fit .resid
1 65 1 2.30 1 115.46728 43.913188 -50.467281
2 156 1 0.75 1 197.05852 108.389586 -41.058516
3 100 1 4.30 1 85.67236 26.043277 14.327641
4 134 1 2.60 1 108.90836 39.624106 25.091636
5 16 1 6.00 1 73.08498 20.029707 -57.084979
6 108 1 10.50 1 55.96298 13.989099 52.037022
7 121 1 10.00 1 57.28065 14.350609 63.719348
8 4 1 17.00 1 44.47189 11.607368 -40.471888
9 39 1 5.40 1 76.85181 21.708514 -37.851810
10 143 1 7.00 1 67.90395 17.911170 75.096054
11 56 1 9.40 1 58.99643 14.848751 -2.996434
12 26 1 32.00 1 32.88935 10.333303 -6.889346
13 22 1 35.00 1 31.51314 10.219871 -9.513136
14 1 1 100.00 1 19.09922 8.963022 -18.099216
15 1 1 100.00 1 19.09922 8.963022 -18.099216
16 5 1 52.00 1 26.09034 9.763728 -21.090343
17 65 1 100.00 1 19.09922 8.963022 45.900784
In this case, the .fitted column is the predicted probabilities.

Related

Using a loop to run Linear Models over multiple populations exporting AIC values and coeficients

I have data that includes multiple populations (which each contain multiple individuals per populations). I am trying to evaluate resource selection through using linear models to evaluate use. I want to do this at the population level, and therefore am hoping to use a loop to loop through running 6 models on each population, and then provide AIC tables for each population as well as the coefficients for the best fitting models.
Here is a sample of my data:
Population WLH_ID Used Var1 Var2 Var3 Var4 Var5
Tweed 1 1 15 2 10 21 22.1
Tweed 2 1 7 3 9 20 20
Lake 3 1 11 2 7 19 20
Lake 4 1 13 2 8 21 20
Hwy 5 1 14 1 6 12 23
Hwy 6 1 10 2 7 17 20
Jasper 7 1 12 4 7 19 22
Tweed 1 0 15 2 10 21 22.1
Tweed 2 0 7 3 9 20 20
Lake 3 0 11 2 7 19 20
Lake 4 0 11 2 8 21 20
Hwy 5 0 12 1 5 23 23
Hwy 6 0 14 7 7 17 20
Jasper 7 0 17 2 4 19 21.5
So far I have tried the following
Model1 <- as.formula(Used ~ var1+var2+var3+var4+var5+(1|WLH_ID))
Model2 <- as.formula(Used ~ var1+var2+var3+var4+(1|WLH_ID))
Model3 <- as.formula(Used ~ var1+var2+var3+(1|WLH_ID))
Model4 <- as.formula(Used ~ var1+var2+(1|WLH_ID))
Model5 <- as.formula(Used ~ var1+(1|WLH_ID))
Model6<- as.formula(Used~1)
### It will use the Model1 formula entered above, so make sure you have run that
SM.split <- split(mydata,mydata$Population) #Split the data into a list with one entry for each population
for (i in 1:length (SM.split)){
poprun<-SM.split[[i]]
Cand.models[[1]]<-glmer(Model1,family = binomial,data=poprun)
Cand.models[[2]]<-glmer(Model2,family = binomial,data=poprun)
Cand.models[[3]]<-glmer(Model3,family = binomial,data=poprun)
Cand.models[[4]]<-glmer(Model4,family = binomial,data=poprun)
Cand.models[[5]]<-glmer(Model5,family = binomial,data=poprun)
Cand.models[[6]]<-glmer(Model6,family = binomial,data=poprun)
Modnames[[i]]<-paste("mod",1:length(Cand.models),sep=" ")
AICTable[[i]]<-aictab(cand.set = Cand.models, modnames = Modnames, sort = TRUE)
}
I receive the following error:
Error in pwrssUpdate(pp, resp, tol = tolPwrss, GQmat = GQmat, compDev = compDev, :
pwrssUpdate did not converge in (maxit) iterations
Alternatively I am really trying to split the data and have R loop through each population separately, and then tell me for that population what is the model with the lowest AIC and what are the coefficients for that model. In reality I have 30+ populations and am comparing 6 models so I am hoping to not have to write out each one.
NOTE: I do not have a good way to make up data to attach, and cannot attach my own.

Flexsurvreg with treatment covariate. Separating variance covariance matrix

I am using flexsurvreg() to fit and extrapolate parametric models on survival data. I use the treatment group as a covariate to make a proportional hazard model. I need a variance covariance matrix separately for the two treatment groups but I am unable to find out how I separate the groups after fitting the parametric model.
weib <- flexsurvreg(Surv(os_mnts, censoring) ~ treat, data = date_ex, dist = "weibull")
An example of the data is below. I do have treat == control as well even though it does not show here.
#sx_date last_fup_date censoring sex treat os_mnts
# <date> <date> <dbl> <dbl> <chr> <dbl>
# 1 2010-06-03 2013-08-10 0 1 treatment 38.2
# 2 2013-06-10 2014-09-09 1 1 treatment 15.0
# 3 2014-11-05 2015-07-03 0 0 treatment 7.89
# 4 2011-03-07 2014-08-10 1 1 treatment 41.1
# 5 2010-03-06 2013-12-11 0 1 treatment 45.2
# 6 2011-09-08 2015-01-01 0 1 treatment 39.8
# 7 2008-10-09 2016-06-02 1 0 treatment 91.8
# 8 2010-02-11 2015-01-02 1 1 treatment 58.7
# 9 2009-08-06 2014-07-06 0 1 treatment 59.0
#10 2011-07-03 2016-04-03 0 0 treatment 57.0
When I call vcov(weib) to get the variance covariance matrix, I get the following.
# shape scale treattreatment
#shape 0.0218074155 -0.004631324 -0.0001595603
#scale -0.0046313242 0.007912648 -0.0068951896
#treattreatment -0.0001595603 -0.006895190 0.0138593195
However, I need two variance covariance matrices (1 for each treatment group) with shape and scale only.
I have tried searching for way to separate the matrix itself and to subset the weib object. However I cannot find how to do either of these things. Does anyone know how I can get separate matrices out of this?

How to include the interaction between a covariate and time for a non-proportional hazards model?

How to include the interaction between a covariate and and time for a non-proportional hazards model?
I often find that the proportional hazards assumption for the Cox regressions doesn’t hold.
Take the following data as an example.
> head(data2)
no np_p age_dx1 race1 mr_dx er_1 pr_1 sct_1 surv_mo km_stts1
1 20 1 2 4 1 2 2 4 52 1
2 33 1 3 1 2 1 2 1 11 1
3 67 1 2 4 4 1 1 3 20 1
4 90 1 3 1 3 3 3 2 11 1
5 143 1 2 4 3 1 1 2 123 0
6 180 1 3 1 3 1 1 2 9 1
First, I fitted a Cox regression model.
> fit2 <- coxph(Surv(surv_mo, km_stts1) ~ np_p + age_dx1 + race1 + mr_dx + er_1 + pr_1 + sct_1, data = data)
Second, I assessed the proportional hazards assumption.
> check_PH2 <- cox.zph(fit2, transform = "km")
> check_PH2
rho chisq p
np_p 0.00946 0.0748 7.84e-01
age_dx1 -0.00889 0.0640 8.00e-01
race1 -0.03148 0.7827 3.76e-01
mr_dx -0.03120 0.7607 3.83e-01
er_1 -0.14741 18.5972 1.61e-05
pr_1 0.05906 2.9330 8.68e-02
sct_1 0.17651 23.8030 1.07e-06
GLOBAL NA 53.2844 3.26e-09
So, this means that the hazard function of er_1 and sct_1 were nonproportional over time (Right?).
In my opinion, I can include the interaction between these two covariates and time seperately in the model. But I don't know how to perform it using R.
Thank you.

Survdiff() output fields in R

my question is about the output structure of survdiff() function form the 'survival' library in R. Namely, I have a data frame containing survival data
> dat
ID Time Treatment Gender Censored
1 E002 2.7597536 IND F 0
2 E003 4.2710472 Control M 0
3 E005 1.4784394 IND F 0
4 E006 6.8993840 Control F 1
5 E008 9.5934292 IND M 0
6 E009 2.9897331 Control F 0
7 E014 1.3470226 IND F 1
8 E016 2.1683778 Control F 1
9 E018 2.7597536 IND F 1
10 E022 1.3798768 IND F 0
11 E023 0.7227926 IND M 1
12 E024 5.5195072 IND F 0
13 E025 2.4640657 Control F 0
14 E028 7.4579055 Control M 1
15 E029 5.5195072 Control F 1
16 E030 2.7926078 IND M 0
17 E031 4.9938398 Control F 0
18 E032 2.7268994 IND M 0
19 E033 0.1642710 IND M 1
20 E034 4.1396304 Control F 0
and a model
> diff = survdiff(Surv(Time, Censored) ~ Treatment+Gender, data = dat)
> diff
Call:
survdiff(formula = Surv(Time, Censored) ~ Treatment + Gender,
data = dat)
N Observed Expected (O-E)^2/E (O-E)^2/V
Treatment=Control, Gender=M 2 1 1.65 0.255876 0.360905
Treatment=Control, Gender=F 7 3 2.72 0.027970 0.046119
Treatment=IND, Gender=M 5 2 2.03 0.000365 0.000519
Treatment=IND, Gender=F 6 2 1.60 0.100494 0.139041
Chisq= 0.5 on 3 degrees of freedom, p= 0.924
I'm wondering what's the field of the output object that contains the values from the very right column (O-E)^2/V? I'd like to use them further but can't obtain them neither from diff\$obs, diff\$exp, diff\$var nor from their combinations.
Your help's gonna be much appreciated.
For (O-E)^2/V try something like
rowSums(diff$obs - diff$exp)^2 / diag(diff$var)
while for (O-E)^2/E try something like
rowSums(diff$obs - diff$exp)^2 / rowSums(diff$exp)

Multivariate Linear Mixed Model in lme4

I wonder how to fit multivariate linear mixed model with lme4. I fitted univariate linear mixed models with the following code:
library(lme4)
lmer.m1 <- lmer(Y1~A*B+(1|Block)+(1|Block:A), data=Data)
summary(lmer.m1)
anova(lmer.m1)
lmer.m2 <- lmer(Y2~A*B+(1|Block)+(1|Block:A), data=Data)
summary(lmer.m2)
anova(lmer.m2)
I'd like to know how to fit multivariate linear mixed model with lme4. The data is below:
Block A B Y1 Y2
1 1 1 135.8 121.6
1 1 2 149.4 142.5
1 1 3 155.4 145.0
1 2 1 105.9 106.6
1 2 2 112.9 119.2
1 2 3 121.6 126.7
2 1 1 121.9 133.5
2 1 2 136.5 146.1
2 1 3 145.8 154.0
2 2 1 102.1 116.0
2 2 2 112.0 121.3
2 2 3 114.6 137.3
3 1 1 133.4 132.4
3 1 2 139.1 141.8
3 1 3 157.3 156.1
3 2 1 101.2 89.0
3 2 2 109.8 104.6
3 2 3 111.0 107.7
4 1 1 124.9 133.4
4 1 2 140.3 147.7
4 1 3 147.1 157.7
4 2 1 110.5 99.1
4 2 2 117.7 100.9
4 2 3 129.5 116.2
Thank in advance for your time and cooperation.
This can sometimes be faked satisfactorily in nlme/lme4 by simply reformatting your data like
require(reshape)
Data = melt(data, id.vars=1:3, variable_name='Y')
Data$Y = factor(gsub('Y(.+)', '\\1', Data$Y))
> Data
Block A B Y value
1 1 1 1 1 135.8
2 1 1 2 1 149.4
3 1 1 3 1 155.4
4 1 2 1 1 105.9
5 1 2 2 1 112.9
6 1 2 3 1 121.6
...
and then including the new variable Y in your linear mixed model.
However, for true Multivariate Generalized Linear Mixed Models (MGLMM), you will probably need the sabreR package or similar. There is also an entire book to accompany the package, Multivariate Generalized Linear Mixed Models Using R. If you have a proxy to a subscribing institution, you might even be able to download it for free from http://www.crcnetbase.com/isbn/9781439813270. I would refer you there for any further advice, as this is a meaty topic and I am very much a novice.
lmer and its elder sibling lme are inherently "one parameter left of ~". Have a look at the car packages; it offers no off-the shelf repeated measurement support, but you will find a few comments on the subject by searching the R list:
John Fox on car package
#John's answer above should be largely right. You add a dummy variable (ie--the factor variable Y) to the model. Here you have 3 subscripts i= 1...N for observations, j=1,...,4 for blocks, and h=1,2 for the dependent var. But you also need to force the level 1 error term to 0 (or to near zero), which I'm not sure lme4 does. Ben Bolker might provide more information. This is described more in Goldstein (2011) Chap 6 and Chap 7 for latent multivariate models.
IE
Y_hij = \beta_{01} z_{1ij} + \beta_{02} z_{2ij} + \beta X + u_{1j} z_{1ij} + u_{2j} z_{2ij}
So:
require(reshape2)
Data = melt(data, id.vars=1:3, variable_name='Y')
Data$Y = factor(gsub('Y(.+)', '\\1', Data$Y))
m1 <- lmer(value ~ Y + A*B + (1|Block) + (1|Block*A), data= Data)
# not sure how to set the level 1 variance to 0, #BenBolker
# also unclear to me if you're requesting Y*A*B instead of Y + A*B

Resources