Flexsurvreg with treatment covariate. Separating variance covariance matrix - r

I am using flexsurvreg() to fit and extrapolate parametric models on survival data. I use the treatment group as a covariate to make a proportional hazard model. I need a variance covariance matrix separately for the two treatment groups but I am unable to find out how I separate the groups after fitting the parametric model.
weib <- flexsurvreg(Surv(os_mnts, censoring) ~ treat, data = date_ex, dist = "weibull")
An example of the data is below. I do have treat == control as well even though it does not show here.
#sx_date last_fup_date censoring sex treat os_mnts
# <date> <date> <dbl> <dbl> <chr> <dbl>
# 1 2010-06-03 2013-08-10 0 1 treatment 38.2
# 2 2013-06-10 2014-09-09 1 1 treatment 15.0
# 3 2014-11-05 2015-07-03 0 0 treatment 7.89
# 4 2011-03-07 2014-08-10 1 1 treatment 41.1
# 5 2010-03-06 2013-12-11 0 1 treatment 45.2
# 6 2011-09-08 2015-01-01 0 1 treatment 39.8
# 7 2008-10-09 2016-06-02 1 0 treatment 91.8
# 8 2010-02-11 2015-01-02 1 1 treatment 58.7
# 9 2009-08-06 2014-07-06 0 1 treatment 59.0
#10 2011-07-03 2016-04-03 0 0 treatment 57.0
When I call vcov(weib) to get the variance covariance matrix, I get the following.
# shape scale treattreatment
#shape 0.0218074155 -0.004631324 -0.0001595603
#scale -0.0046313242 0.007912648 -0.0068951896
#treattreatment -0.0001595603 -0.006895190 0.0138593195
However, I need two variance covariance matrices (1 for each treatment group) with shape and scale only.
I have tried searching for way to separate the matrix itself and to subset the weib object. However I cannot find how to do either of these things. Does anyone know how I can get separate matrices out of this?

Related

How to retrieve the data frame used in a GEE model fit?

I have a longitudinal data frame with multiple rows per id.
> data("dietox")
> head(dietox, 5)
Pig Evit Cu Litter Start Weight Feed Time
1 4601 Evit000 Cu000 1 26.5 26.50000 NA 1
2 4601 Evit000 Cu000 1 26.5 27.59999 5.200005 2
3 4601 Evit000 Cu000 1 26.5 36.50000 17.600000 3
4 4601 Evit000 Cu000 1 26.5 40.29999 28.500000 4
5 4601 Evit000 Cu000 1 26.5 49.09998 45.200001 5
I am trying to fit a GEE model to predict Weight for each row of the data frame.
library(gee)
library(dplyr)
> model1 <- gee(Weight ~ Start + Feed, id=Pig, data=dietox, corstr="exchangeable")
> model1
GEE: GENERALIZED LINEAR MODELS FOR DEPENDENT DATA
gee S-function, version 4.13 modified 98/01/27 (1998)
Model:
Link: Identity
Variance to Mean Relation: Gaussian
Correlation Structure: Exchangeable
Call:
gee(formula = Weight ~ Start + Feed, id = Pig, data = dietox,
corstr = "exchangeable")
Number of observations : 789
Maximum cluster size : 11
Coefficients:
(Intercept) Start Feed
5.1539561 0.9384232 0.4294209
I now want to be able to add a new column to the data frame- prediction, which contains the predicted weight value for each row of data. The idea is that I will then be able to compare the original Weight variable with the prediction variable at different points in the Time variable.
When I try do do this using mutate and predict functions, I get an error saying that the number of observations used in the model fit (789) is different from the number of observations in the original data frame (861).
> new_df <- dietox %>%
+ mutate(prediction = predict(model1))
Error: Column `prediction` must be length 861 (the number of rows) or one, not 789
My questions are:
1. How do I extract the data frame for the 789 observations that
were used in the model fit?
2. Why is the number of observations
used in the model fit different to the total number of observations
in the original data frame?
The 789 observations used in model fitting were the ones which were without NA. You had 72 observations as NA in Feed column
sum(is.na(dietox$Feed))
#[1] 72
and 789 + 72 gives you complete 861 observations. To get all the predicted values you could do
dietox$Prediction <- NA
dietox$Prediction[!is.na(dietox$Feed)] <- predict(model1)
head(dietox)
# Weight Feed Time Pig Evit Cu Litter Prediction
#1 26.50000 NA 1 4601 1 1 1 NA
#2 27.59999 5.200005 2 4601 1 1 1 31.43603
#3 36.50000 17.600000 3 4601 1 1 1 36.76708
#4 40.29999 28.500000 4 4601 1 1 1 41.45324
#5 49.09998 45.200001 5 4601 1 1 1 48.63296
#6 55.39999 56.900002 6 4601 1 1 1 53.66306
Also the values which were used in the model are present in model1$y.

Creating new variables after imputation with the MICE package

I have longitudinal panel data of 1000 individuals measured at two time points. Using the MICE package I have imputed values for those variables with missing data. The imputation itself works fine, generating the required 17 imputed data frames. One of the imputed variables is fitness. I would like to create a new variable of fitness scaled, scale(fitness). My understanding is that I should impute first, and then create the new variable with the imputed data. How do I access each of the 17 imputed datasets and generate a scaled fitness variable in each?
My original data frame looks like (some variables missing):
id age school sex andersen ldl_c_trad pre_post
<dbl> <dbl> <fct> <fct> <int> <dbl> <fct>
1 2 10.7 1 1 951 2.31 1
2 2 11.3 1 1 877 2.20 2
3 3 11.3 1 1 736 2.88 1
4 3 11.9 1 1 668 3.36 2
5 4 10.1 1 0 872 3.31 1
6 4 10.7 1 0 905 2.95 2
7 5 10.5 1 1 925 2.02 1
8 5 11.0 1 1 860 1.92 2
9 8 10.7 1 1 767 3.41 1
10 8 11.2 1 1 709 3.32 2
My imputation code is:
imputed <- mice(imp_vars, method = meth, predictorMatrix = predM, m = 17)
imp_vars are the variables selected for imputation.
I have pre-specified both the method and predictor matrix.
Also, my assumption is that the scaling should be performed separately for each time point, as fitness is likely to have improved over time. Is it possible to perform the scaling filtered by pre_post for each imputed dataset?
Many thanks.
To access each of the imputations where x is a value from 1-17
data <- complete(imputed, x)
or if you want access to the fitness variable
complete(imputed, x)$fitness
If you want to filter observations according to a value of another variable in the dataframe, you could use
data[which(data$pre_post==1), "fitness"]
This should return the fitness observations for when pre_post==1, from there it is simply a matter of scaling these observations for each level of pre_post, assigning them to another variable fitness_scaled and then repeating for each imputation 1-17.

covariance structure for multilevel modelling

I have a multilevel repeated measures dataset of around 300 patients each with up to 10 repeated measures predicting troponin rise. There are other variables in the dataset, but I haven't included them here.
I am trying to use nlme to create a random slope, random intercept model where effects vary between patients, and effect of time is different in different patients. When I try to introduce a first-order covariance structure to allow for the correlation of measurements due to time I get the following error message.
Error in `coef<-.corARMA`(`*tmp*`, value = value[parMap[, i]]) : Coefficient matrix not invertible
I have included my code and a sample of the dataset, and I would be very grateful for any words of wisdom.
#baseline model includes only the intercept. Random slopes - intercept varies across patients
randomintercept <- lme(troponin ~ 1,
data = df, random = ~1|record_id, method = "ML",
na.action = na.exclude,
control = list(opt="optim"))
#random intercept and time as fixed effect
timeri <- update(randomintercept,.~. + day)
#random slopes and intercept: effect of time is different in different people
timers <- update(timeri, random = ~ day|record_id)
#model covariance structure. corAR1() first order autoregressive covariance structure, timepoints equally spaced
armodel <- update(timers, correlation = corAR1(0, form = ~day|record_id))
Error in `coef<-.corARMA`(`*tmp*`, value = value[parMap[, i]]) : Coefficient matrix not invertible
Data:
record_id day troponin
1 1 32
2 0 NA
2 1 NA
2 2 NA
2 3 8
2 4 6
2 5 7
2 6 7
2 7 7
2 8 NA
2 9 9
3 0 14
3 1 1167
3 2 1935
4 0 19
4 1 16
4 2 29
5 0 NA
5 1 17
5 2 47
5 3 684
6 0 46
6 1 45440
6 2 47085
7 0 48
7 1 87
7 2 44
7 3 20
7 4 15
7 5 11
7 6 10
7 7 11
7 8 197
8 0 28
8 1 31
9 0 NA
9 1 204
10 0 NA
10 1 19
You can fit this if you change your optimizer to "nlminb" (or at least it works with the reduced data set you posted).
armodel <- update(timers,
correlation = corAR1(0, form = ~day|record_id),
control=list(opt="nlminb"))
However, if you look at the fitted model, you'll see you have problems - the estimated AR1 parameter is -1 and the random intercept and slope terms are correlated with r=0.998.
I think the problem is with the nature of the data. Most of the data seem to be in the range 10-50, but there are excursions by one or two orders of magnitude (e.g. individual 6, up to about 45000). It might be hard to fit a model to data this spiky. I would strongly suggest log-transforming your data; the standard diagnostic plot (plot(randomintercept)) looks like this:
whereas fitting on the log scale
rlog <- update(randomintercept,log10(troponin) ~ .)
plot(rlog)
is somewhat more reasonable, although there is still some evidence of heteroscedasticity.
The AR+random-slopes model fits OK:
ar.rlog <- update(rlog,
random = ~day|record_id,
correlation = corAR1(0, form = ~day|record_id))
## Linear mixed-effects model fit by maximum likelihood
## ...
## Random effects:
## Formula: ~day | record_id
## Structure: General positive-definite, Log-Cholesky parametrization
## StdDev Corr
## (Intercept) 0.1772409 (Intr)
## day 0.6045765 0.992
## Residual 0.4771523
##
## Correlation Structure: ARMA(1,0)
## Formula: ~day | record_id
## Parameter estimate(s):
## Phi1
## 0.09181557
## ...
A quick glance at intervals(ar.rlog) shows that the confidence intervals on the autoregressive parameter are (-0.52,0.65), so it may not be worth keeping ...
With the random slopes in the model the heteroscedasticity no longer seems problematic ...
plot(rlog,sqrt(abs(resid(.)))~fitted(.),type=c("p","smooth"))

How to calculate survival probabilities in R?

I am trying to fit a parametric survival model. I think I managed to do so. However, I could not succeed in calculating the survival probabilities:
library(survival)
zaman <- c(65,156,100,134,16,108,121,4,39,143,56,26,22,1,1,5,65,
56,65,17,7,16,22,3,4,2,3,8,4,3,30,4,43)
test <- c(rep(1,17),rep(0,16))
WBC <- c(2.3,0.75,4.3,2.6,6,10.5,10,17,5.4,7,9.4,32,35,100,
100,52,100,4.4,3,4,1.5,9,5.3,10,19,27,28,31,26,21,79,100,100)
status <- c(rep(1,33))
data <- data.frame(zaman,test,WBC)
surv3 <- Surv(zaman[test==1], status[test==1])
fit3 <- survreg( surv3 ~ log(WBC[test==1]),dist="w")
On the other hand, no problem at all while calculating the survival probabilities using the Kaplan-Meier Estimation:
fit2 <- survfit(Surv(zaman[test==0], status[test==0]) ~ 1)
summary(fit2)$surv
Any idea why?
You can get the predicted probabilities from a survreg object with predict:
predict(fit3)
If you're interested in combining this with the original data, and also in the residual and standard errors of the predictions, you can use the augment function in my broom package:
library(broom)
augment(fit3)
A full analysis might look something like:
library(survival)
library(broom)
data <- data.frame(zaman, test, WBC, status)
subdata <- data[data$test == 1, ]
fit3 <- survreg( Surv(zaman, status) ~ log(WBC), subdata, dist="w")
augment(fit3, subdata)
With the output:
zaman test WBC status .fitted .se.fit .resid
1 65 1 2.30 1 115.46728 43.913188 -50.467281
2 156 1 0.75 1 197.05852 108.389586 -41.058516
3 100 1 4.30 1 85.67236 26.043277 14.327641
4 134 1 2.60 1 108.90836 39.624106 25.091636
5 16 1 6.00 1 73.08498 20.029707 -57.084979
6 108 1 10.50 1 55.96298 13.989099 52.037022
7 121 1 10.00 1 57.28065 14.350609 63.719348
8 4 1 17.00 1 44.47189 11.607368 -40.471888
9 39 1 5.40 1 76.85181 21.708514 -37.851810
10 143 1 7.00 1 67.90395 17.911170 75.096054
11 56 1 9.40 1 58.99643 14.848751 -2.996434
12 26 1 32.00 1 32.88935 10.333303 -6.889346
13 22 1 35.00 1 31.51314 10.219871 -9.513136
14 1 1 100.00 1 19.09922 8.963022 -18.099216
15 1 1 100.00 1 19.09922 8.963022 -18.099216
16 5 1 52.00 1 26.09034 9.763728 -21.090343
17 65 1 100.00 1 19.09922 8.963022 45.900784
In this case, the .fitted column is the predicted probabilities.

Multivariate Linear Mixed Model in lme4

I wonder how to fit multivariate linear mixed model with lme4. I fitted univariate linear mixed models with the following code:
library(lme4)
lmer.m1 <- lmer(Y1~A*B+(1|Block)+(1|Block:A), data=Data)
summary(lmer.m1)
anova(lmer.m1)
lmer.m2 <- lmer(Y2~A*B+(1|Block)+(1|Block:A), data=Data)
summary(lmer.m2)
anova(lmer.m2)
I'd like to know how to fit multivariate linear mixed model with lme4. The data is below:
Block A B Y1 Y2
1 1 1 135.8 121.6
1 1 2 149.4 142.5
1 1 3 155.4 145.0
1 2 1 105.9 106.6
1 2 2 112.9 119.2
1 2 3 121.6 126.7
2 1 1 121.9 133.5
2 1 2 136.5 146.1
2 1 3 145.8 154.0
2 2 1 102.1 116.0
2 2 2 112.0 121.3
2 2 3 114.6 137.3
3 1 1 133.4 132.4
3 1 2 139.1 141.8
3 1 3 157.3 156.1
3 2 1 101.2 89.0
3 2 2 109.8 104.6
3 2 3 111.0 107.7
4 1 1 124.9 133.4
4 1 2 140.3 147.7
4 1 3 147.1 157.7
4 2 1 110.5 99.1
4 2 2 117.7 100.9
4 2 3 129.5 116.2
Thank in advance for your time and cooperation.
This can sometimes be faked satisfactorily in nlme/lme4 by simply reformatting your data like
require(reshape)
Data = melt(data, id.vars=1:3, variable_name='Y')
Data$Y = factor(gsub('Y(.+)', '\\1', Data$Y))
> Data
Block A B Y value
1 1 1 1 1 135.8
2 1 1 2 1 149.4
3 1 1 3 1 155.4
4 1 2 1 1 105.9
5 1 2 2 1 112.9
6 1 2 3 1 121.6
...
and then including the new variable Y in your linear mixed model.
However, for true Multivariate Generalized Linear Mixed Models (MGLMM), you will probably need the sabreR package or similar. There is also an entire book to accompany the package, Multivariate Generalized Linear Mixed Models Using R. If you have a proxy to a subscribing institution, you might even be able to download it for free from http://www.crcnetbase.com/isbn/9781439813270. I would refer you there for any further advice, as this is a meaty topic and I am very much a novice.
lmer and its elder sibling lme are inherently "one parameter left of ~". Have a look at the car packages; it offers no off-the shelf repeated measurement support, but you will find a few comments on the subject by searching the R list:
John Fox on car package
#John's answer above should be largely right. You add a dummy variable (ie--the factor variable Y) to the model. Here you have 3 subscripts i= 1...N for observations, j=1,...,4 for blocks, and h=1,2 for the dependent var. But you also need to force the level 1 error term to 0 (or to near zero), which I'm not sure lme4 does. Ben Bolker might provide more information. This is described more in Goldstein (2011) Chap 6 and Chap 7 for latent multivariate models.
IE
Y_hij = \beta_{01} z_{1ij} + \beta_{02} z_{2ij} + \beta X + u_{1j} z_{1ij} + u_{2j} z_{2ij}
So:
require(reshape2)
Data = melt(data, id.vars=1:3, variable_name='Y')
Data$Y = factor(gsub('Y(.+)', '\\1', Data$Y))
m1 <- lmer(value ~ Y + A*B + (1|Block) + (1|Block*A), data= Data)
# not sure how to set the level 1 variance to 0, #BenBolker
# also unclear to me if you're requesting Y*A*B instead of Y + A*B

Resources