I used the MuMIN package to do a model averaging based on information criterion, following this question.
options(na.action = "na.fail")
Create a global model with all variables and two way interaction:
global.model<-lmer(yld.res ~ rain + brk +
onset + wid + (1|state),data=data1,REML="FALSE")
Standardise the glboal model since the variables are on different scale
stdz.model <- standardize(global.model,standardize.y = TRUE)
Create all possible combinations of the model
model.set <- dredge(stdz.model)
Get the best model based on deltaAICc<2 criteria
top.models <- get.models(model.set, subset= delta<2)
Average the models to calculate the effect size (standardised slopes of the input variables)
s<-model.avg(top.models)
summary(s);confint(s)
The effect size of the variables are as follows:
Variable slope estimate
brk -0.28
rain 0.13
wid 0.10
onset 0.09
As you can see, I had standardize my model in step 3 so I can compare these slope estimates i.e. I can say slope estimate of brk is greater (in negative direction) than rain. However, since these slope estimates where standardised, I wanted to know if there is any way I can get the unstandardised slopes?
Please let me know if my question is not clear.
Thanks
Related
We are currently working with plant phenology.
We built a linear mixed model for each species present in the study area.
We set Days From Snowmelt (The sum of days from snowmelt to the visit day along the summer) as the response variable while Mean phenology (mean phenology state for each plot ( there are 3 on each locality) is calculated by the mean phenological state from the 12 subplots into each plot is divided. from 1-6, the higher the number the more advanced the cycle). year and plot nested within the locality are set as random factors.
Once the model is built and revised, we want to predict the days from snowmelt for each species to achieve the phenological phases of interest, which happen to have a mean of 2, 3, 4, and 5. (corresponding to vegetative, flowering, fruit development and dispersion, respectively)
I have tried the function predict() but I get no heterogeneity between phases for each species, the progression seems to be linear (as shown in the image file).
Could this be just because is a linear model so will it only give linear responses? Are there any other ways to get predictions from these kinds of models and show their CI?
How can i get predictions with CI from lmerTest models?
I think you probably mean pediction intervals. You can use the predictInterval function in the merTools package. For example:
library(lmerTest); library(merTools)
fm1 <- lmer(Reaction ~ Days + (Days|Subject), data = sleepstudy)
head(predictInterval(fm1, level = 0.95, seed = 123, n.sims = 100))
Could this be just because is a linear model so will it only give linear responses?
Yes ! If you fit a linear model, then the predictions will be linear. Of course, you can model nonlinearity with a linear model in several ways including transformation(s), nonlinear terms (the model is still linear in the parameters) and splines.
I have longitudinal data from two surveys and I want to do a pre-post analysis. Normally, I would use survey::svyglm() or svyVGAM::svy_vglm (for multinomial family) to include sampling weights, but these functions don't account for the random effects. On the other hand, lme4::lmer accounts for the repeated measures, but not the sampling weights.
For continuous outcomes, I understand that I can do
w_data_wide <- svydesign(ids = ~1, data = data_wide, weights = data_wide$weight)
svyglm((post-pre) ~ group, w_data_wide)
and get the same estimates that I would get if I could use lmer(outcome ~ group*time + (1|id), data_long) with weights [please correct me if I'm wrong].
However, for categorical variables, I don't know how to do the analyses. WeMix::mix() has a parameter weights, but I'm not sure if it treats them as sampling weights. Still, this function can't support multinomial family.
So, to resume: can you enlighten me on how to do a pre-post test analysis of categorical outcomes with 2 or more levels? Any tips about packages/functions in R and how to use/write them would be appreciated.
I give below some data sets with binomial and multinomial outcomes:
library(data.table)
set.seed(1)
data_long <- data.table(
id=rep(1:5,2),
time=c(rep("Pre",5),rep("Post",5)),
outcome1=sample(c("Yes","No"),10,replace=T),
outcome2=sample(c("Low","Medium","High"),10,replace=T),
outcome3=rnorm(10),
group=rep(sample(c("Man","Woman"),5,replace=T),2),
weight=rep(c(1,0.5,1.5,0.75,1.25),2)
)
data_wide <- dcast(data_long, id~time, value.var = c('outcome1','outcome2','outcome3','group','weight'))[, `:=` (weight_Post = NULL, group_Post = NULL)]
EDIT
As I said below in the comments, I've been using lmer and glmer with variables used to calculate the weights as predictors. It happens that glmer returns a lot of problems (convergence, high eigenvalues...), so I give another look at #ThomasLumley answer in this post and others (https://stat.ethz.ch/pipermail/r-help/2012-June/315529.html | https://stats.stackexchange.com/questions/89204/fitting-multilevel-models-to-complex-survey-data-in-r).
So, my question is now if a can use participants id as clusters in svydesign
library(survey)
w_data_long_cluster <- svydesign(ids = ~id, data = data_long, weights = data_long$weight)
summary(svyglm(factor(outcome1) ~ group*time, w_data_long_cluster, family="quasibinomial"))
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.875e+01 1.000e+00 18.746 0.0339 *
groupWoman -1.903e+01 1.536e+00 -12.394 0.0513 .
timePre 5.443e-09 5.443e-09 1.000 0.5000
groupWoman:timePre 2.877e-01 1.143e+00 0.252 0.8431
and still interpret groupWoman:timePre as differences in the average rate of change/improvement in the outcome over time between sex groups, as if I was using mixed models with participants as random effects.
Thank you once again!
A linear model with svyglm does not give the same parameter estimates as lme4::lmer. It does estimate the same parameters as lme4::lmer if the model is correctly specified, though.
Generalised linear models with svyglm or svy_vglm don't estimate the same parameters as lme4::glmer, as you note. However, they do estimate perfectly good regression parameters and if you aren't specifically interested in the variance components or in estimating the realised random effects (BLUPs) I would recommend just using svy_glm.
Another option if you have non-survey software for random effects versions of the models is to use that. If you scale the weights to sum to the sample size and if all the clustering in the design is modelled by random effects in the model, you will get at least a reasonable approximation to valid inference. That's what I've seen recommended for Bayesian survey modelling, for example.
# Create the simplest test data set
test1 <- list(time=c(4,3,1,1,2,2,3),
status=c(1,1,1,0,1,1,0),
x=c(0,2,1,1,1,0,0),
sex=c(0,0,0,0,1,1,1))
# Fit a stratified model
m=coxph(Surv(time, status) ~ x + sex, test1)
y=predict(m,type="survival",by="sex")
Basically what I am doing is making fake data called test1, then I am fitting a simple coxph model and saving it as 'm'. Then what I aim to do is get the predicted probabilities and confidence bands for the survival probability separate for sexes. My hopeful dataset 'y' will include: age, survival probability, lower confidence band, upper confidence band, and sex which equals to '0' or '1'.
This can be accomplished in two ways. The first is a slight modification to your code, using the predict() function to get predictions at a specific times for specific combinations of covariates. The second is by using the survfit() function, which estimates the entire survival curve and is easy to plot. The confidence intervals don't exactly agree as we'll see, but they should match fairly closely as long as the probabilities aren't too close to 1 or 0.
Below is code to both make the predictions as your code tries. It uses the built-in cancer data. The important difference is to create a newdata which has the covariate values you're interested in. Because of the non-linear nature of survival probabilities it is generally a bad idea to try and make a prediction for the "average person". Because we want to get a survival probability we must also specify what time to consider that probability. I've taken time = 365, age = 60, and both sex = 1 and sex = 2 So this code predicts the 1-year survival probability for a 60 year old male and a 60 year old female. Note that we must also include status in the newdata, even though it doesn't affect the result.
library(survival)
mod <- coxph(Surv(time,status) ~ age + sex, data = cancer)
pred_dat <- data.frame(time = c(365,365), status = c(2,2),
age = c(60,60), sex = c(1,2))
preds <- predict(mod, newdata = pred_dat,
type = "survival", se.fit = TRUE)
pred_dat$prob <- preds$fit
pred_dat$lcl <- preds$fit - 1.96*preds$se.fit
pred_dat$ucl <- preds$fit + 1.96*preds$se.fit
pred_dat
#> time status age sex prob lcl ucl
#> 1 365 2 60 1 0.3552262 0.2703211 0.4401313
#> 2 365 2 60 2 0.5382048 0.4389833 0.6374264
We see that for a 60 year old male the 1 year survival probability is estimated as 35.5%, while for a 60 year old female it is 53.8%.
Below we estimate the entire survival curve using survfit(). I've saved time by reusing the pred_dat from above, and because the plot gets messy I've only plotted the male curve, which is the first row. I've also added some flair, but you only need the first 2 lines.
fit <- survfit(mod, newdata = pred_dat[1,])
plot(fit, conf.int = TRUE)
title("Estimated survival probability for age 60 male")
abline(v = 365, col = "blue")
abline(h = pred_dat[1,]$prob, col = "red")
abline(h = pred_dat[1,]$lcl, col = "orange")
abline(h = pred_dat[1,]$ucl, col = "orange")
Created on 2022-06-09 by the reprex package (v2.0.1)
I've overlaid lines corresponding to the predicted probabilities from part 1. The red line is the estimated survival probability at day 365 and the orange lines are the 95% confidence interval. The predicted survival probability matches, but if you squint closely you'll see the confidence interval doesn't match exactly. That's generally not a problem, but if it is a problem you should trust the ones from survfit() instead of the ones calculated from predict().
You can also dig into the values of fit to extract fitted probabilities and confidence bands, but the programming is a little more complicated because the desired time doesn't usually match exactly.
Section 5 of this document by Dimitris Rizopoulos discusses how to estimate Survival Probabilities from a Cox model. Dimitris Rizipoulos states:
the Cox model does not estimate the baseline hazard, and therefore we cannot directly obtain survival probabilities from it. To achieve that we need to combine it with a non-parametric estimator of the baseline hazard function. The most popular method to do that is to use the Breslow estimator. For a fitted Cox model from package survival these probabilities are calculated by function survfit(). As an illustration, we would like to derive survival probabilities from the following Cox model for the AIDS dataset:
He then goes on to provide R code that shows how to estimate Survival Probabilities at specific follow-up times.
I found this useful, it may help you too.
A normal Cox Regression is as following:
coxph(formula = Surv(time, status) ~ v1 + v2 + v3, data = x)
I've calculated the Inverse Propensity Treatment Weighting (IPTW) scores with the subsequent Propensity Scores.
Propensity scores can be calculated as following:
ps<-glm(treat~v1+v2+v3, family="binomial", data=x)
Weights used for IPTW are calculated as following:
weight <- ifelse (treat==1, 1/(ps), 1/(1-ps))
Every subject in the dataset can be weighted with aforementioned method (every subject does get a specific weight, calculated as above), but I see no place to put the weights in the 'normal' Cox regression formula.
Is there a Cox regression formula wherein we can assess the calculated weights to each subject and what R package or code is being used for these calculations?
Propensity score weighting method
(inverse probability weighting method)
R was used for the following statistical analysis.
Load the following R packages:
library(ipw)
library(survival)
Estimate propensity score for each ID in your data frame (base_model), based on variables.
The propensity score is the probability of assignment of treatment in the presence of given covariates (v).
As shown in your data,
PS estimation
ps_model <- glm(treatment~v1+v2+v3...., family = binomial, data = base_model)
summary(ps_model)
# view propensity score values
pscore <- ps_model$fitted.values
dataframe$propensityScore <- predict(ps_model, type = "response")
Calculate weights
#estimate weight for each patient
base_model$weight.ATE <- ifelse((base_model$treatment=="1"),(1/base_model$propensityScore), (1/(1-base_model$propensityScore)))
base_weight <- ipwpoint(exposure = treatment, family = "binomial", link="logit", numerator = ~1, denominator =~v1+v2+v3....vn, data = base_model, trunc=0.05) #truncation of 5% for few extreme weights if needed
Survival analysis: Cox regression
#time to event analysis with weights
HR5 <- coxph(Surv(time, event)~as.factor(treat_group), weights = weights.trunc, data = base_model)
summary(HR5)
weights argument was added based on the estimated weights earlier.
cobalt or tableOne packages of R would help you view balance in characteristics before and after propensity score weighting.
Good luck!
You can do like this using the DIVAT dataset from iptwsurvival package:
##Generate ID
DIVAT$ID<- 1:nrow(DIVAT)
We can calculate the IPTW as the average treatment effect instead as the average treatment effect among treated
DIVAT$p.score <- glm(retransplant ~ age + hla, data = DIVAT,
family = "binomial")$fitted.values
DIVAT$ate.weights <- with(DIVAT, retransplant * 1/p.score + (1-retransplant)* 1/(1-p.score))
Than we can perform a cox regression
####COX without weight
coxph(Surv(times, failures)~ retransplant, data=DIVAT)->fit
summary(fit)
Adding weight is quite easy
###COX with weight naive model
coxph(Surv(times, failures)~ retransplant, data=DIVAT, weights = ate.weights)->fit
summary(fit)
###COX with weight and robust estimation
coxph(Surv(times, failures)~ retransplant + cluster(ID), data=DIVAT, weights = ate.weights)->fit
summary(fit)
However, in this way the estimation of standard error is biased (please see Austin, Peter C. "Variance estimation when using inverse probability of treatment weighting (IPTW) with survival analysis." Statistics in medicine 35.30 (2016): 5642-5655.).
Austin suggested to rely on bootstrap estimator. However I'm stacked too, since I'm not able to find a way to perform this kind of analyses. If you found any answer please let me know.
I am encountering quite an annoying and to me incomprehensible problem, and I hope some of you can help me. I am trying to estimate the autoregression (influence of previous measurements of variable X on current measurement of X) for 4 groups that have a positively skewed distribution to various degrees. The theory is that more positively skewed distributions have less variance, and since the relationship between 2 variables depends on the amount of shared variance, positively skewed distributions have a smaller autoregression that more normally distributed variables.
I use simulations to investigate this, and generate data as follows: I simulate data for n people with tp time points. I use a fixed autoregressive parameter, phi (at .3 so we have a stationary process). To generate positively skewed distributions I use a chi-square distributed error. Individuals differ in the degrees of freedom that is used for the chi2 distributed errors. In other words, degrees of freedom is a level 2 variable (and is in itself chi2(1)-distributed). Individuals with a very low df get a very skewed distribution whereas individuals with a higher df get a more normal distribution.
for(i in 1:n) { # Loop over persons.
chi[i, 1] <- rchisq(1, df[i]) # Set initial value.
for(t in 2:(tp + burn)) { # Loop over time points.
chi[i, t] <- phi[i] * chi[i, t - 1] + # Autoregressive effect.
rchisq(1, df[i]) # Chi-square distributed error.
} # End loop over time points.
} # End loop over persons.
Now that I have the outcome variable generated, I put it in long format, I create a lagged predictor, and I person mean center the predictor (or group mean center, or cluster mean center, all the same). I call this lagged and centered predictor chi.pred. I make the subgroups based on the degrees of freedom of individuals. The 25% with a lowest df goes in subgroup 1, 26% - 50% in subgroup 2, etc.
The problem is this: fitting a multilevel (i.e. mixed or random effects model) autoregressive(1) model with family = inverse.gaussian and link = 'identity', using glmer() from the lme4 package gives me quite a lot of warnings. E.g. "degenerate Hessian", "large eigen value/ratio", "failed to converge with max|grad", etc.. I just don't get why.
The model I fit are
# Random intercept, but fixed slope with subgroups as level 2 predictor of slope.
lmer(chi ~ chi.pred + chi.pred:factor(sub.df.noise) + (1|id), data = sim.data, control = lmerControl(optimizer = 'bobyqa'))
# Random intercept and slope.
lmer(chi ~ chi.pred + (1 + chi.pred|id), data = sim.data, control = lmerControl(optimizer = 'bobyqa'))
The reason I use inverse gaussian is because it is said to work better on skewed data.
Does anybody have any clue why I can't fit the models? I have tried increasing sample size and time points, different optimizers, I have double-double-double checked if lagging and centering the data is correct, increased the number of iterations, added some noise to the subgroups (since otherwise they are 1 on 1 related to degree of freedom) etc.