(Need Help Please) Trend and Multilevel Model (intercept and slope and graph) - graph
I am just a newbie to R and Statistics. I desperately needs help. Here are my following questions:
I attempted to solve Q1, but I am not sure whether it is correct. But for Q2, I attempted to try too...but I truly have no clue for that.
**Q1. What is the trend of CES-D score accorss four waves? Whether the trends differ by gender and by age group. Plot a graph indicating this?
**Q2. Conduct a multilevel model to examine the basline factors associated with the trend (intercept and slope) of CES-D score. **
These are the column names Baseline Factors
Q1 Attempts
# subset data
DataCleanedSelected <- H_CHARLS_D_DataCleaned[c("id", "ragender", "r1agey", "raeduc_c", "r1shlt",
"r1dressa", "r1batha", "r1eata", "r1beda",
"r1toilta", "r1urina", "r1cesd10", "r2cesd10",
"r3cesd10", "r4cesd10", "h1rural")]
CESDSelected <- H_CHARLS_D_DataCleaned[c("id","ragender", "r1agey", "r1cesd10", "r2cesd10",
"r3cesd10", "r4cesd10")]
#Check dataset
head(CESDSelected)
names(CESDSelected)
#Attach dataset
attach(CESDSelected)
#Data preparation - Wide format
head(CESDSelected)
# Gather columns x4 into long format
# Convert id and time into factor variables
CHARLS.format <- gather(CESDSelected, time, cesd,
r1cesd10:r4cesd10, factor_key = TRUE)
CHARLS.format
# | Summary statistics
CHARLS.format %>%
group_by(time) %>%
get_summary_stats(cesd, type = "mean_sd")
ind.anova <- aov(cesd ~ time, data=CHARLS.format)
summary(ind.anova)
rm.anova <- aov(cesd ~ time + Error(id/time),
data = CHARLS.format)
summary(rm.anova)
# Visualise the Result#
time <- c("W1","W2","W3","W4")
mean.CESD <- c(mean(r1cesd10, na.rm = TRUE),mean(r2cesd10, na.rm = TRUE),
mean(r3cesd10, na.rm = TRUE),mean(r4cesd10, na.rm = TRUE))
se.CESD <- c(sd(r1cesd10, na.rm = TRUE), sd(r2cesd10, na.rm = TRUE),
sd(r3cesd10, na.rm = TRUE),sd(r4cesd10,na.rm = TRUE))/sqrt(length(id))
CESD.means.frame <- data.frame(time,mean.CESD,se.CESD)
CESD.means.frame
dev.off()
ggplot(CESD.means.frame,aes(x=time,y=mean.CESD)) +
geom_point(size=3)+
geom_errorbar(aes(ymin=mean.CESD-se.CESD,
ymax=mean.CESD+se.CESD),width=.2)
contrasts(CHARLS.format$time) <- matrix(c(-3,-1,1,3,1,-1,
-1,1,-1,3,-3,1), 4, 3)
rm.anova <- aov(cesd ~ time + Error(id/time),
data=CHARLS.format,
contrasts = contrasts(CHARLS.format$time))
summary(rm.anova,split=list(Time=list("Linear" =1,
"Quadratic"=2,"Cubic" =3)))
# Visualisation#
CHARLS.format %>%
group_by(time) %>%
identify_outliers(cesd)
# Normality Assumption#
CHARLS.format%>%
group_by(time) %>%
shapiro_test(cesd)
res.aov <- anova_test(data = CHARLS.format, dv = cesd, wid = ID, within = time)
get_anova_table(res.aov)
##Post-hoc test - pairwise comparisons##
pwc <- CHARLS.format %>%
pairwise_t_test(
cesd ~ time, paired = TRUE,
p.adjust.method = "bonferroni"
)
pwc
##Visualization: box plots with p-values##
pwc <- pwc %>% add_xy_position(x = "time")
bxp +
stat_pvalue_manual(pwc) +
labs(
subtitle = get_test_label(res.aov, detailed = TRUE),
caption = get_pwc_label(pwc)
)
Q2 Attempts
library(lme4)
library(lattice)
library(Matrix)
##Select those who are female
Female <- CHARLS.format[ragender =2, na.rm =False]
str(Female)
summary(Female)
names(CHARLS.format)
##Null hypothesis##
lmer(cesd ~ 1 + (1 | id), data=CHARLS.format)
##Fixed predictors - Gender - Random intercept##
lmer(cesd ~ ragender + (1 | id), data=CHARLS.format)
model.GenderIntercept <- lmer(cesd ~ ragender + (1 | id), data=CHARLS.format)
plot(model.GenderIntercept, main="Residual Plot of Gender Intercept")
##Fixed predictors - Age - Random intercept##
lmer(cesd ~ r1agey + (1 | id), data=CHARLS.format)
model.AgeIntercept <- lmer(cesd ~ r1agey + (1 | id), data=CHARLS.format)
plot(model.AgeIntercept, main="Residual Plot of Age Intercept")
##Random intercept, Random slope - Gender##
lmer(cesd ~ ragender + (ragender | id), data=CHARLS.format)
model.GenderSlope <- lmer(cesd ~ ragender + (ragender | id), data=CHARLS.format)
plot(model.GenderSlope, main="Residual Plot of Gender Slope")
##Random intercept, Random slope - Age##
lmer(cesd ~ r1agey + (r1agey | id), data=CHARLS.format)
model.AgeSlope <- lmer(cesd ~ r1agey + (r1agey | id), data=CHARLS.format)
plot(model.AgeSlope, main="Residual Plot of Age Slope")
##Random intercept, individual and group level predictors##
lmer(cesd ~ ragender + r1agey + (1 + ragender | id), data=CHARLS.format)
Model.IndiGrp <- lmer(cesd ~ ragender + r1agey + (1 + ragender | id), data=CHARLS.format)
plot(Model.IndiGrp, main="Random intercept, individual and group level predictors")
##Random intercept, cross-level interaction##
lmer(cesd ~ ragender * r1agey + (1 + ragender | id), data=CHARLS.format)
Model.RandomCross <- lmer(cesd ~ ragender + r1agey + (1 + ragender | id), data=CHARLS.format)
plot(Model.RandomCross, main="Random intercept, cross-level interaction")
Related
How to create an aesthetically pleasant Hazard Ratio Chart in R
I was looking at a Youtube video I found online. I'm new to survival analysis. The host mentioned that the second graph was created using a mixture of packages Broom & ggplot2. Any ideas? # Current Code: sigMod = coxph(Surv(time, DEATH_EVENT) ~ age+anaemia+creatinine_phosphokinase+ejection_fraction+ serum_creatinine+hypertension, data=HF) ggforest(sigMod, data = HF) EDIT 1 Added code so far: tidy(sigMod) %>% select(term, estimate) %>% ggplot(aes(x=estimate, y=term)) + geom_boxplot() EDIT 2 My Model's data after using Broom: | Term | Estimate | |---------------------|------------------| | Age | 0.0436065795 | | Anaemia1 | 0.3932590155 | | creatinine_phosphokinase | 0.0001964616 | | ejection_fraction | -0.0517850968 | | serum_creatinine | 0.3483455436 | | hypertensionPresent | 0.4667523759 |
Here's a fully reproducible example of how something like your target plot could be achieved, using the pbc dataset from the survival package. Just swap in your own coxph call at the start: library(survival) library(tidyverse) library(broom) coxph(Surv(time, status) ~ sex + ascites + spiders + hepato + edema, data = pbc) %>% tidy() %>% mutate(upper = estimate + 1.96 * std.error, lower = estimate - 1.96 * std.error) %>% mutate(across(all_of(c("estimate", "lower", "upper")), exp)) %>% ggplot(aes(estimate, term, color = estimate > 1)) + geom_vline(xintercept = 1, color = "gray75") + geom_linerange(aes(xmin = lower, xmax = upper), size = 1.5, alpha = 0.5) + geom_point(size = 4) + theme_minimal(base_size = 16) + scale_color_manual(values = c("green4", "red3"), guide = "none") + xlim(c(0, 5)) + labs(title = "Hazard ratio for various clinical findings in PBC", y = NULL, x = "Hazard ratio estimate (95% Confidence Intervals)") + theme(axis.text.y = element_text(hjust = 0, size = 18))
Plotting Growth Curve with Quadratic Growth
I am trying to see how I can plot quadratic growth in R for a growth curve model I've been running. Model: m1 <- lmer(score ~ Time + Group + Time_Sqaure + (1 + School | Subject), data=df, REML = FALSE) tab_model(m1) Both Time (B = 9.58, p<.01) and Time_Square (B = - 0.51, p <.01) along with Group (B = 2.77, p <.01) differences are significant. If I use plot_model, it gives me the best fit line for each group. plot_model(m1, type = "pred", terms = c("Time", "Group")) Is there a way to plot the fitted curves or quadratic growth that shows the rate of growth slowing over time? Thanks!
For sjPlot::plot_model to understand what is going on, you have to enter Time_Square as I(Time^2) not as a separate predictor. Given that df$Time_Square <- df$Time^2, the following two models should give you the same results: m1 <- lmer(score ~ Time + Group + Time_Square + (1 + School | Subject), data=df, REML = FALSE) m2 <- lmer(score ~ Time + Group + I(Time^2) + (1 + School | Subject), data=df, REML = FALSE) However, in the second model, it is clear that the predictor Time is entered twice and so it can be taken into account when plotting it with sjPlot::plot_model(...). To make sure, I tested it with the following simulated data: library(dplyr) grps <- 2 #number of groups subj <- 100 #number of subjects within group obs <- 10 #number of observations/times per subjects b_0 <- 0 #overall intercept b_1 <- 9.58 #linear time effect b_2 <- -0.51 #quadratic time effect sd_b0 <- 0.4 #SD of random intercept per subject sd_b1 <- 3 #SD of random slope per subject sd_b3 <- 1 #SD of group effect (you can simulate more than 2 groups) sd_resid <- 10 #SD of residuals df <- list(Group = factor(rep(letters[1:grps], each=obs*subj)), Subject = factor(rep(1:subj, times=grps, each=obs)), Time = rep(1:obs, times=subj*grps) ) %>% as.data.frame() df$TimeSq <- df$Time^2 subj_b0 <- rnorm(subj, b_0, sd_b0) %>% rep(times=grps, each=obs) subj_b1 <- rnorm(subj, b_1, sd_b1) %>% rep(times=grps, each=obs) grp_m <- rnorm(grps, 0, sd_b3) %>% rep(times=, each=subj*obs) df$Score <- with(df, subj_b0 + Time*subj_b1 + (Time^2)*b_2 + grp_m + rnorm(grps*subj*obs, 0, sd_resid)) fit1 <- lme4::lmer(Score ~ Time + I(Time^2) + Group + (Time | Subject), data=df) sjPlot::plot_model(fit1, type="pred", terms=c("Time"))
How to add weights parameter to Generalized Mixed Model
How do you add the weight of an observation to a Mixed Model? I thought I could add the Freq column to wt argument, but apparently not. using RDatasets MixedModels titanic = RDatasets.dataset("datasets", "Titanic") titanic.surv_flg = titanic.Survived .== "Yes"; This runs: MixedModels.fit(GeneralizedLinearMixedModel, #formula(surv_flg ~ 1 + Age + Sex + (1 | Class)), titanic, Bernoulli(), nAGQ = 2, fast = true) But this doesn't MixedModels.fit(GeneralizedLinearMixedModel, #formula(surv_flg ~ 1 + Age * Sex + (1 | Class)), titanic, wt = Freq, Bernoulli(), nAGQ = 2, fast = true)
I found this out on another forum. the parameter should be wts not wt. So it should be: MixedModels.fit(GeneralizedLinearMixedModel, #formula(surv_flg ~ 1 + Age * Sex + (1 | Class)), titanic, wts = Freq, Bernoulli(), nAGQ = 2)
How to obtain SEs or CIs for predicted probabilities for Heckman models
Question: How to get SEs (or confidence intervals) for predicted probabilities from Heckman model Reproducible example using Mroz87 dataset contained within sampleSelection package to get predicted probability of lfp (labor force participation) for women with kids and without kids: library(sampleSelection) data("Mroz87") Mroz87$kids <- (Mroz87$kids5 + Mroz87$kids618 > 0) ml1 <- selection(lfp ~ age + I(age^2) + kids + huswage + educ, log(wage) ~ educ + exper + I(exper^2) + city, method = "2step", data = Mroz87) ndata <- data.frame( age = median(Mroz87$age), kids = c(0, 1), huswage = median(Mroz87$huswage), educ = median(Mroz87$educ) ) pred_probs <- cbind(ndata, probs = predict(ml1, ndata, part = "selection", type = "response")) # Predicted prob of lfp (labor force participation): # with kids (kids = 1): 59%; # without kids (kids = 0): 73% # Upper and lower bounds?
Regression in R with grouped variables
The dependent variable Value of the data frame DF is predicted using the independent variables Mean, X, Y in the following way: DF <- DF %>% group_by(Country, Sex) %>% do({ mod = lm(Value ~ Mean + X + Y, data = .) A <- predict(mod, .) data.frame(., A) }) Data are grouped by Country and Sex. So, the fitting formula can be expressed as: Value(Country, Sex) = a0(Country, Sex) + a1(Country, Sex) Mean + a2(Country, Sex) X + a3(Country, Sex) Y However, I want to use this formula: Value(Country, Sex) = a0(Country, Sex) + a1(Country, Sex) Mean + a2(Country) X + a3(Country) Y Where a2 and a3 are independent of Sex. How can I do it?
I don't think you can when grouping by Country and Sex. You could just group by Country and add interactions with Sex: DF <- DF %>% group_by(Country) %>% do({ mod = lm(Value ~ Sex + Mean*Sex + X + Y, data = .) A <- predict(mod, .) data.frame(., A) }) or estimate your model in one go adding interactions with Sex and Country: mod <- lm(Value ~ Sex*Country*Mean + Country*X + Country*Y A <- predict(mod)