Longitudinal analysis of interacted effect nested by group and time with LMER - r

I am conducting analyses of survey data nested by country and year. The respodents surveyed are never the same, but the countries surveyed are repeated.
The data looks something like this, where y is the DV, x is the IV, g is a group variable that I am interested in interacting with the DV x. The data is nested by country co and by year t.
dat <- data.frame(y = c(1, 2, 3, 4, 5, 4, 3, 2, 1, 2, 5, 2, 4, 3, 1),
x = c(1, 6, 3, 9, 3, 6, 4, 4, 9, 2, 8, 2, 5, 3, 7),
g = c(1, 1, 2, 2, 1, 1, 2, 2, 1, 1, 2, 2, 1, 1, 2),
t = c(0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0),
co = c("A", "A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B", "B", "B"))
Basically, I want to conduct a longitudinal analysis where x*g predicts y. Given it's a longitudinal analysis, I think I need to interact the effect with year t, correct? Also, I think I need to control for the random effect and slopes of country co. So this is what I've done:
model1 <- glmer(y ~ x*t*g + (1+x|co) + (1|co), data = dat)
stargazer(model1, type = "text")
===============================================
Dependent variable:
---------------------------
y
-----------------------------------------------
x -0.390
(0.818)
t -3.317
(6.415)
g -1.971
(3.012)
x:t 0.471
(1.218)
x:g 0.379
(0.514)
t:g 1.566
(4.312)
x:t:g -0.359
(0.798)
Constant 5.517
(4.367)
-----------------------------------------------
Observations 15
Log Likelihood -22.627
Akaike Inf. Crit. 71.253
Bayesian Inf. Crit. 80.458
===============================================
Note: *p<0.1; **p<0.05; ***p<0.01
I am not sure if this is the correct way to conduct longitudinal analysis like this, so wanted to ask if someone could confirm or correct this. Thanks.

Related

Gtsummary columns for mixed model p-value and interaction

I have been conducting a cross-over experiment, testing a specific treatment to a group of patients who received treatment "1" and "2" in random order.
I am fairly new to R, and I wish to generate a table with tbl_summary with one column of each treatment effect on various parameters, as well as a column for the p-value from the mixed model analysis (between-group comparison) and a sequence-interaction p-value.
I have calculated the between-period difference in response to treatment within each period by using a mixed model approach with the lme4-package. Then, I compared the treatment response between groups by the estimated marginal means (emmeans).
I have conducted my statistics using the following code:
library(emmeans)
library(lme4)
library(lmerTest)
df <- data.frame (record_id = c(1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9, 10, 10, 11, 11, 12, 12),
treatment = c(1, 2, 2, 1, 2, 1, 2, 1, 2, 1, 1, 2, 2, 1, 2, 1, 1, 2, 1, 2, 1, 2, 1, 2),
treatment_sequence = c(1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1),
treatment_response = c(-43.5, 135.0, 8.4, -7.2, 99.0, 159.0, 12.0, -27.0, 3.0, 12.0, -15.0, 91.5, 6.0, -9.0, 177.0, 27.0, 52.8, -54.0, -50.7, 63.0, -9.0, 186.0, -72.0, 15.0)
)
df
df_mm <- lmer(treatment_response ~ as.factor(treatment)*treatment_sequence + (1|record_id), data=df)
anova(df_mm)
emmeans(df_mm, list(pairwise ~ treatment), adjust = "bonferroni")
This gives the following output:
> df_mm <- lmer(treatment_response ~ as.factor(treatment)*treatment_sequence + (1|record_id), data=df)
> anova(df_mm) ###show model as anova???
Type III Analysis of Variance Table with Satterthwaite's method
Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
as.factor(treatment) 1890.0 1890.0 1 10 0.4575 0.5141
treatment_sequence 832.1 832.1 1 10 0.2014 0.6632
as.factor(treatment):treatment_sequence 7466.0 7466.0 1 10 1.8071 0.2086
> emmeans(df_mm, list(pairwise ~ treatment), adjust = "bonferroni")
NOTE: Results may be misleading due to involvement in interactions
$`emmeans of treatment`
treatment emmean SE df lower.CL upper.CL
1 1.45 19.9 19.7 -40.1 43
2 61.83 19.9 19.7 20.3 103
Results are averaged over the levels of: treatment_sequence
Degrees-of-freedom method: kenward-roger
Confidence level used: 0.95
$`pairwise differences of treatment`
1 estimate SE df t.ratio p.value
treatment1 - treatment2 -60.4 26.2 10 -2.301 0.0442
Results are averaged over the levels of: treatment_sequence
Degrees-of-freedom method: kenward-roger
I would like the between-treatment comparison p-value (0.0442) along with the interaction p-value of 0.21 in the table. My aim is to create a table like this:
I have tried modifying the code from this post (Gtsummary columns for all post hoc pairwise comparisons), but I cannot seem to get it right.
Is this possible? And can someone help with the coding?
Below is a working example. BUT I don't think the emmeans method you're using is correct. If you want to use it, you'll need to update the code to grab the p-value from the emmeans object (it's just a random number for now).
library(gtsummary)
packageVersion("gtsummary")
#> [1] '1.6.1'
df <- data.frame (record_id = c(1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9, 10, 10, 11, 11, 12, 12),
treatment = c(1, 2, 2, 1, 2, 1, 2, 1, 2, 1, 1, 2, 2, 1, 2, 1, 1, 2, 1, 2, 1, 2, 1, 2),
treatment_sequence = c(1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1),
treatment_response = c(-43.5, 135.0, 8.4, -7.2, 99.0, 159.0, 12.0, -27.0, 3.0, 12.0, -15.0, 91.5, 6.0, -9.0, 177.0, 27.0, 52.8, -54.0, -50.7, 63.0, -9.0, 186.0, -72.0, 15.0)
)
mod <- lme4::lmer(treatment_response ~ as.factor(treatment) * treatment_sequence + (1 | record_id), data=df)
tt <- emmeans::emmeans(mod, list(pairwise ~ treatment), adjust = "bonferroni")$`pairwise differences of treatment`
#> NOTE: Results may be misleading due to involvement in interactions
tt |> as.data.frame() |> dplyr::select(dplyr::last_col()) |> dplyr::pull()
#> [1] 0.04419325
car::Anova(mod) %>%
broom::tidy() %>%
dplyr::filter(dplyr::n() == dplyr::row_number()) |>
dplyr::pull(p.value)
#> [1] 0.1788567
my_custom_stats <- function(data, variable, ...) {
formula <-
as.formula(glue::glue(
"{variable} ~ as.factor(treatment) * treatment_sequence + (1 | record_id)"
))
mod <- lme4::lmer(formula, data = data)
# I think this is not appriraite due to the interaction
# but if you're confident about this approach, update pw_difference_p to be from emmeans
pw_difference_p <-
emmeans::emmeans(mod, list(pairwise ~ treatment), adjust = "bonferroni")
pw_difference_p <- runif(1)
interacton_p <-
car::Anova(mod) %>%
broom::tidy() %>%
dplyr::filter(dplyr::n() == dplyr::row_number()) |>
dplyr::pull(p.value)
dplyr::tibble(
pw_difference_p = pw_difference_p,
interacton_p = interacton_p
)
}
df %>%
tbl_summary(
by = treatment,
include = treatment_response,
statistic = all_continuous() ~ "{mean} ± {sd}"
) %>%
add_stat(fns = ~my_custom_stats) %>%
modify_header(interacton_p = "**Interaction P**",
pw_difference_p = "**Treatment P**") %>%
modify_fmt_fun(c(interacton_p, pw_difference_p) ~ style_pvalue) %>%
as_kable()
#> NOTE: Results may be misleading due to involvement in interactions
Characteristic
1, N = 12
2, N = 12
Treatment P
Interaction P
treatment_response
1 ± 60
62 ± 76
0.3
0.2
Created on 2022-08-24 by the reprex package (v2.0.1)

Sum based on range in separate columns

I want to calculate the sum of y along the x-axis. The range for summation is contained in the separate columns xmin and xmax.
df <- data.frame (group = c("A","A","A","A","A","B","B","B","B","B" ),
x = c(1,2,3,4,5,1,2,3,4,5),
y= c(1,2,3,2,1,4,5,6,5,4),
xmin=c(2,2,2,2,2,1,1,1,1,1),
xmax=c(4,4,4,4,4,5,5,5,5,5))
For group A that is a range x from 2 to 4, sum{2+3+2}=7
For group B, range x from 1 to 5 sum{4+5+6+5+4}=24
Is there a way to do it?
I have tried around a bit but I'm not sure if the following goes in the right direction
df %>% rowwise() %>% mutate(sumX=sum(df$y[df$x>=df$min & df$x<=df$max]))
Using between to subset, then just sum in tapply.
subset(df, do.call(data.table::between, c(list(x), list(xmin, xmax)))) |>
with(tapply(y, group, sum))
# A B
# 7 24
Note: R >= 4.1 used.
Data:
df <- structure(list(group = c("A", "A", "A", "A", "A", "B", "B", "B",
"B", "B"), x = c(1, 2, 3, 4, 5, 1, 2, 3, 4, 5), y = c(1, 2, 3,
2, 1, 4, 5, 6, 5, 4), xmin = c(2, 2, 2, 2, 2, 1, 1, 1, 1, 1),
xmax = c(4, 4, 4, 4, 4, 5, 5, 5, 5, 5)), class = "data.frame", row.names = c(NA,
-10L))

Multiple boxplot for different variable using same facet_wrap

I am new to R.
I want to plot 4 box plots for 4 continuous variables and present them in the same plot. I am trying to present the boxplot for each variable in 2 study groups while using facet_wrap in ggplot.
dividing variable is: cognitive_groups (has two values 0, 1)
the 4 variables are: memory (presented here), attention, exeuctive and language domains.
here is the code,
cogdb_bl%>%
filter(!is.na(cognitive_groups))%>%
ggplot(aes(x=memory))+
geom_boxplot(aes(y=""))+
facet_wrap(~cognitive_groups)+
theme_bw()+
coord_flip()+
labs(title="Cognitive domains in baseline groups",
x="Z score")
Here is the output,
How do I present the other variables alongside the memory?
THANKS!
Do you mean like this? A tribble by the way is a nice way to create a minimal sample of data.
library(tidyverse)
tribble(
~participant, ~memory, ~attention, ~language, ~executive, ~cognitive,
"A", 2, 5, 2, 2, 0,
"B", 2, 2, 5, 2, 1,
"C", 2, 2, 2, 2, 0,
"D", 2, 3, 2, 6, 1,
"E", 2, 2, 2, 2, 0,
"F", 2, 2, 8, 2, 0,
"G", 2, 4, 2, 2, 1,
"H", 2, 2, 7, 2, 1
) |>
pivot_longer(c(memory, attention, language, executive),
names_to = "domain", values_to = "score") |>
ggplot(aes(domain, score)) +
geom_boxplot() +
facet_wrap(~cognitive) +
theme_bw() +
coord_flip() +
labs(
title = "Cognitive domains in baseline groups",
y = "Z score"
)
Created on 2022-04-20 by the reprex package (v2.0.1)

How to format table and or chi-square for testing significance among categorical variables

I have this data set where I want to compare variables to see if any groups are significant by various categorical variables. When I use the following code, R returns an warning message saying the p-value may be wrong. Am I formatting the chi-square incorrectly?
df<-(group, gender, race, handedness
1, 0, 3, 0
1, 1, 3, 1
2, 1, 3, 0
2, 1, 3, 0
3, 0, 2, 1
3, 0, 2, 0)
# Is Gender significant among groups
gendertab<-table(df$gender, df$group)
chisq.test(gendertab)
# Is Race significant
racetab<-table(df$race, df$group)
chisq.test(racetab)
# Is Handedness significant
handtab<-table(df$handedness, df$group)
chisq.test(handtab)
Try argument simulate.p.value = TRUE and compare the results to fisher.test (R manual page or Wikipedia). They seem to agree with each other.
# Is Gender significant among groups
gendertab<-table(df$gender, df$group)
fisher.test(gendertab)
chisq.test(gendertab, simulate.p.value = TRUE)
# Is Race significant
racetab<-table(df$race, df$group)
fisher.test(racetab)
chisq.test(racetab, simulate.p.value = TRUE)
# Is Handedness significant
handtab<-table(df$handedness, df$group)
fisher.test(handtab)
chisq.test(handtab, simulate.p.value = TRUE)
Data.
df <- read.csv(text = "
group, gender, race, handedness
1, 0, 3, 0
1, 1, 3, 1
2, 1, 3, 0
2, 1, 3, 0
3, 0, 2, 1
3, 0, 2, 0")

how to plot the results of a LDA

There are quite some answers to this question. Not only on stack overflow but through internet. However, none could solve my problem. I have two problems
I try to simulate a data for you
df <- structure(list(Group = c(1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2,
2, 2, 2), var1 = c(2, 3, 1, 2, 3, 2, 3, 3, 5, 6, 7, 6, 8, 5,
5), var2 = c(9, 9, 9, 8, 7, 8, 9, 3, 2, 2, 1, 1, 2, 3, 3), var3 = c(6,
7, 6, 6, 5, 6, 7, 1, 2, 1, 2, 3, 1, 1, 2)), .Names = c("Group",
"var1", "var2", "var3"), row.names = c(NA, -15L), class = "data.frame")
then I do as follows:
fit <- lda(Group~., data=df)
plot(fit)
I end up with groups appearing in two different plots.
how to plot my results in one figure like e.g. Linear discriminant analysis plot
Linear discriminant analysis plot using ggplot2
or any other beautiful plot ?
The plot() function actually calls plot.lda(), the source code of which you can check by running getAnywhere("plot.lda"). This plot() function does quiet a lot of processing of the LDA object that you pass in before plotting. As a result, if you want to customize how your plots look, you will probably have to write your own function that extracts information from the lda object and then passes it to a plot fuction. Here is an example (I don't know much about LDA, so I just trimmed the source code of the default plot.lda and use ggplot2 package (very flexible) to create a bunch of plots).
#If you don't have ggplot2 package, here is the code to install it and load it
install.packages("ggplot2")
library("ggplot2")
library("MASS")
#this is your code. The only thing I've changed here is the Group labels because you want a character vector instead of numeric labels
df <- structure(list(Group = c("a", "a", "a", "a", "a", "a", "a", "b", "b", "b", "b", "b", "b", "b", "b"),
var1 = c(2, 3, 1, 2, 3, 2, 3, 3, 5, 6, 7, 6, 8, 5, 5),
var2 = c(9, 9, 9, 8, 7, 8, 9, 3, 2, 2, 1, 1, 2, 3, 3),
var3 = c(6, 7, 6, 6, 5, 6, 7, 1, 2, 1, 2, 3, 1, 1, 2)),
.Names = c("Group","var1", "var2", "var3"),
row.names = c(NA, -15L), class = "data.frame")
fit <- lda(Group~., data=df)
#here is the custom function I made that extracts the proper information from the LDA object. You might want to write your own version of this to make sure it works with all cases (all I did here was trim the original plot.lda() function, but I might've deleted some code that might be relevant for other examples)
ggplotLDAPrep <- function(x){
if (!is.null(Terms <- x$terms)) {
data <- model.frame(x)
X <- model.matrix(delete.response(Terms), data)
g <- model.response(data)
xint <- match("(Intercept)", colnames(X), nomatch = 0L)
if (xint > 0L)
X <- X[, -xint, drop = FALSE]
}
means <- colMeans(x$means)
X <- scale(X, center = means, scale = FALSE) %*% x$scaling
rtrn <- as.data.frame(cbind(X,labels=as.character(g)))
rtrn <- data.frame(X,labels=as.character(g))
return(rtrn)
}
fitGraph <- ggplotLDAPrep(fit)
#Here are some examples of using ggplot to display your results. If you like what you see, I suggest to learn more about ggplot2 and then you can easily customize your plots
#this is similar to the result you get when you ran plot(fit)
ggplot(fitGraph, aes(LD1))+geom_histogram()+facet_wrap(~labels, ncol=1)
#Same as previous, but all the groups are on the same graph
ggplot(fitGraph, aes(LD1,fill=labels))+geom_histogram()
The following example won't work with your example because you don't have LD2, but this is equivalent to the scatter plot in the external example you provided. I've loaded that example here as a demo
ldaobject <- lda(Species~., data=iris)
fitGraph <- ggplotLDAPrep(ldaobject)
ggplot(fitGraph, aes(LD1,LD2, color=labels))+geom_point()
I didn't customize ggplot settings much, but you can make your graphs look like anything you want if you play around with it.Hope this helps!

Resources