R - Plot 2-way Anova Results on longitudinal data using GGplot2 - r

I am looking to display the results of a two-way Anova analysis over several time points. This is preliminary data, and I am interested in getting an understanding of the potential relationship between time and sex on tumor burden.
My data:
ID Sex Tumor.Burden Time.Point
Cage3 female 1270800 1
Cage3 female 1237600 2
Cage3 female 1288760 3
Cage3 female 775220 4
Cage4 female 1768400 1
Cage4 female 1630200 2
Cage4 female 1606900 3
Cage4 female 1134220 4
Cage5 male 1441500 1
Cage5 male 3000750 2
Cage5 male 5930500 3
Cage5 male 6944225 4
Cage6 male 2063640 1
Cage6 male 7067600 2
Cage6 male 10460400 3
Cage6 male 18764800 4
This is the plot I am using. I'd like to point out that this wasn't made with the data I just listed, but rather with similar data. However, I plan on using the same approach here.
ggplot(Data, aes(x = Time.Point, y = Tumor.Burden, color = Sex)) +
geom_line() +
theme_minimal() +
labs(title = "Weekly Follow-up of Tumor-Bearing Mice", x = "Time points (weeks)", y="Log(Tumor Burden)") +
theme(plot.title = element_text(size = 10, hjust = 0.5))
What is the best approach to add the significance of each time point above to the corresponding time point on the plot? I.E is there a statistically significant difference between males and females at time point 1:5?
Currently, I am following this: https://www.datanovia.com/en/lessons/repeated-measures-anova-in-r/#two-way-repeated-measures-anova. However, I am getting an error at the end and it seems to be related to my ID variable getting flagged as NA when I run
Data %>%
group_by(Time.Point) %>%
anova_test(dv = Tumor.Burden, wid = ID, within = Sex)
Thanks!

To calculate the p values you can use anova_test(Tumor.Burden ~ Sex) and use these output p values in a geom_text to show them in your plot like this:
library(ggplot2)
library(ggpubr)
library(rstatix)
library(dplyr)
p_values <- Data %>%
group_by(Time.Point) %>%
anova_test(Tumor.Burden ~ Sex)
#> Coefficient covariances computed by hccm()
#> Coefficient covariances computed by hccm()
#> Coefficient covariances computed by hccm()
#> Coefficient covariances computed by hccm()
p_values
#> # A tibble: 4 × 8
#> Time.Point Effect DFn DFd F p `p<.05` ges
#> * <int> <chr> <dbl> <dbl> <dbl> <dbl> <chr> <dbl>
#> 1 1 Sex 1 2 0.342 0.618 "" 0.146
#> 2 2 Sex 1 2 3.11 0.22 "" 0.608
#> 3 3 Sex 1 2 8.83 0.097 "" 0.815
#> 4 4 Sex 1 2 4.05 0.182 "" 0.669
ggplot() +
geom_line(Data, mapping = aes(x = Time.Point, y = Tumor.Burden, color = Sex)) +
geom_text(data = p_values, mapping = aes(x = Time.Point, y = 15000000, label = p), size = 3) +
theme_minimal() +
labs(title = "Weekly Follow-up of Tumor-Bearing Mice", x = "Time points (weeks)", y="Log(Tumor Burden)") +
theme(plot.title = element_text(size = 10, hjust = 0.5))
Created on 2022-11-16 with reprex v2.0.2

Related

Visualize Generalized Additive Model (GAM) in R

I want to achieve a GAM plot that looks like this
Image from https://stats.stackexchange.com/questions/179947/statistical-differences-between-two-hourly-patterns/446048#446048
How can I accomplish this?
Model is
model = gam(y ~ s(t) + g, data = d)
The general way to do this is to compute model estimates (fitted values) over the range of the covariate(s) of interest for each group. The reproducible example below illustrates once way to do this using {mgcv} to fit the GAM and my {gratia} package for some helper functions to facilitate the process.
library("gratia")
library("mgcv")
library("ggplot2")
eg_data <- data_sim("eg4", n = 400, dist = "normal", scale = 2, seed = 1)
m <- gam(y ~ s(x2) + fac, data = eg_data, method = "REML")
ds <- data_slice(m, x2 = evenly(x2, n = 100), fac = evenly(fac))
fv <- fitted_values(m, data = ds)
The last line gets you fitted values from the model at the covariate combinations specified in the data slice:
> fv
# A tibble: 300 × 6
x2 fac fitted se lower upper
<dbl> <fct> <dbl> <dbl> <dbl> <dbl>
1 0.00131 1 -1.05 0.559 -2.15 0.0412
2 0.00131 2 -3.35 0.563 -4.45 -2.25
3 0.00131 3 1.13 0.557 0.0395 2.22
4 0.0114 1 -0.849 0.515 -1.86 0.160
5 0.0114 2 -3.14 0.519 -4.16 -2.13
6 0.0114 3 1.34 0.513 0.332 2.34
7 0.0215 1 -0.642 0.474 -1.57 0.287
8 0.0215 2 -2.94 0.480 -3.88 -2.00
9 0.0215 3 1.54 0.473 0.616 2.47
10 0.0316 1 -0.437 0.439 -1.30 0.424
# … with 290 more rows
# ℹ Use `print(n = ...)` to see more rows
This object is in a form suitable for plotting with ggplot():
fv |>
ggplot(aes(x = x2, y = fitted, colour = fac)) +
geom_point(data = eg_data, mapping = aes(y = y), size = 0.5) +
geom_ribbon(aes(x = x2, ymin = lower, ymax = upper, fill = fac,
colour = NULL),
alpha = 0.2) +
geom_line()
which produces
You can enhance and/or modify this using your ggplot skills.
The basic point with this model is that you have a common smooth effect of a covariate (here x2) plus group means (for the factor fac). Hence the curves are "parallel".
Note that there's a lot of variation around the estimated curves in this model because the simulated data are from a richer model with group-specific smooths and smooth effects of other covariates.
gg.bs30 <- ggplot(data,aes(x=Predictor,y=Output,col=class))+geom_point()+
geom_smooth(method='gam',formula=y ~ splines::bs(x, 30)) + facet_grid(class ~.)
print(gg.bs30)
Code from -> https://github.com/mariocastro73/ML2020-2021/blob/master/scripts/gams-with-ggplot-classes.R

Make facet_matrix show density plot for variable with missing values

I want to plot a facet_matrix showing scatter plots and autodensity plots on the diagonal. However, for some reason it does not show the density plot for a certain variable (gini_eurostat). I assume this is because there are some missing values for gini_eurostat. How can I make it show the density plot, even though there are some missing values?
This is the code I used:
ggplot(df_Q2, aes(x = .panel_x, y = .panel_y)) +
geom_autodensity() +
geom_point(alpha = 1, shape = 16, size = 0.5) +
facet_matrix(vars(c(intraEU_trade_bymemberstate_pct, gini_eurostat, exports_currentUSD)),
layer.upper = 2, layer.diag=1, layer.lower = 2) +
theme_few()
The data frame looks like this:
head(df_Q2[,c("intraEU_trade_bymemberstate_pct", "gini_eurostat", "exports_currentUSD")])
# A tibble: 6 × 3
# intraEU_trade_bymemberstate_pct gini_eurostat exports_currentUSD
# <dbl> <dbl> <dbl>
# 1 8.6 NA 96701496330.
# 2 8.8 27.4 116638893905.
# 3 8.8 25.8 141025428359.
# 4 8.4 26.3 153625979356.
# 5 8 25.3 170827273868.
# 6 8.1 26.2 204299603066.

Plotting Cumulative Gains Curve Plot R

I am trying to generate a cumulative gain plot using ggplot2 in R. Basically I want to replicate following using ggplot2.
My Data is this
df
# A tibble: 10 x 6
Decile resp Cumresp Gain Cumlift
<int> <dbl> <dbl> <dbl> <dbl>
1 8301 8301 57.7 5.77
2 2449 10750 74.8 3.74
3 1337 12087 84.0 2.80
4 751 12838 89.3 2.23
5 462 13300 92.5 1.85
6 374 13674 95.1 1.58
7 252 13926 96.8 1.38
8 195 14121 98.2 1.23
9 136 14257 99.1 1.10
10 124 14381 100 1
## Cumulative Gains Plot
ggplot(df, aes(Decile, Gain)) +
geom_point() +
geom_line() +
geom_abline(intercept = 52.3 , slope = 4.77)
scale_y_continuous(breaks = seq(0, 100, by = 20)) +
scale_x_continuous(breaks = c(1:10)) +
labs(title = "Cumulative Gains Plot",
y = "Cumulative Gain %")
However, I am not able to get the diagonal line, even though I tried geom_abline or niether my y-axis is right. I could not start from 0 to 100.
I would really appreciate if someone can get me the plot as in picture using ggplot2.
Thanks in advance
library(dplyr); library(ggplot2)
df2 <- df %>%
add_row(Decile = 0, Gain =0) %>%
arrange(Decile)
ggplot(df2, aes(Decile, Gain)) +
geom_point() +
geom_line() +
# This makes another geom_line that only sees the first and last row of the data
geom_line(data = df2 %>% slice(1, n())) +
scale_y_continuous(breaks = seq(0, 100, by = 20), limits = c(0,100)) +
scale_x_continuous(breaks = c(1:10)) +
labs(title = "Cumulative Gains Plot",
y = "Cumulative Gain %")

Use broom and tidyverse to run regressions on different dependent variables

I'm looking for a Tidyverse / broom solution that can solve this puzzle:
Let's say I have different DVs and a specific set of IVS and I want to perform a regression that considers every DV and this specific set of IVs.
I know I can use something like for i in or apply family, but I really want to run that using tidyverse.
The following code works as an example
ds <- data.frame(income = rnorm(100, mean=1000,sd=200),
happiness = rnorm(100, mean = 6, sd=1),
health = rnorm(100, mean=20, sd = 3),
sex = c(0,1),
faculty = c(0,1,2,3))
mod1 <- lm(income ~ sex + faculty, ds)
mod2 <- lm(happiness ~ sex + faculty, ds)
mod3 <- lm(health ~ sex + faculty, ds)
summary(mod1)
summary(mod2)
summary(mod3)
Income, happiness, and health are DVs. Sex and Faculty are IVs and they will be used for all regressions.
That was the closest I found
Let me know If I need to clarify my question.
Thanks.
As you have different dependent variables but the same independent, you can form a matrix of these and pass to lm.
mod = lm(cbind(income, happiness, health) ~ sex + faculty, ds)
And I think broom::tidy works
library(broom)
tidy(mod)
# response term estimate std.error statistic p.value
# 1 income (Intercept) 1019.35703873 31.0922529 32.7849205 2.779199e-54
# 2 income sex -54.40337314 40.1399258 -1.3553431 1.784559e-01
# 3 income faculty 19.74808081 17.9511206 1.1001030 2.740100e-01
# 4 happiness (Intercept) 5.97334562 0.1675340 35.6545278 1.505026e-57
# 5 happiness sex 0.05345555 0.2162855 0.2471528 8.053124e-01
# 6 happiness faculty -0.02525431 0.0967258 -0.2610918 7.945753e-01
# 7 health (Intercept) 19.76489553 0.5412676 36.5159396 1.741411e-58
# 8 health sex 0.32399380 0.6987735 0.4636607 6.439296e-01
# 9 health faculty 0.10808545 0.3125010 0.3458723 7.301877e-01
Another method is to gather the dependent variables and use a grouped data frame to fit the models with do. This is the method explained in the broom and dplyr vignette.
library(tidyverse)
library(broom)
ds <- data.frame(
income = rnorm(100, mean = 1000, sd = 200),
happiness = rnorm(100, mean = 6, sd = 1),
health = rnorm(100, mean = 20, sd = 3),
sex = c(0, 1),
faculty = c(0, 1, 2, 3)
)
ds %>%
gather(dv_name, dv_value, income:health) %>%
group_by(dv_name) %>%
do(tidy(lm(dv_value ~ sex + faculty, data = .)))
#> # A tibble: 9 x 6
#> # Groups: dv_name [3]
#> dv_name term estimate std.error statistic p.value
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 happiness (Intercept) 6.25 0.191 32.7 3.14e-54
#> 2 happiness sex 0.163 0.246 0.663 5.09e- 1
#> 3 happiness faculty -0.172 0.110 -1.56 1.23e- 1
#> 4 health (Intercept) 20.1 0.524 38.4 1.95e-60
#> 5 health sex 0.616 0.677 0.909 3.65e- 1
#> 6 health faculty -0.653 0.303 -2.16 3.36e- 2
#> 7 income (Intercept) 1085. 32.8 33.0 1.43e-54
#> 8 income sex -12.9 42.4 -0.304 7.62e- 1
#> 9 income faculty -25.1 19.0 -1.32 1.89e- 1
Created on 2018-08-01 by the reprex package (v0.2.0).
We can loop through the column names that are dependent variables, use paste to create the formula to be passed into lm and get the summary statistics with tidy (from broom)
library(tidyverse)
library(broom)
map(names(ds)[1:3], ~
lm(formula(paste0(.x, "~",
paste(names(ds)[4:5], collapse=" + "))), data = ds) %>%
tidy)
If we want it in a single data.frame with a column identifier for dependent variable,
map_df(set_names(names(ds)[1:3]), ~
lm(formula(paste0(.x, "~",
paste(names(ds)[4:5], collapse=" + "))), data = ds) %>%
tidy, .id = "Dep_Variable")

ggplot2 geom_smooth didn't work

I'm plotting two different variables on the same plot.
sex_female is chr, including 0 and 1.
epoch_36:epoch_144 are num, time variables.
Here is my code:
total %>%
select(sex_female, epoch_36:epoch_144)%>%
gather(key = time, value = ac, epoch_36:epoch_144) %>%
group_by(sex_female,time) %>%
mutate(mean = mean(ac)) %>%
ggplot(aes(x = time, y = mean,color = sex_female)) +
geom_point(alpha = .3)+
geom_smooth(method = "lm")+
theme(axis.text.x = element_text(angle = 90,hjust = 1))
After the mutation, I got the tibble:
A tibble: 45,780 x 4
# Groups: sex_female, time [218]
sex_female time ac mean
<chr> <chr> <dbl> <dbl>
1 1 epoch_36 49.8 54.96406
2 0 epoch_36 34.7 55.43448
3 0 epoch_36 70.9 55.43448
4 0 epoch_36 12.3 55.43448
5 1 epoch_36 102.7 54.96406
6 1 epoch_36 77.9 54.96406
7 0 epoch_36 1.1 55.43448
8 1 epoch_36 140.0 54.96406
9 1 epoch_36 51.3 54.96406
10 0 epoch_36 0.0 55.43448
# ... with 45,770 more rows
I've tried using the solution suggested in a similar question: Plot dashed regression line with geom_smooth in ggplot2, but no lines showed up. How do I fix my code to produce lines?
Your time column is categorical and you should transform it into numerical.
mutTibble$time <- as.numeric(mutTibble$time)
And for plotting you can use this:
library(ggplot2)
ggplot(mutTibble,
aes(time, mean, color = factor(sex_female))) +
geom_point(alpha = 0.3)+
geom_smooth(method = "lm")+
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
labs(x = "Time",
y = "Mean"
color = "Gender (female)")

Resources