Suppose I have 2 data frames, one for 2015 and one for 2016. I want to run a regression for each data frame and plot one of the coefficient for each regression with their respective confidence interval. For example:
set.seed(1020022316)
library(dplyr)
library(stargazer)
df16 <- data.frame(
x1 = rnorm(1000, 0, 2),
t = sample(c(0, 1), 1000, T),
e = rnorm(1000, 0, 10)
) %>% mutate(y = 0.5 * x1 + 2 * t + e) %>%
select(-e)
df15 <- data.frame(
x1 = rnorm(1000, 0, 2),
t = sample(c(0, 1), 1000, T),
e = rnorm(1000, 0, 10)
) %>% mutate(y = 0.75 * x1 + 2.5 * t + e) %>%
select(-e)
lm16 <- lm(y ~ x1 + t, data = df16)
lm15 <- lm(y ~ x1 + t, data = df15)
stargazer(lm15, lm16, type="text", style = "aer", ci = TRUE, ci.level = 0.95)
I want to plot t=1.558, x=2015, and t=2.797, x=2016 with their respective .95 CI. What is the best way of doing this?
I could do it 'by hand', but I hope there is a better way.
library(ggplot2)
df.plot <-
data.frame(
y = c(lm15$coefficients[['t']], lm16$coefficients[['t']]),
x = c(2015, 2016),
lb = c(
confint(lm15, 't', level = 0.95)[1],
confint(lm16, 't', level = 0.95)[1]
),
ub = c(
confint(lm15, 't', level = 0.95)[2],
confint(lm16, 't', level = 0.95)[2]
)
)
df.plot %>% ggplot(aes(x, y)) + geom_point() +
geom_errorbar(aes(ymin = lb, ymax = ub), width = 0.1) +
geom_hline(aes(yintercept=0), linetype="dashed")
Best: The figure quality (looks nice), code elegance, easy to expand (more than 2 regressions)
This is a bit too long for a comment, so I post it as a partial answer.
It is unclear from your post if your main problem is to get the data into the right shape, or if it is the plotting itself. But just to follow up on one of the comments, let me show you how to do run several models using dplyr and broom that makes plotting easy. Consider the mtcars-dataset:
library(dplyr)
library(broom)
models <- mtcars %>% group_by(cyl) %>%
do(data.frame(tidy(lm(mpg ~ disp, data = .),conf.int=T )))
head(models) # I have abbreviated the following output a bit
cyl term estimate std.error statistic p.value conf.low conf.high
(dbl) (chr) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl)
4 (Intercept) 40.8720 3.5896 11.39 0.0000012 32.752 48.99221
4 disp -0.1351 0.0332 -4.07 0.0027828 -0.210 -0.06010
6 (Intercept) 19.0820 2.9140 6.55 0.0012440 11.591 26.57264
6 disp 0.0036 0.0156 0.23 0.8259297 -0.036 0.04360
You see that this gives you all coefficients and confidence intervals in one nice dataframe, which makes plotting with ggplot easier. For instance, if your datasets have identical content, you could add a year identifier to them (e.g. df1$year <- 2000; df2$year <- 2001 etc), and bind them together afterwards (e.g. using bind_rows, of you can use bind_rows's .id option). Then you can use the year identifer instead of cyl in the above example.
The plotting then is simple. To use the mtcars data again, let's plot the coefficients for disp only (though you could also use faceting, grouping, etc):
ggplot(filter(models, term=="disp"), aes(x=cyl, y=estimate)) +
geom_point() + geom_errorbar(aes(ymin=conf.low, ymax=conf.high))
To use your data:
df <- bind_rows(df16, df15, .id = "years")
models <- df %>% group_by(years) %>%
do(data.frame(tidy(lm(y ~ x1+t, data = .),conf.int=T ))) %>%
filter(term == "t") %>%
ggplot(aes(x=years, y=estimate)) + geom_point() +
geom_errorbar(aes(ymin=conf.low, ymax=conf.high))
Note that you can easily add more and more models just by binding more and more data to the main dataframe. You can also easily use faceting, grouping or position-dodgeing to adjust the look of the corresponding plot if you want to plot more than one coefficient.
This is the solution I have right now:
gen_df_plot <- function(reg, coef_name){
df <- data.frame(y = reg$coefficients[[coef_name]],
lb = confint(reg, coef_name, level = 0.95)[1],
ub = confint(reg, coef_name, level = 0.95)[2])
return(df)
}
df.plot <- lapply(list(lm15,lm16), gen_df_plot, coef_name = 't')
df.plot <- data.table::rbindlist(df.plot)
df.plot$x <- as.factor(c(2015, 2016))
df.plot %>% ggplot(aes(x, y)) + geom_point(size=4) +
geom_errorbar(aes(ymin = lb, ymax = ub), width = 0.1, linetype="dotted") +
geom_hline(aes(yintercept=0), linetype="dashed") + theme_bw()
I don't love it, but it works.
Here is what might be generalized code. I have made a change to how "x" is defined so that you don't have to worry about alphabetic reordering of the factor.
#
# Paul Gronke and Paul Manson
# Early Voting Information Center at Reed College
#
# August 27, 2019
#
#
# Code to plot a single coefficient from multiple models, provided
# as an easier alternative to "coefplot" and "dotwhisker". Some users
# may find those packages more capable
#
# Code adapted from https://stackoverflow.com/questions/35582052/plot-regression-coefficient-with-confidence-intervals
# gen_df_plot function will create a tidy data frame for your plot
# Currently set up to display 95% confidence intervals
gen_df_plot <- function(reg, coef_name){
df <- data.frame(y = reg$coefficients[[coef_name]],
lb = confint(reg, coef_name, level = 0.95)[1],
ub = confint(reg, coef_name, level = 0.95)[2])
return(df)
}
# Populate the data frame with a list of your model results.
df.plot <- lapply(list(model1, # List your models here
model2),
gen_df_plot,
coef_name = 'x1') # Coefficient name
# Convert the list to a tidy data frame
df.plot <- data.table::rbindlist(df.plot)
# Provide the coefficient or regression labels below, in the
# order that you want them to appear. The "levels=unique(.)" parameter
# overrides R's desire to order the factor alphabetically
df.plot$x <- c("Group 1",
"Group 2") %>%
factor(., levels = unique(.),
ordered = TRUE)
# Create your plot
df.plot %>% ggplot(aes(x, y)) +
geom_point(size=4) +
geom_errorbar(aes(ymin = lb, ymax = ub), width = 0.1, linetype="dotted") +
geom_hline(aes(yintercept=0), linetype="dashed") +
theme_bw() +
ggtitle("Comparing Coefficients") +
ylab("Coefficient Value")```
Related
sample_data = read.table("http://freakonometrics.free.fr/db.txt",
header=TRUE, sep=";")
head(sample_data)
model = glm(Y~0+X1+X2+X3,family=binomial,data=sample_data)
summary(model)
exp(coef(model ))
exp(cbind(OR = coef(model ), confint(model )))
I have the above sample data on logistic regression with categorical predictor
I try the above code i get the following output,
OR 2.5 % 97.5 %
X1 1.67639337 1.352583976 2.09856514
X2 1.23377720 1.071959330 1.42496949
X3A 0.01157565 0.001429430 0.08726854
X3B 0.06627849 0.008011818 0.54419759
X3C 0.01118084 0.001339984 0.08721028
X3D 0.01254032 0.001545240 0.09539880
X3E 0.10654454 0.013141540 0.87369972
but I am wondering how to extract OR and CI only for factors. My
desired output will be:
OR 2.5 % 97.5 %
X3A 0.01157565 0.001429430 0.08726854
X3B 0.06627849 0.008011818 0.54419759
X3C 0.01118084 0.001339984 0.08721028
X3D 0.01254032 0.001545240 0.09539880
X3E 0.10654454 0.013141540 0.87369972
Can any one help me the code to extract it?
additionally I want to plot the above OR with confidence interval for
the extracted one.
Can you also help me the code with plot,or box plot?
You could filter out the rows that are the same as variable names in your data frame, since those row names with factor levels appended will not match:
result <- exp(cbind(OR = coef(model ), confint(model )))
result[!rownames(result) %in% names(sample_data),]
#> OR 2.5 % 97.5 %
#> X3A 0.01157565 0.001429430 0.08726854
#> X3B 0.06627849 0.008011818 0.54419759
#> X3C 0.01118084 0.001339984 0.08721028
#> X3D 0.01254032 0.001545240 0.09539880
#> X3E 0.10654454 0.013141540 0.87369972
To extract the necessary rows and plot them, the full reproducible code would be:
library(tidyverse)
sample_data <- read.table("http://freakonometrics.free.fr/db.txt",
header = TRUE, sep = ";")
model <- glm(Y ~ 0 + X1 + X2 + X3,family = binomial, data = sample_data)
result <- exp(cbind(OR = coef(model), confint(model)))
#> Waiting for profiling to be done...
result %>%
as.data.frame(check.names = FALSE) %>%
rownames_to_column(var = "Variable") %>%
filter(!Variable %in% names(sample_data)) %>%
ggplot(aes(x = OR, y = Variable)) +
geom_vline(xintercept = 1, linetype = 2) +
geom_errorbarh(aes(xmin = `2.5 %`, xmax = `97.5 %`), height = 0.1) +
geom_point(size = 2) +
scale_x_log10(name = "Odds ratio (log scale)") +
theme_minimal(base_size = 16)
Created on 2022-06-14 by the reprex package (v2.0.1)
One possibility, using broom to extract the coefficients, dplyr::filter to select the terms you want, and dwplot to plot.
library(broom)
library(dotwhisker)
library(dplyr)
tt <- (tidy(model, exponentiate = TRUE, conf.int = TRUE)
|> filter(stringr::str_detect(term, "^X3"))
)
dwplot(tt)
In addition, I would suggest:
library(ggplot2)
dwplot(tt) + scale_x_log10() + geom_vline(xintercept = 1, lty = 2) +
labs(x="Odds ratio")
To extract all but the first 2 rows, use a negative index on the rows.
I will also coerce to data.frame and add an id, it will be needed to plot the confidence intervals.
ORCI <- exp(cbind(OR = coef(model), confint(model)))[-(1:2), ]
ORCI <- cbind.data.frame(ORCI, id = row.names(ORCI))
I'm trying to do a power analysis on a clmm2 analysis that I'm doing.
This is the code for the particular statistical model:
test <- clmm2(risk_sensitivity ~ treat + sex + dispersal +
sex*dispersal + treat*dispersal + treat*sex,random = id, data = datasocial, Hess=TRUE)
Now, I have the following function:
sim_experiment_power <- function(rep) {
s <- sim_experiment(n_sample = 1000,
prop_disp = 0.10,
prop_fem = 0.35,
disp_probability = 0.75,
nondisp_probability = 0.90,
fem_probability = 0.75,
mal_probability = 0.90)
broom.mixed::tidy(s) %>%
mutate(rep = rep)
}
my_power <- map_df(1:10, sim_experiment_power)
The details of the function sim_experiment are not relevant because they are working as expected. The important thing to know is that it spits up a statistical clmm2 result. My objective with the function above is to do a power analysis. However, I get the following error:
Error: No tidy method for objects of class clmm2
I'm a bit new to R, but I guess it means that tidy doesn't work with clmm2. Does anyone know a work-around for this issue?
EDIT: This is what follows the code that I posted above, which is ultimately what I'm trying to get.
You can then plot the distribution of estimates across your simulations.
ggplot(my_power, aes(estimate, color = term)) +
geom_density() +
facet_wrap(~term, scales = "free")
You can also just calculate power as the proportion of p-values less than your alpha.
my_power %>%
group_by(term) %>%
summarise(power <- mean(p.value < 0.05))
For what you need, you can write a function to return the coefficients with the same column name:
library(ordinal)
library(dplyr)
library(purrr)
tidy_output_clmm = function(fit){
results = as.data.frame(coefficients(summary(fit)))
colnames(results) = c("estimate","std.error","statistic","p.value")
results %>% tibble::rownames_to_column("term")
}
Then we apply it using an example where I sample the wine dataset in ordinal:
sim_experiment_power <- function(rep) {
idx = sample(nrow(wine),replace=TRUE)
s <- clmm2(rating ~ temp, random=judge, data=wine[idx,], nAGQ=10,Hess=TRUE)
tidy_output_clmm(s) %>% mutate(rep=rep)
}
my_power <- map_df(1:10, sim_experiment_power)
Plotting works:
ggplot(my_power, aes(estimate, color = term)) +
geom_density() +
facet_wrap(~term, scales = "free")
And so does power:
my_power %>% group_by(term) %>% summarise(power = mean(p.value < 0.05))
# A tibble: 5 x 2
term power
<chr> <dbl>
1 1|2 0.9
2 2|3 0.1
3 3|4 1
4 4|5 1
5 tempwarm 1
I have a set of data which I've re-sampled, is there a command that I can use in R to smooth the data first, and only then create the graph from the created data frame?.
My data has a lot of noise, an after I've re-sampled the data, now I want to smooth out the data, I used the geom_smooth to produce a graphic of the data, but the command only creates the graphical representation of the smoothed out data, without giving out the values of the points it represented.
library(ggplot2)
library(dplyr)
library(plotly)
df <- read.csv("data.csv", header = T)
str(df)
rs <- sample_n(df,715)
q <-
ggplot(df,aes(x,y)) +
geom_line() +
geom_smooth(method = "loess", formula = y~log(x), span = 0.05)
This is what I used to smooth out my data, I used loess, formula = y~log(x), span = 0.05 because out of all the smoothing out method I've tried, this is the closest result to what I want which is smoothing with the least errors or differences from the original data. I apologize for not giving a reproducible example, I am not far enough into learning R that I can create a random data, any help is appreciated, thanks in advance.
You can fit a loess model to the data and then use predict to determine the points to plot.
library(tidyverse)
# Generate some noisy data
x <- seq(1,100)
y <- x + rnorm(100, sd = 20)
df <- tibble(x = x, y = y)
# plot with a smooth
df %>%
ggplot(aes(x,y)) +
geom_point() +
geom_smooth(method = "loess")
# Alteratively
m_loess <- loess(y ~ x, df) #fit a loess model
m_loess_pred <- predict(m_loess) # predict for each data point
df <- df %>% # add predictions to data frame for plotting
add_column(m_loess_pred)
df %>% # plot
ggplot(aes(x,m_loess_pred)) +
geom_point()
This answer is based on the data at imgur.com/a/L22T4BN
library(tidyverse)
# I've reproduced a subset of your data
df <- data.frame(Date = c('21/05/2019','21/05/2019','21/05/2019','21/05/2019','21/05/2019','21/05/2019','21/05/2019','21/05/2019','21/05/2019'),
Time24 = c('15:45:22',
'15:18:11',
'15:22:10',
'15:18:38',
'15:40:50',
'15:51:42',
'15:38:29',
'15:20:20',
'15:41:34'
),
MPM25 = c(46, 34, 57, 51, 31, 32,46,33,31))
glimpse(df)
# Variables: 4
# $ Date <fct> 21/05/2019, 21/05/2019, 21/05/2019, 21/05/2019, 21/05/2019, 21/05/2019
# $ Time24 <fct> 15:18:11, 15:18:38, 15:22:10, 15:40:50, 15:45:22, 15:51:42
# $ MPM25 <dbl> 34, 51, 57, 31, 46, 32
# $ datetime <dttm> 2019-05-21 15:18:11, 2019-05-21 15:18:38, 2019-05-21 15:22:10, 2019-0
# Note that the Date and Time24 are factors <fct>
# We can use these values to create a datetime object
# Also note the dates/time are out of order because they some from a random sample
df <- df %>%
mutate(datetime = str_c(as.character(Date), as.character(Time24), sep = ' ')) %>% # join date and time
mutate(datetime = lubridate::dmy_hms(datetime)) %>% # convert to datetime object
mutate(num_datetime = as.numeric(datetime)) %>% # numerical version of datetime required for loess fitting
arrange(datetime) # put times in order
# Take care with the time zone. The function dmy_hms defaults to UTC.
# You may need to use the timezone for your area e.g. for me it would be tz = 'australia/melbourne'
# we can then plot
df %>%
ggplot(aes(x = datetime, y = MPM25)) +
geom_point() +
geom_smooth(span = 0.9) # loess smooth
# fitting a loess
m_loess <- loess(MPM25 ~ num_datetime, data = df, span = 0.9) #fit a loess model
# Create predictions
date_seq <- seq(from = 1558451891, # 100 points from the first to the late datetime
to = 1558453902,
length.out = 100)
m_loess_pred <- predict(m_loess,
newdata = data.frame(num_datetime = date_seq))
# To plot the dates they need to be in POSIXct format
date_seq <- as.POSIXct(date_seq, tz = 'UTC', origin = "1970-01-01")
# Create a dataframe with times and predictions
df_predict <- data.frame(datetime = date_seq, pred = m_loess_pred)
# Plot to show that geom_smooth and the loess predictions are the same
df %>%
ggplot(aes(x = datetime, y = MPM25)) +
geom_point() +
geom_smooth(span = 0.9, se = FALSE) +
geom_point(data = df_predict, aes(x = datetime, y = pred) , colour = 'orange')
Introduction and Current Work Done
[Note: For those interested, I have provided code at the end for reproducing my example.]
I have some data and I have conducted an ANOVA analysis and obtained Tukey's pairwise comparisons:
model1 = aov(trt ~ grp, data = df)
anova(model1)
> TukeyHSD(model1)
diff lwr upr p adj
B-A 0.03481504 -0.40533118 0.4749613 0.9968007
C-A 0.36140489 -0.07874134 0.8015511 0.1448379
D-A 1.53825179 1.09810556 1.9783980 0.0000000
C-B 0.32658985 -0.11355638 0.7667361 0.2166301
D-B 1.50343674 1.06329052 1.9435830 0.0000000
D-C 1.17684690 0.73670067 1.6169931 0.0000000
I can also plot Tukey's pairwise comparisons
> plot(TukeyHSD(model1))
We can see from Tukey's confidence intervals and the plot that A-B, B-C and A-C are not significantly different.
Problem
I have been asked to create something called an "underscore plot" which is described as follows:
We plot the group means on the real line and we draw a line segment between group means to indicate that there is no significant difference between those two particular groups.
Obtaining the means is not difficult:
> aggregate(df$trt ~ df$grp, FUN = mean)
df$grp df$trt
1 A 2.032086
2 B 2.066901
3 C 2.393491
4 D 3.570338
Desired Output
Using the data in this example, the desired plot should appear like the one below:
There is a line segment between the groups that are not significantly different (i.e. a line segment between A-B, B-C and A-C as indicated by Tukey's).
Note: Please note that the plot above is not to scale and it was created in keynote for illustrative purposes only.
Is there a way to get the "underscore plot" described above using R (using either base R or a library such as ggplot2)?
Edit
Here is the code that I used to create the example above:
library(data.table)
set.seed(3)
A = runif(20, 1,3)
A = data.frame(A, rep("A", length(A)))
B = runif(20, 1.25,3.25)
B = data.frame(B, rep("B", length(B)))
C = runif(20, 1.5,3.5)
C = data.frame(C, rep("C", length(C)))
D = runif(20, 2.75,4.25)
D = data.frame(D, rep("D", length(D)))
df = list(A, B, C, D)
df = rbindlist(df)
colnames(df) = c("trt", "grp")
Here's a ggplot version of the underscore plot. We'll load the tidyverse package, which loads ggplot2, dplyr and a few other packages from the tidyverse. We create a data frame of coefficients to plot the group names, coefficient values, and vertical segments and a data frame of non-significant pairs for generating the horizontal underscores.
library(tidyverse)
model1 = aov(trt ~ grp, data=df)
# Get coefficients and label coefficients with names of levels
coefs = coef(model1)
coefs[2:4] = coefs[2:4] + coefs[1]
names(coefs) = levels(model1$model$grp)
# Get non-significant pairs
pairs = TukeyHSD(model1)$grp %>%
as.data.frame() %>%
rownames_to_column(var="pair") %>%
# Keep only non-significant pairs
filter(`p adj` > 0.05) %>%
# Add coefficients to TukeyHSD results
separate(pair, c("pair1","pair2"), sep="-", remove=FALSE) %>%
mutate(start = coefs[match(pair1, names(coefs))],
end = coefs[match(pair2, names(coefs))]) %>%
# Stagger vertical positions of segments
mutate(ypos = seq(-0.03, -0.04, length=3))
# Turn coefs into a data frame
coefs = enframe(coefs, name="grp", value="coef")
ggplot(coefs, aes(x=coef)) +
geom_hline(yintercept=0) +
geom_segment(aes(x=coef, xend=coef), y=0.008, yend=-0.008, colour="blue") +
geom_text(aes(label=grp, y=0.011), size=4, vjust=0) +
geom_text(aes(label=sprintf("%1.2f", coef)), y=-0.01, size=3, angle=-90, hjust=0) +
geom_segment(data=pairs, aes(group=pair, x=start, xend=end, y=ypos, yend=ypos),
colour="red", size=1) +
scale_y_continuous(limits=c(-0.05,0.04)) +
theme_void()
Base R
d1 = data.frame(TukeyHSD(model1)[[1]])
inds = which(sign(d1$lwr) * (d1$upr) <= 0)
non_sig = lapply(strsplit(row.names(d1)[inds], "-"), sort)
d2 = aggregate(df$trt ~ df$grp, FUN=mean)
graphics.off()
windows(width = 400, height = 200)
par("mai" = c(0.2, 0.2, 0.2, 0.2))
plot(d2$`df$trt`, rep(1, NROW(d2)),
xlim = c(min(d2$`df$trt`) - 0.1, max(d2$`df$trt`) + 0.1), lwd = 2,
type = "l",
ann = FALSE, axes = FALSE)
segments(x0 = d2$`df$trt`,
y0 = rep(0.9, NROW(d2)),
x1 = d2$`df$trt`,
y1 = rep(1.1, NROW(d2)),
lwd = 2)
text(x = d2$`df$trt`, y = rep(0.8, NROW(d2)), labels = round(d2$`df$trt`, 2), srt = 90)
text(x = d2$`df$trt`, y = rep(0.75, NROW(d2)), labels = d2$`df$grp`)
lapply(seq_along(non_sig), function(i){
lines(cbind(d2$`df$trt`[match(non_sig[[i]], d2$`df$grp`)], rep(0.9 - 0.01 * i, 2)))
})
I compare categorical data from three different groups.
I wonder if it is possible to easily add p-values of chi-squared tests to facet ggplots (since I am analyzing a big data set). I just read that there is a marvelous way to do so when comparing means https://www.r-bloggers.com/add-p-values-and-significance-levels-to-ggplots/. However, I could not find a solution for other tests (like the chisq.test in my case).
d.test <- data.frame(
results = sample(c("A","B","C"), 30, replace =TRUE),
test = sample(c("test1", "test2","test3"), 30, replace = TRUE)
)
chisq.test(d.test$results,d.test$test)
ggplot(d.test, aes(results) ) +
geom_bar() + facet_grid(test ~ .)
Many thanks for your help! ;D
Store your p-value in a variable
pval <- chisq.test(d.test$results,d.test$test)$p.value
Use annotate to plot text manually
ggplot(d.test, aes(results) ) +
geom_bar() + facet_grid(test ~ .) +
annotate("text", x=1, y=5, label=pval)
Change its positioning with x and y
ggplot(d.test, aes(results) ) +
geom_bar() + facet_grid(test ~ .) +
annotate("text", x=2, y=3, label=pval)
Change significant digits displayed with signif
ggplot(d.test, aes(results) ) +
geom_bar() + facet_grid(test ~ .) +
annotate("text", x=1, y=5, label=signif(pval,4))
Add a 'label' p-value: with
ggplot(d.test, aes(results) ) +
geom_bar() + facet_grid(test ~ .) +
annotate("text", x=1, y=5, label=paste0("p-value: ", signif(pval,4)))
broom has methods to create tidy dataframes of most statistical test outputs. Then you can use that output as a data = argument within geom_text.
Generate data
library(broom)
library(dplyr)
library(ggplot2)
fakedata <-
data.frame(groups = sample(c("pop1", "pop2", "pop3", "pop4"), 120, replace = T),
results = sample(c("A","B","C"), 120, replace = TRUE),
test = sample(c("test1", "test2","test3"), 120, replace = TRUE))
Conduct and tidy tests
fakedata.test <-
fakedata %>%
group_by(groups) %>%
do(fit = chisq.test(.$results, .$test)) %>%
tidy(fit)
# A tibble: 4 x 5
# Groups: groups [4]
groups statistic p.value parameter method
<fctr> <dbl> <dbl> <int> <fctr>
1 pop1 3.714286 0.44605156 4 Pearson's Chi-squared test
2 pop2 2.321429 0.67687042 4 Pearson's Chi-squared test
3 pop3 2.294897 0.68169829 4 Pearson's Chi-squared test
4 pop4 10.949116 0.02714188 4 Pearson's Chi-squared test
Visualize
fakedata %>%
ggplot(aes(results, test)) +
geom_jitter(width = 0.2, height = 0.2, shape = 1, size = 2) +
geom_text(data = fakedata.test,
aes(3, 3.5,
label = paste0("χ²(", parameter, ")=", round(statistic, 2), "; p=", round(p.value, 2))),
hjust = 1) +
facet_wrap(~groups)