data=data.frame("grade"=c(1, 2, 3, 1, 2, 3),
"class"=c('a', 'a', 'a', 'b', 'b', 'b'),
"size"=c(1, 1, 2, 2, 2, 1),
"var"=c('q33', 'q35', 'q39', 'q33', 'q35', 'q39'),
"score"=c(5, 8, 7, 3, 7, 5))
My data have many group variables.
First I want to just plot 'score' by 'grade' with a line
library(reshape2, ggplot2)
ggplot(data, aes(x = grade, y = score)) + geom_line()
It gives a funny graph because I have 'grade' repeated for different classes and sizes.
If I take a subset of my data then the graph looks ok.
ggplot(subset(data, size == 1), aes(x = grade, y = score)) + geom_line()
So I wonder how can I plot my data 'score' by 'grade' for ALL combinations without the graph somehow combining all values?
Here is one approach. You can plot score vs. grade, and use stat_summary to add a line going through mean at each grade, and a ribbon that contains the 95% confidence interval. Is this what you had in mind?
library(ggplot2)
ggplot(data = data, mapping = aes(x = grade, y = score)) +
stat_summary(geom = "line", fun = mean, linetype = "dashed") +
stat_summary(geom = "ribbon", fun.data= mean_cl_normal, fun.args = list(conf.int=0.95), alpha=.1) +
scale_x_continuous(breaks = data$grade)
Plot
Alternatively, you can plot points for mean values at each grade and standard error bars.
library(tidyverse)
data %>%
group_by(grade) %>%
summarise(mean_score = mean(score),
SD = sd(score),
n = n(),
SE = SD/sqrt(n)) %>%
ggplot(mapping = aes(x = grade, y = mean_score)) +
geom_point() +
geom_line() +
geom_errorbar(aes(ymin = mean_score - SE, ymax = mean_score + SE), width = .1) +
scale_x_continuous(breaks = data$grade)
Plot
You could use facet_wrap(~class+size) this will give one plot per combination.
Related
For each treatment tmt, I want to plot the means using stat_summary in ggplot2 with different colour size. I find that the there are mulitple means being plotted over the current points. Not sure how to rectify it.
df <- data.frame(x = rnorm(12, 4,1), y = rnorm(12, 6,4), tmt = rep(c("A","B","C"), each = 4))
ggplot(aes(x = x, y = y, fill = tmt), data = df) +
geom_point(shape=21, size=5, alpha = 0.6) +
scale_fill_manual(values=c("pink","blue", "purple")) +
stat_summary(aes(fill = tmt), fun = 'mean', geom = 'point', size = 5) +
scale_fill_manual(values=c("pink","blue", "purple"))
Plot without the last two lines of code
Plot with the entire code
Using stat_summary you compute the mean of y for each pair of x and tmt. If you want the mean of x and the mean of y per tmt I would suggest to manually compute the means outside of ggplot and use a second geom_point to plot the means. In my code below I increased the size and used rectangles for the means:
df <- data.frame(x = rnorm(12, 4,1), y = rnorm(12, 6,4), tmt = rep(c("A","B","C"), each = 4))
library(ggplot2)
library(dplyr)
df_mean <- df |>
group_by(tmt) |>
summarise(across(c(x, y), mean))
ggplot(aes(x = x, y = y, fill = tmt), data = df) +
geom_point(shape=21, size=5, alpha = 0.6) +
geom_point(data = df_mean, shape=22, size=8, alpha = 0.6) +
scale_fill_manual(values=c("pink","blue", "purple"))
In my dat2 file I have a series of body measurements for various subjects. These body measurements are bicep, tricep, suprailiac, sub scapular....ffm (fat free mass). I have taken these measurements at 0,6,12 weeks. The subjects were on either treatment 'a' or treatment 'b'.
I would like to create multiple graphs for both treatments where Y = Body Measurement and X = Time. I know how to do this for just one variable like "weight".
line <- ggplot(dat2, aes(time, weight))
line + stat_summary(fun.y = mean, geom = "point") +
stat_summary(fun.y = mean, geom = "line", aes(group = 1)) +
stat_summary(fun.data = mean_cl_normal, geom = "errorbar", width = 0.2) +
facet_wrap(~ treatment, nrow = 4) + labs(x = "Time", y = "Weight") +
scale_x_continuous(expand = c(0, 0), breaks = c(0, 6, 12))
I tried the following code to do this for multiple variables and to print multiple graphs.
dat2 %>% select(code:solidfatstotallog, time, bicep, tricep, subscapular, suprailiac, weight, wc, bia, bmi, wthr, X.fat, fm, ffm, height, density)
dat2 %>%
select(code, treatment, time, bicep:ffm)
gather(body_measure, value, -c(code, treatment, time)) %>%
ggplot(aes(time, body_measure)) +
stat_summary(fun.y = mean, geom = "point") +
stat_summary(fun.y = mean, geom = "line", aes(group = 1)) +
stat_summary(fun.data = mean_cl_normal, geom = "errorbar", width = 0.2) +
facet_wrap(~ treatment, nrow = 4) + labs(x = "Time", y = "Body_Measure") +
scale_x_continuous(expand = c(0, 0), breaks = c(0, 6, 12))
However, this just creates one graph of Y = weight and X = Time. With different labels where Y is now body_measure. What am I missing here?
See example picture for Weight by Time. I would like to create multiple such graphs, ideally on a single page.
So I was able to create a facet_grid, but the graph is not nice to look at. I used the following code.
dat2 %>%
select(code, treatment, time, bicep:ffm, -c(density, height)) %>% #drop irrelevant variables
gather(measure, score, -c(code, treatment, time)) %>% #data into long form
ggplot(aes(time, score)) +
stat_summary(fun.y = mean, geom = "point") +
stat_summary(fun.y = mean, geom = "line", aes(group = 1)) +
stat_summary(fun.data = mean_cl_normal, geom = "errorbar", width = 0.2) +
facet_grid(treatment ~ measure) + labs(x = "Time", y = "Body_Measure") +
scale_x_continuous(expand = c(0, 0), breaks = c(0, 6, 12))
I have a plot where there are 4 groups (bars 1 + 2, bars 3 + 4, etc), but there's no indication of this in the dataset. How can I manually add space, add a shared group label (x-axis) and recolour the 4 groups?
Data (melted):
Curent Ggplot code:
xdr<-melt(result)
ggplot(
aes(x = variable, y = value), data = xdr) +
stat_summary(fun.y = "mean", geom = "bar") +
coord_cartesian(ylim=c(0.6,0.85)) +
stat_summary(fun.y = mean, geom = "bar") +
stat_summary(fun.data = mean_se, geom = "errorbar")
I think you'll want to use mutate() to add the groups, in whatever way is appropriate for your data, and then facet_wrap() to make the kind of subgraphs you are talking about.
library(tidyverse)
df <- tribble(
~variable, ~value,
"baseline1", 0.730,
"baseline2", 0.521,
"baseline3", 0.762,
"baseline4", 0.655,
"baseline5", 0.604,
"baseline6", 0.710,
"baseline7", 0.528,
"baseline8", 0.172
)
df %>%
mutate(group = (row_number() + 1) %/% 2,
group = paste("Group", group)) %>%
ggplot(aes(variable, value, fill = group)) +
geom_col(show.legend = FALSE) +
facet_wrap(~group, nrow = 1, scales = "free_x")
When using ggplot2 to make faceted plots, I'm having trouble getting individual labels in each facet when I also specify a grouping parameter. Without specifying group = ..., things work fine, but I'm trying to make plots of paired data that emphasize the before vs. after treatment changes.
Here is an example:
library(tidyr)
library(ggplot2)
set.seed(253)
data <- data.frame(Subject = LETTERS[1:10],
Day1.CompoundA = rnorm(10, 4, 2),
Day2.CompoundA = rnorm(10, 7, 2),
Day1.CompoundB = rnorm(10, 5, 2),
Day2.CompoundB = rnorm(10, 5.5, 2))
# Compare concentration of compounds by day
A <- t.test(data$Day1.CompoundA, data$Day2.CompoundA, paired = TRUE)
B <- t.test(data$Day1.CompoundB, data$Day2.CompoundB, paired = TRUE)
data.long <- gather(data, key = DayCompound, value = Concentration, -Subject) %>%
separate(DayCompound, c("Day", "Compound"))
# text to annotate graphs
graphLabels <- data.frame(Compound = c("CompoundA", "CompoundB"),
Pval = paste("p =", c(signif(A$p.value, 2),
signif(B$p.value, 2))))
Ok, now that the data are set up, I can make a boxplot just fine:
ggplot(data.long, aes(x = Day, y = Concentration)) +
geom_boxplot() +
facet_wrap(~ Compound) +
geom_text(data = graphLabels, aes(x = 1.5, y = 10, label = Pval))
But if I want to show line plots that emphasize the paired nature of the data by showing each subject in a different color, the facet labels don't work.
ggplot(data.long, aes(x = Day, y = Concentration, color = Subject, group = Subject)) +
geom_point() + geom_line() +
facet_wrap(~ Compound) +
geom_text(data = graphLabels, aes(x = 1.5, y = 10, label = Pval))
# Error in eval(expr, envir, enclos) : object 'Subject' not found
Any suggestions?
When you map aesthetics (i.e. aes(...,color = Subject)) in the top level ggplot() call, those mappings are passed on to each layer, which means that each layer expects data to have variables by those names.
You either need to specify the data and mapping separately in each layer, or unmap them explicitly:
ggplot(data.long, aes(x = Day, y = Concentration, color = Subject, group = Subject)) +
geom_point() + geom_line() +
facet_wrap(~ Compound) +
geom_text(data = graphLabels, aes(x = 1.5, y = 10, label = Pval,color = NULL,group= NULL))
There is also an inherit.aes argument that you can set to FALSE in any layer you don't want pulling in those other mappings, e.g.
ggplot(data.long, aes(x = Day, y = Concentration, color = Subject, group = Subject)) +
geom_point() + geom_line() +
facet_wrap(~ Compound) +
geom_text(data = graphLabels, aes(x = 1.5, y = 10, label = Pval),inherit.aes = FALSE)
Is there a way to place horizontal lines with the group means on a plot without creating the summary data set ahead of time? I know this works, but I feel there must be a way to do this with just ggplot2.
library(dplyr)
library(ggplot2)
X <- data_frame(
x = rep(1:5, 3),
y = c(rnorm(5, 5, 0.5),
rnorm(5, 3, 0.3),
rnorm(5, 4, 0.7)),
grp = rep(LETTERS[1:3], each = 5))
X.mean <- X %>%
group_by(grp) %>%
summarize(y = mean(y))
X %>%
ggplot(aes(x = x, y = y, color = grp)) +
geom_point(shape = 19) +
geom_hline(data = X.mean, aes(group = grp, yintercept = y, color = grp)) +
background_grid()
Expanding on my comment:
ggplot(X, aes(x = x, y = y, color = grp)) +
geom_point(shape = 19) +
stat_smooth(method="lm", formula=y~1, se=FALSE)+
theme_bw()
So this applies a linear model with only the constant term, which returns the mean. Credit to this answer for the basic idea.
Edit: Response to OP's very clever suggestion.
It looks like you can use quantile regression to generate the medians!
library(quantreg)
ggplot(X, aes(x = x, y = y, color = grp)) +
geom_point(shape = 19) +
stat_smooth(method="rq", formula=y~1, se=FALSE)+
theme_bw()
The basic requirement for stat_smooth(method=..., ...) is that the method returns an object for which there is a predict(...) method. So here rq(...) returns an rq object and there is a predict.rq(...) method. You can get into trouble using se=TRUE sometimes as not all predict methods return standard errors of the estimates.