Wrong location of errorbars - r

I am trying to make a plot with multiple lines and with stat_summary defining the mean values. When I apply geom_errorbar(), some of them are placed with a some distance to mean indications, which means, that some of them are 'flying'. What is happening?
Thanks!
My code:
#First I add another data set with SE, SD and mean.
cdata <- ddply(data2, c("OGTT","Treatment"), summarise,
N = sum(!is.na(Glucose)),
mean = mean(Glucose, na.rm=TRUE),
sd = sd(Glucose, na.rm=TRUE),
se = sd / sqrt(N))
#Then I merge it with my original data
totalglu<-merge(data2,cdata)
#Then I make the ggplot
p<-ggplot(data=totalglu, aes(x = factor(OGTT), y = Glucose, group = StudyID, color=StudyID)) +
geom_line() +
facet_grid(End.start ~Treatment)+
stat_summary(aes(group = Treatment), geom = "point", fun.y = mean, shape = 16, size = 2) +
theme(legend.position="none") +
labs(x = "OGTT time points (min)",y= "Glucose (mmol/l)")+
geom_errorbar(aes(ymin=mean-se,ymax=mean+se), width=.1, colour="black")
p

It appears that you are not using End.start when calculating the bars, but it is being used by stat_summary because of the faceting.
Try:
cdata <- ddply(data2, c("OGTT","Treatment","End.start"), summarise,
N = sum(!is.na(Glucose)),
mean = mean(Glucose, na.rm=TRUE),
sd = sd(Glucose, na.rm=TRUE),
se = sd / sqrt(N))
#Then I merge it with my original data
totalglu<-merge(data2,cdata)
#Then I make the ggplot
p<-ggplot(data=totalglu, aes(x = factor(OGTT), y = Glucose, group = StudyID, color=StudyID)) +
geom_line() +
facet_grid(End.start ~Treatment)+
stat_summary(aes(group = Treatment), geom = "point", fun.y = mean, shape = 16, size = 2) +
theme(legend.position="none") +
labs(x = "OGTT time points (min)",y= "Glucose (mmol/l)")+
geom_errorbar(aes(ymin=mean-se,ymax=mean+se), width=.1, colour="black")
p
Though, without the actual starting data, I am not quite sure what data2 looks like, or how ddply is affecting things. Instead, I might suggest skipping making cdata altogether, and just using:
ggplot(data=totalglu, aes(x = factor(OGTT), y = Glucose, group = StudyID, color=StudyID)) +
geom_line() +
facet_grid(End.start ~Treatment)+
stat_summary(aes(group = Treatment), fun.data = mean_cl_normal) +
theme(legend.position="none") +
labs(x = "OGTT time points (min)",y= "Glucose (mmol/l)")

Related

Connect medians with lines in dodged boxplots within subgroups?

In the code below, which uses an interaction of two variables to define a group, why ggplot2 correctly plots separate points for the medians, but complains that the group consists of only one observation when I try to plot lines connecting the medians within each group?
Please how can I do it?
dat <-
data.frame(method = c("A","A","A","A","A","A","A","A","A","A","B","B","B","B","B","B","B","B","B","B"),
var = c("X01","X01","X01","X01","X01","X02","X02","X02","X02","X02","X01","X01","X01","X01","X01","X02","X02","X02","X02","X02"),
value = c(-1.1,0.3,0.3,-0.6,-0.2,0.7,1.2,0.2,0.3,1.8,0.5,1.1,-0.8,0.8,0.6,1.0,0.5,0.6,1.5,1.3))
ggplot(dat, aes(var, value)) +
geom_boxplot(aes(fill = method),
position = position_dodge(width = .9),
outlier.color = NA) +
stat_summary(
fun = median,
geom = "point",
aes(group = interaction(method, var)),
position = position_dodge(width = .9)
) +
stat_summary(
fun = median,
geom = "line",
aes(group = interaction(method, var)),
position = position_dodge(width = .9)
)
geom_path: Each group consists of only one observation. Do you need to adjust
the group aesthetic?
Desired output:
Because ggplot2 thinks the groups are a combination of method and var. It doesn't know that after you've calculated the median, you'd like to drop the method part of the grouping. I don't think there is an elegant way to get the lines as you want.
Below is an inelegant put pragmatic solution: you first calculate the medians, then feed that layer data to a new line geom.
library(ggplot2)
dat <-
data.frame(method = c("A","A","A","A","A","A","A","A","A","A","B","B","B","B","B","B","B","B","B","B"),
var = c("X01","X01","X01","X01","X01","X02","X02","X02","X02","X02","X01","X01","X01","X01","X01","X02","X02","X02","X02","X02"),
value = c(-1.1,0.3,0.3,-0.6,-0.2,0.7,1.2,0.2,0.3,1.8,0.5,1.1,-0.8,0.8,0.6,1.0,0.5,0.6,1.5,1.3))
p <- ggplot(dat, aes(var, value)) +
geom_boxplot(aes(fill = method),
position = position_dodge(width = .9),
outlier.color = NA) +
stat_summary(
fun = median,
geom = "point",
aes(group = interaction(method, var)),
position = position_dodge(width = .9)
)
p + geom_line(
data = layer_data(p, 2),
aes(x = x, group = round(x), y = y)
)
Created on 2022-08-14 by the reprex package (v2.0.1)

Violin plot with confidence interval in r

How can I add a confidence interval to this violin plot?
df <- data.frame("Need" = c(3,4.3,4.5,2.2,5.1,5.2), "Condition" = c("A","A","A","B","B","B"))
ggplot(df,aes(x = Condition, y = Need, fill=Condition)) +
geom_violin() +
stat_summary(fun.data = "mean_cl_boot", geom = "pointrange",
colour = "red") +
ggtitle("Needs by condition violin plot"))
I can't attach pictures yet, but you get the gist. With this code I can create violin plots with standard deviation lines for each violin plot, but I'd add 95% confidence interval lines.
Any ideas?
What you can do is first calculate the error bars per condition and after that add them by using geom_errorbar like this:
library(tidyverse)
stats <- df %>%
group_by(Condition) %>%
summarise(Mean = mean(Need), SD = sd(Need),
CI_L = Mean - (SD * 1.96)/sqrt(6),
CI_U = Mean + (SD * 1.96)/sqrt(6))
ggplot() +
geom_violin(df, mapping = aes(x = Condition, y = Need, fill=Condition)) +
stat_summary(fun.data = "mean_cl_boot", geom = "pointrange",
colour = "red") +
geom_point(stats, mapping = aes(Condition, Mean)) +
geom_errorbar(stats, mapping = aes(x = Condition, ymin = CI_L, ymax = CI_U), width = 0.2) +
ggtitle("Needs by condition violin plot")
Output:

How do you replace the points on a box plot with the point's corresponding row number index?

I have a data frame that looks like this:
Train_Table_Time_Power <- data.frame(
Skew = runif(250),
Crest = runif(250),
Kurt = runif(250),
Impulse = runif(250),
TI = sample(c("0.05", "0.10", "0.15", "0.20"), 10, replace = TRUE)
)
I then created a box with points using the code below:
Crest_BoxPlot <- ggplot(Train_Table_Time_Power, aes(x = TI, y = Crest, color = TI)) +
geom_boxplot(notch = T, id=TRUE) +
stat_summary(fun = mean, geom="point", shape=19, color="red", size=2) +
geom_jitter(shape=16, position = position_jitter(0.2), size = 0.3) +
labs(title = "Crest_Time", x = "TI", y = "Normalized Magnitude") +
theme_minimal() + theme_Publication()
I would like to somehow have the individual points of the boxplot be replaced with their row number index, however, I can't seem to figure out a way how. Could someone direct me on how to do this, if it is indeed possible?
Just use geom_text()instead of geom_jitter(), but be aware that readability is limited due to the overlapping labels.
# add the row number as column
library(tibble)
Train_Table_Time_Power <- rowid_to_column(Train_Table_Time_Power)
ggplot(Train_Table_Time_Power, aes(x = TI, y = Crest, color = TI, label = rowid)) +
geom_boxplot(notch = T, id=TRUE) +
stat_summary(fun = mean, geom="point", shape=19, color="red", size=2) +
geom_text(position = position_jitter(0.2)) +
labs(title = "Crest_Time", x = "TI", y = "Normalized Magnitude") +
theme_minimal()

Plot standard deviation

I want to plot the standard deviation for 1 line (1 flow serie, the plot will have 2) in a plot with lines or smoth areas. I've seen and applied some code from sd representation and other examples... but it's not working for me.
My original data has several flow values for the same day, of which I've calculated the daily mean and sd. I'm stuck here, don't know if it is possible to represent the daily sd with lines from the column created "called sd" or should I use the original data.
The bellow code is a general example of what I'll apply to my data. The flow, flow1 and sd, are examples of the result calculation of daily mean and sd of the original data.
library(gridExtra)
library(ggplot2)
library(grid)
x <- data.frame(
date = seq(as.Date("2012-01-01"),as.Date("2012-12-31"), by="week"),
rain = sample(0:20,53,replace=T),
flow1 = sample(50:150,53,replace=T),
flow = sample(50:200,53,replace=T),
sd = sample (0:10,53, replace=T))
g.top <- ggplot(x, aes(x = date, y = rain, ymin=0, ymax=rain)) +
geom_linerange() +
scale_y_continuous(limits=c(22,0),expand=c(0,0), trans="reverse")+
theme_classic() +
theme(plot.margin = unit(c(5,5,-32,6),units="points"),
axis.title.y = element_text(vjust = 0.3))+
labs(y = "Rain (mm)")
g.bottom <- ggplot(x, aes(x = date)) +
geom_line(aes(y = flow, colour = "flow")) +
geom_line(aes(y = flow1, colour = "flow1")) +
stat_summary(geom="ribbon", fun.ymin="min", fun.ymax="max", aes(fill=sd), alpha=0.3) +
theme_classic() +
theme(plot.margin = unit(c(0,5,1,1),units="points"),legend.position="bottom") +
labs(x = "Date", y = "River flow (m/s)")
grid.arrange(g.top, g.bottom , heights = c(1/5, 4/5))
The above code gives Error: stat_summary requires the following missing aesthetics: y
Other option is geom_smooth, but as far as I could understand it requires some line equation (I can be wrong, I'm new in R).
Something like this maybe?
g.bottom <- x %>%
select(date, flow1, flow, sd) %>%
gather(key, value, c(flow, flow1)) %>%
mutate(min = value - sd, max = value + sd) %>%
ggplot(aes(x = date)) +
geom_ribbon(aes(ymin = min, ymax = max, fill = key)) +
geom_line(aes(y = value, colour = key)) +
scale_fill_manual(values = c("grey", "grey")) +
theme_classic() +
theme(plot.margin = unit(c(0,5,1,1),units="points"),legend.position="bottom") +
labs(x = "Date", y = "River flow (m/s)")

Specifying points on multiple points in R

I've set up a time series line graph for variables of ratios of housing cost and housing income in R but I'm having no success specifying different point symbols for each variable as suggested in other posts. I'm getting the error message “A continuous variable can not be mapped to shape” against the following (simplified for two variables):
ggplot(housing, aes(year)) +
geom_line(aes(y = Greenwich, colour = "Greenwich"))+
geom_point(aes(y = Greenwich, colour = "Greenwich", shape = 1)) +
scale_shape_identity() + #added missing "+"
geom_line(aes(y = median, colour = "median"))+
geom_point(aes(y = median, colour = "median", shape = 2)) + # added missing parenthesis
scale_shape_identity() + # removed extra parenthesis
ylab("house price to earnings (lower quartile)")+
theme(legend.title=element_blank())
Any suggestions most welcome.
You are pretty close:
## toy data
year <- runif(20,10,20)
Greenwich <- runif(20,30,50)
median <- runif(20,30,50)
data<-data.frame(year,Greenwich,median)
## map it
ggplot(data, aes(year)) +
geom_line(aes(y = Greenwich, colour = "Greenwich"))+ scale_shape_identity()+
geom_point(aes(y = Greenwich, colour = "Greenwich",shape = 12,size=8))+
geom_line(aes(y = median, colour = "median")) +
geom_point(aes(y = median, colour = "median",shape = 10,size=8))+
ylab("house price to earnings (lower quartile)")+
theme(legend.title=element_blank())

Resources