I would like to plot the data by subject but adding the errorbar of the total mean and se. I mean, not an error bar for each subject. I've tried geom_errorbar and stat_summary but still failed to get my ideal plot (see the figure I drew).
and here is the code I used to draw this figure (the errorbars are added by hand).
ggplot(ASD, aes(x=period, y=meanF0, group=subject, color=group)) +
geom_line(aes(color=group, size=group)) +
scale_size_manual(values=c(.6, .6, .6, .6)) +
theme_light()+
xlab("Period")+
ylab("F0 (Hz)")+
ggtitle("Mean F0 Adjustment (ASD Group)") +
geom_point()+
scale_color_manual(values=c("red")) +
theme(plot.title = element_text(size=14.5, face="bold", hjust = 0.5, family = "serif"),
axis.title.y= element_text(size=12, face = "bold", family = "serif"),
axis.title.x= element_text(size=12, face = "bold", family = "serif"),
axis.text.x = element_text(size=11, face="bold", family = "serif"),
axis.text.y = element_text(size=11, face="bold", family = "serif"))+
theme(legend.position = "none")+
geom_hline(yintercept=112.8, linetype="dashed",
color = "dark grey", size=.7)
Anyone could help? Thank you very much!!!
Use annotate to add the error bars. I don't have your data, so I created my own. You're going to need the confidence interval and the average for each group. My average-by-group values and confidence interval-by-group are stored in df4$meanV and df4$ci. You can replace these with your variable names. In annotate, you'll include the data frame in the call like you would in base R plots. Like base R, you can just use raw values, as well. Multiple values can be joined with c(). As in y = c(12, 10). If you have any questions, just let me know.
ggplot(df2, aes(x = condition, y = value,
color = subject, group = subject)) +
geom_line() + geom_point() +
annotate("errorbar",
x = df4$condition
ymin = df4$meanV - df4$ci,
ymax = df4$meanV + df4$ci,
width = .2) +
annotate("point",
x = df4$condition,
y = df4$meanV) +
ylim(min(df2$value), max(df2$value))
Related
I am able to make the plot correctly, but I would like to increase the line sizes to make the plot more readable. When I try size inside the geom_line, my lines get super fat.
I have three time series variables (x,y, z) in the dataframe "data", which I want to plot on the y-axis, and they are of different length, meaning the plots start at different time.
How can I change the size of the lines without making them huge?
P_comp <- ggplot(data, aes(x=Date))+
geom_line(aes(y = x, colour = "green"))+
geom_line(aes(y = y, colour = "darkred"))+
geom_line(aes(y = z, colour = "steelblue"))+
theme_ipsum()+
theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank())+
theme(text = element_text(family = "serif"))+
xlab("Time") + ylab("Value") +
ggtitle("EPU Indices")+
theme(plot.title = element_text(hjust = 0.5, family = "serif", face = "plain", size = 16))+
theme(axis.title.x = element_text(hjust = 0.5, family = "serif", size = 12, face = "plain"))+
theme(axis.title.y = element_text(hjust = 0.5, family = "serif", size = 12, face = "plain"))
P_comp
Your code snippet doesn't show it here, but it sounds like you are setting size = 1 inside the aes() statement. This will add a size aesthetic called "1" and automatically assign a size to it.
Try this instead: geom_line(aes(y = x, colour = "green"), size = 1)
The line width can be set with one of the scale_size_* scales. In the examples below I will use scale_size_manual.
The line sizes will be set to one value per level of the categorical variable, "group.
In this first example the line size is set to the values 1:3, making the lines thicker.
library(ggplot2)
ggplot(df1, aes(Date, y, color = group)) +
geom_line(aes(size = group)) +
scale_size_manual(values = 1:3) +
theme_bw()
Now make the lines slimmer. The rest of the plot is the same.
ggplot(df1, aes(Date, y, color = group)) +
geom_line(aes(size = group)) +
scale_size_manual(values = (1:3)/5) +
theme_bw()
Data
df1 <- iris[4:5]
df1$Date <- rep(seq(Sys.Date() - 49, Sys.Date(), by = "day"), 3)
names(df1)[1:2] <- c("y", "group")
if someone could please help me, when I run the code bellow, it works, but there are too many countries in the graph, and the labels overlap so I wanted to:
a) learn any way to "stretch" the yaxis to that the country lines can be a bit more distant from one another, and consequently, the labels won’t overlap,
b) learn how to set a margin inside the graph, so that all the names from all the countries can fit in the graph frame (they are currently being cut).
r <- ggplot(df3, aes(x=year2, y=df_ji, colour=Country)) +
geom_line() +
scale_colour_discrete(guide = 'none') +
geom_dl(aes(label=Country), method =
list(dl.combine("first.points", "last.points"), cex = 1.0)) +
stat_summary(aes(y = df_ji, group=1), fun.y=mean, colour= "blue",
geom="line", alpha = .8, size = 1.5) +
theme_bw(base_size = 18, base_family = "serif") +
labs(title = "Judicial Independence across Latin America", x =
"Year", y = "Judicial Independence")
r + theme(plot.title = element_text(size=20, face="bold"),
axis.title.x = element_text(size=18),
axis.title.y = element_text(size=18))
I recently updated ggplot2 package and running into major issues drawing horizontal lines for averages per group using facets.
I believe this post is no longer valid?
I am creating a time series graph using the following code:
ggplot(p2p_dt_SKILL_A,aes(x=Date,y=Prod_DL)) +
geom_line(aes(colour="red"),lwd=1.3) +
geom_smooth() +
geom_line(stat = "hline", yintercept = "mean")+
scale_x_date(labels=date_format("%b-%y"),breaks ="2 month")+
geom_vline(xintercept = as.numeric(p2p_dt_SKILL_A$Date[p2p_dt_SKILL_A$Date=="2015-09-18"]))+
geom_vline(xintercept = as.numeric(p2p_dt_SKILL_A$Date[p2p_dt_SKILL_A$Date=="2015-10-02"]))+
geom_vline(xintercept = as.numeric(p2p_dt_SKILL_A$Date[p2p_dt_SKILL_A$Date=="2015-10-23"]))+
ylab("DL Prod for All Skills")+
ggtitle("BVG1 DL Prod for All Skills 2014-2015")+
theme(axis.title.y = element_text(size = 15,face="bold",color="red"),
plot.title = element_text(size = 15,lineheight = .8,face="bold",color="red"),
axis.title.x = element_blank(),
legend.position="none")+
facet_wrap(~Patch)
The number 1 issue is that I can no longer use the stat = "hline" in the geom_line(stat = "hline", yintercept = "mean") because it gives the following error: Error: No stat called StatHline.
so therefore I changed it to:
ggplot(p2p_dt_SKILL_A,aes(x=Date,y=Prod_DL)) +
geom_line(aes(colour="red"),lwd=1.3) +
geom_smooth() +
geom_hline(yintercept = mean(p2p_dt_SKILL_A$Prod_DL))+
scale_x_date(labels=date_format("%b-%y"),date_breaks ="2 month")+
geom_vline(xintercept = as.numeric(p2p_dt_SKILL_A$Date[p2p_dt_SKILL_A$Date=="2015-09-18"]))+
geom_vline(xintercept = as.numeric(p2p_dt_SKILL_A$Date[p2p_dt_SKILL_A$Date=="2015-10-02"]))+
geom_vline(xintercept = as.numeric(p2p_dt_SKILL_A$Date[p2p_dt_SKILL_A$Date=="2015-10-23"]))+
ylab("DL Prod for All Skills")+
ggtitle("BVG1 DL Prod for All Skills 2014-2015")+
theme(axis.title.y = element_text(size = 15,face="bold",color="red"),
plot.title = element_text(size = 15,lineheight = .8,face="bold",color="red"),
axis.title.x = element_blank(),
legend.position="none")+
facet_wrap(~Patch)
But this doesn't draw the horizontal line at means per Patch. It just takes the overall mean for Prod_DL
See below:
Are there any new ways now to calculate mean per group and draw horizontal lines?
Thanks
UPDATE
Here is what I did:
#first create a dataframe which holds patch and mean values for prod dl, this will then be used in geom_hline()
mean_Prod_DL <- p2p_dt_SKILL_A%>%
group_by(Patch)%>%
summarise(mean_Prod_DL_per_patch = mean(Prod_DL))
ggplot(p2p_dt_SKILL_A,aes(x=Date,y=Prod_DL)) +
scale_x_date(labels=date_format("%b-%y"),date_breaks ="2 months")+
geom_line(aes(colour="red"),lwd=1.3) +
geom_smooth() +
geom_hline(data = mean_Prod_DL,aes(yintercept = mean_Prod_DL_per_patch),lty=2)+
geom_vline(xintercept = as.numeric(p2p_dt_SKILL_A$Date[p2p_dt_SKILL_A$Date=="2015-09-18"]))+
geom_vline(xintercept = as.numeric(p2p_dt_SKILL_A$Date[p2p_dt_SKILL_A$Date=="2015-10-02"]))+
geom_vline(xintercept = as.numeric(p2p_dt_SKILL_A$Date[p2p_dt_SKILL_A$Date=="2015-10-23"]))+
geom_vline(xintercept = as.numeric(p2p_dt_SKILL_A$Date[p2p_dt_SKILL_A$Date=="2015-12-04"]))+
ylab("DL Prod for All Skills")+
ggtitle("BVG1 DL Prod for All Skills 2014-2016")+
theme(axis.title.y = element_text(size = 15,face="bold",color="red"),
plot.title = element_text(size = 15,lineheight = .8,face="bold",color="red"),
axis.title.x = element_blank(),
legend.position="none")+
facet_wrap(~Patch)
I agree with #MLavoie that just calculating the quantity of interest is the simplest solution. Not sure in what way you are looking for something 'better'.
Example:
# sample data
my_df <- data.frame(x=rep(1:100, 4),
y=cumsum(rnorm(400)),
category=rep(letters[1:4], each=100))
# calculate the hline data in one line with data.table
library(data.table)
setDT(my_df)[, cat_mean := mean(y), by=category]
# plot
ggplot(my_df, aes(x=x, y=y, group=category)) +
geom_line(color='red') +
geom_smooth(color='blue') +
geom_hline(aes(yintercept=cat_mean)) +
facet_wrap(~category)
Result:
I have a data frame as follows:
variable=c("D","D","C","C","C","A","B","B","B","B")
value=c(80,100,70,68,65,45,33,31,36,32)
Count=as.integer(c(5,10,4,5,2,7,3,5,6,2))
mean=c(93.3,93.3,68.2,68.2,68.2,45,33.4,33.4,33.4,33.4)
sumVarVal=data.frame(variable=variable,value=value,Count=Count,mean=mean)
I can make a nice plot (where the size of the square corresponds to the count of observations with that particular x-value and y-value), as shown below:
library(ggplot2)
ggplot(sumVarVal, aes(variable, value)) +
geom_point(aes(size = Count), pch=15) +
guides(fill=guide_legend(title="New")) +
theme(legend.background = element_rect(fill="gray90",
size=.5,
colour = "black"),
legend.text=element_text(size=rel(1.3)),
legend.title=element_text(size=rel(1.3), face="plain"),
legend.position="bottom",
axis.text = element_text(size=rel(1.3)),
axis.title = element_text(size = rel(1.3))) +
labs(x="Learning Outcome", y = "Percentage Grade")
However, I used to have an additional piece of code (at the end of the syntax) that allowed me to superimpose a horizontal bar on each of the four topics, indicating the mean percentage grade. Those values are stored in df$mean. Here is the code I was using:
ggplot(sumVarVal, aes(variable, value)) +
geom_point(aes(size = Count), pch=15) +
guides(fill=guide_legend(title="New")) +
theme(legend.background = element_rect(fill="gray90", size=.5, colour = "black"),
legend.text=element_text(size=rel(1.3)),
legend.title=element_text(size=rel(1.3), face="plain"),
legend.position="bottom",
axis.text = element_text(size=rel(1.3)),
axis.title = element_text(size = rel(1.3))) +
labs(x="Learning Outcome", y = "Percentage Grade") +
geom_errorbar(stat = "hline", width=0.6, colour = "blue", size = 1, aes(ymax=..y..,ymin=..y.., yintercept = mean))
With version 1.0.1, this gives:
With version 2.0.0, it now leads to an error:
Error: no stat called StatHline.
I know this may be connected to recent upgrades in ggplot2. I have seen other recent comments about it (geom_errorbar - "No stat called StatHline"). However, due to my code surrounding the use of stat="hline", when I tried some of these suggestions, I was not able to get my code to work either. Perhaps there is something I do not understand about my original code that is preventing me from being able to update this issue?
EDIT: I have taken into account some of the suggestions, and am currently using this code:
ggplot(sumVarVal, aes(variable, value)) +
geom_point(aes(size = Count), pch=15) +
guides(fill=guide_legend(title="New")) +
theme(legend.background = element_rect(fill="gray90", size=.5, colour = "black"),
legend.text=element_text(size=rel(1.3)),
legend.title=element_text(size=rel(1.3), face="plain"),
legend.position="bottom",
axis.text = element_text(size=rel(1.3)),
axis.title = element_text(size = rel(1.3))) +
labs(x="Learning Outcome", y = "Percentage Grade") +
geom_errorbar(stat = "summary", fun.y = "mean", width=0.6, colour = "blue", size = 1, aes(ymax=..y..,ymin=..y.., yintercept = mean))
This gives me an output that looks like this:
It seems that some of the mean blue lines are not lining up to their values, as given originally in the mean vector. For instance, for variable "D", it should have a mean value of 93.3, but the blue horizontal line seems to be displayed at a value of 90.0. The effect is even more dramatic in my real code (not this MWE). Any ideas what might be causing this discrepancy?
stat_hline got removed in ggplot2 2.0.0, but never fear; it wasn't really necessary anyway. If you remove the stat argument entirely, it will default to identity, which is fine. (summary can work, too, if you prefer.) You need to change the aes mapping, though, changing yintercept to y to account for the new stat.
All together,
ggplot(sumVarVal, aes(variable, value)) +
geom_point(aes(size = Count), pch=15) +
guides(fill=guide_legend(title="New")) +
theme(legend.background = element_rect(fill="gray90", size=.5, colour = "black"),
legend.text=element_text(size=rel(1.3)),
legend.title=element_text(size=rel(1.3), face="plain"),
legend.position="bottom",
axis.text = element_text(size=rel(1.3)),
axis.title = element_text(size = rel(1.3))) +
labs(x="Learning Outcome", y = "Percentage Grade") +
geom_errorbar(width=0.6, colour = "blue", size = 1, aes(ymax=..y.., ymin=..y.., y = mean))
produces
I would like to produce a plot using facet_wrap that has a different y scale for each row of the wrap. In other words, with fixed scales on the same row, free scales on different rows, with a fixed x scale. Free scales doesn't give me exactly what I'm looking for, nor does facet_grid. If possible, I'd like to avoid creating 2 separate plots and then pasting them together. I'm looking for a result like the plot below, but with a y scale max of 300 for the first row, and an y scale max of 50 in the second row. Thanks for any help!
Here is my code:
library(ggplot2)
library(reshape)
# set up data frame
dat <- data.frame(jack = c(150,160,170),
surgeon = c(155,265,175),
snapper = c(10,15,12),
grouper = c(5,12,50))
dat$island<-c("Oahu","Hawaii","Maui")
df<-melt(dat)
# plot
ggplot(df, aes(fill=variable, y=value, x=island)) +
geom_bar(width = 0.85, position= position_dodge(width=0.5),stat="identity", colour="black") +
facet_wrap(~variable, scales = "free_y",ncol=2) +
theme_bw() +
theme(strip.text = element_text(size=15, face="bold"))+
theme(legend.position="none")+
theme(panel.grid.major = element_line(colour = "white", size = 0.2))+
theme(panel.grid.minor = element_line(colour = "white", size = 0.5))+
theme(axis.text.x = element_text(angle = 90, hjust =1, vjust =0.5, size=18))+
labs(y = expression(paste("Yearly catch (kg)")))
Drawing on one of the lower ranked answers from the link Eric commented, you can add a layer that blends into the background to enforce the axes.
Here I created a second data frame (df2) that puts a single point at "Hawaii" and the max value you wanted (300 or 50) for the four variable/fish types. By manually setting the color of the geom_point white, it fades into the background.
library(ggplot2)
library(reshape)
# set up data frame
dat <- data.frame(jack = c(150,160,170),
surgeon = c(155,265,175),
snapper = c(10,15,12),
grouper = c(5,12,50))
dat$island<-c("Oahu","Hawaii","Maui")
df<-melt(dat)
#> Using island as id variables
df2 <- data.frame(island = rep("Hawaii",4), variable = c("jack","surgeon","snapper","grouper"),value = c(300,300,50,50))
ggplot(df, aes(fill=variable, y=value, x=island)) +
geom_bar(width = 0.85, position= position_dodge(width=0.5),stat="identity", colour="black") +
geom_point(data = df2, aes(x = island, y = value), colour = "white") +
facet_wrap(~variable, scales = "free_y",ncol=2) +
theme_bw() +
theme(strip.text = element_text(size=15, face="bold"))+
theme(legend.position="none")+
theme(panel.grid.major = element_line(colour = "white", size = 0.2))+
theme(panel.grid.minor = element_line(colour = "white", size = 0.5))+
theme(axis.text.x = element_text(angle = 90, hjust =1, vjust =0.5, size=18))+
labs(y = expression(paste("Yearly catch (kg)")))