I made ton of figures with ggplot 0.8.9 (which is what I'm still running). Now I need to modify these figures to include legends. I am running across all sorts of problems that are hard to solve because I am getting really confused about theme and opts and many SO answers that apply to later versions.
At this point, it seems like I need to update ggplot2 and rewrite all of my code just so I can have legends on my figures. Is this true? I've read the ggplot2 transition guide, it makes it seem true.
Here's what the old code looks like (does not produce a legend): And here is the data for the sake of reproducibility: mean10v2 and stderr10.
me10<-read.table("mean10v2.txt", header=TRUE)
se10<-read.table("stderr10.txt", header=TRUE)
ggplot() +
geom_ribbon(aes(x = me10[me10$trt=="CC", "tu"], ymin=(me10[me10$trt=="CC", "biomassA"]-
se10[se10$trt=="CC", "biomassA"]), ymax=(me10[me10$trt=="CC",
"biomassA"]+se10[se10$trt=="CC", "biomassA"])), alpha=0.25) +
geom_line(aes(me10[me10$trt=="CC", "tu"], y=me10[me10$trt=="CC", "biomassA"]), size=1)+
geom_ribbon(aes(x = me10[me10$trt=="PF", "tu"], ymin=(me10[me10$trt=="PF", "biomassA"]-
se10[se10$trt=="PF", "biomassA"]), ymax=(me10[me10$trt=="PF",
"biomassA"]+se10[se10$trt=="PF", "biomassA"])), alpha=0.25) +
geom_line(aes(me10[me10$trt=="PF", "tu"], y=me10[me10$trt=="PF", "biomassA"]),
colour="red2", linetype="dashed", size=1) +
geom_ribbon(aes(x = me10[me10$trt=="P", "tu"], ymin=(me10[me10$trt=="P", "biomassA"]-
se10[se10$trt=="P", "biomassA"]), ymax=(me10[me10$trt=="P",
"biomassA"]+se10[se10$trt=="P", "biomassA"])), alpha=0.25) +
geom_line(aes(me10[me10$trt=="P", "tu"], y=me10[me10$trt=="P", "biomassA"]),
colour="blue3", linetype="dotted", size=1) +
opts(panel.grid.minor = theme_blank()) +
opts(panel.grid.major = theme_blank()) +
opts(panel.background = theme_blank()) +
opts(axis.line = theme_segment()) +
opts(legend.position=c(.5,.5)) +
opts(axis.title.x = theme_text(size=12,vjust=-0.5)) +
opts(axis.title.y = theme_text(size=12,angle=90)) +
opts(axis.text.x = theme_text(colour="black", size=16)) +
opts(axis.text.y = theme_text(colour="black", size=16)) +
annotate("text", x = -Inf, y = Inf, label = "a", face="bold", hjust = -5, vjust=2, size
= 9) +
ylab("") +
xlab("") +
ylim(0,2200)
Updating the theme parts is actually quite simple. You really just need to change opts() to theme() and replace theme_* with element_*. Some other names have changed, like you'll use element_line instead of theme_segment.
But more generally, you're using ggplot all wrong:
my_df <- me10[,c('trt','tu','biomassA')]
my_se <- setNames(se10[,c('trt','tu','biomassA')],c('trt','tu','se'))
my_df <- merge(my_df,my_se)
ggplot(data = my_df,aes(x = tu,y = biomassA)) +
geom_ribbon(aes(group = trt,ymin = biomassA - se,ymax = biomassA + se),alpha = 0.25) +
geom_line(aes(group = trt,linetype = trt,colour = trt)) +
labs(x = "",y = "") +
ylim(0,2200) +
theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.background = element_blank(),
axis.line = element_line(),
legend.position=c(.5,.5),
axis.title.x = element_text(size=12,vjust=-0.5),
axis.title.y = element_text(size=12,angle=90),
axis.text.x = element_text(colour="black", size=16),
axis.text.y = element_text(colour="black", size=16)) +
annotate("text", x = -Inf, y = Inf, label = "a", face="bold", hjust = -5, vjust=2, size = 9)
Notice how much cleaner that is, and putting the data into an appropriate form only took three lines. Also note that there is absolutely no need to keep repeating the opts() or theme() calls for every....single....thing...you....set.
And then if you want to choose specific colors/linetypes for each group, you do that using the scale functions, not by setting them individually:
+ scale_colour_manual(values = c('black','red2','blue3')) +
scale_linetype_manual(values = c('solid','dashed','dotted'))
Related
I have created a R visualisation in Power BI and looking at having only 1 grid line where the horizontal axis value crosses the axis value at 1.
I am not good with words and not sure if I have explained it well in words. Please see the screenshots below to get a better understanding of what I want to achieve.
Any help is greatly appreciated.
First screenshot is from Excel where I was able to do it and I want to replicate the same in the R chart (second screenshot)
library(ggplot2)
ggplot(unique(dataset), aes(x = reorder(Condition, Rate), y = Rate)) +
labs(x = "Condition")+
geom_point(size = 5, stroke = 0, shape = 18, colour="brown") +
geom_point() + geom_line() +
geom_errorbar(aes(ymin = LL, ymax = UL), width=.2, position=position_dodge(.9), colour="brown", alpha=0.6, size=.7) +
theme_bw()+
theme(panel.grid.major = element_blank()) +
theme(axis.text.x = element_text(angle=90, hjust = 1))+
theme(axis.text.x = element_text(size = 10))
Let p is your original ggplot object
step 1: remove the original x axis
p + theme(axis.line.x = element_blank(),
axis.ticks.x = element_blank(),
axis.text.x = element_blank()) +
labs(x = '') -> p1
step 2: add a line at 1
p1 + geom_hline(yintercept = 1, color = "black")
As you have not provided any data, so I am using iris dataset. You can use the following code
library(ggplot2)
ggplot(unique(iris), aes(x = Species, y = Petal.Width)) +
labs(x = "Condition")+
geom_point(size = 5, stroke = 0, shape = 18, colour="brown") +
geom_point() + geom_line() +
theme_bw()+
theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank()) +
theme(axis.text.x = element_text(angle=90, hjust = 1))+
theme(axis.text.x = element_text(size = 10)) +
geom_abline(slope=0, intercept=1, col = "darkblue",lty=1,size = 0.5)
I am not able to increase the font size of the names of the variables in a graphic realized with ggplot.
I tried to include these codes inside ggplot code, but unsuccessfully :
theme(text = element_text(size=20))
theme(axis.text=element_text(size=20))
theme(axis.title=element_text(size=14))
theme_grey(base_size = 20)
geom_text(size=20)
My code is :
library(ggplot2)
library(reshape2)
dataplot <- read.csv("/Documents/R.csv",header=T,sep=";")
dataPlotMelt <- melt(data = dataplot, id.vars = c("variable"),variable.name = "Method",value.name = "SMD")
varNames <- as.character(dataplot$variable)
dataPlotMelt$variable <- factor(dataPlotMelt$variable,levels = varNames)
ggplot(data=dataPlotMelt,mapping=aes(x=variable,y=SMD,group=Method, color=Method))+
ylab("Standardizedmeandifference(%)")+
xlab("") +
geom_point(aes(shape=Method),size=2) +
geom_hline(yintercept=15,color="black",size=0.1,linetype="dashed") +
geom_hline(yintercept=-15,color="black",size=0.1,linetype="dashed") +
coord_flip() +
theme(axis.text.x=element_blank()) +
scale_y_continuous(breaks=c(-65,-15,15,105)) +
theme_bw() +
theme(legend.text=element_text(size=12)) +
theme(legend.title=element_blank(),legend.key=element_blank()) +
scale_colour_manual(values=c("grey","black"))
I'd like to increase the font size of the names of the variables in the graphic and, besides, increase the text "Standardized mean difference (%)" and remove the vertical line between the yintercept and ybreak on both sides
new graphic
Thank you Richard for giving me the solution.
As you suggested I used theme after theme_bw
I managed to suppress the useless vertical lines as well with the command theme(panel.grid.minor = element_blank())
Here is the new code for ggplot :
ggplot(data = dataPlotMelt, mapping = aes(x = variable, y = SMD,group = Method,
color = Method)) +
ylab("Standardized mean difference (%)") + xlab("") +
geom_point(aes(shape = Method),size=2) +
geom_hline(yintercept = 15, color = "black", size = 0.1, linetype = "dashed") +
geom_hline(yintercept = -15, color = "black", size = 0.1, linetype = "dashed") +
coord_flip() +
theme(axis.text.x = element_blank()) +
scale_y_continuous(breaks=c(-65,-15,0,15,105)) +
theme_bw() + theme(legend.text = element_text(size=13)) +
scale_colour_manual(values= c("grey","black")) +
theme(axis.text.y = element_text(size=12)) +
theme(axis.title.x = element_text(size=13)) +
theme(panel.grid.minor = element_blank()) +
theme(legend.title = element_blank(), legend.key=element_blank())
I recently updated ggplot2 package and running into major issues drawing horizontal lines for averages per group using facets.
I believe this post is no longer valid?
I am creating a time series graph using the following code:
ggplot(p2p_dt_SKILL_A,aes(x=Date,y=Prod_DL)) +
geom_line(aes(colour="red"),lwd=1.3) +
geom_smooth() +
geom_line(stat = "hline", yintercept = "mean")+
scale_x_date(labels=date_format("%b-%y"),breaks ="2 month")+
geom_vline(xintercept = as.numeric(p2p_dt_SKILL_A$Date[p2p_dt_SKILL_A$Date=="2015-09-18"]))+
geom_vline(xintercept = as.numeric(p2p_dt_SKILL_A$Date[p2p_dt_SKILL_A$Date=="2015-10-02"]))+
geom_vline(xintercept = as.numeric(p2p_dt_SKILL_A$Date[p2p_dt_SKILL_A$Date=="2015-10-23"]))+
ylab("DL Prod for All Skills")+
ggtitle("BVG1 DL Prod for All Skills 2014-2015")+
theme(axis.title.y = element_text(size = 15,face="bold",color="red"),
plot.title = element_text(size = 15,lineheight = .8,face="bold",color="red"),
axis.title.x = element_blank(),
legend.position="none")+
facet_wrap(~Patch)
The number 1 issue is that I can no longer use the stat = "hline" in the geom_line(stat = "hline", yintercept = "mean") because it gives the following error: Error: No stat called StatHline.
so therefore I changed it to:
ggplot(p2p_dt_SKILL_A,aes(x=Date,y=Prod_DL)) +
geom_line(aes(colour="red"),lwd=1.3) +
geom_smooth() +
geom_hline(yintercept = mean(p2p_dt_SKILL_A$Prod_DL))+
scale_x_date(labels=date_format("%b-%y"),date_breaks ="2 month")+
geom_vline(xintercept = as.numeric(p2p_dt_SKILL_A$Date[p2p_dt_SKILL_A$Date=="2015-09-18"]))+
geom_vline(xintercept = as.numeric(p2p_dt_SKILL_A$Date[p2p_dt_SKILL_A$Date=="2015-10-02"]))+
geom_vline(xintercept = as.numeric(p2p_dt_SKILL_A$Date[p2p_dt_SKILL_A$Date=="2015-10-23"]))+
ylab("DL Prod for All Skills")+
ggtitle("BVG1 DL Prod for All Skills 2014-2015")+
theme(axis.title.y = element_text(size = 15,face="bold",color="red"),
plot.title = element_text(size = 15,lineheight = .8,face="bold",color="red"),
axis.title.x = element_blank(),
legend.position="none")+
facet_wrap(~Patch)
But this doesn't draw the horizontal line at means per Patch. It just takes the overall mean for Prod_DL
See below:
Are there any new ways now to calculate mean per group and draw horizontal lines?
Thanks
UPDATE
Here is what I did:
#first create a dataframe which holds patch and mean values for prod dl, this will then be used in geom_hline()
mean_Prod_DL <- p2p_dt_SKILL_A%>%
group_by(Patch)%>%
summarise(mean_Prod_DL_per_patch = mean(Prod_DL))
ggplot(p2p_dt_SKILL_A,aes(x=Date,y=Prod_DL)) +
scale_x_date(labels=date_format("%b-%y"),date_breaks ="2 months")+
geom_line(aes(colour="red"),lwd=1.3) +
geom_smooth() +
geom_hline(data = mean_Prod_DL,aes(yintercept = mean_Prod_DL_per_patch),lty=2)+
geom_vline(xintercept = as.numeric(p2p_dt_SKILL_A$Date[p2p_dt_SKILL_A$Date=="2015-09-18"]))+
geom_vline(xintercept = as.numeric(p2p_dt_SKILL_A$Date[p2p_dt_SKILL_A$Date=="2015-10-02"]))+
geom_vline(xintercept = as.numeric(p2p_dt_SKILL_A$Date[p2p_dt_SKILL_A$Date=="2015-10-23"]))+
geom_vline(xintercept = as.numeric(p2p_dt_SKILL_A$Date[p2p_dt_SKILL_A$Date=="2015-12-04"]))+
ylab("DL Prod for All Skills")+
ggtitle("BVG1 DL Prod for All Skills 2014-2016")+
theme(axis.title.y = element_text(size = 15,face="bold",color="red"),
plot.title = element_text(size = 15,lineheight = .8,face="bold",color="red"),
axis.title.x = element_blank(),
legend.position="none")+
facet_wrap(~Patch)
I agree with #MLavoie that just calculating the quantity of interest is the simplest solution. Not sure in what way you are looking for something 'better'.
Example:
# sample data
my_df <- data.frame(x=rep(1:100, 4),
y=cumsum(rnorm(400)),
category=rep(letters[1:4], each=100))
# calculate the hline data in one line with data.table
library(data.table)
setDT(my_df)[, cat_mean := mean(y), by=category]
# plot
ggplot(my_df, aes(x=x, y=y, group=category)) +
geom_line(color='red') +
geom_smooth(color='blue') +
geom_hline(aes(yintercept=cat_mean)) +
facet_wrap(~category)
Result:
I can not find a solution for my overplotting problem. If somebody could help me find a solution I would appreciate that a lot.
My data look like this (csv format): http://pastebin.com/embed_js.php?i=Cnfpkjsz
This is the code I am running:
library(dplyr)
library(gdata)
library(ggplot2)
library(directlabels)
all<-read.xls('all_auto_bio_adjusted.xls')
all$station<-as.factor(all$station)
all$automatic<-log(all$automatic)
all$averagebiol<-log(all$averagebiol)
all$stdevbiol<-log(all$stdevbiol)
pd <- position_dodge(.9)
allp<-ggplot(data=all, aes(y=averagebiol, x=automatic, colour=group)) +
geom_errorbar(aes(ymin=averagebiol-stdevbiol, ymax=averagebiol+stdevbiol), colour="red", width=.1, position=pd) +
geom_point(aes(size=size), show_guide = TRUE) +
geom_abline(intercept=0, slope=1) +
stat_smooth(method="loess",se=FALSE,colour='blue') +
geom_dl(aes(label=shortname),method="last.bumpup",cex = 1.3, hjust = 1) +
facet_wrap(~station,nrow=2)+
xlab("auto") +
ylab("manual") +
ggtitle("Comparison of automatic vs manual identification") +
scale_y_continuous(limits=c(0, max(all$averagebiol + all$stdevbiol))) +
theme_bw() +
theme(plot.title = element_text(lineheight=.8, face="bold", size=20,vjust=1), axis.text.x = element_text(colour="grey20",size=15,angle=0,hjust=.5,vjust=.5,face="bold"), axis.text.y = element_text(colour="grey20",size=15,angle=0,hjust=1,vjust=0,face="bold"), axis.title.x = element_text(colour="grey20",size=20,angle=0,hjust=.5,vjust=0,face="bold"), axis.title.y = element_text(colour="grey20",size=20,angle=90,hjust=.5,vjust=1,face="bold"),legend.position="right")
allp
I tried around quite a bit with different geom_dl methods but can't find the right one. Is there one that can plot above the error bars?
If there is no good one for me. What could I do to at least have the labels plotted nicely so that I can rearrange them myself in photoshop?
Thanks a lot for your input!
I'm not familiar with directlabels but if you want to move the labels to the top you could just do it with geom_text():
allp <- ggplot(data = all, aes(y = averagebiol, x = automatic, colour = group)) +
geom_errorbar(aes(ymin = averagebiol - stdevbiol, ymax = averagebiol + stdevbiol),
colour = "red", width = 0.1, position = pd) +
geom_point(aes(size = size), show_guide = TRUE) +
geom_abline(intercept = 0, slope = 1) +
stat_smooth(method = "loess", se = FALSE, colour = "blue") +
facet_wrap(~station, nrow = 2) +
xlab("auto") + ylab("manual") +
ggtitle("Comparison of automatic vs manual identification") +
scale_y_continuous(limits = c(0, max(all$averagebiol + all$stdevbiol)))
allp + geom_text(aes(label = shortname, y = averagebiol + stdevbiol), vjust = -0.1)
However still seems too busy to distinguish between the groups. How about skipping the text labels and facetting over station and group? Here's a possible start, if you like it you would need to tune it...
allp <- ggplot(data = all, aes(y = averagebiol, x = automatic, colour = group)) +
geom_errorbar(aes(ymin = averagebiol - stdevbiol, ymax = averagebiol + stdevbiol),
colour = "grey", width = 0.5, position = pd) +
geom_point(aes(size = size), show_guide = TRUE) +
geom_abline(intercept = 0, slope = 1) +
stat_smooth(method = "loess", se = FALSE, colour = "blue") +
facet_grid(station ~ group) +
xlab("auto") + ylab("manual") +
ggtitle("Comparison of automatic vs manual identification") +
scale_y_continuous(limits = c(0, max(all$averagebiol + all$stdevbiol))) +
theme_minimal()
allp
I hope you wanted somehting like this.
My code:
allp<-ggplot(data=all, aes(y=averagebiol, x=automatic, colour=group)) +
geom_point(aes(size=size), show_guide = TRUE) +
geom_abline(intercept=0, slope=1) +
stat_smooth(method="loess",se=FALSE,colour='blue') +
facet_wrap(~station,nrow=2)+
xlab("auto") +
ylab("manual") +
ggtitle("Comparison of automatic vs manual identification") +
scale_y_continuous(limits=c(0, max(all$averagebiol + all$stdevbiol + 1, na.rm=T))) +
theme_bw() +
theme(plot.title = element_text(lineheight=.8, face="bold", size=20,vjust=1), axis.text.x = element_text(colour="grey20",size=15,angle=0,hjust=.5,vjust=.5,face="bold"), axis.text.y = element_text(colour="grey20",size=15,angle=0,hjust=1,vjust=0,face="bold"), axis.title.x = element_text(colour="grey20",size=20,angle=0,hjust=.5,vjust=0,face="bold"), axis.title.y = element_text(colour="grey20",size=20,angle=90,hjust=.5,vjust=1,face="bold"),legend.position="right")+
geom_errorbar(aes(ymin=averagebiol-stdevbiol, ymax=averagebiol+stdevbiol), colour="red", width=.1, position=pd) +
geom_text(aes(label = shortname, y = averagebiol+stdevbiol), vjust = -.3)
I removed your line of code for the labels and I introduced this one:
geom_text(aes(label = shortname, y = averagebiol+stdevbiol), vjust = -.3)
This is just putting the labels on top of the Error bar, with a small adjustment.
Also I modified this part:
scale_y_continuous(limits=c(0, max(all$averagebiol + all$stdevbiol + 1, na.rm=T)))
With the labels on top, some of the labels where covered by the gray bar, so I increased a bit the value of the X bar (+1), but the max didn't want to work correctly with my little addition, so I had to remove the NA values.
Indeed I ended up faceting by station and group after input from Harrop.
Here is my code (The underlying data changed slightly)
library(dplyr)
library(gdata)
library(ggplot2)
library(directlabels)
all<-read.xls('all_auto_bio_adjusted_c.xls')
all$size.new<-sqrt(all$size.new)
all$station<-as.factor(all$station)
all$group.new<-factor(all$group, levels=c('C. hyperboreus','C. glacialis','Special Calanus','M. longa','Pseudocalanus sp.','Copepoda'))
pd <- position_dodge(w = 50)
allp <- ggplot(data = all, aes(y = averagebiol, x = automatic, colour = group.new, group=group.new)) +
geom_abline(intercept = 0, slope = 1) +
geom_point(aes(size = size.new), show_guide=TRUE, position=pd) +
scale_size_identity()+
geom_errorbar(aes(ymin = averagebiol - stdevbiol, ymax = averagebiol + stdevbiol),colour = "grey", width = 0.1, position=pd) +
facet_grid(group.new~station, scales="free") +
xlab("Automatic identification") + ylab("Manual identification") +
ggtitle("Comparison of automatic vs manual identification") +
theme_bw() +
theme(plot.title = element_text(lineheight=.8, face="bold", size=20,vjust=1), axis.text.x = element_text(colour="grey20",size=15,angle=0,hjust=.5,vjust=.5,face="bold"), axis.text.y = element_text(colour="grey20",size=15,angle=0,hjust=1,vjust=0,face="bold"), axis.title.x = element_text(colour="grey20",size=20,angle=0,hjust=.5,vjust=0,face="bold"), axis.title.y = element_text(colour="grey20",size=20,angle=90,hjust=.5,vjust=1,face="bold"), legend.position="none", strip.text.x = element_text(size = 12, face="bold", colour = "black", angle = 0), strip.text.y = element_text(size = 12, face="bold", colour = "black"))
allp
I need to gather two facet columns into one column with ggplot2.
In the following example, I need to overlay the content of the two columns DEG and RAN into one, while giving different colours to DEG and RAN data (small points and smooth line) and provide the corresponding legend (so I can distinguish them as they are overlayed).
I feel my code is not too, too far from what I need, but the relative complexity of the dataset blocks me. How to go about achieving this in ggplot2?
Here's my code so far:
require(reshape2)
library(ggplot2)
library(RColorBrewer)
fileName = paste("./4.csv", sep = "") # csv file available here: https://www.dropbox.com/s/bm9hd0t5ak74k89/4.csv?dl=0
mydata = read.csv(fileName,sep=",", header=TRUE)
dataM = melt(mydata,c("id"))
dataM = cbind(dataM,colsplit(dataM$variable,pattern = "_",names = c("NM", "ORD", "CAT")))
dataM$variable <- NULL
dataM <- dcast(dataM, ... ~ CAT, value.var = "value")
my_palette <- colorRampPalette(rev(brewer.pal(11, "Spectral")))
ggplot(dataM, aes(x=NR ,y= ASPL)) +
geom_point(size = .4,alpha = .5) +
stat_smooth(se = FALSE, size = .5) +
theme_bw() +
theme(plot.background = element_blank(),
axis.line = element_blank(),
legend.key = element_blank(),
legend.title = element_blank()) +
scale_y_continuous("ASPL", expand=c(0,0), limits = c(1, 7)) +
scale_x_continuous("NR", expand=c(0,0), limits = c(0, 100)) +
theme(legend.position="bottom") +
theme(axis.title.x = element_text(vjust=-0.3, face="bold", size=12)) +
theme(axis.title.y = element_text(vjust=1.5, face="bold", size=12)) +
ggtitle("Title") + theme(plot.title = element_text(lineheight=.8, face="bold")) +
theme(title = element_text(vjust=2)) +
facet_grid(NM ~ ORD)
Here's what it gives me right now:
Extra question: how come DEG/SF doesn't show a smooth line?
You can use the group aesthetic to define that data points with the same value of ORD belong together. You can also map aesthetics shape and color to this variable. You can also use . to specify that the facets are not split along a specific dimension.
I have made the changes to your code below after transforming NR and ASPL to numeric variables:
dataM$NR <- as.integer(dataM$NR)
dataM$ASPL <- as.numeric(dataM$ASPL)
ggplot(dataM, aes(x=NR ,y= ASPL, group=ORD, color=ORD)) +
geom_point(size = .7,alpha = .5, aes(shape=ORD)) + ## increased size
stat_smooth(se = FALSE, size = .5) +
theme_bw() +
theme(plot.background = element_blank(),
axis.line = element_blank(),
legend.key = element_blank(),
legend.title = element_blank()) +
scale_y_continuous("ASPL", expand=c(0,0), limits = c(1, 7)) +
scale_x_continuous("NR", expand=c(0,0), limits = c(0, 100)) +
theme(legend.position="bottom") +
theme(axis.title.x = element_text(vjust=-0.3, face="bold", size=12)) +
theme(axis.title.y = element_text(vjust=1.5, face="bold", size=12)) +
ggtitle("Title") + theme(plot.title = element_text(lineheight=.8, face="bold")) +
theme(title = element_text(vjust=2)) +
facet_grid(NM ~.)