I have the following command that I would like to draw a histogram in an ordered manner.
So the code is as follows:
ggplot(upstream, aes(x=type, y=round(..count../sum(..count..) * 100, 2))) + geom_histogram(fill= "red", color = "red") + xlab ("Vehicle Type") +
ylab("Percentage of Vehicles in the Category (%)") + ggtitle ("Percentage of Upstream Vehicles by Type") +
stat_bin(geom="text", aes(label=round(..count../sum(..count..) * 100, 2)), vjust=-0.5)
The output is:
I would like to arrange the bars in an ordered manner, so I use reorder() function in aes, but this gives me the following problem:
stat_bin requires the following missing aesthetics x
How can I use reorder without getting this error? I couldn't seem to be able to figure it out with the posted solutions.
Thanks for suggestions in advance.
EDIT 1: I fixed what I was looking for based on joran's suggestion with geom_bar() as follows in case anyone needs it:
# Reorder the factor you are trying to plot on the x-side (descending manner)
upstream$type <- with(upstream, reorder(type, type, function(x) -length(x)))
# Plotting
ggplot(upstream, aes(x=type, y=round(..count../sum(..count..) * 100, 2))) + geom_bar(fill= "blue", color = "blue") + xlab ("Vehicle Type") +
ylab("Percentage of Vehicles in the Category (%)") + ggtitle ("Percentage of Upstream Vehicles by Type") +
stat_bin(geom="text", aes(label=round(..count../sum(..count..) * 100, 2)), vjust=-0.5)
Here is a reproducible example of the behaviour you are looking for. It is copied from FAQ: How to order the (factor) variables in ggplot2
# sample data.
d <- data.frame(Team1=c("Cowboys", "Giants", "Eagles", "Redskins"), Win=c(20, 13, 9, 12))
# basic layer and options
p <- ggplot(d, aes(y=Win))
# default plot (left panel)
# the variables are alphabetically reordered.
p + geom_bar(aes(x=Team1), stat="identity")
# re-order the levels in the order of appearance in the data.frame
d$Team2 <- factor(d$Team1, as.character(d$Team1))
# plot on the re-ordered variables (Team2)
p + geom_bar(aes(x=Team2), data=d, stat="identity")
Related
i am currently plotting (long format) data which consists of fluorescence (RFU) on the 1. Y-Axis and Growth (OD600) on the 2. Y-Axis. I have managed to create the plots, but i find it very difficult to log transform the 2. Y-axis (for OD600) and not messing up the entire plot. (The data is all derived from the same data frame)
My question is this: Is there any way to log10 transform only the 2. Y-axis (from 0.01-1) and making perhaps 5 breaks something like:("0.01","0.1","0.5","0.1")?
My code looks like this: (i apologize for ugly code)
for (i in 1:length(unique(lf_combined$media)[grepl("^.+(gfp)$",unique(lf_combined$media))])){
print(i)
coeff <- 1/max(lf_combined_test$normalized_gfp)
p1<-lf_combined_test[lf_combined_test$media %in% unique(lf_combined$media)[grepl("^.+(gfp)$",unique(lf_combined$media))][i], ] %>%
# filter(normalized_gfp>0) %>%
filter(row_number() %% 3 == 1) %>%
ggplot( aes(x=time)) +
geom_bar( aes(y=normalized_gfp), stat="identity", size=.1, fill="green", color="green", alpha=.4)+
geom_line( aes(y=od / coeff), size=2, color="tomato") +
scale_x_continuous(breaks = round(seq(0,92, by = 5),1))+
geom_vline(xintercept = 12, linetype="dotted",
color = "blue", size=1)+
scale_y_continuous(limits = c(0,80000),
name = "Relative Flourescence [RFU]/[OD] ",
sec.axis = sec_axis(~.*coeff, name="[OD600]")
) +
scale_y_log10(limits=c(0.01,1))+
theme_grey() +
theme(
axis.title.y = element_text(color = "green", size=13),
axis.title.y.right = element_text(color = "tomato", size=13)
) +
ggtitle(paste("Relative fluorescence & OD600 time series for",unique(lf_combined$media)[grepl("^.+(gfp)$",unique(lf_combined$media))][i],sep=" "))
print(p1)
)
}
Which gives a plots that looks like this for now:
Thank you very much in advance! :))
Yes, this is certainly possible. Without your data set it is difficult to give you specific code, but here is an example using the built-in mtcars data set. We plot a best-fitting line for mpg against an x axis of wt.
p <- ggplot(mtcars, aes(wt, mpg)) + geom_smooth(aes(color = 'mpg'))
p
Suppose we want to draw the value of disp according to a log scale which we will show on the y axis. We need to carry out the log transform of our data to do this, but also multiply it by 10 to get it on a similar visual scale to the mpg line:
p <- p + geom_smooth(aes(y = 10 * log10(disp), color = 'disp'))
p
To draw the secondary axis in, we need to supply it with the reverse transformation of 10 * log10(x), which is 10^(x/10), and we will supply appropriately logarithmic breaks at 10, 100 and 1000
p + scale_y_continuous(
sec.axis = sec_axis(~ 10^(.x/10), breaks = c(10, 100, 1000), name = 'disp'))
It seems that you are generating the values of your line by using od / coeff, and reversing that transform with .*coeff, which seems appropriate, but to get a log10 axis, you will need to do something like log10(od) * constant and reverse it with 10^(od/constant). Without your data, it's impossible to know what this constant should be, but you can play around with different values until it looks right visually.
I'm trying to figure out how to add legends to my R ggplot2 graphs, but clearly I'm not getting the syntax right.
# basic plot layout
ggplot() +
labs(x="random values", y="frequency", title="Examples for F-Test") +
theme_minimal() +
# histogram of distributions
geom_histogram(data=data.frame(random.data.1), aes(x=random.data.1), fill="forestgreen", color="grey", alpha=0.5, binwidth=0.5) +
geom_histogram(data=data.frame(random.data.2), aes(x=random.data.2), fill="orange", color="black", alpha=0.5, binwidth=0.5) +
# manual text annotations
annotate("text", x=10, y=5, label=paste("F-Test p-value =", signif(F.test[[3]], digits=3)), color="firebrick", fontface="bold") +
# add legend?
scale_color_manual(name="Distributions", values=c("grey", "black"))
ggplot2 usually works better if you concatenate your data into long-form columns, as I've done here, with one or more additional columns that indicate the variables or datasets that you want to use to group formatting options. In this case, since you wanted to split by dataset, I just used "1" and "2" for the fake datasets. That column should be a factor (if it's not, then R will assume that the variable is continuous). The command you are specifically looking for is guides(), I think.
Reshaping data can be done easily with either the "reshape2" package or the "tidyr" package. This post compares them.
library(ggplot2)
random.data.1 = runif(10)
random.data.2 = runif(10)
df = data.frame(vals = c(random.data.1,random.data.2))
df$dset<-c(rep(1,10),rep(2,10)) #Indicates the dataset
df$dset<-factor(df$dset)
df
ggplot(data=df,aes(x=vals,color=dset,fill=dset,group=dset)) +
labs(x="random values", y="frequency", title="Examples for F-Test") +
#theme_minimal() +
# histogram of distributions (now you only need one line!)
geom_histogram(position="stack",alpha=0.5, binwidth=0.5) +
# manual text annotations
annotate("text", x=10, y=5, label=paste("F-Test p-value =", signif(F.test[[3]], digits=3)), color="firebrick", fontface="bold") +
# add legend?
#These lines set the colors
scale_color_manual(values=c("grey", "black")) +
scale_fill_manual(values=c("forest green","orange")) +
#and these set the legend manually
guides(color = guide_legend(title = "Distributions")) +
guides(fill=FALSE) #don't show the fill legend
I'm preparing an appendix plot for a revised manuscript where I need to give information of the within-year ranges (variability) of several variables between years and sites.
I figured the tidiest way to do this (I have 7 sites, 21 years, and 5 variables...) would be to use a rose plot using coord_polar. However, I stumbled upon something that has always frustrated me about ggplot - the default ordering assumptions. While factors are easily reordered based on some value, this seems to only work in a fixed fashion: as far as I've understood, the order needs to apply throughout the data frame.
In this plot, the ordering needs to depend on a value which changes between years, and therefore the colour and fill values need to change in plotting order within the panel.
To demonstrate, I've created a reproducible example coded below (pictured in the way it should not work)
Basically, I always need the Site with the minimum value within a given Year to be plotted first (in the centre), followed outwards by the increase in value of the other sites, in order of the original value (see order and diff columns of the data frame). In other words, some years Site a will be at the centre, some years Site c will be in the centre, etc.
Any help would be massively appreciated.
library('ggplot2')
library('reshape2')
library("plyr")
## reproducible example of problem: create dummy data
madeup <- data.frame(Year = rep(2000:2015, each=20), Site=rep(c("a","b","c","d"), each=5, times=16),
var1 = rnorm(n=16*20, mean=20, sd=5), var2= rnorm(n=16*20, mean=50, sd=1))
## create ranges of the data by Year and Site
myRange <- function(dat) {range=max(dat, na.rm=TRUE)-min(dat,na.rm = TRUE)}
vardf <- ddply(madeup, .(Site, Year), summarise, var1=myRange(var1),
var2=myRange(var2))
varmelt <- melt(vardf, id.vars = c("Site","Year"))
varmelt$Site <- as.character(varmelt$Site) # this to preserve the new order when rbind called
varmelt <- by(varmelt, list(varmelt$Year, varmelt$variable), function(x) {x <- x[order(x$value),]
x$order <- 1:nrow(x)
return(x)})
varmelt <- do.call(rbind, varmelt)
## create difference between these values so that each site gets plotted cumulatively on the rose plot
## (otherwise areas close to the centre become uninterpretable)
vartest <- by(varmelt, list(varmelt$Year, varmelt$variable), function(x) {
x$diff <- c(x$value[1], diff(x$value))
return(x)
})
vartest <- do.call(rbind,vartest)
## plot rose plot to display how ranges in variables vary by year and between sites
## for this test example we'll just take one variable, but the idea is to facet by variable
max1 <- max(vartest$value[vartest$variable=='var1'])
yearlength <- length(2000:2015)
ggplot(vartest[vartest$variable=="var1",], aes(x=factor(Year), y=diff)) +
theme_bw() +
geom_hline(yintercept = seq(0,max1, by=1), size=0.3, col="grey60",lty=3) +
geom_vline(xintercept=seq(1,yearlength,1), size=0.3, col='grey30', lty=2) +
geom_bar(stat='identity', width=1, size=0.5, aes(col=Site, fill=Site)) +
scale_x_discrete() +
coord_polar() +
theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank())
As long as you don't use stacked bars (position = "stack", which is the default for geom_bar), ggplot2 will actually use the order of the rows in your data for the plotting order. So all you need to do, is use the original values for the y-axis (rather than the cumulatively differenced ones) along with position = "identity", and order your data from largest to smallest value before plotting:
ordered_data <- vartest[order(-vartest$value), ]
ggplot(ordered_data, aes(factor(Year), value)) +
geom_col(aes(fill = Site), position = "identity", width = 1) +
coord_polar() +
facet_wrap(~ variable)
Created on 2018-02-17 by the reprex package (v0.2.0).
PS. When generating random data for an example, consider using set.seed so that your results can be reproduced exactly.
You can start with a single plot of the largest site, and then layer smaller sites on top like so:
a <- ggplot(vartest[vartest$variable=="var1"& vartest$order==4,], aes(x=factor(Year), y=value,group=order)) +
theme_bw() +
geom_hline(yintercept = seq(0,max1, by=1), size=0.3, col="grey60",lty=3) +
geom_vline(xintercept=seq(1,yearlength,1), size=0.3, col='grey30', lty=2) +
geom_bar(stat='identity', width=1, size=0.5, aes(col=Site, fill=Site)) +
scale_x_discrete() +
coord_polar() +
theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank())
b <- a + geom_bar(data = vartest[vartest$variable=="var1"& vartest$order==3,],
stat='identity', width=1, size=0.5, aes(x=factor(Year), y=value,col=Site, fill=Site))
c <- b + geom_bar(data = vartest[vartest$variable=="var1"& vartest$order==2,],
stat='identity', width=1, size=0.5, aes(x=factor(Year), y=value,col=Site, fill=Site))
c + geom_bar(data = vartest[vartest$variable=="var1"& vartest$order==1,],
stat='identity', width=1, size=0.5, aes(x=factor(Year), y=value,col=Site, fill=Site))
This produces the following:
Is that what you wanted?
I'm trying to add a custom legend to my ggplot, similar to the examples in: http://docs.ggplot2.org/0.9.2.1/scale_gradientn.html
I want the bars in the plot to be colored according to the df$col column and for that reason I'm using scale_fill_manual with values = coloursv.
set.seed(1)
df <- data.frame(log10.p.value = -10*log10(runif(10,0,1)), y = letters[1:10], col = rep("#E0E0FF",10), stringsAsFactors = F)
#specify color by log10.p.value
df$col[which(df$log10.p.value > 2)] <- "#EBCCD6"
df$col[which(df$log10.p.value > 4)] <- "#E09898"
df$col[which(df$log10.p.value > 6)] <- "#C74747"
df$col[which(df$log10.p.value > 8)] <- "#B20000"
#truncate bars
df$log10.p.value[which(df$log10.p.value > 10)] <- 10
coloursv <- df$col
names(coloursv) <- df$col
p <- ggplot(df, aes(y=log10.p.value,x=y,fill=as.factor(col)))+
geom_bar(stat="identity",width=0.2) +
scale_y_continuous(limits=c(0,10)) +
theme(axis.text=element_text(size=10)) +
scale_fill_manual(values = coloursv)+coord_flip()+
scale_fill_gradientn(colours=c("#EBCCD6","#E09898","#C74747","#B20000","#E0E0FF"),
breaks=c(-4,-3,-2,-1,0),guide="colorbar",labels=c(2,4,6,8,10))
And getting nothing:
You're not getting a legend, because you have scale_fill_manual(values = coloursv,guide=F) and guide=F prevents a fill legend from being shown.
On the other hand, scale_color_gradientn sets a color aesthetic, but you don't have a color aesthetic in your plot. You probably meant scale_fill_gradient here (in which case you wouldn't want to also have scale_fill_manual). However, even if you switch this statement to scale_fill_manual, you've set the breaks to be at values that are outside the range of the values in your data (breaks range from -4 to 0, but data ranges from 0.4 to 10).
Adding df$col doesn't set the colors that get plotted. It just creates a categorical variable with different category values in different ranges of log10.p.value. You could have called the category values anything, and ggplot2 has a default color palette that's the same regardless of the category names and depends only on the number of categories. If you want categorical values, you can instead use the cut function as shown below.
Here are a few examples to illustrate various fill options and legends:
# Create log10.p.value categories
df$log10.p.value.cat = cut(df$log10.p.value, seq(0,10,2))
# Fill bars based on log10.p.value.cat
p1=ggplot(df, aes(y=log10.p.value, x=y, fill=log10.p.value.cat)) +
geom_bar(stat="identity", width=0.2) +
scale_y_continuous(limits=c(0,10)) +
theme(axis.text=element_text(size=10)) +
coord_flip()
The plot below is the one in your question with the legend included. Note that the ordering of the colors in scale_fill_manual has to match the order of the corresponding values in log10.p.value.cat in order to get the desired color for each category.
# Fill bars based on log10.p.value.cat with custom colors
p1a=ggplot(df, aes(y=log10.p.value, x=y, fill=log10.p.value.cat)) +
geom_bar(stat="identity", width=0.2) +
scale_y_continuous(limits=c(0,10)) +
theme(axis.text=element_text(size=10)) +
coord_flip() +
scale_fill_manual(values=c("#E0E0FF","#EBCCD6","#E09898","#C74747","#B20000"))
# Continuous fill gradient based on log10.p.value
p2=ggplot(df, aes(y=log10.p.value, x=y, fill=log10.p.value)) +
geom_bar(stat="identity", width=0.2) +
scale_y_continuous(limits=c(0,10)) +
theme(axis.text=element_text(size=10)) +
coord_flip()
# Continuous fill gradient based on log10.p.value with custom colors
p2a=ggplot(df, aes(y=log10.p.value, x=y, fill=log10.p.value)) +
geom_bar(stat="identity", width=0.2) +
scale_y_continuous(limits=c(0,10)) +
theme(axis.text=element_text(size=10)) +
coord_flip() +
scale_fill_gradientn(colours=c("#EBCCD6","#E09898","#C74747","#B20000","#E0E0FF"),
breaks=seq(0,10,2))
I am plotting a forest plot in ggplot2 and am having issues with the ordering of the labels in the legend matching the order of the labels in the data set. Here is my code below.
data code
d<-data.frame(x=c("Co-K(W) N=720", "IH-K(W) N=67", "IF-K(W) N=198", "CO-K(B)N=78", "IH-K(B) N=13", "CO=A(W) N=874","D-Sco Ad(W) N=346","DR-Ad (W) N=892","CE_A(W) N=274","CO-Ad(B) N=66","D-So Ad(B) N=215","DR-Ad(B) N=123","CE-Ad(B) N=79"),
y = rnorm(13, 0, 0.1))
d <- transform(d, ylo = y-1/13, yhi=y+1/13)
d$x <- factor(d$x, levels=rev(d$x)) # reverse ordering
forest plot code
credplot.gg <- function(d){
# d is a data frame with 4 columns
# d$x gives variable names
# d$y gives center point
# d$ylo gives lower limits
# d$yhi gives upper limits
require(ggplot2)
p <- ggplot(d, aes(x=x, y=y, ymin=ylo, ymax=yhi,group=x,colour=x,)) +
geom_pointrange(size=1) +
theme_bw() +
scale_color_discrete(name="Sample") +
coord_flip() +
theme(legend.key=element_rect(fill='cornsilk2')) +
guides(colour = guide_legend(override.aes = list(size=0.5))) +
geom_hline(aes(x=0), colour = 'red', lty=2) +
xlab('Cohort') + ylab('CI') + ggtitle('Forest Plot')
return(p)
}
credplot.gg(d)
This is what I get. As you can see the labels on the y axis matches the labels in the order that it is in the data. However, it is not the same order in the legend. I'm not sure how to correct this. This is my first time creating a plot in ggplot2. Any feedback is well appreciated.Thanks in advanced
Nice plot, especially for a first ggplot! I've not tested, but I think all you need is to add reverse=TRUE inside your colour's guide_legend(found this in the Cookbook for R).
If I were to make one more comment, I'd say that ordering your vertical factor by numeric value often makes comparisons easier when alphabetical order isn't particularly meaningful. (Though maybe your alpha order is meaningful.)