Being new to R, I produced very simple horizontal bar plots using ggplot2 and coord_flip().
Notably, I insert the values of the x variable at the left side of the bar by default (or at the right side if the label does not fit) using the following command:
geom_text(aes(x=TYPE, y=COUNT, ymax=COUNT, label=COUNT,
hjust=ifelse(COUNT>1000, 1.5, -0.3)),
size=3.5, position = position_dodge(width=0.8))
The problem is that, depending on the data-sets, the x values can vary significantly (e.g. dataset_1 x values can be between 1 to 200; dataset_2 x values can be between 10,000 to 100,000; ...), which causes the label of the shortest bar to be misplaced with the ifelse statement I am using (see brown bar in figure A below).
In this case I cannot just use a constant COUNT>1000 condition for all the datasets.
Figure A:
I could modify manually the value of the hjust=ifelse(COUNT>1000,...statement for each dataset.
But I was wondering if it is possible to automatically move the label outs of the bar if it does not fit between the axis and the top of the bar without modifying the value of the ifelse condition for each dataset, like in figure B below.
Figure B :
EDIT
Workaround (not perfect but better):
Placing the label at the right of the bar if the value is less than 5% of the maximum value
MAXI <- max(data[,2])
geom_text(aes(x=TYPE, y=COUNT, ymax=COUNT, label=COUNT,
hjust=ifelse((COUNT/MAXI)<0.05, -0.3, 1.3)))
Having some labels outside the bars and some inside can distort the visual encoding of magnitude as the length of the bar. Another option is to put the values in the middle of the bar but set geom_text to skip values that are small relative to the maximum bar. Or, if you want to include text for all the bar values added, you can put them below the bars in order to keep a clean visual pattern for the bar lengths. Examples of both options are below:
# Fake data
dat = data.frame(x = LETTERS[1:5], y=c(432, 1349, 10819, 5489, 12123))
ggplot(dat, aes(x, y, fill=x)) +
geom_bar(stat="identity") +
geom_text(aes(label=ifelse(y < 0.05*max(dat$y), "", format(y, big.mark=",")), y=0.5*y),
colour="white") +
coord_flip(xlim=c(0.4,5.6), ylim=c(0, 1.03*max(dat$y)), expand=FALSE) +
guides(fill=FALSE)
ggplot(dat, aes(x, y, fill=x)) +
geom_hline(yintercept=0, lwd=0.3, colour="grey40") +
geom_bar(stat="identity") +
geom_text(aes(label=format(y, big.mark=","), y=-0.01*max(dat$y)),
size=3.5, hjust=1) +
coord_flip(ylim = c(-0.04*max(dat$y), max(dat$y))) +
guides(fill=FALSE)
Related
I have a data frame with three continuous variables (x,y,z). I want a column plot in which x defines the x-axis position of the columns, y defines the length of the columns, and the column colors (function of y) are defined by z. The test code below shows the set up.
`require(ggplot2)
require(viridis)
# Create a dummy data frame
x <- c(rep(0.0, 5),rep(0.5,10),rep(1.0,15))
y <- c(seq(0.0,-5,length.out=5),
seq(0.0,-10,length.out=10),
seq(0.0,-15,length.out=15))
z <- c(seq(10,0,length.out=5),
seq(8,0,length.out=10),
seq(6,0,length.out=15))
df <- data.frame(x=x, y=y, z=z)
pbase <- ggplot(df, aes(x=x, y=y, fill=z))
ptest <- pbase + geom_col(width=0.5, position="identity") +
scale_fill_viridis(option="turbo",
limits = c(0,10),
breaks=seq(0,10,2.5),
labels=c("0","2.5","5.0","7.5","10.0"))
print(ptest)`
The legend has the correct colors but the columns do not. Perhaps this is not the correct way to do this type of plot. I tried using geom_bar() which creates a bars with the correct colors but the y-values are incorrect.
It looks like you have 3 X values that each appear 5, 10, or 15 times. Do you want the bars to be overlaid on top of one another, as they are now? If you add an alpha = 0.5 to the geom_col call you'll see the overlapping bars.
Alternatively, you might use dodging to show the bars next to one another instead of on top of one another.
ggplot(df, aes(x=x, y=y, fill=z, group = z)) +
geom_col(width=0.5, position=position_dodge()) +
scale_fill_viridis_c(option="turbo", # added with ggplot 3.x in 2018
limits = c(0,10),
breaks=seq(0,10,2.5),
labels=c("0","2.5","5.0","7.5","10.0"))
Or you might plot the data in order of y so that the smaller bars appear on top, visibly:
ggplot(dplyr::arrange(df,y), aes(x=x, y=y, fill=z))+
geom_col(width=0.5, position="identity") +
scale_fill_viridis_c(option="turbo",
limits = c(0,10),
breaks=seq(0,10,2.5),
labels=c("0","2.5","5.0","7.5","10.0"))
I solved this by using geom_tile() in place of geom_col().
I am looking to "dodge" the bars of a barplot together. The following R code leaves white space between the bars. Other answers like this one show how to accomplish this for the bars part of a group, but that does not seem to apply for distinct bars per factor on the x axis.
require(ggplot2)
dat <- data.frame(a=c("A", "B", "C"), b=c(0.71, 0.94, 0.85), d=c(32, 99, 18))
ggplot(dat, aes(x= a, y = b, fill=d, width = d/sum(d))) +
geom_bar(position=position_dodge(width = 0.1), stat="identity")
Playing with the width variable changes the appearance, but it does not seem possible to get the bars to sit side by side while still retaining their meaningful difference in width (in this graph redundantly represented by the fill colour too).
I would generate my x-positions and widths first, then pass them in to the aesthetics and override to make your factor labels:
First, store the width
dat$width <-
dat$d / sum(dat$d)
Then, assuming that your data.frame is in the order you want it plotted, you can set the location as the cumulative sum of the widths. Note, however, that that cumulative sum is where you want the right edge of the bar to be, so to get the center you need to subtract half of the width:
dat$loc <-
cumsum(dat$width) - dat$width/2
Then, pass it all in to the ggplot call, setting your labels explictly:
ggplot(dat, aes(x= loc, y = b, fill=d, width = width)) +
geom_bar(stat="identity") +
scale_x_continuous(breaks = dat$loc
, labels = dat$a)
gives
I am not sure about the advisability of this appproach, but this should get the job done.
It is possible by using a continuous x axis and relabel it.
ggplot(dat, aes(x=cumsum(d/sum(d))) - d/sum(d)/2, y = b, fill=d, width=d/sum(d))) +
geom_bar(stat="identity", position=position_dodge()) +
scale_x_continuous(breaks=cumsum(dat$d/sum(dat$d)) - dat$d/sum(dat$d)/2, labels=dat$a)
Or isn't this what you where looking for
So I have a bar chart to make, and a log plot for y axis is warranted because of the data range. So the problem is I have the value of 0.5, which in log10 is -0.3.
Since the bar goes into negative, the "top" of the bar, which is used for placing the labels is actually the "bottom" and so my text label is "just above" the bottom, which means in the middle of the bar.
I figure I am probably not the first person with this issue, but searching for related fixes has not helped. Most notably, I tried using dodge, but this does not change that the "top" of the bar is really the "bottom".
So two questions:
Can I fix this label mishap?
This is just ugly: can I move the x axis up to y=1 to give more context to the negative value without moving the x axis labels?
.
alpha=c('A','B','C','D')
value=c(0.5,10,40,1100)
table<-as.data.frame(alpha)
table<-cbind(table, value)
library(ggplot2)
graph <- ggplot(table, aes(x=alpha)) +
geom_bar(stat="identity",aes(y=value),width=0.5) +
geom_text(aes(y=value,label=value),vjust=-0.5) +
scale_y_continuous(trans="log10",limits=c(0.5,1400))
graph + theme_classic()
A little trick modifying the y coordinate of the data labels (use ifelse() to set the value of y to 1 if the value is less than one). As for the axis, simply hide the X axis (setting it to element_blank()) and draw a new horizontal line:
graph <- ggplot(table, aes(x=alpha)) +
geom_bar(stat="identity",aes(y=value),width=0.5) +
# Modify the placing of the label using 'ifelse()':
geom_text(aes(y=ifelse(value < 1, 1, value),label=value),vjust=-0.5) +
scale_y_continuous(trans="log10",limits=c(0.5,1400)) +
theme_classic() +
# Hide the X axis:
theme(axis.line.x = element_blank()) +
# Draw the new axis
geom_hline()
print(graph)
The output:
With the following code:
library(ggplot2)
set.seed(6809)
diamonds <- diamonds[sample(nrow(diamonds), 1000), ]
diamonds$cut <- factor(diamonds$cut,
levels = c("Ideal", "Very Good", "Fair", "Good", "Premium"))
# Repeat first example with new order
p <- ggplot(diamonds, aes(carat, ..density..)) +
geom_histogram(binwidth = 1)
p + facet_grid(color ~ cut)
I can create the following figure:
My questions are:
How can I sort the right strip according to my desired order
e.g. (G, F, D, E, I, J, H)?
How can I swap the right strip to with the y-axis (density)?
Update for ggplot2 2.2.1
With ggplot2 version 2, you can switch the positions of the axis labels and facet labels. So here's updated code that takes advantage of these features:
# Reorder factor levels
diamonds$color = factor(diamonds$color, levels=c("G","F","D","E","I","J","H"))
ggplot(diamonds, aes(carat, ..density..)) +
geom_histogram(binwidth=1) +
facet_grid(color ~ cut, switch="y") + # Put the y facet strips on the left
scale_y_continuous("density", position="right") + # Put the y-axis labels on the right
theme(strip.text.y=element_text(angle=180))
Original Answer
As #joran said, you have to modify the grid object if you want full control over what goes where. That's painful.
Here's another approach that's still a hassle, but easier (for me at least) than modifying the grid object. The basic idea is that we orient the various facet and axis labels so that we can rotate the plot 90 degrees counter-clockwise (to get the facet labels on the left side) while still having all the labels oriented properly.
To make this work, you need to modify the graph in several ways: Note my addition of coord_flip, all the theme stuff, and scale_x_reverse. Note also that I've switched the order of the facet variables, so that color starts out on top (it will be on the left after we rotate the graph).
# Reorder factor levels
diamonds$color = factor(diamonds$color, levels=rev(c("G","F","D","E","I","J","H")))
p <- ggplot(diamonds, aes(carat, ..density..)) +
geom_histogram(binwidth = 1) +
facet_grid(cut ~ color) + coord_flip() +
theme(strip.text.x=element_text(angle=-90),
axis.text.y=element_text(angle=-90, vjust=0.5, hjust=0.5),
axis.text.x=element_text(angle=-90, vjust=0.5, hjust=0),
axis.title.x=element_text(angle=180),
axis.title.y=element_text(angle=-90)) +
scale_x_reverse()
One option is to save the graph and then rotate it in another program (such as Preview, if you're on a Mac). However, with the help of this SO answer, I was able to rotate the plot within R. It required some trial and error (with my limited knowledge of how to manipulate grid objects) to get the right size for the viewport. I saved it as a PNG for posting on SO, but you can of course save it as a PDF, which will look nicer.
png("example.png", 500,600)
pushViewport(viewport(width = unit(8, "inches"), height = unit(7, "inches")))
print(p, vp=viewport(angle=90))
dev.off()
And here's the result:
facet_grid has the attribute "switch".
switch: By default, the labels are displayed on the top and right of the plot. If "x", the top labels will be displayed to the bottom. If "y", the right-hand side labels will be displayed to the left. Can also be set to "both".
manual here
facet_grid(cut ~ color, switch = "y")
I have a data-frame arranged as follows:
condition,treatment,value
A , one , 2
A , one , 1
A , two , 4
A , two , 2
...
D , two , 3
I have used ggplot2 to make a grouped bar plot that looks like this:
The bars are grouped by "condition" and the colours indicate "treatment." The bar heights are the mean of the values for each condition/treatment pair. I achieved this by creating a new data frame containing the mean and standard error (for the error bars) for all the points that will make up each group.
What I would like to do is superimpose the raw jittered data to produce a bar-chart version of this box plot: http://docs.ggplot2.org/0.9.3.1/geom_boxplot-6.png [I realise that a box plot would probably be better, but my hands are tied because the client is pathologically attached to bar charts]
I have tried adding a geom_point object to my plot and feeding it the raw data (rather than the aggregated means which were used to make the bars). This sort of works, but it plots the raw values at the wrong x axis locations. They appear at the points at which the red and grey bars join, rather than at the centres of the appropriate bar. So my plot looks like this:
I can not figure out how to shift the points by a fixed amount and then jitter them in order to get them centered over the correct bar. Anyone know? Is there, perhaps, a better way of achieving what I'm trying to do?
What follows is a minimal example that shows the problem I have:
#Make some fake data
ex=data.frame(cond=rep(c('a','b','c','d'),each=8),
treat=rep(rep(c('one','two'),4),each=4),
value=rnorm(32) + rep(c(3,1,4,2),each=4) )
#Calculate the mean and SD of each condition/treatment pair
agg=aggregate(value~cond*treat, data=ex, FUN="mean") #mean
agg$sd=aggregate(value~cond*treat, data=ex, FUN="sd")$value #add the SD
dodge <- position_dodge(width=0.9)
limits <- aes(ymax=value+sd, ymin=value-sd) #Set up the error bars
p <- ggplot(agg, aes(fill=treat, y=value, x=cond))
#Plot, attempting to overlay the raw data
print(
p + geom_bar(position=dodge, stat="identity") +
geom_errorbar(limits, position=dodge, width=0.25) +
geom_point(data= ex[ex$treat=='one',], colour="green", size=3) +
geom_point(data= ex[ex$treat=='two',], colour="pink", size=3)
)
I found it is unnecessary to create separate dataframes. The plot can be created by providing ggplot with the raw data.
ex <- data.frame(cond=rep(c('a','b','c','d'),each=8),
treat=rep(rep(c('one','two'),4),each=4),
value=rnorm(32) + rep(c(3,1,4,2),each=4) )
p <- ggplot(ex, aes(cond,value,fill = treat))
p + geom_bar(position = 'dodge', stat = 'summary', fun.y = 'mean') +
geom_errorbar(stat = 'summary', position = 'dodge', width = 0.9) +
geom_point(aes(x = cond), shape = 21, position = position_dodge(width = 1))
You need just one call to geom_point() where you use data frame ex and set x values to cond, y values to value and color=treat (inside aes()). Then add position=dodge to ensure that points are dodgeg. With scale_color_manual() and argument values= you can set colors you need.
p+geom_bar(position=dodge, stat="identity") +
geom_errorbar(limits, position=dodge, width=0.25)+
geom_point(data=ex,aes(cond,value,color=treat),position=dodge)+
scale_color_manual(values=c("green","pink"))
UPDATE - jittering of points
You can't directly use positions dodge and jitter together. But there are some workarounds. If you save whole plot as object then with ggplot_build() you can see x positions for bars - in this case they are 0.775, 1.225, 1.775... Those positions correspond to combinations of factors cond and treat. As in data frame ex there are 4 values for each combination, then add new column that contains those x positions repeated 4 times.
ex$xcord<-rep(c(0.775,1.225,1.775,2.225,2.775,3.225,3.775,4.225),each=4)
Now in geom_point() use this new column as x values and set position to jitter.
p+geom_bar(position=dodge, stat="identity") +
geom_errorbar(limits, position=dodge, width=0.25)+
geom_point(data=ex,aes(xcord,value,color=treat),position=position_jitter(width =.15))+
scale_color_manual(values=c("green","pink"))
As illustrated by holmrenser above, referencing a single dataframe and updating the stat instruction to "summary" in the geom_bar function is more efficient than creating additional dataframes and retaining the stat instruction as "identity" in the code.
To both jitter and dodge the data points with the bar charts per the OP's original question, this can also be accomplished by updating the position instruction in the code with position_jitterdodge. This positioning scheme allows widths for jitter and dodge terms to be customized independently, as follows:
p <- ggplot(ex, aes(cond,value,fill = treat))
p + geom_bar(position = 'dodge', stat = 'summary', fun.y = 'mean') +
geom_errorbar(stat = 'summary', position = 'dodge', width = 0.9) +
geom_point(aes(x = cond), shape = 21, position =
position_jitterdodge(jitter.width = 0.5, jitter.height=0.4,
dodge.width=0.9))