I'm trying to do a pyramid-"like" plot in R and I think I am close. I know there are functions such as plotrix's pyramid.plot but what I want to do isn't a real pyramid plot. In a pyramid plot, there are text labels down the middle that line up with the bars on the left and on the right. Instead, what I'd like to do is have two columns of text with bars coming away from them.
I'm using ggplot (but I guess I don't have to) and the multiplot function. A minimal example would be something like this:
mtcars$`car name` <- rownames(mtcars)
obj_a <- ggplot (mtcars, aes (x=`car name`, y=mpg))
obj_a <- obj_a + geom_bar (position = position_dodge(), stat="identity")
obj_a <- obj_a + coord_flip ()
obj_a <- obj_a + xlab ("")
USArrests$`states` <- rownames(USArrests)
obj_b <- ggplot (USArrests, aes (x=`states`, y=UrbanPop))
obj_b <- obj_b + geom_bar (position = position_dodge(), stat="identity")
obj_b <- obj_b + coord_flip ()
obj_b <- obj_b + xlab ("")
multiplot (obj_a, obj_b, cols=2)
Which looks like this:
I guess what I'd like is just to flip the left half so that each row has (from left-to-right): left bar, car model, state name, right bar. (The graph I'm making will have the same number of rows in both halves so it won't look so cramped.) However, the point is, there are two columns of text, not one.
Of course, since both halves are independent of each other, my real problem is I don't know how to make the left half. (A bar plot with bars going in the opposite direction.) But I thought I'd also explain what I'm trying to do...
Thank you in advance!
You can set the mpg values in obj_a negative, & position the car names axis on the opposite side:
ggplot (mtcars, aes (x=`car name`, y=-mpg)) + # y takes on negative values
geom_bar (position = position_dodge(), stat = "identity") +
coord_flip () +
scale_x_discrete(name = "", position = "top") + # x axis (before coord_flip) on opposite side
scale_y_continuous(name = "mpg",
breaks = seq(0, -30, by = -10), # y axis values (before coord_flip)
labels = seq(0, 30, by = 10)) # show non-negative values
It seems that pyramid.plot already does what you need. Using their example:
xy.pop<-c(3.2,3.5,3.6,3.6,3.5,3.5,3.9,3.7,3.9,3.5,3.2,2.8,2.2,1.8,
1.5,1.3,0.7,0.4)
xx.pop<-c(3.2,3.4,3.5,3.5,3.5,3.7,4,3.8,3.9,3.6,3.2,2.5,2,1.7,1.5,
1.3,1,0.8)
agelabels<-c("0-4","5-9","10-14","15-19","20-24","25-29","30-34",
"35-39","40-44","45-49","50-54","55-59","60-64","65-69","70-74",
"75-79","80-44","85+")
mcol<-color.gradient(c(0,0,0.5,1),c(0,0,0.5,1),c(1,1,0.5,1),18)
fcol<-color.gradient(c(1,1,0.5,1),c(0.5,0.5,0.5,1),c(0.5,0.5,0.5,1),18)
par(mar=pyramid.plot(xy.pop,xx.pop,labels=agelabels,
main="Australian population pyramid 2002",lxcol=mcol,rxcol=fcol,
gap=0.5,show.values=TRUE))
# three column matrices
avtemp<-c(seq(11,2,by=-1),rep(2:6,each=2),seq(11,2,by=-1))
malecook<-matrix(avtemp+sample(-2:2,30,TRUE),ncol=3)
femalecook<-matrix(avtemp+sample(-2:2,30,TRUE),ncol=3)
# *** Make agegrps a two column data frame with the labels ***
# group by age
agegrps<-data.frame(c("0","11","21","31","41","51",
"61-70","71-80","81-90","91+"),
c("10","20","30","40","50","60",
"70","80","90","91"))
oldmar<-pyramid.plot(malecook,femalecook,labels=agegrps,
unit="Bowls per month",lxcol=c("#ff0000","#eeee88","#0000ff"),
rxcol=c("#ff0000","#eeee88","#0000ff"),laxlab=c(0,10,20,30),
raxlab=c(0,10,20,30),top.labels=c("Males","Age","Females"),gap=4,
do.first="plot_bg(\"#eedd55\")")
# put a box around it
box()
# give it a title
mtext("Porridge temperature by age and sex of bear",3,2,cex=1.5)
# stick in a legend
legend(par("usr")[1],11,c("Too hot","Just right","Too cold"),
fill=c("#ff0000","#eeee88","#0000ff"))
# don't forget to restore the margins and background
par(mar=oldmar,bg="transparent")
Result:
Instead of negating the variable, you could just add + scale_y_reverse(), which becomes x after the flip
This way you don't have to set axis labels manually.
You will still need to change the x axis label position, as suggested in the answer by user Z.Lin
E.g.
library(ggplot2)
mtcars$`car name` <- rownames(mtcars)
ggplot (mtcars, aes (x=`car name`, y=mpg)) +
geom_bar (position = position_dodge(), stat="identity") +
scale_y_reverse () +
scale_x_discrete(name = "", position = "top") +
coord_flip ()
Created on 2020-04-09 by the reprex package (v0.3.0)
EDIT: An even simpler solution is not to use coord_flip() at all, but rather specify the desired mapping directly right away. I would now recommend this approach, having encountered issues with how coord_flip() behaves under certain more complex scenarios.
I give the code for this below; the plot should still look the same.
library(ggplot2)
mtcars$`car name` <- rownames(mtcars)
ggplot (mtcars, aes(x = mpg, y = `car name`)) +
geom_bar(position = position_dodge(), stat = "identity") +
scale_x_reverse() +
scale_y_discrete(name = "", position = "right")
Related
I'd like to have some labels stacked on top of a geom_bar graph. Here's an example:
df <- data.frame(x=factor(c(TRUE,TRUE,TRUE,TRUE,TRUE,FALSE,FALSE,FALSE)))
ggplot(df) + geom_bar(aes(x,fill=x)) + opts(axis.text.x=theme_blank(),axis.ticks=theme_blank(),axis.title.x=theme_blank(),legend.title=theme_blank(),axis.title.y=theme_blank())
Now
table(df$x)
FALSE TRUE
3 5
I'd like to have the 3 and 5 on top of the two bars. Even better if I could have the percent values as well. E.g. 3 (37.5%) and 5 (62.5%). Like so:
(source: skitch.com)
Is this possible? If so, how?
To plot text on a ggplot you use the geom_text. But I find it helpful to summarise the data first using ddply
dfl <- ddply(df, .(x), summarize, y=length(x))
str(dfl)
Since the data is pre-summarized, you need to remember to change add the stat="identity" parameter to geom_bar:
ggplot(dfl, aes(x, y=y, fill=x)) + geom_bar(stat="identity") +
geom_text(aes(label=y), vjust=0) +
opts(axis.text.x=theme_blank(),
axis.ticks=theme_blank(),
axis.title.x=theme_blank(),
legend.title=theme_blank(),
axis.title.y=theme_blank()
)
As with many tasks in ggplot, the general strategy is to put what you'd like to add to the plot into a data frame in a way such that the variables match up with the variables and aesthetics in your plot. So for example, you'd create a new data frame like this:
dfTab <- as.data.frame(table(df))
colnames(dfTab)[1] <- "x"
dfTab$lab <- as.character(100 * dfTab$Freq / sum(dfTab$Freq))
So that the x variable matches the corresponding variable in df, and so on. Then you simply include it using geom_text:
ggplot(df) + geom_bar(aes(x,fill=x)) +
geom_text(data=dfTab,aes(x=x,y=Freq,label=lab),vjust=0) +
opts(axis.text.x=theme_blank(),axis.ticks=theme_blank(),
axis.title.x=theme_blank(),legend.title=theme_blank(),
axis.title.y=theme_blank())
This example will plot just the percentages, but you can paste together the counts as well via something like this:
dfTab$lab <- paste(dfTab$Freq,paste("(",dfTab$lab,"%)",sep=""),sep=" ")
Note that in the current version of ggplot2, opts is deprecated, so we would use theme and element_blank now.
Another solution is to use stat_count() when dealing with discrete variables (and stat_bin() with continuous ones).
ggplot(data = df, aes(x = x)) +
geom_bar(stat = "count") +
stat_count(geom = "text", colour = "white", size = 3.5,
aes(label = ..count..),position=position_stack(vjust=0.5))
So, this is our initial plot↓
library(ggplot2)
df <- data.frame(x=factor(c(TRUE,TRUE,TRUE,TRUE,TRUE,FALSE,FALSE,FALSE)))
p <- ggplot(df, aes(x = x, fill = x)) +
geom_bar()
p
As suggested by yuan-ning, we can use stat_count().
geom_bar() uses stat_count() by default. As mentioned in the ggplot2 reference, stat_count() returns two values: count for number of points in bin and prop for groupwise proportion. Since our groups match the x values, both props are 1 and aren’t useful. But we can use count (referred to as “..count..”) that actually denotes bar heights, in our geom_text(). Note that we must include “stat = 'count'” into our geom_text() call as well.
Since we want both counts and percentages in our labels, we’ll need some calculations and string pasting in our “label” aesthetic instead of just “..count..”. I prefer to add a line of code to create a wrapper percent formatting function from the “scales” package (ships along with “ggplot2”).
pct_format = scales::percent_format(accuracy = .1)
p <- p + geom_text(
aes(
label = sprintf(
'%d (%s)',
..count..,
pct_format(..count.. / sum(..count..))
)
),
stat = 'count',
nudge_y = .2,
colour = 'royalblue',
size = 5
)
p
Of course, you can further edit the labels with colour, size, nudges, adjustments etc.
I'm trying to add a single, manual bar to the existing area (ribbon) plot. Ideally I just wanted to specify the x (position) and y (value) for the bar.
ExampleData <- data.frame(myID=c(1,2,3,4,5,6,7,8,9,10),PU=c(10,20,30,40,50,60,70,80,90,100))
MyPlot <- ggplot(ExampleData,aes(x=myID))
MyPlot <- MyPlot + geom_ribbon(aes(ymin=0, ymax=PU), fill="lightgray", color="darkgray", size=1)
MyPlot <- MyPlot + geom_col(aes(x=4,y=40), color="red", linetype="solid", size=1)
MyPlot
It is almost working, but for some reason the value of 40 is becoming 400, and ideally I should be able to specify the width of the bar (should be half of what we see below).
Thank you for any help!
Maybe something more like this?
ExampleData <- data.frame(myID=c(1,2,3,4,5,6,7,8,9,10),
PU=c(10,20,30,40,50,60,70,80,90,100))
bar <- data.frame(xmin = 4,xmax= 4.5,ymin = 0,ymax = 40)
ggplot() +
geom_ribbon(data = ExampleData,
aes(x = myID,ymin=0, ymax=PU),
fill="lightgray",
color="darkgray", size=1) +
geom_rect(data = bar,
aes(xmin = xmin,xmax = xmax,ymin = ymin,ymax = ymax),
color = "red")
The 40 vs 400 issue you mention happens when you specify a data frame at the top ggplot() level and then try to add layers where all the aesthetics are intended to be "set" rather than "mapped". The most common case when this happens is when people are adding text labels and you end up with many many copies of each text label plotted on top of each other.
In this case, ggplot is trying to interpret the x and y values you give geom_col in the context of ExampleData, and so ends up repeating those single values 10 times and stacking the resulting bars.
I have a set of code that produces multiple plots using facet_wrap:
ggplot(summ,aes(x=depth,y=expr,colour=bank,group=bank)) +
geom_errorbar(aes(ymin=expr-se,ymax=expr+se),lwd=0.4,width=0.3,position=pd) +
geom_line(aes(group=bank,linetype=bank),position=pd) +
geom_point(aes(group=bank,pch=bank),position=pd,size=2.5) +
scale_colour_manual(values=c("coral","cyan3", "blue")) +
facet_wrap(~gene,scales="free_y") +
theme_bw()
With the reference datasets, this code produces figures like this:
I am trying to accomplish two goals here:
Keep the auto scaling of the y axis, but make sure only 1 decimal place is displayed across all the plots. I have tried creating a new column of the rounded expr values, but it causes the error bars to not line up properly.
I would like to wrap the titles. I have tried changing the font size as in Change plot title sizes in a facet_wrap multiplot, but some of the gene names are too long and will end up being too small to read if I cram them on a single line. Is there a way to wrap the text, using code within the facet_wrap statement?
Probably cannot serve as definite answer, but here are some pointers regarding your questions:
Formatting the y-axis scale labels.
First, let's try the direct solution using format function. Here we format all y-axis scale labels to have 1 decimal value, after rounding it with round.
formatter <- function(...){
function(x) format(round(x, 1), ...)
}
mtcars2 <- mtcars
sp <- ggplot(mtcars2, aes(x = mpg, y = qsec)) + geom_point() + facet_wrap(~cyl, scales = "free_y")
sp <- sp + scale_y_continuous(labels = formatter(nsmall = 1))
The issue is, sometimes this approach is not practical. Take the leftmost plot from your figure, for example. Using the same formatting, all y-axis scale labels would be rounded up to -0.3, which is not preferable.
The other solution is to modify the breaks for each plot into a set of rounded values. But again, taking the leftmost plot of your figure as an example, it'll end up with just one label point, -0.3
Yet another solution is to format the labels into scientific form. For simplicity, you can modify the formatter function as follow:
formatter <- function(...){
function(x) format(x, ..., scientific = T, digit = 2)
}
Now you can have a uniform format for all of plots' y-axis. My suggestion, though, is to set the label with 2 decimal places after rounding.
Wrap facet titles
This can be done using labeller argument in facet_wrap.
# Modify cyl into factors
mtcars2$cyl <- c("Four Cylinder", "Six Cylinder", "Eight Cylinder")[match(mtcars2$cyl, c(4,6,8))]
# Redraw the graph
sp <- ggplot(mtcars2, aes(x = mpg, y = qsec)) + geom_point() +
facet_wrap(~cyl, scales = "free_y", labeller = labeller(cyl = label_wrap_gen(width = 10)))
sp <- sp + scale_y_continuous(labels = formatter(nsmall = 2))
It must be noted that the wrap function detects space to separate labels into lines. So, in your case, you might need to modify your variables.
This only solved the first part of the question. You can create a function to format your axis and use scale_y_continous to adjust it.
df <- data.frame(x=rnorm(11), y1=seq(2, 3, 0.1) + 10, y2=rnorm(11))
library(ggplot2)
library(reshape2)
df <- melt(df, 'x')
# Before
ggplot(df, aes(x=x, y=value)) + geom_point() +
facet_wrap(~ variable, scale="free")
# label function
f <- function(x){
format(round(x, 1), nsmall=1)
}
# After
ggplot(df, aes(x=x, y=value)) + geom_point() +
facet_wrap(~ variable, scale="free") +
scale_y_continuous(labels=f)
scale_*_continuous(..., labels = function(x) sprintf("%0.0f", x)) worked in my case.
Is there any way to line up the points of a line plot with the bars of a bar graph using ggplot when they have the same x-axis? Here is the sample data I'm trying to do it with.
library(ggplot2)
library(gridExtra)
data=data.frame(x=rep(1:27, each=5), y = rep(1:5, times = 27))
yes <- ggplot(data, aes(x = x, y = y))
yes <- yes + geom_point() + geom_line()
other_data = data.frame(x = 1:27, y = 50:76 )
no <- ggplot(other_data, aes(x=x, y=y))
no <- no + geom_bar(stat = "identity")
grid.arrange(no, yes)
Here is the output:
The first point of the line plot is to the left of the first bar, and the last point of the line plot is to the right of the last bar.
Thank you for your time.
Extending #Stibu's post a little: To align the plots, use gtable (Or see answers to your earlier question)
library(ggplot2)
library(gtable)
data=data.frame(x=rep(1:27, each=5), y = rep(1:5, times = 27))
yes <- ggplot(data, aes(x = x, y = y))
yes <- yes + geom_point() + geom_line() +
scale_x_continuous(limits = c(0,28), expand = c(0,0))
other_data = data.frame(x = 1:27, y = 50:76 )
no <- ggplot(other_data, aes(x=x, y=y))
no <- no + geom_bar(stat = "identity") +
scale_x_continuous(limits = c(0,28), expand = c(0,0))
gYes = ggplotGrob(yes) # get the ggplot grobs
gNo = ggplotGrob(no)
plot(rbind(gNo, gYes, size = "first")) # Arrange and plot the grobs
Edit To change heights of plots:
g = rbind(gNo, gYes, size = "first") # Combine the plots
panels <- g$layout$t[grepl("panel", g$layout$name)] # Get the positions for plot panels
g$heights[panels] <- unit(c(0.7, 0.3), "null") # Replace heights with your relative heights
plot(g)
I can think of (at least) two ways to align the x-axes in the two plots:
The two axis do not align because in the bar plot, the geoms cover the x-axis from 0.5 to 27.5, while in the other plot, the data only ranges from 1 to 27. The reason is that the bars have a width and the points don't. You can force the axex to align by explicitly specifying an x-axis range. Using the definitions from your plot, this can be achieved by
yes <- yes + scale_x_continuous(limits=c(0,28))
no <- no + scale_x_continuous(limits=c(0,28))
grid.arrange(no, yes)
limits sets the range of the x-axis. Note, though, that the alginment is still not quite perfect. The y-axis labels take up a little more space in the upper plot, because the numbers have two digits. The plot looks as follows:
The other solution is a bit more complicated but it has the advantage that the x-axis is drawn only once and that ggplot makes sure that the alignment is perfect. It makes use of faceting and the trick described in this answer. First, the data must be combined into a single data frame by
all <- rbind(data.frame(other_data,type="other"),data.frame(data,type="data"))
and then the plot can be created as follows:
ggplot(all,aes(x=x,y=y)) + facet_grid(type~.,scales = "free_y") +
geom_bar(data=subset(all,type=="other"),stat="identity") +
geom_point(data=subset(all,type=="data")) +
geom_line(data=subset(all,type=="data"))
The trick is to let the facets be constructed by the variable type which was used before to label the two data sets. But then each geom only gets the subset of the data that should be drawn with that specific geom. In facet_grid, I also used scales = "free_y" because the two y-axes should be independent. This plot looks as follows:
You can change the labels of the facets by giving other names when you define the data frame all. If you want to remove them alltogether, then add the following to your plot:
+ theme(strip.background = element_blank(), strip.text = element_blank())
I am trying to create a faceted plot with flipped co-ordinates where one and only one of the axes are allowed to vary for each facet:
require(ggplot2)
p <- qplot(displ, hwy, data = mpg)
p + facet_wrap(~ cyl, scales = "free_y") + coord_flip()
This plot is not satisfactory to me because the wrong tick marks and tick labels are repeated for each plot. I want tick marks on every horizontal axis not on every vertical axis.
This is unexpected behaviour because the plot implies that the horizontal axis tick marks are the same for the top panels as they are for the bottom ones, but they are not. To see this run:
p <- qplot(displ, hwy, data = mpg)
p + facet_wrap(~ cyl, scales = "fixed") + coord_flip()
So my question is: is there a way to remove the vertical axis tick marks for the right facets and add horizontal axis tick marks and labels to the top facets?
As Paul insightfully points out below, the example I gave can be addressed by swapping x and y in qplot() and avoiding coord_flip(), however this does not work for all geoms for example, if I want a horizontal faceted bar plot with free horizontal axes I could run:
c <- ggplot(diamonds, aes(clarity, fill=cut)) + geom_bar()
c + facet_wrap(~cut, scales = "free_y") + coord_flip()
These facets have a variable horizontal axes but repeated vertical axis tick marks instead of repeated horizontal axes tick marks. I do not think Paul's trick will work here, because unlike scatter plots, bar plots are not rotationally symmetric.
I would be very interested to hear any partial or complete solutions.
Using coord_flip in conjunction with facet_wrap is the problem. First you define a certain axis to be free (the x axis) and then you swap the axis, making the y axis free. Right now this is not reproduced well in ggplot2.
In your first example, I would recommend not using coord_flip, but just swapping the variables around in your call to qplot, and using free_x:
p <- qplot(hwy, displ, data = mpg)
p + facet_wrap(~ cyl, scales = "free_x")
This is the second or third time I have run into this problem myself. I have found that I can hack my own solution by defining a custom geom.
geom_bar_horz <- function (mapping = NULL, data = NULL, stat = "bin", position = "stack", ...) {
GeomBar_horz$new(mapping = mapping, data = data, stat = stat, position = position, ...)
}
GeomBar_horz <- proto(ggplot2:::Geom, {
objname <- "bar_horz"
default_stat <- function(.) StatBin
default_pos <- function(.) PositionStack
default_aes <- function(.) aes(colour=NA, fill="grey20", size=0.5, linetype=1, weight = 1, alpha = NA)
required_aes <- c("y")
reparameterise <- function(., df, params) {
df$width <- df$width %||%
params$width %||% (resolution(df$x, FALSE) * 0.9)
OUT <- transform(df,
xmin = pmin(x, 0), xmax = pmax(x, 0),
ymin = y - .45, ymax = y + .45, width = NULL
)
return(OUT)
}
draw_groups <- function(., data, scales, coordinates, ...) {
GeomRect$draw_groups(data, scales, coordinates, ...)
}
guide_geom <- function(.) "polygon"
})
This is just copying the geom_bar code from the ggplot2 github and then switching the x and y references to make a horizontal barplot in the standard Cartesian coordinators.
Note that you must use position='identity' and possibly also stat='identity' for this to work. If you need to use a position other than identity then you will have to eddit the collide function for it to work properly.
I've just been trying to do a horizontal barplot, and run into this problem where I wanted to scales = "free_x". In the end, it seemed easier to create the conventional (vertical) barplot), rotate the text so that if you tip your head to the left, it looks like the plot that you want. And then, once your plot is completed, rotate the PDF/image output(!)
ggplot(data, aes(x, y)) +
geom_bar(stat = "identity") +
facet_grid(var ~ group, scale = "free", space = "free_x", switch = "both") +
theme(axis.text.y = element_text(angle=90), axis.text.x = element_text(angle = 90),
strip.text.x = element_text(angle = 180))
The main keys to do this are to switch = "both", which moves the facet labels to the other axis, and the element_text(angle=90) which rotates the axis labels and text.