Different fills for different facet_grids using geom_bar - r

I'm new with ggplot2 and I have a question that I couldn't find the answer.
I've created the following toy data to help in the explanation:
data <- data.frame(tech=c(rep(letters[1:15],2)),
sep=c(rep(c("SitutationA", "SitutationB"),each=15)),
error=c(runif(15,min=-0.2, max=0.5), runif(15, min=0.3, max=1)))
I want to plot a geom_bar graph showing the "error" (axis y) for each technique "tech" (axis x) divided in two different situations (SituationA and SituationB) using facet_grid. The color (fill) of each bar should represent the "error" of each technique, and not the technique (as a factor). The errors for situations A and B are measured in different scales. However, in my code, an error of the same value have the same color in both situations. I do not want this behavior since they were measured in different scales. Thus, I would like that the colors in Situations A and B were independents.
The following code plots the graph, but using the same color for both situations.
ggplot(data, aes(x=tech, y=error)) +
geom_bar(aes(fill=error), stat="identity", position="dodge") +
facet_grid(sep ~ ., scales="free_y") +
scale_fill_continuous(guide=FALSE)
How could I use different continuous fills for each facet (situationA and situationB)?
Thank you.

You can't have two different fill scales on the same plot.
Solution to the problem could be to make two plots and then put them together with grid.arrange() from library gridExtra.
In the first plot put only values of SitutationA. Changed y scale to show values with two numbers after decimal point (to be the same as for second plot). Removed x axis title, texts and ticks and changed plot margins - set bottom margin to -0.4 to reduce space between plots.
library(grid)
library(gridExtra)
p1<-ggplot(subset(data,sep=="SitutationA"), aes(x=tech, y=error)) +
geom_bar(aes(fill=error), stat="identity", position="dodge") +
facet_grid(sep ~ ., scales="free_y") +
scale_fill_continuous(guide=FALSE)+
scale_y_continuous(breaks=c(0,0.25,0.50))+
theme(axis.text.x=element_blank(),
axis.title.x=element_blank(),
axis.ticks.x=element_blank(),
plot.margin=unit(c(1,1,-0.4,1),"lines"))
For the second plot (SitutationB) changed top plot margin to -0.4 to reduce space between plots. Then changed scale_fill_continuous() and provided new colors.
p2<-ggplot(subset(data,sep=="SitutationB"), aes(x=tech, y=error)) +
geom_bar(aes(fill=error), stat="identity", position="dodge") +
facet_grid(sep ~ ., scales="free_y") +
scale_fill_continuous(guide=FALSE,low="red",high="purple") +
theme(plot.margin=unit(c(-0.4,1,1,1),"lines"))
Now put both plots together.
grid.arrange(p1,p2)

Related

Redistribute columns along x axis using ggplot2

Using this code:
ggplot(total_reads, aes(x=Week, y=Reads)) +
geom_bar(position = "dodge", stat = "identity") +
scale_y_log10(breaks=breaks, minor_breaks=minor_breaks) +
scale_x_continuous() +
facet_grid(~PEDIS, scales="free_x", space = "free_x") +
theme_classic() +
ylab("Total Bacterial Reads")
I produced this graph:
How do I remove the empty spaces in the first facet (pedis1) and make sure only the relevant labels are on the x axis (ie 0,3,6,12,13)?
The quick answer is because your x axis values (total_reads$Week) is an integer/number. This automatically sets the scale to be continuous and therefore you have spacing according to the distance on the scale (like any numeric scale). If you want to have the bars right next to one another and remove the white space, you'll need to set the x axis to a discrete variable when plotting. It's easiest to do this by mapping factor(Week) right in the aes() declaration.
Here's an example with that modification as well as some other suggestions described below:
total_reads <- data.frame(
Week=c(0,3,6,12,13),
Reads=c(100,110,100,129,135),
PEDIS=c(rep('PEDIS1', 3), rep('PEDIS2',2))
)
ggplot(total_reads, aes(x=factor(Week), y=Reads)) +
geom_col() +
facet_grid(~PEDIS, scales="free_x", space="free_x") +
theme_classic()
A few other notes on what you see changed here:
Use geom_col(), not geom_bar(). If you check out the documentation associated with the geom_bar() function, you can see it mentions that geom_bar() is used for showing counts of observations along a single axis, whereas if you want to show value, you should use geom_col(). You get the same effect with geom_col() as if you use geom_bar(stat="identity").
Remove scale_x_continuous(). Not sure why you have this there anyway, but if your column Week is numeric, it would default to use this scale anyway. If you do use the sale, you will ask ggplot to force a continuous scale - apparently not what you want here.

ggplot boxplot - length of whiskers with logarithmic axis

I'm trying to create a horizontal boxplot with logarithmic axis using ggplot2. But, the length of whiskers are wrong.
A minimal reproducible example:
Some data
library(ggplot2)
library(reshape2)
set.seed(1234)
my.df <- data.frame(a = rnorm(1000,150,50), b = rnorm(1000,500,150))
my.df$a[which(my.df$a < 5)] <- 5
my.df$b[which(my.df$b < 5)] <- 5
If I plot this using base R boxplot(), everything is fine
boxplot(my.df, log="x", horizontal=T)
But with ggplot,
my.df.long <- melt(my.df, value.name = "vals")
ggplot(my.df.long, aes(x=variable, y=vals)) +
geom_boxplot() +
scale_y_log10(breaks=c(5,10,20,50,100,200,500,1000), limits=c(5,1000)) +
theme_bw() + coord_flip()
I get this plot, in which the whiskers are the wrong length (see for example how there are many additional outliers below the whiskers and none above).
Note that, without log axes, ggplot has the whiskers the correct length
ggplot(my.df.long, aes(x=variable, y=vals)) +
geom_boxplot() +
theme_bw() + coord_flip()
How do I produce a horizontal logarithmic boxplot using ggplot with the correct length whiskers? Preferably with the whiskers extending to 1.5 times the IQR.
N.B. as explained here. It is possible to use coord_trans(y = "log10") instead of scale_y_log10, which will cause the stats to be calculated before transforming the data. However, coord_trans cannot be used in combination with coord_flip. So this does not solve the issue of creating horizontal boxplots with a log axis.
You can have ggplot use boxplot.stats (the same function used by base boxplot) to set the y-values for the box-and-whiskers and the outliers. For example:
# Function to use boxplot.stats to set the box-and-whisker locations
mybxp = function(x) {
bxp = boxplot.stats(x)[["stats"]]
names(bxp) = c("ymin","lower", "middle","upper","ymax")
return(bxp)
}
# Function to use boxplot.stats for the outliers
myout = function(x) {
data.frame(y=boxplot.stats(x)[["out"]])
}
Now we use those functions in stat_summary to draw the boxplot, as in the example below:
ggplot(my.df.long, aes(x=variable, y=vals)) +
stat_summary(fun.data=mybxp, geom="boxplot") +
stat_summary(fun.data=myout, geom="point") +
theme_bw() + coord_flip()
Now for the log transformation issue: The plots below show, respectively, no coordinate transformation, scale_y_log10, and coord_trans(y="log10"). In addition, I've used geom_hline to add dotted lines at each of the box-and-whisker values and I've added text to show the actual values. To reduce clutter, I've removed the outlier points, and I've faded out the boxplots a bit so that the other components will show up better.
# Set up common plot elements
p = ggplot(my.df.long, aes(x=variable, y=vals)) +
geom_hline(yintercept=mybxp(my.df$a), colour="red", lty="11", size=0.3) +
geom_hline(yintercept=mybxp(my.df$b), colour="blue", lty="11", size=0.3) +
stat_summary(fun.data=mybxp, geom="boxplot", colour="#000000A0", fatten=0.5) +
#stat_summary(fun.data=myout, geom="point") +
theme_bw() + coord_flip()
br = c(5,10,20,50,100,200,500,1000)
## Create plots
# Without log transformation
p1 = p + scale_y_continuous(breaks=br, limits=c(5,1000)) +
stat_summary(fun.y=mybxp, aes(label=round(..y..)), geom="text", size=3, colour="red") +
ggtitle("No Transformation")
# With scale_y_log10
p2 = p + scale_y_log10(breaks=br, limits=c(5,1000)) + ggtitle("scale_y_log10") +
stat_summary(fun.y=mybxp, aes(label=round(..y..,2)), geom="text", size=3, colour="red") +
stat_summary(fun.y=mybxp, aes(label=round(10^(..y..))), geom="text", size=3,
colour="blue", position=position_nudge(x=0.3))
# With coord_trans
p3 = p + scale_y_continuous(breaks=br, limits=c(5,1000)) +
stat_summary(fun.y=mybxp, aes(label=round(..y..)), geom="text", size=3, colour="red") +
coord_trans(y="log10") + ggtitle("coord_trans(y='log 10')")
The three plots are shown below. Note that the last plot, using coord_trans is not flipped, because coord_trans overrides coord_flip. You can probably use something like the code in this SO answer to flip the plot, but I haven't done that here.
The first plot, with no transformations, shows the correct values.
The third plot, using coord_trans also has everything in the correct locations. Note that coord_trans is actually changing the y-coordinate system of the plot without changing the values of the plotted points. It's the space itself that's been "distorted" to a log scale.
Now, note that in the second plot, using scale_y_log10, the boxes are in the correct locations but the ends of the whiskers are in the wrong locations. On the other hand, comparison with the other two plots shows that the location of all the geom_hlines is correct. Also note that, unlike coord_trans, scale_y_log10 takes the log of the points themselves and just relabels the y-axis breaks with the unlogged values, while leaving the "space" in the which the points are plotted unchanged. You can see this by looking at the values in red text. The values in blue text are the unlogged values.
See #dww's answer for an explanation of why scale_y_log10 results only in the whisker ends being transformed incorrectly, while the box values are plotted in the right place.
The problem is due to the fact that scale_y_log10 transforms the data before calculating the stats. This does not matter for the median and percentile points, because e.g. 10^log10(median) is still the median value, which will be plotted in the correct location. But it does matter for the whiskers which are calculated using 1.5 * IQR, because 10^(1.5 * IQR(log10(x)) is not equal to 1.5 * IQR(x). So the calculation fails for the whiskers.
This error becomes evident if we compare
boxplot.stats(my.df$b)$stats
# [1] 117.4978 407.3983 502.0460 601.2937 873.0992
10^boxplot.stats(log10(my.df$b))$stats
# [1] 231.1603 407.3983 502.0459 601.2935 975.1906
In which we see that the median and percentile ppoints are identical, but the whisker ends (1st and last elements of the stats vector) differ
This detailed and useful answer by #eipi10, shows how to calculate the stats yourself and force ggplot to use these user-defined stats rather than its internal (and incorrect) algorithm. Using this approach, it becomes relatively simple to calculate the correct statistics and use these instead.
# Function to use boxplot.stats to set the box-and-whisker locations
mybxp = function(x) {
bxp = log10(boxplot.stats(10^x)[["stats"]])
names(bxp) = c("ymin","lower", "middle","upper","ymax")
return(bxp)
}
# Function to use boxplot.stats for the outliers
myout = function(x) {
data.frame(y=log10(boxplot.stats(10^x)[["out"]]))
}
ggplot(my.df.long, aes(x=variable, y=vals)) + theme_bw() + coord_flip() +
scale_y_log10(breaks=c(5,10,20,50,100,200,500,1000), limits=c(5,1000)) +
stat_summary(fun.data=mybxp, geom="boxplot") +
stat_summary(fun.data=myout, geom="point")
Which produces the correct plot
A note on using coord_trans as an alternative approach:
Using coord_trans(y = "log10") instead of scale_y_log10, causes the stats to be calculated (correctly) on the untransformed data. However, coord_trans cannot be used in combination with coord_flip. So, this does not solve the issue of creating horizontal boxplots with a log axis. The suggestion here to use ggdraw(switch_axis_position()) from the cowplot package to flip the axes after using coord_trans did not work, but throws an error (cowplot v0.4.0 with ggplot2 v2.1.0)
Error in Ops.unit(gyl$x, grid::unit(0.5, "npc")) : both operands
must be units
In addition: Warning message: axis.ticks.margin is
deprecated. Please set margin property of axis.text instead
I think that the easiest answer if you don't need to make the boxplots horizontal is to transform the coordinate system in stead of changing the scale, using coord_trans(y = "log10") in stead of scale_y_log10().

Can the minimum y-value be adjusted when using scales = "free" in ggplot?

Using the following data set:
day <- gl(8,1,48,labels=c("Mon","Tues","Wed","Thurs","Fri","Sat","Sun","Avg"))
day <- factor(day, level=c("Mon","Tues","Wed","Thurs","Fri","Sat","Sun","Avg"))
month<-gl(3,8,48,labels=c("Jan","Mar","Apr"))
month<-factor(month,level=c("Jan","Mar","Apr"))
snow<-gl(2,24,48,labels=c("Y","N"))
snow<-factor(snow,levels=c("Y","N"))
count <- c(.94,.95,.96,.98,.93,.94,.99,.9557143,.82,.84,.83,.86,.91,.89,.93,.8685714,1.07,.99,.86,1.03,.81,.92,.88,.9371429,.94,.95,.96,.98,.93,.94,.99,.9557143,.82,.84,.83,.86,.91,.89,.93,.8685714,1.07,.99,.86,1.03,.81,.92,.88,.9371429)
d <- data.frame(day=day,count=count,month=month,snow=snow)
I like the y-scale in this graph, but not the bars:
ggplot()+
geom_line(data=d[d$day!="Avg",],aes(x=day, y=count, group=month, colour=month))+
geom_bar(data=d[d$day=="Avg",],aes(x=day, y=count, fill=month),position="dodge", group=month)+
scale_x_discrete(limits=levels(d$day))+
facet_wrap(~snow,ncol=1,scales="free")+
scale_y_continuous(labels = percent_format())
I like the points, but not the scale:
ggplot(data=d[d$day=="Avg",],aes(x=day, y=count, fill=month,group=month,label=month),show_guide=F)+
facet_wrap(~snow,ncol=1,scales="free")+
geom_line(data=d[d$day!="Avg",],aes(x=day, y=count, group=month, colour=month), show_guide=F)+
scale_x_discrete(limits=levels(d$day))+
scale_y_continuous(labels = percent_format())+
geom_point(aes(colour = month),size = 4,position=position_dodge(width=1.2))
How to combine the desirable qualities in the above graphs?
Essentially, I'm asking: How can I graph the points with a varied y-max while setting the y-min to zero?
Note: The solution that I'm aiming to find will apply to about 27 graphs built from one dataframe. So I'll vote up those solutions that avoid alterations to individual graphs. I'm hoping for a solution that applies to all the facet wrapped graphs.
Minor Questions (possibly for a separate post):
- How can I add a legend to each of the facet wrapped graphs? How
can I change the title of the legend to read "Weekly Average"? How
can the shape/color of the lines/points be varied and then reported
in one single legend?
there's expand_limits(y=0), which essentially adds a dummy layer with invisible geom_blank only to stretch the scales.

How can I change the colors in a ggplot2 density plot?

Summary: I want to choose the colors for a ggplot2() density distribution plot without losing the automatically generated legend.
Details: I have a dataframe created with the following code (I realize it is not elegant but I am only learning R):
cands<-scan("human.i.cands.degnums")
non<-scan("human.i.non.degnums")
df<-data.frame(grp=factor(c(rep("1. Candidates", each=length(cands)),
rep("2. NonCands",each=length(non)))), val=c(cands,non))
I then plot their density distribution like so:
library(ggplot2)
ggplot(df, aes(x=val,color=grp)) + geom_density()
This produces the following output:
I would like to choose the colors the lines appear in and cannot for the life of me figure out how. I have read various other posts on the site but to no avail. The most relevant are:
Changing color of density plots in ggplot2
Overlapped density plots in ggplot2
After searching around for a while I have tried:
## This one gives an error
ggplot(df, aes(x=val,colour=c("red","blue"))) + geom_density()
Error: Aesthetics must either be length one, or the same length as the dataProblems:c("red", "blue")
## This one produces a single, black line
ggplot(df, aes(x=val),colour=c("red","green")) + geom_density()
The best I've come up with is this:
ggplot() + geom_density(aes(x=cands),colour="blue") + geom_density(aes(x=non),colour="red")
As you can see in the image above, that last command correctly changes the colors of the lines but it removes the legend. I like ggplot2's legend system. It is nice and simple, I don't want to have to fiddle about with recreating something that ggplot is clearly capable of doing. On top of which, the syntax is very very ugly. My actual data frame consists of 7 different groups of data. I cannot believe that writing + geom_density(aes(x=FOO),colour="BAR") 7 times is the most elegant way of coding this.
So, if all else fails I will accept with an answer that tells me how to get the legend back on to the 2nd plot. However, if someone can tell me how to do it properly I will be very happy.
set.seed(45)
df <- data.frame(x=c(rnorm(100), rnorm(100, mean=2, sd=2)), grp=rep(1:2, each=100))
ggplot(data = df, aes(x=x, color=factor(grp))) + geom_density() +
scale_color_brewer(palette = "Set1")
ggplot(data = df, aes(x=x, color=factor(grp))) + geom_density() +
scale_color_brewer(palette = "Set3")
gives me same plots with different sets of colors.
Provide vector containing colours for the "values" argument to map discrete values to manually chosen visual ones:
ggplot(df, aes(x=val,color=grp)) +
geom_density() +
scale_color_manual(values=c("red", "blue"))
To choose any colour you wish, enter the hex code for it instead:
ggplot(df, aes(x=val,color=grp)) +
geom_density() +
scale_color_manual(values=c("#f5d142", "#2bd63f")) # yellow/green

transform y axis to percents ggplot

I have used a stacked bar chart (with coord_flip) to try to compare distributions (this is one a several techniques I'm playing with) for a control and treatment group for pre and post test. Here is the plot:
and here is the code (Sorry it's not reproducible with no data set. If this is a problem I'll make up a reproducible data set as I can't share the real data):
m4 <- ggplot(data=v, aes(x=trt, fill=value))
m5 <- m4 + geom_bar() + coord_flip() +
facet_grid(time~type) + scale_fill_grey()
How can I change the y axis (which is actually on the bottom dues to coord_flip) to percents so every bar is equal in length? So I want counts to become percents. I need some sort of transformation that I'm betting ggplot has or could easily be created and applied some how.
You probably just want position_fill, by setting,
+ geom_bar(position = "fill")

Resources