Transform y axis in bar plot using scale_y_log10() - r

Using the data.frame below, I want to have a bar plot with y axis log transformed.
I got this plot
using this code
ggplot(df, aes(x=id, y=ymean , fill=var, group=var)) +
geom_bar(position="dodge", stat="identity",
width = 0.7,
size=.9)+
geom_errorbar(aes(ymin=ymin,ymax=ymax),
size=.25,
width=.07,
position=position_dodge(.7))+
theme_bw()
to log transform y axis to show the "low" level in B and D which is close to zero, I used
+scale_y_log10()
which resulted in
Any suggestions how to transform y axis of the first plot?
By the way, some values in my data is close to zero but none of it is zero.
UPDATE
Trying this suggested answer by #computermacgyver
ggplot(df, aes(x=id, y=ymean , fill=var, group=var)) +
geom_bar(position="dodge", stat="identity",
width = 0.7,
size=.9)+
scale_y_log10("y",
breaks = trans_breaks("log10", function(x) 10^x),
labels = trans_format("log10", math_format(10^.x)))+
geom_errorbar(aes(ymin=ymin,ymax=ymax),
size=.25,
width=.07,
position=position_dodge(.7))+
theme_bw()
I got
DATA
dput(df)
structure(list(id = structure(c(7L, 7L, 7L, 1L, 1L, 1L, 2L, 2L,
2L, 6L, 6L, 6L, 5L, 5L, 5L, 3L, 3L, 3L, 4L, 4L, 4L), .Label = c("A",
"B", "C", "D", "E", "F", "G"), class = "factor"), var = structure(c(1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L,
3L, 1L, 2L, 3L), .Label = c("high", "medium", "low"), class = "factor"),
ymin = c(0.189863418, 0.19131948, 0.117720496, 0.255852069,
0.139624146, 0.048182771, 0.056593774, 0.037262727, 0.001156667,
0.024461299, 0.026203592, 0.031913077, 0.040168571, 0.035235902,
0.019156667, 0.04172913, 0.03591233, 0.026405094, 0.019256055,
0.011310755, 0.000412414), ymax = c(0.268973856, 0.219709677,
0.158936508, 0.343307692, 0.205225352, 0.068857143, 0.06059596,
0.047296296, 0.002559633, 0.032446541, 0.029476821, 0.0394,
0.048959184, 0.046833333, 0.047666667, 0.044269231, 0.051,
0.029181818, 0.03052381, 0.026892857, 0.001511628), ymean = c(0.231733739333333,
0.204891473333333, 0.140787890333333, 0.295301559666667,
0.173604191666667, 0.057967681, 0.058076578, 0.043017856,
0.00141152033333333, 0.0274970166666667, 0.0273799226666667,
0.0357511486666667, 0.0442377366666667, 0.0409452846666667,
0.0298284603333333, 0.042549019, 0.0407020586666667, 0.0272998796666667,
0.023900407, 0.016336106, 0.000488014)), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -21L), .Names = c("id",
"var", "ymin", "ymax", "ymean"))

As #Miff has written bars are generally not useful on a log scale. With barplots, we compare the height of the bars to one another. To do this, we need a fixed point from which to compare, usually 0, but log(0) is negative infinity.
So, I would strongly suggest that you consider using geom_point() instead of geom_bar(). I.e.,
ggplot(df, aes(x=id, y=ymean , color=var)) +
geom_point(position=position_dodge(.7))+
scale_y_log10("y",
breaks = trans_breaks("log10", function(x) 10^x),
labels = trans_format("log10", math_format(10^.x)))+
geom_errorbar(aes(ymin=ymin,ymax=ymax),
size=.25,
width=.07,
position=position_dodge(.7))+
theme_bw()
If you really, really want bars, then you should use geom_rect instead of geom_bar and set your own baseline. That is, the baseline for geom_bar is zero but you will have to invent a new baseline in a log scale. Your Plot 1 seems to use 10^-7.
This can be accomplished with the following, but again, I consider this a really bad idea.
ggplot(df, aes(xmin=as.numeric(id)-.4,xmax=as.numeric(id)+.4, x=id, ymin=10E-7, ymax=ymean, fill=var)) +
geom_rect(position=position_dodge(.8))+
scale_y_log10("y",
breaks = trans_breaks("log10", function(x) 10^x),
labels = trans_format("log10", math_format(10^.x)))+
geom_errorbar(aes(ymin=ymin,ymax=ymax),
size=.25,
width=.07,
position=position_dodge(.8))+
theme_bw()

If you need bars flipped, maybe calculate your own log10(y), see example:
library(ggplot2)
library(dplyr)
# make your own log10
dfPlot <- df %>%
mutate(ymin = -log10(ymin),
ymax = -log10(ymax),
ymean = -log10(ymean))
# then plot
ggplot(dfPlot, aes(x = id, y = ymean, fill = var, group = var)) +
geom_bar(position = "dodge", stat = "identity",
width = 0.7,
size = 0.9)+
geom_errorbar(aes(ymin = ymin, ymax = ymax),
size = 0.25,
width = 0.07,
position = position_dodge(0.7)) +
scale_y_continuous(name = expression(-log[10](italic(ymean)))) +
theme_bw()

Firstly, don't do it! The help file from ?geom_bar says:
A bar chart uses height to represent a value, and so the base of the
bar must always be shown to produce a valid visual comparison. Naomi
Robbins has a nice article on this topic. This is why it doesn't make
sense to use a log-scaled y axis with a bar chart.
To give a concrete example, the following is a way of producing the graph you want, but a larger k will also be correct but produce a different plot visually.
k<- 10000
ggplot(df, aes(x=id, y=ymean*k , fill=var, group=var)) +
geom_bar(position="dodge", stat="identity",
width = 0.7,
size=.9)+
geom_errorbar(aes(ymin=ymin*k,ymax=ymax*k),
size=.25,
width=.07,
position=position_dodge(.7))+
theme_bw() + scale_y_log10(labels=function(x)x/k)
k=1e4
k=1e6

Related

Faceted ggplot boxplot with different X axes by column [duplicate]

I am trying to make a faceted plot in ggplot2 where the y axis shows labels and the x axis should show line graphs with the value for each label in two different measures (which are on different scales). So far I have this:
Data <- structure(list(label = structure(
c(1L, 1L, 2L, 2L, 3L, 3L, 4L,
4L, 5L, 5L, 6L, 6L), .Label = c("A", "B", "C", "D", "E", "F"), class = "factor"),
facet = structure(c(1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L,
1L, 2L), .Label = c("A", "B"), class = "factor"), value = c(0.0108889081049711,
0.37984336540103, 0.0232500876998529, 0.777756493305787,
0.0552913920022547, 0.920194681268185, 0.0370863009011373,
0.114463779143989, 0.00536034172400832, 0.469208759721369,
0.0412159096915275, 0.587875489378348), group = c(1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1)), .Names = c("label", "facet",
"value", "group"), row.names = c(NA, -12L), class = "data.frame")
ggplot(Data, aes(x = label, y = value, group = group)) + geom_line() +
facet_grid(~ facet, scales = "free") + coord_flip()
Which creates the following plot:
The problem is that the measures are on different scales and I would prefer the A plot to have x limits from 0 to 0.1 and the B plot to have x limits from 0 to 1. I thought scales = "free" should fix this but it doesn't change the plot.
I came up with something similar to df239:
ggplot(Data, aes(y = label, x = value, group=group)) + geom_path() +
facet_wrap( ~ facet, scales = "free")
Note you have to use geom_path, and take care with the ordering of your points because just switching x and y is not the same as coord_flip (which as noted in the other answer isn't supported with facet_wrap).
Change axes orientation manually, the problem is: *ggplot2 does not currently support free scales with a non-cartesian coord or coord_flip.*
ggplot(Data, aes(y = label, x = value, group = group)) + geom_line() +
facet_grid(~ facet, scales = "free")

qplot limit of facets [duplicate]

I am trying to make a faceted plot in ggplot2 where the y axis shows labels and the x axis should show line graphs with the value for each label in two different measures (which are on different scales). So far I have this:
Data <- structure(list(label = structure(
c(1L, 1L, 2L, 2L, 3L, 3L, 4L,
4L, 5L, 5L, 6L, 6L), .Label = c("A", "B", "C", "D", "E", "F"), class = "factor"),
facet = structure(c(1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L,
1L, 2L), .Label = c("A", "B"), class = "factor"), value = c(0.0108889081049711,
0.37984336540103, 0.0232500876998529, 0.777756493305787,
0.0552913920022547, 0.920194681268185, 0.0370863009011373,
0.114463779143989, 0.00536034172400832, 0.469208759721369,
0.0412159096915275, 0.587875489378348), group = c(1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1)), .Names = c("label", "facet",
"value", "group"), row.names = c(NA, -12L), class = "data.frame")
ggplot(Data, aes(x = label, y = value, group = group)) + geom_line() +
facet_grid(~ facet, scales = "free") + coord_flip()
Which creates the following plot:
The problem is that the measures are on different scales and I would prefer the A plot to have x limits from 0 to 0.1 and the B plot to have x limits from 0 to 1. I thought scales = "free" should fix this but it doesn't change the plot.
I came up with something similar to df239:
ggplot(Data, aes(y = label, x = value, group=group)) + geom_path() +
facet_wrap( ~ facet, scales = "free")
Note you have to use geom_path, and take care with the ordering of your points because just switching x and y is not the same as coord_flip (which as noted in the other answer isn't supported with facet_wrap).
Change axes orientation manually, the problem is: *ggplot2 does not currently support free scales with a non-cartesian coord or coord_flip.*
ggplot(Data, aes(y = label, x = value, group = group)) + geom_line() +
facet_grid(~ facet, scales = "free")

Different axis limits per facet in ggplot2

I am trying to make a faceted plot in ggplot2 where the y axis shows labels and the x axis should show line graphs with the value for each label in two different measures (which are on different scales). So far I have this:
Data <- structure(list(label = structure(
c(1L, 1L, 2L, 2L, 3L, 3L, 4L,
4L, 5L, 5L, 6L, 6L), .Label = c("A", "B", "C", "D", "E", "F"), class = "factor"),
facet = structure(c(1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L,
1L, 2L), .Label = c("A", "B"), class = "factor"), value = c(0.0108889081049711,
0.37984336540103, 0.0232500876998529, 0.777756493305787,
0.0552913920022547, 0.920194681268185, 0.0370863009011373,
0.114463779143989, 0.00536034172400832, 0.469208759721369,
0.0412159096915275, 0.587875489378348), group = c(1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1)), .Names = c("label", "facet",
"value", "group"), row.names = c(NA, -12L), class = "data.frame")
ggplot(Data, aes(x = label, y = value, group = group)) + geom_line() +
facet_grid(~ facet, scales = "free") + coord_flip()
Which creates the following plot:
The problem is that the measures are on different scales and I would prefer the A plot to have x limits from 0 to 0.1 and the B plot to have x limits from 0 to 1. I thought scales = "free" should fix this but it doesn't change the plot.
I came up with something similar to df239:
ggplot(Data, aes(y = label, x = value, group=group)) + geom_path() +
facet_wrap( ~ facet, scales = "free")
Note you have to use geom_path, and take care with the ordering of your points because just switching x and y is not the same as coord_flip (which as noted in the other answer isn't supported with facet_wrap).
Change axes orientation manually, the problem is: *ggplot2 does not currently support free scales with a non-cartesian coord or coord_flip.*
ggplot(Data, aes(y = label, x = value, group = group)) + geom_line() +
facet_grid(~ facet, scales = "free")

R ggplot2 reducing bar width and spacing between bars

I've been reading posts and searching for an answer to my problem but can't find one. Here's the basic idea. I'm using ggplot to produce a stacked barchart where each bar is broken down by group and the plot is flipped on the horizontal axis. I know how to change the width of the bars using the "width" option, however reducing the bar width leaves a lot of white space between the bars. Question: how do I remove the huge amounts of space between the bars?
I've cobbled together some reproducible code using a previous question & answer that has been tailored to my needs. Any help would be appreciated!
df <- structure(list(A = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L,
3L), .Label = c("0-50,000", "50,001-250,000", "250,001-Over"), class = "factor"),
B = structure(c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), .Label = c("0-50,000",
"50,001-250,000", "250,001-Over"), class = "factor"), Freq = c(0.507713884992987,
0.258064516129032, 0.23422159887798, 0.168539325842697, 0.525280898876405,
0.306179775280899, 0.160958904109589, 0.243150684931507,
0.595890410958904)), .Names = c("A", "B", "Freq"), class = "data.frame", row.names = c(NA,
-9L))
library(ggplot2)
bp <- ggplot(data=df, aes(x=A, y=Freq))+
geom_bar(width=0.2,stat="identity",position="fill") +
theme_bw() +
theme(axis.title.y=element_blank()) +
theme(axis.text.y=element_text(size=10)) +
theme(axis.title.x=element_blank()) +
theme(legend.text=element_text(size=10)) +
theme(legend.title=element_text(size=10)) +
scale_y_continuous(labels = percent_format())
bp + geom_bar(colour="white",width=0.2,stat="identity",position="fill",show_guide=FALSE) + coord_flip() +theme(panel.grid.minor=element_blank(), panel.grid.major=element_blank())+ theme(legend.position="bottom")
You could change the aspect ratio of the whole plot using coord_equal and remove the width argument from geom_bar.
library(ggplot2)
library(scales)
ggplot(data=df, aes(x=A, y=Freq)) +
geom_bar(stat="identity",position="fill") +
theme_bw() +
theme(axis.title.y=element_blank()) +
theme(axis.text.y=element_text(size=10)) +
theme(axis.title.x=element_blank()) +
theme(legend.text=element_text(size=10)) +
theme(legend.title=element_text(size=10)) +
scale_y_continuous(labels = percent_format()) +
geom_bar(colour="white",stat="identity",position="fill",show_guide=FALSE) +
theme(panel.grid.minor=element_blank(), panel.grid.major=element_blank()) +
theme(legend.position="bottom") +
coord_equal(1/0.2) # the new command
The drawback of this approach is that it does not work with coord_flip.

Alignment of numbers on the individual bars

I have the need to place labels above bars on ggplot. I used to use the method found (HERE) but this does not appear to work anymore since my ggplot2 update as I now get the error message:
Error in continuous_scale(c("y", "ymin", "ymax", "yend", "yintercept", :
unused argument(s) (formatter = "percent")
How can I again plot numeric values above the bars when using the example:
df <- structure(list(A = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L,
3L), .Label = c("0-50,000", "50,001-250,000", "250,001-Over"), class = "factor"),
B = structure(c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), .Label = c("0-50,000",
"50,001-250,000", "250,001-Over"), class = "factor"), Freq = c(0.507713884992987,
0.258064516129032, 0.23422159887798, 0.168539325842697, 0.525280898876405,
0.306179775280899, 0.160958904109589, 0.243150684931507,
0.595890410958904)), .Names = c("A", "B", "Freq"), class = "data.frame", row.names = c(NA,
-9L))
library(ggplot2)
ggplot(data=df, aes(x=A, y=Freq))+
geom_bar(aes(fill=B), position = position_dodge()) +
geom_text(aes(label = paste(sprintf("%.1f", Freq*100), "%", sep=""),
y = Freq+0.015, x=A),
size = 3, position = position_dodge(width=0.9)) +
scale_y_continuous(formatter = "percent") +
theme_bw()
Running R 2.15 ggplot2 0.9 on a win 7 machine
The error is from the scale_y_continuous call. Formatting of labels is now handled by the labels argument. See the ggplot2 0.9.0 transition guide for more details.
There was another problem with the labels not lining up correctly; I fixed that by adding a group=B to the aesthetics for the geom_text; I'm not quite sure why this is necessary, though. I also took out x=A from the geom_text aesthetics because it was not needed (it would be inherited from the ggplot call.
library("ggplot2")
library("scales")
ggplot(data=df, aes(x=A, y=Freq))+
geom_bar(aes(fill=B), position = position_dodge()) +
geom_text(aes(label = paste(sprintf("%.1f", Freq*100), "%", sep=""),
y = Freq+0.015, group=B),
size = 3, position = position_dodge(width=0.9)) +
scale_y_continuous(labels = percent) +
theme_bw()

Resources