Is it possible to put space between stacks in ggplot2 stacked bar? - r

I took this example from here:
DF <- read.table(text="Rank F1 F2 F3
1 500 250 50
2 400 100 30
3 300 155 100
4 200 90 10", header=TRUE)
library(reshape2)
DF1 <- melt(DF, id.var="Rank")
library(ggplot2)
ggplot(DF1, aes(x = Rank, y = value, fill = variable)) +
geom_bar(stat = "identity")
Is it possible to create a stacked bar such as the following graph using ggplot2? I do not want to differentiate stacks by different colors.
EDIT: Based on Pascal's comments,
ggplot(DF1, aes(x = Rank, y = value)) +
geom_bar(stat = "identity",lwd=2, color="white")
I still have the white borders for the bars.

This is the closest I could get to your example figure. It is not much of an improvement beyond what you've already sorted but puts less of an emphasis on the white bar borders on the grey background.
library(ggplot2)
p <- ggplot(DF1, aes(x = Rank, y = value, group = variable))
p <- p + geom_bar(stat = "identity", position = "stack", lwd = 1.5,
width = 0.5, colour = "white", fill = "black")
p <- p + theme_classic()
p <- p + theme(axis.text.x = element_text(angle = 90, vjust = 0.5))
p
That produces:
If you want to keep the grey background you can find out exactly what shade of grey it is and use that colour for the line while removing the background grids (this is not the right shade).
p <- ggplot(DF1, aes(x = Rank, y = value))
p <- p + geom_bar(stat = "identity", position = "stack", lwd = 1.5,
width = 0.5, colour = "grey", fill = "black")
p <- p + theme(panel.grid = element_blank())
p
An issue with this solution is that very small groups will not be seen (e.g., when Rank = 4 variable F3 = 10; this small value is completely covered by the white bar outline).
Your sample data:
DF1 <- structure(list(Rank = c(1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L,
3L, 4L), variable = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
3L, 3L, 3L, 3L), .Label = c("F1", "F2", "F3"), class = "factor"),
value = c(500L, 400L, 300L, 200L, 250L, 100L, 155L, 90L,
50L, 30L, 100L, 10L)), row.names = c(NA, -12L), .Names = c("Rank",
"variable", "value"), class = "data.frame")

Related

How to automatically adjust the width of each facet for facet_wrap?

I want to plot a boxplot using ggplot2, and i have more than one facet, each facet has different terms, as follows:
library(ggplot2)
p <- ggplot(
data=Data,
aes(x=trait,y=mean)
)
p <- p+facet_wrap(~SP,scales="free",nrow=1)
p <- p+geom_boxplot(aes(fill = Ref,
lower = mean - sd,
upper = mean + sd,
middle = mean,
ymin = min,
ymax = max,
width=c(rep(0.8/3,3),rep(0.8,9))),
lwd=0.5,
stat="identity")
as showed, the width of box in different facet is not the same, is there any way to adjust all the box at a same scale? I had tried to use facet_grid, it can automatically change the width of facets, but all facets share the same y axis.
Data
Data <- structure(list(SP = structure(c(3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L), .Label = c("Human", "Cattle", "Horse", "Maize"
), class = "factor"), Ref = structure(c(3L, 2L, 1L, 3L, 3L, 3L,
2L, 2L, 2L, 1L, 1L, 1L), .Label = c("LMM", "Half", "Adoptive"
), class = "factor"), trait = structure(c(11L, 11L, 11L, 14L,
13L, 12L, 14L, 13L, 12L, 14L, 13L, 12L), .Label = c("cad", "ht",
"t2d", "bd", "cd", "ra", "t1d", "fpro", "mkg", "scs", "coat colour",
"ywk", "ssk", "gdd"), class = "factor"), min = c(0.324122039,
0.336486555, 0.073152049, 0.895455441, 0.849944623, 0.825248005,
0.890413591, 0.852385351, 0.826470308, 0.889139116, 0.838256672,
0.723753592), max = c(0.665536838, 0.678764774, 0.34033228, 0.919794865,
0.955018001, 0.899903826, 0.913350912, 0.957305688, 0.89843716,
0.911257005, 0.955312678, 0.817489555), mean = c(0.4919168555,
0.5360103372, 0.24320509565, 0.907436221, 0.9057516121, 0.8552899502,
0.9035394117, 0.9068819173, 0.8572309823, 0.90125638965, 0.90217769835,
0.7667208778), sd = c(0.0790133656517775, 0.09704320004497, 0.0767552215753863,
0.00611921020505611, 0.0339614482273291, 0.0199389195311925,
0.00598633573504195, 0.0332634006653858, 0.0196465508521771,
0.00592476494699222, 0.0348144156099722, 0.0271827880539459)), .Names = c("SP",
"Ref", "trait", "min", "max", "mean", "sd"), class = "data.frame", row.names = c(10L,
11L, 12L, 34L, 35L, 36L, 37L, 38L, 39L, 40L, 41L, 42L))
While u/z-lin's answer works, there is a far simpler solution. Switch from facet_wrap(...) to use facet_grid(...). With facet_grid, you don't need to specify rows and columns. You are still able to specify scales= (which allows automatic adjustment of axis scales for each facet if wanted), but you can also specify space=, which does the same thing, but with the scaling of the overall facet width. This is what you want. Your function call is now something like this:
ggplot(Data, aes(x = trait, y = mean)) +
geom_boxplot(aes(
fill = Ref, lower = mean-sd, upper = mean+sd, middle = mean,
ymin = min, ymax = max),
lwd = 0.5, stat = "identity") +
facet_grid(. ~ SP, scales = "free", space='free') +
scale_x_discrete(expand = c(0, 0.5)) +
theme_bw()
Some more description of layout of facets can be found here.
As #cdtip mentioned, this does not allow for independent y scales for each facet, which is what the OP asked for initially. Luckily, there is also a simple solution for this, which utilizes facet_row() from the ggforce package:
library(ggforce)
# same as above without facet_grid call..
p <- ggplot(Data, aes(x = trait, y = mean)) +
geom_boxplot(aes(
fill = Ref, lower = mean-sd, upper = mean+sd, middle = mean,
ymin = min, ymax = max),
lwd = 0.5, stat = "identity") +
scale_x_discrete(expand = c(0, 0.5)) +
theme_bw()
p + ggforce::facet_row(vars(SP), scales = 'free', space = 'free')
You can adjust facet widths after converting the ggplot object to a grob:
# create ggplot object (no need to manipulate boxplot width here.
# we'll adjust the facet width directly later)
p <- ggplot(Data,
aes(x = trait, y = mean)) +
geom_boxplot(aes(fill = Ref,
lower = mean - sd,
upper = mean + sd,
middle = mean,
ymin = min,
ymax = max),
lwd = 0.5,
stat = "identity") +
facet_wrap(~ SP, scales = "free", nrow = 1) +
scale_x_discrete(expand = c(0, 0.5)) + # change additive expansion from default 0.6 to 0.5
theme_bw()
# convert ggplot object to grob object
gp <- ggplotGrob(p)
# optional: take a look at the grob object's layout
gtable::gtable_show_layout(gp)
# get gtable columns corresponding to the facets (5 & 9, in this case)
facet.columns <- gp$layout$l[grepl("panel", gp$layout$name)]
# get the number of unique x-axis values per facet (1 & 3, in this case)
x.var <- sapply(ggplot_build(p)$layout$panel_scales_x,
function(l) length(l$range$range))
# change the relative widths of the facet columns based on
# how many unique x-axis values are in each facet
gp$widths[facet.columns] <- gp$widths[facet.columns] * x.var
# plot result
grid::grid.draw(gp)
In general, you can determine the width of a box plot in ggplot like so:
ggplot(data= df, aes(x = `some x`, y = `some y`)) + geom_boxplot(width = `some witdth`)
In your case, you might consider setting the width of all the box plots to the range of x divided by the maximum number of elements (in the leftmost figure).

Transform y axis in bar plot using scale_y_log10()

Using the data.frame below, I want to have a bar plot with y axis log transformed.
I got this plot
using this code
ggplot(df, aes(x=id, y=ymean , fill=var, group=var)) +
geom_bar(position="dodge", stat="identity",
width = 0.7,
size=.9)+
geom_errorbar(aes(ymin=ymin,ymax=ymax),
size=.25,
width=.07,
position=position_dodge(.7))+
theme_bw()
to log transform y axis to show the "low" level in B and D which is close to zero, I used
+scale_y_log10()
which resulted in
Any suggestions how to transform y axis of the first plot?
By the way, some values in my data is close to zero but none of it is zero.
UPDATE
Trying this suggested answer by #computermacgyver
ggplot(df, aes(x=id, y=ymean , fill=var, group=var)) +
geom_bar(position="dodge", stat="identity",
width = 0.7,
size=.9)+
scale_y_log10("y",
breaks = trans_breaks("log10", function(x) 10^x),
labels = trans_format("log10", math_format(10^.x)))+
geom_errorbar(aes(ymin=ymin,ymax=ymax),
size=.25,
width=.07,
position=position_dodge(.7))+
theme_bw()
I got
DATA
dput(df)
structure(list(id = structure(c(7L, 7L, 7L, 1L, 1L, 1L, 2L, 2L,
2L, 6L, 6L, 6L, 5L, 5L, 5L, 3L, 3L, 3L, 4L, 4L, 4L), .Label = c("A",
"B", "C", "D", "E", "F", "G"), class = "factor"), var = structure(c(1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L,
3L, 1L, 2L, 3L), .Label = c("high", "medium", "low"), class = "factor"),
ymin = c(0.189863418, 0.19131948, 0.117720496, 0.255852069,
0.139624146, 0.048182771, 0.056593774, 0.037262727, 0.001156667,
0.024461299, 0.026203592, 0.031913077, 0.040168571, 0.035235902,
0.019156667, 0.04172913, 0.03591233, 0.026405094, 0.019256055,
0.011310755, 0.000412414), ymax = c(0.268973856, 0.219709677,
0.158936508, 0.343307692, 0.205225352, 0.068857143, 0.06059596,
0.047296296, 0.002559633, 0.032446541, 0.029476821, 0.0394,
0.048959184, 0.046833333, 0.047666667, 0.044269231, 0.051,
0.029181818, 0.03052381, 0.026892857, 0.001511628), ymean = c(0.231733739333333,
0.204891473333333, 0.140787890333333, 0.295301559666667,
0.173604191666667, 0.057967681, 0.058076578, 0.043017856,
0.00141152033333333, 0.0274970166666667, 0.0273799226666667,
0.0357511486666667, 0.0442377366666667, 0.0409452846666667,
0.0298284603333333, 0.042549019, 0.0407020586666667, 0.0272998796666667,
0.023900407, 0.016336106, 0.000488014)), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -21L), .Names = c("id",
"var", "ymin", "ymax", "ymean"))
As #Miff has written bars are generally not useful on a log scale. With barplots, we compare the height of the bars to one another. To do this, we need a fixed point from which to compare, usually 0, but log(0) is negative infinity.
So, I would strongly suggest that you consider using geom_point() instead of geom_bar(). I.e.,
ggplot(df, aes(x=id, y=ymean , color=var)) +
geom_point(position=position_dodge(.7))+
scale_y_log10("y",
breaks = trans_breaks("log10", function(x) 10^x),
labels = trans_format("log10", math_format(10^.x)))+
geom_errorbar(aes(ymin=ymin,ymax=ymax),
size=.25,
width=.07,
position=position_dodge(.7))+
theme_bw()
If you really, really want bars, then you should use geom_rect instead of geom_bar and set your own baseline. That is, the baseline for geom_bar is zero but you will have to invent a new baseline in a log scale. Your Plot 1 seems to use 10^-7.
This can be accomplished with the following, but again, I consider this a really bad idea.
ggplot(df, aes(xmin=as.numeric(id)-.4,xmax=as.numeric(id)+.4, x=id, ymin=10E-7, ymax=ymean, fill=var)) +
geom_rect(position=position_dodge(.8))+
scale_y_log10("y",
breaks = trans_breaks("log10", function(x) 10^x),
labels = trans_format("log10", math_format(10^.x)))+
geom_errorbar(aes(ymin=ymin,ymax=ymax),
size=.25,
width=.07,
position=position_dodge(.8))+
theme_bw()
If you need bars flipped, maybe calculate your own log10(y), see example:
library(ggplot2)
library(dplyr)
# make your own log10
dfPlot <- df %>%
mutate(ymin = -log10(ymin),
ymax = -log10(ymax),
ymean = -log10(ymean))
# then plot
ggplot(dfPlot, aes(x = id, y = ymean, fill = var, group = var)) +
geom_bar(position = "dodge", stat = "identity",
width = 0.7,
size = 0.9)+
geom_errorbar(aes(ymin = ymin, ymax = ymax),
size = 0.25,
width = 0.07,
position = position_dodge(0.7)) +
scale_y_continuous(name = expression(-log[10](italic(ymean)))) +
theme_bw()
Firstly, don't do it! The help file from ?geom_bar says:
A bar chart uses height to represent a value, and so the base of the
bar must always be shown to produce a valid visual comparison. Naomi
Robbins has a nice article on this topic. This is why it doesn't make
sense to use a log-scaled y axis with a bar chart.
To give a concrete example, the following is a way of producing the graph you want, but a larger k will also be correct but produce a different plot visually.
k<- 10000
ggplot(df, aes(x=id, y=ymean*k , fill=var, group=var)) +
geom_bar(position="dodge", stat="identity",
width = 0.7,
size=.9)+
geom_errorbar(aes(ymin=ymin*k,ymax=ymax*k),
size=.25,
width=.07,
position=position_dodge(.7))+
theme_bw() + scale_y_log10(labels=function(x)x/k)
k=1e4
k=1e6

Align points summarized by a variable with points dodged by the same variable

I have data from two different data sets. The first, dat1 contains multiple points. The 2nd dat2 contains only the max values for each Season-Species group in dat1. I am trying to plot dat1 and then want to plot larger shapes that highlight the max values for each Season-Species group, that is, dat2.
Data:
library(ggplot2)
library(dplyr)
dat1 <- structure(list(Season = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 2L, 2L), .Label = c("Summer", "Winter"), class = "factor"),
Species = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L,
2L, 2L, 2L), .Label = c("BHS", "MTG"), class = "factor"),
CovGrain = structure(c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L,
1L, 2L, 3L), .Label = c("CanCov_30", "CanCov_500", "CanCov_1000",
"NDVI_30", "NDVI_500", "NDVI_1000", "Slope_30", "Slope_500",
"Slope_1000", "SlopeVar_30", "SlopeVar_500", "SlopeVar_1000"
), class = "factor"), Count = c(4L, 19L, 4L, 5L, 3L, 14L,
14L, 9L, 9L, 4L, 10L, 9L)), .Names = c("Season", "Species",
"CovGrain", "Count"), class = "data.frame", row.names = c(1L,
2L, 3L, 14L, 15L, 16L, 30L, 31L, 32L, 45L, 46L, 47L))
dat2 <- dat1 %>% group_by(Season, Species) %>%
filter(Count == max(Count)) %>% as.data.frame()
ggplot(dat1, aes(x = CovGrain, y = Count)) +
geom_point(aes(fill = Species, color = Species),
alpha = 0.5, stroke = 3, size = 3, position=position_dodge(0.5)) +
facet_wrap(~Season, scales = "free_x") +
scale_shape_manual(values = c(21,22,24)) +
scale_fill_manual(values=c("blue", "red")) +
geom_point(data = dat2, aes(x = CovGrain, y = Count), shape = 23,
stroke = 2, size = 6, position=position_dodge(0.5)) +
theme_bw()
In the plot below I want the black triangles to be correctly dodged so that they outline the largest point of each group.
Any suggestions are appreciated!
Create a boolean variable in the original data which indicates if 'Count' is at maximum, grouped by 'Season' and 'Species'. Use scale_alpha_manual to set alpha to 0 for FALSE (i.e. "Count" not at max). Dodge by "Species" using group = Species.
dat1 <- dat1 %>% group_by(Season, Species) %>%
mutate(max_count = Count == max(Count))
pos <- position_dodge(0.5)
ggplot(dat1, aes(x = CovGrain, y = Count)) +
geom_point(aes(color = Species), position = pos) +
geom_point(aes(alpha = max_count, group = Species), shape = 23, size = 6, position = pos) +
facet_wrap(~ Season) +
scale_alpha_manual(values = c(0, 1), guide = "none") +
theme_bw()
This is a little bit hacky, and there is probably a better solution, but one way is to create a new x variable with your own randomness. The hacky part comes from first doing geom_point(size = -1) to get it to maintain the x-axis. So, not elegant, by any means, but gets you what you want, I think.
dat1$id <- 1:nrow(dat1)
dat2 <- dat1 %>%
group_by(Season, Species, Cov) %>%
filter(Count == max(Count)) %>%
as.data.frame()
randomness <- rnorm(nrow(dat1), 0, 0.5)
dat1$new_x <- as.integer(dat1$CovGrain) + randomness
dat2$new_x <- as.integer(dat2$CovGrain) + randomness[dat2$id]
ggplot(dat1, aes(x = CovGrain, y = Count)) +
geom_point(size = -1) +
geom_point(aes(x = new_x, fill = Species, color = Species),
alpha = 0.5, stroke = 3, size = 3) +
facet_wrap(~Season, scales = "free_x") +
scale_shape_manual(values = c(21,22,24)) +
scale_fill_manual(values=c("blue", "red")) +
geom_point(data = dat2, aes(x = new_x, y = Count), shape = 23,
stroke = 2, size = 6) +
theme_bw()

how to change the color in geom_point or lines in ggplot [duplicate]

This question already has an answer here:
ggplot2: How to specify multiple fill colors for points that are connected by lines of different colors
(1 answer)
Closed 5 years ago.
I have a data like this
data<- structure(list(sample = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("A",
"B"), class = "factor"), y = c(0.99999652, 0.99626012, 0.94070452,
0.37332406, 0.57810894, 0.37673758, 0.22784684, 0.35358141, 0.21253558,
0.17715703, 0.99999652, 0.86403956, 0.64054516, 0.18448824, 0.40362691,
0.10791682, 0.06985696, 0.07384465, 0.0433271, 0.02875159), time = c(100L,
150L, 170L, 180L, 190L, 220L, 260L, 270L, 300L, 375L, 100L, 150L,
170L, 180L, 190L, 220L, 260L, 270L, 300L, 375L), x = c(0.9999965,
0.9981008, 0.9940164, 1.0842966, 0.9412978, 1.0627907, 0.9135079,
1.1982235, 0.9194105, 0.9361713, 0.9999965, 1.0494051, 0.9526752,
1.1594711, 0.9827104, 1.0223711, 1.1419197, 1.0328598, 0.6015229,
0.3745817)), .Names = c("sample", "y", "time", "x"), class = "data.frame", row.names = c(NA,
-20L))
I am interested in plotting it with a costumed color like black and red
I can plot it with two random different color like this but the problem is that
ggplot() +
geom_point(data = data, aes(x = time, y = y, color = sample),size=4)
if I want to assign the first one (A) to black and the (B) to red. how can I do that?
You could use scale_color_manual:
ggplot() +
geom_point(data = data, aes(x = time, y = y, color = sample),size=4) +
scale_color_manual(values = c("A" = "black", "B" = "red"))
Per OP's comment, to get lines with the same color as the points you could do:
ggplot(data = data, aes(x = time, y = y, color = sample)) +
geom_point(size=4) +
geom_line(aes(group = sample)) +
scale_color_manual(values = c("A" = "black", "B" = "red"))
I would do it like this (you can also use hexidecimal colors instead of red, black)
data <- data %>%
mutate(Color = ifelse(sample == "A", "black",
ifelse(sample == "B", "red", "none")))
ggplot() +
geom_point(data = data, aes(x = time, y = y, color = Color),size=4)+
scale_color_identity()

Custom legend for multi layer geom_bar plot

R version 3.1.1 (2014-07-10) Platform: i386-w64-mingw32/i386 (32-bit)
I am working on a barplot with ggplot2. The aim is to have a combination of a stacked and dodged barplot for the data. My problem is to include a legend, which includes both layers or shows them separately.
Data:
df <- structure(list(year = structure(c(1L, 2L, 3L, 4L, 5L, 6L, 7L,
1L, 2L, 3L, 4L, 5L, 6L, 7L), .Label = c("2008", "2009", "2010",
"2011", "2012", "2013", "2014"), class = "factor"), product = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("a",
"b"), class = "factor"), total = c(1663L, 1344L, 1844L, 444L,
1336L, 897L, 655L, 3433L, 3244L, 2044L, 3344L, 1771L, 1410L,
726L), partial = c(1663L, 1344L, 1844L, 444L, 949L, 302L, 5L,
3433L, 3244L, 2044L, 3344L, 1476L, 1158L, 457L)), .Names = c("year",
"product", "total", "partial"), row.names = c(NA, -14L), class = "data.frame")
The plan was, to plot two geom_bar layers to combine dodge and stacked. The first layer is the total amount, the second layer is the partial amount. The alpha value for the first layer is reduced to see the difference between the two layers. So far it worked.
Example:
ggplot(df, aes(x = year))+
geom_bar(aes(y = total, fill = product), alpha= 0.3, stat = "identity", position = "dodge", width = 0.3)+
geom_bar(aes(y = partial, fill = product), alpha= 1, stat = "identity", position = "dodge", width = 0.3)
Now the legend is not sufficiant. It shows the colour of fill = product and is not sensitive to the alpha value of the first layer.
My approach was to use scale_fill_manual and manually add a new lable with a new colour, which did not worked.
My idea:
ggplot(df, aes(x = year))+
geom_bar(aes(y = total, fill = product), alpha= 0.3, stat = "identity", position = "dodge", width = 0.3)+
geom_bar(aes(y = partial, fill = product), alpha= 1, stat = "identity", position = "dodge", width = 0.3)+
scale_fill_manual(name = "",
values=c("red", "black","blue"),
labels=c("a","b","test"))
Thank you for any help on my problem!
Try to use different fill values for total and partial data.
Quick and dirty solution:
ggplot(df, aes(x = year))+
geom_bar(aes(y = total, fill = factor(as.numeric(product))), alpha= 0.3, stat = "identity", position = "dodge", width = 0.3) +
geom_bar(aes(y = partial, fill = factor(as.numeric(product) * 3)), alpha= 1, stat = "identity", position = "dodge", width = 0.3) +
scale_fill_manual(name = "", values=c("red", "black","blue", "green"), labels=c("A","B","Partial A", "Partial B"))
Tested on
R x64 3.2.2

Resources