I am looking to "dodge" the bars of a barplot together. The following R code leaves white space between the bars. Other answers like this one show how to accomplish this for the bars part of a group, but that does not seem to apply for distinct bars per factor on the x axis.
require(ggplot2)
dat <- data.frame(a=c("A", "B", "C"), b=c(0.71, 0.94, 0.85), d=c(32, 99, 18))
ggplot(dat, aes(x= a, y = b, fill=d, width = d/sum(d))) +
geom_bar(position=position_dodge(width = 0.1), stat="identity")
Playing with the width variable changes the appearance, but it does not seem possible to get the bars to sit side by side while still retaining their meaningful difference in width (in this graph redundantly represented by the fill colour too).
I would generate my x-positions and widths first, then pass them in to the aesthetics and override to make your factor labels:
First, store the width
dat$width <-
dat$d / sum(dat$d)
Then, assuming that your data.frame is in the order you want it plotted, you can set the location as the cumulative sum of the widths. Note, however, that that cumulative sum is where you want the right edge of the bar to be, so to get the center you need to subtract half of the width:
dat$loc <-
cumsum(dat$width) - dat$width/2
Then, pass it all in to the ggplot call, setting your labels explictly:
ggplot(dat, aes(x= loc, y = b, fill=d, width = width)) +
geom_bar(stat="identity") +
scale_x_continuous(breaks = dat$loc
, labels = dat$a)
gives
I am not sure about the advisability of this appproach, but this should get the job done.
It is possible by using a continuous x axis and relabel it.
ggplot(dat, aes(x=cumsum(d/sum(d))) - d/sum(d)/2, y = b, fill=d, width=d/sum(d))) +
geom_bar(stat="identity", position=position_dodge()) +
scale_x_continuous(breaks=cumsum(dat$d/sum(dat$d)) - dat$d/sum(dat$d)/2, labels=dat$a)
Or isn't this what you where looking for
Related
I would like to have barplots in which the bars have the same width across different plots, no matter how many bars are shown. I do not want to show the plots on the same page, or arrange them with facets, grid.arrange or anything like that, but just have two plots with bars of the same width.
I could do this by just multiplying the width by the number of bars in the plot divided by the number of bars in the plot with the most bars (see example). But it would be more convenient and somewhat cleaner code if I could do this without any computations before the ggplot call.
Is there a way to specify the bar widths in a unit like lines, em, centimeters?
Or can I access the number of levels of the variable mapped to the x-aesthetic in the call to geom_col? (Note the variable mapped to the x-aesthetic changes between plots)
Or is there another simple solution?
ggplot(data.frame(x=factor(1:2), y=4:5), aes(x=x, y=y)) +
geom_col(width=0.7*2/3)
ggplot(data.frame(A=factor(1:3), y=3:5), aes(x=A, y=y)) +
geom_col(width=0.7*3/3)
AFAIK, you can not set an absolute width to geom_col()/geom_bar(), so you'd either have to precalculate the proportions and aspect ratio of the bars or use geom_segment() that takes a size argument that is absolute. These aren't internally parameterised as rectangles and don't take seperate colour and fill arguments though.
library(ggplot2)
library(patchwork)
g1 <- ggplot(data.frame(x=factor(1:2), y=4:5), aes(x=x, y=y, xend = x, yend = 0)) +
geom_segment(size = 20)
g2 <- ggplot(data.frame(A=factor(1:3), y=3:5), aes(x=A, y=y, xend=A, yend = 0)) +
geom_segment(size = 20)
g1 + g2
Probably a simple ggplot2 question.
I have a data.frame with a numeric value, a categorical (factor) value, and a character value:
library(dplyr)
set.seed(1)
df <- data.frame(log10.p.value=c(-2.5,-2.5,-2.5,-2.39,-2,-1.85,-1.6,-1.3,-1.3,-1),
direction=sample(c("up","down"),10,replace = T),
label=paste0("label",1:10),stringsAsFactors = F) %>% dplyr::arrange(log10.p.value)
df$direction <- factor(df$direction,levels=c("up","down"))
I want to plot these data as a barplot using geom_bar, where the bars are horizontal and their lengths are determined by df$log10.p.value, their color by df$direction, and the y-axis tick labels are df$label, where the bars are vertically ordered by df$log10.p.value.
As you can see df$log10.p.value are not unique, hence:
ggplot(df,aes(log10.p.value))+geom_bar(aes(fill=direction))+theme_minimal()+coord_flip()+ylab("log10(p-value)")+xlab("")
Gives me:
How do I:
Make the bars not overlap each other.
Have the same width.
Be separated by a small margin?
Have the y-axis tick labels be df$label?
Thanks
Here is one possible solution. Please note that, by default, geom_bar determines the bar length using frequency/count. So, you need to specify stat = "identity" for value mapping.
# since all of your values are negative the graph is on the left side
ggplot(df, aes(x = label, y = log10.p.value, fill = direction)) +
geom_bar(stat = "identity") +
theme_minimal() +
coord_flip() +
ylab("log10(p-value)") +
xlab("")
Being new to R, I produced very simple horizontal bar plots using ggplot2 and coord_flip().
Notably, I insert the values of the x variable at the left side of the bar by default (or at the right side if the label does not fit) using the following command:
geom_text(aes(x=TYPE, y=COUNT, ymax=COUNT, label=COUNT,
hjust=ifelse(COUNT>1000, 1.5, -0.3)),
size=3.5, position = position_dodge(width=0.8))
The problem is that, depending on the data-sets, the x values can vary significantly (e.g. dataset_1 x values can be between 1 to 200; dataset_2 x values can be between 10,000 to 100,000; ...), which causes the label of the shortest bar to be misplaced with the ifelse statement I am using (see brown bar in figure A below).
In this case I cannot just use a constant COUNT>1000 condition for all the datasets.
Figure A:
I could modify manually the value of the hjust=ifelse(COUNT>1000,...statement for each dataset.
But I was wondering if it is possible to automatically move the label outs of the bar if it does not fit between the axis and the top of the bar without modifying the value of the ifelse condition for each dataset, like in figure B below.
Figure B :
EDIT
Workaround (not perfect but better):
Placing the label at the right of the bar if the value is less than 5% of the maximum value
MAXI <- max(data[,2])
geom_text(aes(x=TYPE, y=COUNT, ymax=COUNT, label=COUNT,
hjust=ifelse((COUNT/MAXI)<0.05, -0.3, 1.3)))
Having some labels outside the bars and some inside can distort the visual encoding of magnitude as the length of the bar. Another option is to put the values in the middle of the bar but set geom_text to skip values that are small relative to the maximum bar. Or, if you want to include text for all the bar values added, you can put them below the bars in order to keep a clean visual pattern for the bar lengths. Examples of both options are below:
# Fake data
dat = data.frame(x = LETTERS[1:5], y=c(432, 1349, 10819, 5489, 12123))
ggplot(dat, aes(x, y, fill=x)) +
geom_bar(stat="identity") +
geom_text(aes(label=ifelse(y < 0.05*max(dat$y), "", format(y, big.mark=",")), y=0.5*y),
colour="white") +
coord_flip(xlim=c(0.4,5.6), ylim=c(0, 1.03*max(dat$y)), expand=FALSE) +
guides(fill=FALSE)
ggplot(dat, aes(x, y, fill=x)) +
geom_hline(yintercept=0, lwd=0.3, colour="grey40") +
geom_bar(stat="identity") +
geom_text(aes(label=format(y, big.mark=","), y=-0.01*max(dat$y)),
size=3.5, hjust=1) +
coord_flip(ylim = c(-0.04*max(dat$y), max(dat$y))) +
guides(fill=FALSE)
geom_bar seems to work best when it has fixed width bars - even the spaces between bars seem to be determined by width, according to the documentation. When you have variable widths, however, it does not respond as I would expect, leading to overlaps or gaps between the different bars (as shown here).
To see what I mean, please try this very simple reproducible example:
x <- c("a","b","c")
w <- c(1.2, 1.3, 4) # variable widths
y <- c(9, 10, 6) # variable heights
ggplot() +
geom_bar(aes(x = x, y = y, width = w, fill=x),
stat="identity", position= "stack")
What I really want is for the different bars to be just touching, but not overlapping, like in a histogram.
I've tried adding position= "stack", "dodge", and "fill, but none work. Does the solution lie in geom_histogram or am I just not using geom_bar correctly?
P.s. to see the issue with gaps, try replacing 4 with 0.5 in the above code and see the outcome.
Seems that there isn't any straightforward solution, so we should treat x-axis as continuous in terms of w and manually compute required positions for ticks and bar centers (this is useful):
# pos is an explicit formula for bar centers that we are interested in:
# last + half(previous_width) + half(current_width)
pos <- 0.5 * (cumsum(w) + cumsum(c(0, w[-length(w)])))
ggplot() +
geom_bar(aes(x = pos, width = w, y = y, fill = x), stat = "identity") +
scale_x_continuous(labels = x, breaks = pos)
You can now do this with the mekko package: https://cran.r-project.org/web/packages/mekko/vignettes/mekko-vignette.html
I know, 3D Barcharts are a sin. But i´m asked to do them and as a trade-off i suggested to only make a border with a slightly darker color than the bar´s on the top and the right side of the bar. Like that, the bars would have some kind of "shadow" (urgh) but at least you still would be able to compare them.
Is there any way to do this?
ggplot(diamonds, aes(clarity)) + geom_bar()
Another possibility, using two sets of geom_bar. The first set, the green ones, are made slightly higher and offset to the right. I borrow the data from #Didzis Elferts.
ggplot(data = df2) +
geom_bar(aes(x = as.numeric(clarity) + 0.1, y = V1 + 100),
width = 0.8, fill = "green", stat = "identity") +
geom_bar(aes(x = as.numeric(clarity), y = V1),
width = 0.8, stat = "identity") +
scale_x_continuous(name = "clarity",
breaks = as.numeric(df2$clarity),
labels = levels(df2$clarity))+
ylab("count")
As you already said - 3D barcharts are "bad". You can't do it directly in ggplot2 but here is a possible workaround for this.
First, make new data frame that contains levels of clarity and corresponding count for each level.
library(plyr)
df2<-ddply(diamonds,.(clarity),nrow)
Then in ggplot() call use new data frame and clarity as x values and V1 (counts) as y values and add geom_blank() - this will make x axis with levels we need. Then add geom_rect() to produce shading for bars - here xmin and xmax values are made as.numeric() from clarity and constant is added - for xmin constant should be less than half of bars width and xmax constant larger than half of bars width. ymin is 0 and ymax is V1 (counts) plus some constant. Finally add geom_bar(stat="identity") above this shadow to plot actually barplot.
ggplot(df2,aes(clarity,V1)) + geom_blank()+
geom_rect(aes(xmin=as.numeric(clarity)-0.38,
xmax=as.numeric(clarity)+.5,
ymin=0,
ymax=V1+250),fill="green")+
geom_bar(width=0.8,stat="identity")