What I'm wanting to do is have non-additive, 'stacked' bars, which I understand is achieved using position_dodge, rather than "stack". However, this does not behave as I expected it to.
The closest answer to what I'm after is here, but the code on that post causes exactly the same problem I'm running into.
A very basic example:
library(ggplot2)
example <- data.frame(week_num = c(1, 1),
variable = c("x", "y"),
value = c(5, 10))
ex <- ggplot(example, aes(x = week_num, y = value, fill = variable))
ex <- ex +
geom_bar(stat = "identity", position = position_dodge(0))
print(ex)
What you get is essentially a single blue bar, representing the variable 'y' value of 10, with no sign of the 'x' value of 5:
Chart with lower value hidden
So far, the only way around this I've found is to make the width argument of position_dodge, say, 0.1, to get something like this, but that's not ideal.
Chart with lower value visible
Essentially, I want to 'front' the lower of the two values, so in this case what I'd want is a bar of height 10 (representing variable = , but with the lower half (up to 5) filled in a different colour.
One option to fix your issue would be to reorder your data. Observations are plotted in the order as they appear in your dataset. Hence, reorder you dataset so that lower values are at the end and will plotted last. Moreover, you could use position="identity" and geom_col (which is the same as geom_bar(stat="identity")):
library(ggplot2)
example <- dplyr::arrange(example, week_num, desc(value))
ggplot(example, aes(x = week_num, y = value, fill = variable)) +
geom_col(position = "identity")
Related
I have a pretty straightforward dataset consisting of a week of two totals in groups, which I'm displaying in an identity bar plot using ggplot2 (version 3.3.0).
library(ggplot2)
library(lubridate)
weeksummary <- data.frame(
Date = rep(as.POSIXct("2020-01-01") + days(0:6), 2),
Total = rpois(14, 30),
Group = c(rep("group1", 7), rep("group2", 7))
)
ggplot(data = weeksummary, mapping = aes(x = Date, y = Total, fill = Group)) +
geom_col(position = "dodge") +
geom_text(aes(label = Total), position = position_dodge(width = 0.9), size = 3)
I cannot for the life of me get this to put the numbers at the top of their own bars, been hunting around for an answer and trying everything I found with no luck, until I randomly tried this:
weeksummary$Date <- as.factor(weeksummary$Date)
But this seems unnecessary manipulation, and I'd need to make sure the dates appear in the right format and order and rewrite the additional bits that currently rely on dates... I'd rather understand what I'm doing wrong.
What you're looking for is to use as.Date.POSIXct. as.factor() works to force weeksummary$Date into a factor, but it forces the conversion of your POSIXct class into a character first (thus erasing "date"). However, you need to convert to a factor so that dodging works properly - that's the question.
You can either convert before (e.g. weeksummary$Date <- as.Date.POXIXct(weeksummary$Date)), or do it right in your plot call:
ggplot(weeksummary, aes(x = as.Date.POSIXct(Date), y = Total, fill = Group)) +
geom_col(position = 'dodge') +
geom_text(aes(label = Total, y = Total + 1),
position = position_dodge(width = 0.9), size = 3)
Giving you this:
Note: the values are different than your values, since our randomization seeds are likely not the same :)
You'll notice I nudged the labels up a bit. You can normally do this with nudge_y, but you cannot specify nudge_x or nudge_y the same time you specify a position= argument. In this case, you can just nudge by overwriting the y aesthetic.
Because geom_text inherits x aesthetics which is Date in this case, which is totally correct. You don't have to mutate your data frame, you can specify the behaviour when plotting instead
aes(x = factor(Date), y = ...),
I have the following data:
I would like to generate a bar plot that shows the frequency of each value of Var1 per each run. I want the x axis represents each run and the y axis represents the frequency of each Var1 value. To do that, I wrote the following R script:
df <- read.csv("/home/nasser/Desktop/data.csv")
g <- ggplot(df) +
geom_bar(aes(Run, Freq, fill = Var1, colour = Var1), position = "stack", stat = "identity")
The result that I got is:
The issue is that the x axis does not show each run seperately (the axis should be 1, 2, .., etc) and the legend should show each value of Var1 seperately and in a different color. Also, the bars are not so clear since it is so difficult to see the frequency of each Var1 values. In other words, the generated plot is not the normal stacked bar like the one shown in this answer
How to solve that?
You need to convert both variables to factors. Otherwise, R sees them as numerical and not categorical data.
df <- read.csv("/home/nasser/Desktop/data.csv")
g <- ggplot(df) +
geom_bar(aes(factor(Run), Freq, fill = factor(Var1), colour = factor(Var1)),
position = "stack", stat = "identity")
I would like to take a ggplot scatterplot and overlay on top of it the mean of the y-variable within evenly-spaced bins on the x-axis.
So far what I have is this:
library(tidyverse)
data(midwest)
ggplot(arrange(midwest,percollege),aes(x=percollege,y=percbelowpoverty))+
geom_point()+
stat_summary_bin(aes(x=percollege,y=percbelowpoverty),
bins=10,fun.y='mean',geom='point',col='red')
Which produces
which is basically perfect except instead of red points I would like horizontal red lines that extend from the beginning of the bin to the end of the bin.
I can sort of mimic what I want with
library(tidyverse)
data(midwest)
ggplot(arrange(midwest,percollege),aes(x=percollege,y=percbelowpoverty))+
geom_point()+
stat_summary_bin(aes(x=percollege,y=percbelowpoverty),
bins=10,fun.y='mean',geom='point',col='red',shape="-",size=50)
which gives
Which is kinda what I want, except
I have to manually set the size every time I make a new graph like this
Uh, ew.
Another approach I've tried is with geom='bar',fill=NA, which seems promising if I can somehow get it to only show the top bar without the sides or bottom of the bar.
Any tips for this? I've had little luck with setting the geom to pointrange or linerange or line (the first two I've yet to get to work, and the last just connects each point with non-horizontal lines). Kind of surprised this isn't default behavior for stat_summary_bin to be honest!
Thanks!
This should work. I think the rownames_to_column line may not be necessary, and the modify_if argument is necessary because the cut function produces strings rather than than numeric values.
midwest_sum <- midwest %>%
mutate(coll_bins = cut(percollege, breaks = 10)) %>%
group_by(coll_bins) %>%
summarise(bin_mean = mean(percbelowpoverty)) %>%
rownames_to_column(var = "bin_num") %>%
tidyr::extract(coll_bins, c("min", "max"), "\\((.*),(.*)]") %>%
modify_if(is.character, as.numeric)
ggplot()+
geom_point(data = midwest, aes(x=percollege,y=percbelowpoverty)) +
geom_errorbarh(data = midwest_sum, aes(xmin = min, xmax = max, y = bin_mean),
col = "red", size = 1)
Hope this helps!
I wouldn't often call this desired default behaviour; leaving out the sides of the bins necessarily makes it confusing where the bin boundaries actually are for points far above or below the bin means.
Anyway, here's a first attempt. We can calculate the bin boundaries based on some input parameter and then use geom_segment to draw them on the graph. geom_segment needs start and end coordinates, so bin_boundaries calculates the means of the y variable and the bounds of the bins for the x variable, and returns a call to geom_segment. This means we can simply add the output of our function to our ggplot call and it works as expected. Note the use of passing through ... so we can still use the geom parameters.
You can probably modify to use other bin width and dodge parameters instead of calculating from the bounds of your x variable, haven't thought too carefully about that. Note that the lines look different from your use of stat_summary_bin because they are centered differently and so use different points in each calculation. You might also consider a version that uses geom_step which would connect the ends of each horizontal line.
library(tidyverse)
bin_boundaries <- function(tbl, n_bins, x_var, y_var, ...) {
x_var <- enquo(x_var)
y_var <- enquo(y_var)
bin_bounds <- seq(
from = min(pull(tbl, !!x_var)),
to = max(pull(tbl, !!x_var)),
length.out = n_bins + 1)
bounds_tbl <- tbl %>%
mutate(bin_group = ntile(!!x_var, n_bins)) %>%
group_by(bin_group) %>%
summarise(!!y_var := mean(!!y_var)) %>%
mutate(bin_start = bin_bounds[1:n_bins], bin_end = bin_bounds[2:(n_bins + 1)])
geom_segment(
data = bounds_tbl,
mapping = aes(
x = bin_start, y = !!y_var,
xend = bin_end, yend = !!y_var
),
...
)
}
ggplot(midwest) +
geom_point(aes(x = percollege, y = percbelowpoverty)) +
bin_boundaries(midwest, 10, percollege, percbelowpoverty, colour = "red", size = 1)
Created on 2019-02-07 by the reprex package (v0.2.1)
I am looking to "dodge" the bars of a barplot together. The following R code leaves white space between the bars. Other answers like this one show how to accomplish this for the bars part of a group, but that does not seem to apply for distinct bars per factor on the x axis.
require(ggplot2)
dat <- data.frame(a=c("A", "B", "C"), b=c(0.71, 0.94, 0.85), d=c(32, 99, 18))
ggplot(dat, aes(x= a, y = b, fill=d, width = d/sum(d))) +
geom_bar(position=position_dodge(width = 0.1), stat="identity")
Playing with the width variable changes the appearance, but it does not seem possible to get the bars to sit side by side while still retaining their meaningful difference in width (in this graph redundantly represented by the fill colour too).
I would generate my x-positions and widths first, then pass them in to the aesthetics and override to make your factor labels:
First, store the width
dat$width <-
dat$d / sum(dat$d)
Then, assuming that your data.frame is in the order you want it plotted, you can set the location as the cumulative sum of the widths. Note, however, that that cumulative sum is where you want the right edge of the bar to be, so to get the center you need to subtract half of the width:
dat$loc <-
cumsum(dat$width) - dat$width/2
Then, pass it all in to the ggplot call, setting your labels explictly:
ggplot(dat, aes(x= loc, y = b, fill=d, width = width)) +
geom_bar(stat="identity") +
scale_x_continuous(breaks = dat$loc
, labels = dat$a)
gives
I am not sure about the advisability of this appproach, but this should get the job done.
It is possible by using a continuous x axis and relabel it.
ggplot(dat, aes(x=cumsum(d/sum(d))) - d/sum(d)/2, y = b, fill=d, width=d/sum(d))) +
geom_bar(stat="identity", position=position_dodge()) +
scale_x_continuous(breaks=cumsum(dat$d/sum(dat$d)) - dat$d/sum(dat$d)/2, labels=dat$a)
Or isn't this what you where looking for
I am making stacked bar plots with ggplot2 in R with specific bar ordering about the y-axis.
# create reproducible data
library(ggplot2)
d <- read.csv(text='Day,Location,Length,Amount
1,4,3,1.1
1,3,1,2
1,2,3,4
1,1,3,5
2,0,0,0
3,3,3,1.8
3,2,1,3.54
3,1,3,1.1',header=T)
ggplot(d, aes(x = Day, y = Length)) + geom_bar(aes(fill = Amount, order = Location), stat = "identity")
ggplot(d, aes(x = Day, y = Length)) + geom_bar(aes(fill = Amount, order = rev(Location)), stat = "identity")
The first ggplot plot shows the data in order of Location, with Location=1 nearest the x-axis and data for each increasing value of Location stacked upon the next.
The second ggplot plot shows the data in a different order, but it doesn't stack the data with the highest Location value nearest the x-axis with the data for the next highest Location stacked in the second from the x-axis position for the first bar column, like I would expect it to based on an earlier post.
This next snippet does show the data in the desired way, but I think this is an artifact of the simple and small example data set. Stacking order hasn't been specified, so I think ggplot is stacking based on values for Amount.
ggplot(d, aes(x = Day, y = Length)) + geom_bar(aes(fill = Amount), stat = "identity")
What I want is to force ggplot to stack the data in order of decreasing Location values (Location=4 nearest the x-axis, Location=3 next, ... , and Location=1 at the very top of the bar column) by calling the order = or some equivalent argument. Any thoughts or suggestions?
It seems like it should be easy because I am only dealing with numbers. It shouldn't be so hard to ask ggplot to stack the data in a way that corresponds to a column of decreasing (as you move away from the x-axis) numbers, should it?
Try:
ggplot(d, aes(x = Day, y = Length)) +
geom_bar(aes(fill = Amount, order = -Location), stat = "identity")
Notice how I swapped rev with -. Using rev does something very different: it stacks by the value for each row you happen to get if you reverse the order of values in the column Location, which could be just about anything.