How to display 0 value in a bar chart using ggplot2 - r

I have this data frame called data:
head(data)
date total_sold purchasability visibility
81 2014-05-01 3 3 3
82 2014-05-02 2 2 3
83 2014-05-03 1 2 3
84 2014-05-04 1 3 3
85 2014-05-05 3 2 3
86 2014-05-06 0 0 3
And I would like to do a bar chart with x = date and y = total_sold with a color depending on the purchasability. I this ggplot2 to do that :
bar <- ggplot(data = data, aes(x = date, fill=as.factor(purchasability),y = total_sold)) + geom_bar(stat = 'identity')
The output is very nice but the problem is that where total_sold = 0 there is not chart and thus no way to know the purchasability. Is it possible to still display a bar (maybe from 0.5 to -0.5) when total_sold = 0 ?
Thanks

You can just use geom bar, please look this code
df <- data.frame(time = factor(c("Lunch","Dinner","breakfast","test"), levels=c("Lunch","Dinner","breakfast","test")),
total_bill = c(14.89, 0,0.5,-0.5))
# Add a black outline
ggplot(data=df, aes(x=time, y=total_bill, fill=time)) + geom_bar(colour="black", stat="identity")

I'm not sure there's a simple way to go from 0.5 to -0.5 but you can easily show the 0 value as being a fraction (eg -0.1) by modifying the value in your bar= line to:
bar <- ggplot(data = data, aes(x = date, fill=as.factor(purchasability),y = sapply(total_sold, FUN=function(x) ifelse(x==0, -0.1,x) ))) + geom_bar(stat = 'identity')
This produces:
It is a little misleading to show 0 as something other than 0, but I hope this solves your problem.

Related

In ggplot2, how do I properly scale x-axis in histogram?

The Ask:
Please help me understand my conceptual error in the use of scale_x_binned() in ggplot2 as it relates to centering breaks beneath the appropriate bin in a geom_histogram().
Starting Example:
library(ggplot2)
df <- data.frame(hour = sample(seq(0,23), 150, replace = TRUE))
# The data is just the integer values of the 24-hour clock in a day. It is
# **NOT** continuous data.
ggplot(df, aes(x = hour)) +
geom_histogram(bins = 24, fill = "grey60", color = "red")
This produces a histogram with labels properly centered beneath the
bin for which it belongs, but I want to label each hour, 0 - 23.
To do that, I thought I would assign breaks using scale_x_binned()
as demonstrated below.
Now I try to add the breaks:
ggplot(df, aes(x = hour)) +
geom_histogram(bins = 24, fill = "grey60", color = "red") +
scale_x_binned(name = "Hour of Day",
breaks = seq(0,23))
#> Warning: Removed 1 rows containing missing values (`geom_bar()`).
This returns the number of labels I wanted, but they are not centered
beneath the bins as desired. I also get the warning message for missing
values associated with geom_bar().
I believe I am overwriting the bins = 24 from the geom_histogram() call when I use the scale_x_binned() call afterward, but I don't understand exactly what is causing geom_histogram() to be centered in the first case that I am wrecking with my new call. I'd really like to have that clarified as I am not seeing my error when I read the associated help pages.
EDIT:
The "Starting Example" essentially works (bins are centered) except for the number of labels I ultimately want. If you built the ggplot2 layer differently, what is the equivalent code? That is, instead of:
ggplot(df, aes(x = hour)) +
geom_histogram(bins = 24, fill = "grey60", color = "red")
the call was instead built something like:
ggplot(df, aes(x = hour)) +
geom_histogram(fill = "grey60", color = "red") +
scale_x_binned(n.breaks = 24) # I know this isn't right, but akin to this.
or maybe
ggplot(df, aes(x = hour)) +
stat_bin(bins = 24, center = 0, fill = "grey60", color = "red")
It sounds like you are looking to use non-default labeling, where you want the labels to be aligned to the midpoint of the bins instead of their boundaries, which is what the breaks define. We could do that by using a continuous scale and hiding the main breaks, but keeping the minor breaks, like below.
scale_x_binned does not have minor breaks. It only has breaks at the boundaries of the bins, so it's not obvious to me how you could place the break labels at the midpoints of the bins.
ggplot(df, aes(x = hour)) +
geom_histogram(bins = 24, fill = "grey60", color = "red") +
scale_x_continuous(name = "Hour of Day", breaks = 0:23) +
theme(axis.ticks = element_blank(),
panel.grid.major.x = element_blank())
I though the same as you, namely scale_x_discrete, but the data given to geom_histogram is assumed to be continuous, so ...
ggplot(df, aes(x = hour)) +
geom_histogram(bins = 24, fill = "grey60", color = "red") +
scale_x_continuous(breaks = 0:23)
(Doesn't require any machinations with theme.)
I wish I could tell you that I found out how geom_histogram is centering the labels, but ggproto objects exist in a cavern with too many tunnels and passages for my mind to follow.
So I took a shot at examining the plot object that I created when I produced the png graphic above:
ggplot_build(plt)
# ------------
$data
$data[[1]]
y count x xmin xmax density ncount ndensity flipped_aes PANEL group ymin ymax colour fill size linetype
1 6 6 0 -0.5 0.5 0.04000000 0.6 0.6 FALSE 1 -1 0 6 red grey60 0.5 1
2 7 7 1 0.5 1.5 0.04666667 0.7 0.7 FALSE 1 -1 0 7 red grey60 0.5 1
3 4 4 2 1.5 2.5 0.02666667 0.4 0.4 FALSE 1 -1 0 4 red grey60 0.5 1
4 5 5 3 2.5 3.5 0.03333333 0.5 0.5 FALSE 1 -1 0 5 red grey60 0.5 1
5 7 7 4 3.5 4.5 0.04666667 0.7 0.7 FALSE 1 -1 0 7 red grey60 0.5 1
#snipped remainder
So the reason the break tick-marks are centered is that the bin construction is set up so they all are centered on the breaks.
Further exploration f whats in ggplot_build results:
ls(envir=ggplot_build(plt)$layout)
#[1] "coord" "coord_params" "facet" "facet_params" "layout" "panel_params"
#[7] "panel_scales_x" "panel_scales_y" "super"
ggplot_build(plt)$layout$panel_params
#-------results
[[1]]
[[1]]$x
<ggproto object: Class ViewScale, gg>
aesthetics: x xmin xmax xend xintercept xmin_final xmax_final xlower ...
break_positions: function
break_positions_minor: function
breaks: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 ...
continuous_range: -1.7 24.7
dimension: function
get_breaks: function
get_breaks_minor: function
#---- snipped remaining outpu

How to combine vlines from one dataframe with series from another dataframe using GGPLOT2 in R

I am trying to make a graph that will plot the cumulative sum value of different customers which will reset whenever a new order is placed. When a new order is placed, it will be indicated with a DateTick = 1 and I've tried to add this to my plots with vlines. Unfortunately, the plot will only show me either the correct Vlines or the correct series lines.
The data I'm using looks something like this
> head(CUSTWP)
# A tibble: 6 x 6
# Groups: Customer [1]
Customer YearWeek `Corrected Delta` `Ordered Quantity TU` DateTick ROP
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 CustLoc1 2020-01 46 NA 0 46
2 CustLoc1 2020-02 148 NA 0 194
3 CustLoc1 2020-03 150 NA 0 344
4 CustLoc1 2020-04 186 NA 0 530
5 CustLoc1 2020-05 205 NA 0 735
6 CustLoc1 2020-06 246 NA 0 981
I used below mentioned code to create the graphs.
p <- CUSTWP[CUSTWP$DateTick==1,]
p <- p[,1:2]
vline.dat <- data.frame(z=p$Customer, vl=p$YearWeek)
ggplot(CUSTWP, aes(YearWeek,`ROP`, group=1)) + geom_line(color= 'red', size = 0.8) + geom_vline(aes(xintercept=vl), data=vline.dat, linetype=4) +
facet_grid(Customer ~ ., scales = "free_y") + theme_light() + ggtitle('Reordering Points') +
theme(axis.text.x = element_text(angle = 20, vjust = 1, hjust=0.9), text = element_text(size = 14)) +
scale_x_discrete(guide = guide_axis(check.overlap = TRUE))
When I execute the code, I get a result as can be seen in the link.
The issue with this graph is that the Vlines are the orders DateTicks for all customers rather than the DateTicks grouped by customer. I've tried a different code that somehow produces the correct graphs but also a bunch of incorrect graphs with below-mentioned code.
p <- CUSTWP[CUSTWP$DateTick==1,]
p <- p[,1:2]
vline.dat <- data.frame(z=p$Customer, vl=p$YearWeek)
ggplot(CUSTWP, aes(YearWeek,`ROP`, group=1)) + geom_line(color= 'red', size = 0.8) + geom_vline(aes(xintercept=vl), data=vline.dat, linetype=4) +
facet_grid(Customer ~ z, scales = "free_y") + theme_light() + ggtitle('Reordering Points') +
theme(axis.text.x = element_text(angle = 20, vjust = 1, hjust=0.9), text = element_text(size = 14)) +
scale_x_discrete(guide = guide_axis(check.overlap = TRUE))
The above code creates a matrix of plots but the only correct ones are the plots on the diagonal line running from top left to bottom right.
I would really appreciate your input on this as I've been stuck on this for quite some time. Thank you in advance and apologies for the incorrect posting standards, this is my first post.

Secondary axis in R not registering

ggplot(df) +
geom_bar(aes(x=Date, y=DCMTotalCV, fill=CampaignName), stat='identity', position='stack') +
geom_line(aes(x=Date, y=DCMCPA, color=CampaignName, group=as.factor(CampaignName)), na.rm = FALSE,show.legend=NA)+
scale_y_continuous(sec.axis = sec_axis(~./1000, name = "DCMTotalCV"))+
theme_bw()+
labs(
x= "Date",
y= "CPA",
title = "Daily Performance"
)
Hey everyone - so I have 2 y-axes i want to plot. geom_line is registering fine on the main y-axis but geom_bar is not registering properly on the right. I tried scaling but it's still not registering or plotting on that second axis. It looks like it's still appearing on the main y-axis so I'm wondering how to tell the plot to plot it on the second one? Sorry i'm kind of a newbie. Thanks!
data <- data.frame(
day = as.Date("2020-01-01"),
conversions = seq(1,6)^2,
cpa = 100000 / seq(1,6)^2
)
head(data)
str(data)
#plot
ggplot(data, aes(x=day)) +
geom_bar( aes(y=conversions), stat='identity') +
geom_line( aes(y=cpa)) +
scale_y_continuous(sec.axis = sec_axis(~./1000))
ggplot2::sec_axis is intended only to put up the scale itself; it does nothing to try to scale the values (that you are pairing with that axis). Why? Primarily because it knows nothing about which y variable you are intending to pair with which y-axis. (Is there anywhere in sec_axis to tell it that it should be looking at a particular variable? Nope.)
As a demonstration, let's start with some random data and plot the line.
set.seed(42)
dat <- data.frame(x = rep(1:10), y1 = sample(10), y2 = sample(100, size = 10))
dat
# x y1 y2
# 1 1 1 47
# 2 2 5 24
# 3 3 10 71
# 4 4 8 89
# 5 5 2 37
# 6 6 4 20
# 7 7 6 26
# 8 8 9 3
# 9 9 7 41
# 10 10 3 97
ggplot(dat, aes(x, y1)) +
geom_line() +
scale_y_continuous(name = "Oops!")
Now you determine that you want to add the y2 variable in there, but because its values are on a completely different scale, you think to just add them (I'll use geom_text here) and then set a second axis.
ggplot(dat, aes(x, y1)) +
geom_line() +
geom_text(aes(y = y2, label = y2)) +
scale_y_continuous(name = "Oops!", sec.axis = sec_axis(~ . * 10, name = "Quux!"))
Two things wrong with this:
The primary (left) y-axis now scales from 0 to 100, scrunching the primary y values to the bottom of the plot; and
Related, the secondary (right) y-axis scales from 0 to 1000?!? This is because the only thing that the secondary axis "knows" is the values that go into the primary axis ... and the primary axis is scaling to fit all of the y* variables it is told to plot.
That last point is important: this is giving y values that scale from 0 to 100, so the axis will reflect that. You can do lims(y=c(0,10)), but realize you'll be truncating y2 values ... that's not the right approach.
Instead, you need to scale the second values to be within the same range of values as the primary axis variable y1. Though not required, I'll use scale::rescale for this.
dat$y2scaled <- scales::rescale(dat$y2, range(dat$y1))
dat
# x y1 y2 y2scaled
# 1 1 1 47 5.212766
# 2 2 5 24 3.010638
# 3 3 10 71 7.510638
# 4 4 8 89 9.234043
# 5 5 2 37 4.255319
# 6 6 4 20 2.627660
# 7 7 6 26 3.202128
# 8 8 9 3 1.000000
# 9 9 7 41 4.638298
# 10 10 3 97 10.000000
Notice how y2scaled is now proportionately within y1's range?
We'll use that to position each of the text objects (though we'll still show the y2 as the label here).
ggplot(dat, aes(x, y1)) +
geom_line() +
geom_text(aes(y = y2scaled, label = y2)) +
scale_y_continuous(name = "Oops!", sec.axis = sec_axis(~ . * 10, name = "Quux!"))
Are we strictly required to make sure that the points pairing with the secondary axis perfectly fill the range of values of the primary axis? No. We could easily have thought to keep the text labels only on the bottom half of the plot, so we'd have to scale appropriately.
dat$y2scaled2 <- scales::rescale(dat$y2, range(dat$y1) / c(1, 2))
dat
# x y1 y2 y2scaled y2scaled2
# 1 1 1 47 5.212766 2.872340
# 2 2 5 24 3.010638 1.893617
# 3 3 10 71 7.510638 3.893617
# 4 4 8 89 9.234043 4.659574
# 5 5 2 37 4.255319 2.446809
# 6 6 4 20 2.627660 1.723404
# 7 7 6 26 3.202128 1.978723
# 8 8 9 3 1.000000 1.000000
# 9 9 7 41 4.638298 2.617021
# 10 10 3 97 10.000000 5.000000
ggplot(dat, aes(x, y1)) +
geom_line() +
geom_text(aes(y = y2scaled2, label = y2)) +
scale_y_continuous(name = "Oops!", sec.axis = sec_axis(~ . * 20, name = "Quux!"))
Notice that not only did I change how the y-axis values were scaled (now ranging from 0 to 5 in y2scaled2), but I also had to change the transformation within sec_axis to be *20 instead of *10.
Sometimes getting these transformations correct can be confusing, and it is easy to mess them up. However ... realize that it took many years to even get this functionality into ggplot2, mostly due to the lead developer(s) belief that even when plotted well, they can be confusing to the viewer, and potentially provide misleading takeaways. I find that they can be useful sometimes, and there are techniques one can use to encourage correct interpretation, but ... it's hard to get because it's easy to get wrong.
As an example of one technique that helps distinguish which axis goes with which data, see this:
ggplot(dat, aes(x, y1)) +
geom_line(color = "blue") +
geom_text(aes(y = y2scaled2, label = y2), color = "red") +
scale_y_continuous(name = "Oops!", sec.axis = sec_axis(~ . * 20, name = "Quux!")) +
theme(
axis.ticks.y.left = element_line(color = "blue"),
axis.text.y.left = element_text(color = "blue"),
axis.title.y.left = element_text(color = "blue"),
axis.ticks.y.right = element_line(color = "red"),
axis.text.y.right = element_text(color = "red"),
axis.title.y.right = element_text(color = "red")
)
(One might consider colors from viridis for a more color-blind palette.)

How to customise the colors in stacked bar charts

Maybe a question someone already asked.
I have a data frame (dat) that looks like this:
Sample perc cl
a 30 0
b 22 0
s 2 0
z 19 0
a 12 1
b 45 1
s 70 1
z 1 1
a 60 2
b 67 2
s 50 2
z 18 2
I would like to generate a stacked barplot. To do this I used the following:
g = ggplot(dat, aes(x = cl, y = Perc,fill = Sample)
g + geom_bar(stat="identity", position = "fill", show.legend = FALSE) +
scale_fill_manual(name = "Samples", values=c("a"="blue","b" = "blue","s" = "gray","z" = "red"))`
Fortunately the colors are assigned correctly. My point is that the order of samples in the bar is from a to z from the top to the bottom of the bar but I would like a situation in which the gray is on the top without loss of continuity in the bar from the blue to the red. Maybe there's another way to color the bars and set the desired order.
The groups are plotted in the bars in the order of the factor levels. You can change the plotting order by changing the order of the factor levels in your call to aes with factor(var, levels(var[order])) like this:
library(ggplot2)
ggplot(dat, aes(x = cl, y = perc,
fill = factor(Sample, levels(Sample)[c(3,1,2,4)]))) +
geom_bar(stat="identity", position = "fill", show.legend = FALSE) +
scale_fill_manual(name = "Samples",
values=c("a"="blue","b" = "blue","s" = "gray","z" = "red"))

In a ggplot2 geom_tile plot, is it possible to dodge the positions of tiles?

I'm trying to produce a bar plot where the bars fade vertically according to a third variable, and I'm using geom_tile to enable this. However, I have multiple bars for a given category on the x-axis, and I'd like to dodge their positions to put alike x values together in groups of bars which don't overlap.
Is it possible to use position='dodge' or similar with geom_tile and, if so, what's wrong with my syntax?
a <- data.frame(x = factor(c(rep('a',5), rep('a',5), rep('b',5), rep('c',5))),
y = c(1:5, 1:5, 1:5, 1:5),
z = c(5:1, c(5,4,4,4,1), 5:1, 5:1)
)
ggplot(a, aes(x = x, y = y, group = x)) +
geom_tile(aes(alpha = z, fill = x, width = 1),
position = 'dodge')
The example data frame a looks like this:
x y z
1 a 1 5
2 a 2 4
3 a 3 3
4 a 4 2
5 a 5 1
6 a 1 5
7 a 2 4
8 a 3 4
9 a 4 4
10 a 5 1
11 b 1 5
12 b 2 4
13 b 3 3
14 b 4 2
15 b 5 1
16 c 1 5
17 c 2 4
18 c 3 3
19 c 4 2
20 c 5 1
...and the resulting graph from the current code has no gaps between the x values, and the two where x is a are drawn on top of one-another:
I want those two bars where x is 'a' to be drawn as separate bars.
This is a mock-up of what I want the result to look like. The data are not correct for either of the a columns but it shows the grouping on the x-axis which is desired:
EDIT 2
To get your desired effect, use geom_bar() but be sure to change the y data to indicate the bar height, in this case 1. The reason is that the bars get stacked, so there is no need to specify the y-axis position, but instead specify the height.
Try this:
library(ggplot2)
a <- data.frame(x = factor(c(rep('a',5), rep('a',5), rep('b',5), rep('c',5))),
y = 1,
z = c(5:1, c(5,4,4,4,1), 5:1, 5:1)
)
a$bar <- rep(1:4, each=5)
ggplot(a, aes(x = factor(bar), y=y, fill=x, alpha=z)) +
geom_bar(stat="identity") +
facet_grid(~x, space="free", scale="free")
You should get:
EDIT 1
You can get close to what you describe by:
Explicitly adding another column that differentiates different bars in the same category
Using faceting
For example:
a$bar <- rep(1:4, each=5)
ggplot(a, aes(x = factor(bar), y = y, fill=x, alpha=z)) +
geom_bar(stat="identity", position="dodge") +
facet_grid(~x, space="free", scale="free")
ORIGINAL
You can use geom_bar() for this, by using stat="identity":
ggplot(a, aes(x = x, y = y, fill=x, alpha=z)) +
geom_bar(stat="identity", position="dodge")

Resources