In ggplot2, how do I properly scale x-axis in histogram? - r

The Ask:
Please help me understand my conceptual error in the use of scale_x_binned() in ggplot2 as it relates to centering breaks beneath the appropriate bin in a geom_histogram().
Starting Example:
library(ggplot2)
df <- data.frame(hour = sample(seq(0,23), 150, replace = TRUE))
# The data is just the integer values of the 24-hour clock in a day. It is
# **NOT** continuous data.
ggplot(df, aes(x = hour)) +
geom_histogram(bins = 24, fill = "grey60", color = "red")
This produces a histogram with labels properly centered beneath the
bin for which it belongs, but I want to label each hour, 0 - 23.
To do that, I thought I would assign breaks using scale_x_binned()
as demonstrated below.
Now I try to add the breaks:
ggplot(df, aes(x = hour)) +
geom_histogram(bins = 24, fill = "grey60", color = "red") +
scale_x_binned(name = "Hour of Day",
breaks = seq(0,23))
#> Warning: Removed 1 rows containing missing values (`geom_bar()`).
This returns the number of labels I wanted, but they are not centered
beneath the bins as desired. I also get the warning message for missing
values associated with geom_bar().
I believe I am overwriting the bins = 24 from the geom_histogram() call when I use the scale_x_binned() call afterward, but I don't understand exactly what is causing geom_histogram() to be centered in the first case that I am wrecking with my new call. I'd really like to have that clarified as I am not seeing my error when I read the associated help pages.
EDIT:
The "Starting Example" essentially works (bins are centered) except for the number of labels I ultimately want. If you built the ggplot2 layer differently, what is the equivalent code? That is, instead of:
ggplot(df, aes(x = hour)) +
geom_histogram(bins = 24, fill = "grey60", color = "red")
the call was instead built something like:
ggplot(df, aes(x = hour)) +
geom_histogram(fill = "grey60", color = "red") +
scale_x_binned(n.breaks = 24) # I know this isn't right, but akin to this.
or maybe
ggplot(df, aes(x = hour)) +
stat_bin(bins = 24, center = 0, fill = "grey60", color = "red")

It sounds like you are looking to use non-default labeling, where you want the labels to be aligned to the midpoint of the bins instead of their boundaries, which is what the breaks define. We could do that by using a continuous scale and hiding the main breaks, but keeping the minor breaks, like below.
scale_x_binned does not have minor breaks. It only has breaks at the boundaries of the bins, so it's not obvious to me how you could place the break labels at the midpoints of the bins.
ggplot(df, aes(x = hour)) +
geom_histogram(bins = 24, fill = "grey60", color = "red") +
scale_x_continuous(name = "Hour of Day", breaks = 0:23) +
theme(axis.ticks = element_blank(),
panel.grid.major.x = element_blank())

I though the same as you, namely scale_x_discrete, but the data given to geom_histogram is assumed to be continuous, so ...
ggplot(df, aes(x = hour)) +
geom_histogram(bins = 24, fill = "grey60", color = "red") +
scale_x_continuous(breaks = 0:23)
(Doesn't require any machinations with theme.)
I wish I could tell you that I found out how geom_histogram is centering the labels, but ggproto objects exist in a cavern with too many tunnels and passages for my mind to follow.
So I took a shot at examining the plot object that I created when I produced the png graphic above:
ggplot_build(plt)
# ------------
$data
$data[[1]]
y count x xmin xmax density ncount ndensity flipped_aes PANEL group ymin ymax colour fill size linetype
1 6 6 0 -0.5 0.5 0.04000000 0.6 0.6 FALSE 1 -1 0 6 red grey60 0.5 1
2 7 7 1 0.5 1.5 0.04666667 0.7 0.7 FALSE 1 -1 0 7 red grey60 0.5 1
3 4 4 2 1.5 2.5 0.02666667 0.4 0.4 FALSE 1 -1 0 4 red grey60 0.5 1
4 5 5 3 2.5 3.5 0.03333333 0.5 0.5 FALSE 1 -1 0 5 red grey60 0.5 1
5 7 7 4 3.5 4.5 0.04666667 0.7 0.7 FALSE 1 -1 0 7 red grey60 0.5 1
#snipped remainder
So the reason the break tick-marks are centered is that the bin construction is set up so they all are centered on the breaks.
Further exploration f whats in ggplot_build results:
ls(envir=ggplot_build(plt)$layout)
#[1] "coord" "coord_params" "facet" "facet_params" "layout" "panel_params"
#[7] "panel_scales_x" "panel_scales_y" "super"
ggplot_build(plt)$layout$panel_params
#-------results
[[1]]
[[1]]$x
<ggproto object: Class ViewScale, gg>
aesthetics: x xmin xmax xend xintercept xmin_final xmax_final xlower ...
break_positions: function
break_positions_minor: function
breaks: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 ...
continuous_range: -1.7 24.7
dimension: function
get_breaks: function
get_breaks_minor: function
#---- snipped remaining outpu

Related

How to draw the circle with certain radius around the point in the plot?

I'd like to draw the bubble plot with additional central point.
X1, Y1 is the x-axis and y-axis, respectively, and N1 is the radius of the circle around the point.
require(ggplot2)
df <- structure(list(X1 = c(1:10),
Y1 = c(15:24),
N1 = c(5, 2, 3, 5, 1, 2, 6, 3, 4, 1)),
class = "data.frame", row.names = c(NA, -10L))
df
X1 Y1 N1
1 1 15 5
2 2 16 2
3 3 17 3
4 4 18 5
5 5 19 1
6 6 20 2
7 7 21 6
8 8 22 3
9 9 23 4
10 10 24 1
Simply, I thought that it will be okay to overlap the bubble plot and basic plot.
ggplot(df, aes(x=X1, y=Y1, size=N1))+
theme_bw()+
geom_point(alpha=0.3, color='blue')+
scale_size(range=c(10,40))+
theme(legend.position="none")
But there were two problems.
How can I eliminate the line around the circles?
How can I add the center point on the bubble plot?
The below figure is the expected result.
I tried to mimic the code in this site: https://datavizpyr.com/how-to-add-circles-around-specific-data-points-in-r/, but the last line overlapped the previous result.
ggplot(df, aes(x=X1, y=Y1, size=N1))+
theme_bw()+
geom_point(alpha=0.3, color='blue')+
scale_size(range=c(10,40))+
geom_point(df, mapping=aes(x=X1, y=Y1))
Remember, aesthetics are by default inherited from the base ggplot() call to all subsequent layers, so the small central dots were inheriting the size aesthetic. Use the size aesthetic only in the aes call to the "bubble" layer.
To get rid of the lines around the circles, you can change the points to shape = 21, which is a filled circle. That way, you can set the fill colour to blue and make the line colour completely transparent.
ggplot(df, aes(x = X1, y = Y1))+
theme_bw()+
geom_point(aes(size = N1), shape = 21, alpha = 0.3, fill ='blue',
color = alpha("white", 0)) +
scale_size(range = c(10, 40)) +
geom_point() +
theme(legend.position = "none")

Secondary axis in R not registering

ggplot(df) +
geom_bar(aes(x=Date, y=DCMTotalCV, fill=CampaignName), stat='identity', position='stack') +
geom_line(aes(x=Date, y=DCMCPA, color=CampaignName, group=as.factor(CampaignName)), na.rm = FALSE,show.legend=NA)+
scale_y_continuous(sec.axis = sec_axis(~./1000, name = "DCMTotalCV"))+
theme_bw()+
labs(
x= "Date",
y= "CPA",
title = "Daily Performance"
)
Hey everyone - so I have 2 y-axes i want to plot. geom_line is registering fine on the main y-axis but geom_bar is not registering properly on the right. I tried scaling but it's still not registering or plotting on that second axis. It looks like it's still appearing on the main y-axis so I'm wondering how to tell the plot to plot it on the second one? Sorry i'm kind of a newbie. Thanks!
data <- data.frame(
day = as.Date("2020-01-01"),
conversions = seq(1,6)^2,
cpa = 100000 / seq(1,6)^2
)
head(data)
str(data)
#plot
ggplot(data, aes(x=day)) +
geom_bar( aes(y=conversions), stat='identity') +
geom_line( aes(y=cpa)) +
scale_y_continuous(sec.axis = sec_axis(~./1000))
ggplot2::sec_axis is intended only to put up the scale itself; it does nothing to try to scale the values (that you are pairing with that axis). Why? Primarily because it knows nothing about which y variable you are intending to pair with which y-axis. (Is there anywhere in sec_axis to tell it that it should be looking at a particular variable? Nope.)
As a demonstration, let's start with some random data and plot the line.
set.seed(42)
dat <- data.frame(x = rep(1:10), y1 = sample(10), y2 = sample(100, size = 10))
dat
# x y1 y2
# 1 1 1 47
# 2 2 5 24
# 3 3 10 71
# 4 4 8 89
# 5 5 2 37
# 6 6 4 20
# 7 7 6 26
# 8 8 9 3
# 9 9 7 41
# 10 10 3 97
ggplot(dat, aes(x, y1)) +
geom_line() +
scale_y_continuous(name = "Oops!")
Now you determine that you want to add the y2 variable in there, but because its values are on a completely different scale, you think to just add them (I'll use geom_text here) and then set a second axis.
ggplot(dat, aes(x, y1)) +
geom_line() +
geom_text(aes(y = y2, label = y2)) +
scale_y_continuous(name = "Oops!", sec.axis = sec_axis(~ . * 10, name = "Quux!"))
Two things wrong with this:
The primary (left) y-axis now scales from 0 to 100, scrunching the primary y values to the bottom of the plot; and
Related, the secondary (right) y-axis scales from 0 to 1000?!? This is because the only thing that the secondary axis "knows" is the values that go into the primary axis ... and the primary axis is scaling to fit all of the y* variables it is told to plot.
That last point is important: this is giving y values that scale from 0 to 100, so the axis will reflect that. You can do lims(y=c(0,10)), but realize you'll be truncating y2 values ... that's not the right approach.
Instead, you need to scale the second values to be within the same range of values as the primary axis variable y1. Though not required, I'll use scale::rescale for this.
dat$y2scaled <- scales::rescale(dat$y2, range(dat$y1))
dat
# x y1 y2 y2scaled
# 1 1 1 47 5.212766
# 2 2 5 24 3.010638
# 3 3 10 71 7.510638
# 4 4 8 89 9.234043
# 5 5 2 37 4.255319
# 6 6 4 20 2.627660
# 7 7 6 26 3.202128
# 8 8 9 3 1.000000
# 9 9 7 41 4.638298
# 10 10 3 97 10.000000
Notice how y2scaled is now proportionately within y1's range?
We'll use that to position each of the text objects (though we'll still show the y2 as the label here).
ggplot(dat, aes(x, y1)) +
geom_line() +
geom_text(aes(y = y2scaled, label = y2)) +
scale_y_continuous(name = "Oops!", sec.axis = sec_axis(~ . * 10, name = "Quux!"))
Are we strictly required to make sure that the points pairing with the secondary axis perfectly fill the range of values of the primary axis? No. We could easily have thought to keep the text labels only on the bottom half of the plot, so we'd have to scale appropriately.
dat$y2scaled2 <- scales::rescale(dat$y2, range(dat$y1) / c(1, 2))
dat
# x y1 y2 y2scaled y2scaled2
# 1 1 1 47 5.212766 2.872340
# 2 2 5 24 3.010638 1.893617
# 3 3 10 71 7.510638 3.893617
# 4 4 8 89 9.234043 4.659574
# 5 5 2 37 4.255319 2.446809
# 6 6 4 20 2.627660 1.723404
# 7 7 6 26 3.202128 1.978723
# 8 8 9 3 1.000000 1.000000
# 9 9 7 41 4.638298 2.617021
# 10 10 3 97 10.000000 5.000000
ggplot(dat, aes(x, y1)) +
geom_line() +
geom_text(aes(y = y2scaled2, label = y2)) +
scale_y_continuous(name = "Oops!", sec.axis = sec_axis(~ . * 20, name = "Quux!"))
Notice that not only did I change how the y-axis values were scaled (now ranging from 0 to 5 in y2scaled2), but I also had to change the transformation within sec_axis to be *20 instead of *10.
Sometimes getting these transformations correct can be confusing, and it is easy to mess them up. However ... realize that it took many years to even get this functionality into ggplot2, mostly due to the lead developer(s) belief that even when plotted well, they can be confusing to the viewer, and potentially provide misleading takeaways. I find that they can be useful sometimes, and there are techniques one can use to encourage correct interpretation, but ... it's hard to get because it's easy to get wrong.
As an example of one technique that helps distinguish which axis goes with which data, see this:
ggplot(dat, aes(x, y1)) +
geom_line(color = "blue") +
geom_text(aes(y = y2scaled2, label = y2), color = "red") +
scale_y_continuous(name = "Oops!", sec.axis = sec_axis(~ . * 20, name = "Quux!")) +
theme(
axis.ticks.y.left = element_line(color = "blue"),
axis.text.y.left = element_text(color = "blue"),
axis.title.y.left = element_text(color = "blue"),
axis.ticks.y.right = element_line(color = "red"),
axis.text.y.right = element_text(color = "red"),
axis.title.y.right = element_text(color = "red")
)
(One might consider colors from viridis for a more color-blind palette.)

ggplot2: Adding another legend to a plot (two times)

I have the following data set that is used for plotting a bubble plot frequencies.
Freq are frequencies at time 1
Freq1 are frequencies at time 2
id names variable value Freq Freq.1
1 1 item1 1 13 11
2 2 item2 1 9 96
3 3 item1 2 10 28
4 4 item2 2 15 8
5 5 item1 3 9 80
6 6 item2 3 9 10
7 7 item1 4 11 89
8 8 item2 4 14 8
9 9 item1 5 3 97
10 10 item2 5 25 82
I am using the following code for plotting, and I do like the plot. However I am having some troubles with the legend that I explain below:
theme_nogrid <- function (base_size = 12, base_family = "") {
theme_bw(base_size = base_size, base_family = base_family) %+replace%
theme(panel.grid = element_blank())
}
plot1<- ggplot(Data, aes(x = variable, y = value, size = Freq, color=Freq.1))+
geom_point( aes(size = Freq, stat = "identity", position = "identity"),
shape = 19, color="black", alpha=0.5) +
geom_point( aes(size = Freq.1, stat = "identity", position = "identity"),
shape = 19, color="red", alpha=0.5) +
scale_size_continuous(name= "Frequencies ", range = c(2,30))+
theme_nogrid()
1- I would like to have two legends: one for color, the other one for size, but i can't get the right arguments to do it (I have consult guide and theme documentation and i can't solve my problem with my own ideas)
2- After having the two legends, I would like to increase the size of the legend shape in order to look bigger (not the text, not the background, just the shape (without actually changing the plot)).
Here and example from what I would have and what i would like (that's an example from my real data). As you can see is almost impossible to distinguish the color in the first image.
Sorry if it's a newbie question, but i can't really get an example of that.
Thanks,
Angulo
Try something like this
library(ggplot2)
library(tidyr)
d <- gather(Data, type, freq, Freq, Freq.1)
ggplot(d, aes(x = variable, y = value))+
geom_point(aes(size = freq, colour = type), shape = 19, alpha = 0.5) +
scale_size_continuous(name = "Frequencies ", range = c(2, 30)) +
scale_colour_manual(values = c("red", "blue")) +
theme_nogrid() +
guides(colour = guide_legend(override.aes = list(size = 10)))
The last line will make the circles in the "colour" legend larger.

How to display 0 value in a bar chart using ggplot2

I have this data frame called data:
head(data)
date total_sold purchasability visibility
81 2014-05-01 3 3 3
82 2014-05-02 2 2 3
83 2014-05-03 1 2 3
84 2014-05-04 1 3 3
85 2014-05-05 3 2 3
86 2014-05-06 0 0 3
And I would like to do a bar chart with x = date and y = total_sold with a color depending on the purchasability. I this ggplot2 to do that :
bar <- ggplot(data = data, aes(x = date, fill=as.factor(purchasability),y = total_sold)) + geom_bar(stat = 'identity')
The output is very nice but the problem is that where total_sold = 0 there is not chart and thus no way to know the purchasability. Is it possible to still display a bar (maybe from 0.5 to -0.5) when total_sold = 0 ?
Thanks
You can just use geom bar, please look this code
df <- data.frame(time = factor(c("Lunch","Dinner","breakfast","test"), levels=c("Lunch","Dinner","breakfast","test")),
total_bill = c(14.89, 0,0.5,-0.5))
# Add a black outline
ggplot(data=df, aes(x=time, y=total_bill, fill=time)) + geom_bar(colour="black", stat="identity")
I'm not sure there's a simple way to go from 0.5 to -0.5 but you can easily show the 0 value as being a fraction (eg -0.1) by modifying the value in your bar= line to:
bar <- ggplot(data = data, aes(x = date, fill=as.factor(purchasability),y = sapply(total_sold, FUN=function(x) ifelse(x==0, -0.1,x) ))) + geom_bar(stat = 'identity')
This produces:
It is a little misleading to show 0 as something other than 0, but I hope this solves your problem.

Character values on a continuous axis in R ggplot2

Is there a way to include character values on the axes when plotting continuous data with ggplot2? I have censored data such as:
x y Freq
1 -3 16 3
2 -2 12 4
3 0 10 6
4 2 7 7
5 2 4 3
The last row of data are right censored. I am plotting this with the code below to produce the following plot:
a1 = data.frame(x=c(-3,-2,0,2,2), y=c(16,12,10,7,4), Freq=c(3,4,6,7,3))
fit = ggplot(a1, aes(x,y)) + geom_text(aes(label=Freq), size=5)+
theme_bw() +
scale_x_continuous(breaks = seq(min(a1$x)-1,max(a1$x)+1,by=1),
labels = seq(min(a1$x)-1,max(a1$x)+1,by=1),
limits = c(min(a1$x)-1,max(a1$x)+1))+
scale_y_continuous(breaks = seq(min(a1$y),max(a1$y),by=2))
The 3 points at (2,4) are right censored. I would like them to be plotted one unit to the right with the corresponding xaxis tick mark '>=2' instead of 3. Any ideas if this is possible?
It is quite possible. I hacked the data so 2,4 it's 3,4. Then I modified your labels which can be whatever you want as long as they are the same length as the breaks.
ggplot(a1, aes(x,y)) + geom_text(aes(label=Freq), size=5)+
theme_bw() +
scale_x_continuous(breaks = seq(min(a1$x)-1,max(a1$x),by=1),
labels = c(seq(min(a1$x)-1,max(a1$x)-1,by=1), ">=2"),
limits = c(min(a1$x)-1,max(a1$x)))+
scale_y_continuous(breaks = seq(min(a1$y),max(a1$y),by=2))

Resources