When creating a geom_histogram in ggplot, the bin labels appear directly underneath the bars. How can I make it so that they appear on either side of the bin, so that they describe the range of each bin (so that the bin that includes cases from 0 to 10 will appear between the 0 and 10 labels)?
I tried using
geom_histogram(position=position_nudge(5))
However, the histogram I'm using is stacked (to differentiate categories within each bin), and this effect is ruined when I add this position. Is there another way of doing it? Maybe moving the axis labels themselves instead of the bars?
Reproducible code:
dd<-data.frame(nums=c(1:20,15:30,40:55),cats=c(rep("a",20),rep("b",30),rep("c",2)))
ggplot(dd, aes(nums))+geom_histogram(aes(nums,fill=cats),dd,binwidth = 10)
results in this:
I want the bars to be shifted to the right by 5, so that the 0 aligns with the left-hand side of the histogram
You can try to define breaks and labels
n <- 10
ggplot(dd, aes(nums, fill=cats)) +
geom_histogram(binwidth = n, boundary = 0) +
scale_x_continuous(breaks = seq(0,55,n), labels = seq(0,55, n))
The following moves the labels of the axis. I wasn't sure how to move the ticks on the x axis so I removed them.
ggplot(dd, aes(nums))+geom_histogram(aes(nums),dd,binwidth = 10)+
theme(axis.text.x = element_text(hjust = 5),
axis.ticks.x = element_blank())
Related
I hope you can help me. I have the idea of visualizing segments within a plot with a rectangle that can be placed next to the y or x-axis which means that it would be outside of the plot area. It should look similar as in the image below:
I tried to reach the mentioned output by trying two different approaches:
I created two viewports with the grid package and put the plot in one viewport that I placed at the bottom and one viewport on top of that. The big problem here is that I need the coordinates from where the grey background panel of the ggplot starts so I can place the top viewport exactly there, so that the segments conincide with the x-axis length. My code looked like following:
container_viewport <- viewport(x=0,y=0,height=1,width=1,just = c("left","bottom"))
pushViewport(container_viewport)
grid.draw(rectGrob())
popViewport()
section_viewport <- viewport(x=0.055,y=0.99,height=0.085,width=0.935,just=c("left","top"))
pushViewport(section_viewport)
plot_obj <- ggplot_build(testplot)
plot_data <- plot_obj$data[[1]]
grid.draw(rectGrob(gp = gpar(col = "red")))
popViewport()
plot_viewport <- viewport(x=0,y=0,height=0.9,width=1,just=c("left","bottom"))
pushViewport(plot_viewport)
grid.draw(ggplotGrob(testplot))
popViewport()
This looks fine but I had to hardcode the coordinates of the viewport at the top.
I used grid.arrange() to arrange to stack the plots vertically (instead of a grob for the rectangle like in the other approach I create a ggplot instead for that). Here, basically the same problem exists, since I somehow need to put the plot representing the rectangle at the top in the right position on the x-axis. My code looked like following:
p1 <- plot_data %>%
ggplot()+
geom_rect(aes(xmin=-Inf,xmax=Inf,ymin=-Inf,ymax=Inf))
p2 <- testplot
test_plot <- grid.arrange(p1,p2,heights=c(1,10))
This approach does not work that good.
Since I would like to create a solution that can be applied generally, trial and error with the coordinates of the viewport is no option since the length of the y-axis label or tick labels can vary and therefore the length and coordinates of the background panel. When this step is done the segmentation of the rectangle should be no problem anymore.
Maybe this is just not possible but if then I would appreciate any help.
Thank you!
I would probably use patchwork here. Let's start by replicating your plot:
library(ggplot2)
library(patchwork)
p <- ggplot(iris, aes(Sepal.Length, Sepal.Width)) +
geom_point(color = "red") +
labs(x = "test", y = "test")
p
That looks very similar. Now we define (in our own co-ordinates) where we want the section split to occur on the x axis.
section_split <- 5.25
Using just this number, we add rectangles and text annotations that cover a copy of our original plot, and remove its axis annotations using theme_void:
p2 <- p +
annotate("rect", xmin = c(-Inf, section_split), ymin = c(-Inf, -Inf),
xmax = c(section_split, Inf), ymax = c(Inf, Inf),
fill = c("#00a2e8", "#ff7f27")) +
annotate("text", label = c("Section A", "Section B"), size = 6,
y = rep(mean(layer_scales(p)$y$range$range), 2),
x = c((min(layer_scales(p)$x$range$range) + section_split)/2,
(max(layer_scales(p)$x$range$range) + section_split)/2)) +
theme_void()
Now we just draw this second plot above our first, adjusting the relative heights to about 1:10
p2/p + plot_layout(heights = c(1, 10))
The benefit of doing it this way is that, since we copied the original plot, the positional mapping of the x axis is identical between the two plots, and patchwork will automatically line up the panels.
Created on 2023-02-04 with reprex v2.0.2
Consider this simple example
library(ggplot2)
dat <- data.frame(number = c(5, 10, 11 ,12,12,12,13,15,15))
ggplot(dat, aes(x = number)) + geom_histogram()
See how the bars are weirdly aligned with the x axis? Why is the first bar on the left of 5.0 while the bar at 10.0 is centered? How can I get control over that? For instance, it would make more sense (to me) to have the bar starting on the right of the label.
Why are the bars "weirdly aligned"?
Let me start by explaining, why your code leads to weirdly aligned bars. This has to do with the way a histogram is constructed. First, the x-axis is split up into intervals and then, the number of values in each interval is counted.
By default, ggplot splits the data up into 30 bins. It even spits out a message that says so:
stat_bin() using bins = 30. Pick better value with binwidth.
The default number of is not always a good choice. In your case, where all the data points are integers, one might want to choose the boundaries of the bins as 5, 6, 7, 8, ... or 4.5, 5.5, 6.5, ..., such that each bin contains exactly one integer value. You can obtain the boundaries of the bins that have been used in the plot as follows:
data <- data.frame(number = c(5, 10, 11 ,12, 12, 12, 13, 15, 15))
p <- ggplot(data, aes(x = number)) + geom_histogram()
ggplot_build(p)$data[[1]]$xmin
## [1] 4.655172 5.000000 5.344828 5.689655 6.034483 6.379310 6.724138 7.068966 7.413793
## [10] 7.758621 8.103448 8.448276 8.793103 9.137931 9.482759 9.827586 10.172414 10.517241
## [19] 10.862069 11.206897 11.551724 11.896552 12.241379 12.586207 12.931034 13.275862 13.620690
## [28] 13.965517 14.310345 14.655172
As you can see, the boundaries of the bins are not chosen in a way that would lead to a nice alignment of the bars with integers.
So, in short, the reason for the weird alignment is that ggplot simply uses a default number of 30 bins, which is not suitable, in your case, to have bars that are nicely aligned with integers.
There are (at least) two ways to get nicely aligned bars that I will discuss in the following
Use a bar plot instead
Since you have integer data, a histogram may just not be the appropriate choice of visualisation. You could instead use geom_bar(), which will lead to bars that are centered on integers:
ggplot(data, aes(x = number)) + geom_bar() + scale_x_continuous(breaks = 1:16)
You could move the bars to the right of the integers by adding 0.5 to number:
ggplot(data, aes(x = number + 0.5)) + geom_bar() + scale_x_continuous(breaks = 1:16)
Create a histogram with appropriate bins
If you nevertheless want to use a histogram, you can make ggplot to use more reasonable bins as follows:
ggplot(data, aes(x = number)) +
geom_histogram(binwidth = 1, boundary = 0, closed = "left") +
scale_x_continuous(breaks = 1:16)
With binwidth = 1, you override the default choice of 30 bins and explicitly require that bins should have a width of 1. boundary = 0 ensures that the binning starts at an integer value, which is what you need, if you want the integers to be to the left of the bars. (If you omit it, bins are chosen such that the bars are centered on integers.)
The argument closed = "left" is a bit more tricky to explain. As I described above, the boundaries of the bins are now chosen to be 5, 6, 7, .... The question is now, in which bin, e.g., 6 should be? It could be either the first or second one. This is the choice that is controlled by closed: if you set it to "right" (the default), then the bins are closed on the right, meaning that the right boundary of the bin will be included, while the left boundary belongs to the bin to the left. So, 6 would be in the first bin. On the other hand, if you chose "left", the left boundary will be part of the bin and 6 would be in the second bin.
Since you want the bars to be to the left of the integers, you need to pick closed = "left".
Comparison of the two solutions
If you compare the histogram with the bar plot, you will notice two differences:
There is a little gap between the bars in the bar plot, while they touch in the histogram. You could make the bars touch in the former by using geom_bar(width = 1).
The right most bar is between 15 and 16 for the bar plot, while it is between 14 and 15 for the histogram. The reason is that while for all the bins only the left boundary is part of the bin, for the right most bin, both boundaries are included.
This will center the bar on the value
data <- data.frame(number = c(5, 10, 11 ,12,12,12,13,15,15))
ggplot(data,aes(x = number)) + geom_histogram(binwidth = 0.5)
Here is a trick with the tick label to get the bar align on the left..
But if you add other data, you need to shift them also
ggplot(data,aes(x = number)) +
geom_histogram(binwidth = 0.5) +
scale_x_continuous(
breaks=seq(0.75,15.75,1), #show x-ticks align on the bar (0.25 before the value, half of the binwidth)
labels = 1:16 #change tick label to get the bar x-value
)
other option: binwidth = 1, breaks=seq(0.5,15.5,1) (might make more sense for integer)
On top of #Stibu's great answer, note that since ggplot2 3.4.0, geom_col and geom_bar can now take a new just argument to place the bars / cols to the left or right of the x-axis. 0.5 (the default) will place the columns in the center, 0 on the right, and 1 on the left:
library(patchwork)
library(ggplot2)
plot1 <- ggplot(dat, aes(x = number)) +
geom_bar(just = 0) +
labs(title = "with just = 0") +
scale_x_continuous(breaks = 1:16)
plot2 <- ggplot(dat, aes(x = number)) +
geom_bar(just = 1) +
labs(title = "with just = 1") +
scale_x_continuous(breaks = 1:16)
plot1 + plot2
This worked for me
+ scale_x_continuous(limits = c(0, NA))
From ?scale_x_continuous, limits is:
One of:
NULL to use the default scale range
A numeric vector of length two providing limits of the scale. Use NA
to refer to the existing minimum or maximum
A function that accepts the existing (automatic) limits and returns
new limits Note that setting limits on positional scales will remove
data outside of the limits. If the purpose is to zoom, use the limit
argument in the coordinate system (see coord_cartesian()).
library(ggplot2)
dat <- data.frame(number = c(5, 10, 11 ,12,12,12,13,15,15))
#I have added bins=10 to control too many bins, by default it takes 30
#then it is difficult to read the labels
p1 <- ggplot(dat, aes(x = number)) + geom_histogram(bins = 10, color="black")
#use ggplot_build to get access to bin details, subsetting to [5] is used to
#get max of each bin, you can use 3 to get centre, 4 to get left edge etc
#to see all the coponent of this chart, you can just run
#ggplot_build(p1)$data[[1]]
binDetails <- round(ggplot_build(p1)$data[[1]][5], digits = 3)
Scalexx <- scale_x_continuous(breaks = binDetails$xmax)
#final chart
p1+Scalexx
Please visit below link to see the same method as video and upvote if it helps:
https://www.youtube.com/watch?v=Za8bTDvmPLk
By using this method, we do not need to count the bin details manually. Please comment if any questions.
I have a large number of plots computed with ggplot, however when the y-axis has different number of digits, the left side of the plot are not aligned. They will not be inserted directly under / over each other, so a grid cannot be used. Nevertheless, I would like them to have the exact same size. How could this be achieved?
qplot(rnorm(10),1:10, colour = runif(10))
qplot(rnorm(10),1001:1010, colour = runif(10))
You can manually adjust the y-axis labels to match lengths or just rotate everything 90 degrees. Although there might be better solution out there.
ggplot(data.frame(x=rnorm(10),y=1:10),aes(x,y, colour = x))+geom_point()+
scale_y_continuous(breaks = seq(0,10,by=2),labels=c('0.000','2.000','4.000','6.000','8.000','10.000'))
ggplot(data.frame(x=rnorm(10),y=1001:1010),aes(x,y,colour = x) )+geom_point()+
theme(axis.text.y = element_text(angle = 90))
So would you like to have a fixed y-axis limits? you can use the coord_cartesian()
qplot(rnorm(10),1:10, colour = runif(10)) +
coord_cartesian(ylim = c(min(y_var), max(y_var)))
This shall fix the y-axis limits for all plots. Here y_var refers to the y variable being used for the y-axis
This is should be very simple question!
I would like to make a barplot with errorbars and I'm using the following code:
ggplot(data = bars, aes(x=c("1","2","3"), y=V2, fill = names)) +
geom_bar(position=position_dodge(), stat="identity", alpha = 0.7) +
geom_errorbar(aes(ymin=V1, ymax=V3))+
theme(legend.position='none')+
coord_cartesian(ylim=c(0,10))
However, I have 2 problems:
1. I would like the bars to start at y = 0
2. I don't like the ticks in the y axis. I would like numbers with just one decimal and less ticks.
this is my actual plot: Bars with error bars
For the first problem (if I understand it correctly) you can use ylim
... + ylim(0.2, NA)
NA leaves the upper bound free.
For the second, I suggest to use pretty_breaks from scale
library(scales)
... + scale_y_continuous(breaks=pretty_breaks(n=5))
So I have a bar chart to make, and a log plot for y axis is warranted because of the data range. So the problem is I have the value of 0.5, which in log10 is -0.3.
Since the bar goes into negative, the "top" of the bar, which is used for placing the labels is actually the "bottom" and so my text label is "just above" the bottom, which means in the middle of the bar.
I figure I am probably not the first person with this issue, but searching for related fixes has not helped. Most notably, I tried using dodge, but this does not change that the "top" of the bar is really the "bottom".
So two questions:
Can I fix this label mishap?
This is just ugly: can I move the x axis up to y=1 to give more context to the negative value without moving the x axis labels?
.
alpha=c('A','B','C','D')
value=c(0.5,10,40,1100)
table<-as.data.frame(alpha)
table<-cbind(table, value)
library(ggplot2)
graph <- ggplot(table, aes(x=alpha)) +
geom_bar(stat="identity",aes(y=value),width=0.5) +
geom_text(aes(y=value,label=value),vjust=-0.5) +
scale_y_continuous(trans="log10",limits=c(0.5,1400))
graph + theme_classic()
A little trick modifying the y coordinate of the data labels (use ifelse() to set the value of y to 1 if the value is less than one). As for the axis, simply hide the X axis (setting it to element_blank()) and draw a new horizontal line:
graph <- ggplot(table, aes(x=alpha)) +
geom_bar(stat="identity",aes(y=value),width=0.5) +
# Modify the placing of the label using 'ifelse()':
geom_text(aes(y=ifelse(value < 1, 1, value),label=value),vjust=-0.5) +
scale_y_continuous(trans="log10",limits=c(0.5,1400)) +
theme_classic() +
# Hide the X axis:
theme(axis.line.x = element_blank()) +
# Draw the new axis
geom_hline()
print(graph)
The output: