ggplot functions to replicate plots - r

I'm trying to replicate the theme of these graph using ggplot, I searched online to show me how to assign and I found few articles that discussed changing colors of two variables in scatterplot, I tried the following:
d1<-read.csv("./data/games.csv")
p.1<-ggplot(d1, aes(x=cream_rating, y=charcoal_rating)) +
geom_point(aes(color = cream_rating))
p.1 + ggtitle("Rating of Cream vs Charcoal") +
xlab("rating of cream") + ylab("rating of charcoal")+ theme(plot.title = element_text(hjust = 0.5)) + scale_color_manual(
values=c("orange", "green"))
I get this error:
ERROR while rich displaying an object: Error: Continuous value supplied to discrete scale
Traceback:
1. FUN(X[[i]], ...)
2. tryCatch(withCallingHandlers({
. if (!mime %in% names(repr::mime2repr))
. stop("No repr_* for mimetype ", mime, " in repr::mime2repr")
. rpr <- repr::mime2repr[[mime]](obj)
. if (is.null(rpr))
. return(NULL)
. prepare_content(is.raw(rpr), rpr)
. }, error = error_handler), error = outer_handler)
3. tryCatchList(expr, classes, parentenv, handlers)
4. tryCatchOne(expr, names, parentenv, handlers[[1L]])
5. doTryCatch(return(expr), name, parentenv, handler)
6. withCallingHandlers({
. if (!mime %in% names(repr::mime2repr))
. stop("No repr_* for mimetype ", mime, " in repr::mime2repr")
. rpr <- repr::mime2repr[[mime]](obj)
. if (is.null(rpr))
. return(NULL)
. prepare_content(is.raw(rpr), rpr)
. }, error = error_handler)
7. repr::mime2repr[[mime]](obj)
8. repr_text.default(obj)
9. paste(capture.output(print(obj)), collapse = "\n")
10. capture.output(print(obj))
11. evalVis(expr)
12. withVisible(eval(expr, pf))
13. eval(expr, pf)
14. eval(expr, pf)
15. print(obj)
16. print.ggplot(obj)
17. ggplot_build(x)
18. ggplot_build.ggplot(x)
19. lapply(data, scales_train_df, scales = npscales)
20. FUN(X[[i]], ...)
21. lapply(scales$scales, function(scale) scale$train_df(df = df))
22. FUN(X[[i]], ...)
23. scale$train_df(df = df)
24. f(..., self = self)
25. self$train(df[[aesthetic]])
26. f(..., self = self)
27. self$range$train(x, drop = self$drop, na.rm = !self$na.translate)
28. f(..., self = self)
29. scales::train_discrete(x, self$range, drop = drop, na.rm = na.rm)
30. stop("Continuous value supplied to discrete scale", call. = FALSE)
I'm using the wrong function, which one that I should use and how to get the cross line in the middle?
structure(list(rated = c(FALSE, TRUE, TRUE, TRUE, TRUE, FALSE,
TRUE, FALSE, TRUE, TRUE), turns = c(13L, 16L, 61L, 61L, 95L,
5L, 33L, 9L, 66L, 119L), victory_status = structure(c(3L, 4L,
2L, 2L, 2L, 1L, 4L, 4L, 4L, 2L), .Label = c("draw", "mate", "outoftime",
"resign"), class = "factor"), winner = structure(c(2L, 1L, 2L,
2L, 2L, 3L, 2L, 1L, 1L, 2L), .Label = c("charcoal", "cream",
"draw"), class = "factor"), increment_code = structure(c(3L,
7L, 7L, 5L, 6L, 1L, 1L, 4L, 2L, 1L), .Label = c("10+0", "15+0",
"15+2", "15+30", "20+0", "30+3", "5+10"), class = "factor"),
cream_rating = c(1500L, 1322L, 1496L, 1439L, 1523L, 1250L,
1520L, 1413L, 1439L, 1381L), charcoal_rating = c(1191L, 1261L,
1500L, 1454L, 1469L, 1002L, 1423L, 2108L, 1392L, 1209L)), row.names = c(NA,
10L), class = "data.frame")
This is what I want to achieve:
I tried Stefan's suggestion (which was great help) with some modifications:
`d1<-read.csv("./data/games.csv")
ggplot(d1, aes(x=cream_rating, y=charcoal_rating)) +
# Map winner on color. Add some transparency in case of overplotting
geom_point(aes(color = winner), alpha = 0.2) +
# Add the cross: Add geom_pints with one variable fixed on its mean
geom_point(aes(x = mean(cream_rating), color = winner), alpha = 0.2) +
geom_point(aes(y = mean(charcoal_rating), color = winner), alpha = 0.2) +
scale_shape_manual(values=c(16, 17)) +
# "draw"s should be dropped and removed from the title
scale_color_manual(values = c(cream = "seagreen4", charcoal = "chocolate3", draw = NA)) +
ggtitle("Rating of Cream vs Charcoal") +
xlab("rating of cream") + ylab("rating of charcoal") + theme_bw() + theme(plot.title = element_text(hjust = 0.5))
I want to filter out "draw" from the plot, also when I change the dot shapes to triangles and circle, they don't seem to be changing, in addition I get this error:
Warning message:
“Removed 950 rows containing missing values (geom_point).”
Warning message:
“Removed 950 rows containing missing values (geom_point).”
Warning message:
“Removed 950 rows containing missing values (geom_point).”
One more thing that I noticed, I get double cross instead of one!
This is my output:

The issue is that you mapped a continuous variable (cream_rating) on a discrete color scale (scale_color_manual).
As the plots in your images show there are only two colors, i.e. we need a discrete variable. As your data is about ratings my guess is that to achieve the plots you have to map winner on color. One question remains: How about draws. In my code below I set the color for draws equal to NA, i.e draws are dropped. But you can change that as you like.
From the image I also guess that some transparency was used to tackle overplotting. This could be achieved via the alpha argument, which I set to 0.6.
Concerning the cross appearing in your plot. Hard to tell, but my guess is that here the data was "replicated" two times by fixing one of your ratings variables to its meanvalue. If this guess is correct, we can get the cross via two additional geom_point layers.
library(ggplot2)
d1 <- structure(list(rated = c(FALSE, TRUE, TRUE, TRUE, TRUE, FALSE,
TRUE, FALSE, TRUE, TRUE), turns = c(13L, 16L, 61L, 61L, 95L,
5L, 33L, 9L, 66L, 119L), victory_status = structure(c(3L, 4L,
2L, 2L, 2L, 1L, 4L, 4L, 4L, 2L), .Label = c("draw", "mate", "outoftime",
"resign"), class = "factor"), winner = structure(c(2L, 1L, 2L,
2L, 2L, 3L, 2L, 1L, 1L, 2L), .Label = c("charcoal", "cream",
"draw"), class = "factor"), increment_code = structure(c(3L,
7L, 7L, 5L, 6L, 1L, 1L, 4L, 2L, 1L), .Label = c("10+0", "15+0",
"15+2", "15+30", "20+0", "30+3", "5+10"), class = "factor"),
cream_rating = c(1500L, 1322L, 1496L, 1439L, 1523L, 1250L,
1520L, 1413L, 1439L, 1381L), charcoal_rating = c(1191L, 1261L,
1500L, 1454L, 1469L, 1002L, 1423L, 2108L, 1392L, 1209L)), row.names = c(NA,
10L), class = "data.frame")
ggplot(d1, aes(x=cream_rating, y=charcoal_rating)) +
# Map winner on color. Add some transparency in case of overplotting
geom_point(aes(color = winner), alpha = 0.6) +
# Just a guess to add the cross: Add geom_pints with one variable fixed on its mean
geom_point(aes(x = mean(cream_rating), color = winner), alpha = 0.6) +
geom_point(aes(y = mean(charcoal_rating), color = winner), alpha = 0.6) +
# Should "draw"s be colored or dropped?
scale_color_manual(values = c(cream = "green", charcoal = "orange", draw = NA)) +
ggtitle("Rating of Cream vs Charcoal") +
xlab("rating of cream") + ylab("rating of charcoal")+ theme(plot.title = element_text(hjust = 0.5))
EDIT
the shapes don't show up because you missed to map winner on the shape aes
the "errors" are simply warnings which arise because we set the color for draws to NA. These are the rows which ggplot removes. To get rid of the draws simply filter your dataset before plotting:
library(ggplot2)
library(dplyr)
d1 %>%
filter(winner != "draw") %>%
ggplot(aes(x=cream_rating, y=charcoal_rating, color = winner, shape = winner)) +
# Map winner on color. Add some transparency in case of overplotting
geom_point(alpha = 0.6, na.rm = TRUE) +
# Just a guess to add the cross: Add geom_pints with one variable fixed on its mean
geom_point(aes(x = mean(cream_rating)), alpha = 0.6) +
geom_point(aes(y = mean(charcoal_rating)), alpha = 0.6) +
# Should "draw"s be colored or dropped?
scale_color_manual(values = c(cream = "green", charcoal = "orange")) +
scale_shape_manual(values = c(cream = 16, charcoal = 17)) +
ggtitle("Rating of Cream vs Charcoal") +
xlab("rating of cream") + ylab("rating of charcoal")+ theme(plot.title = element_text(hjust = 0.5))

Related

Why doesn't the x axis add value to the existing values? [duplicate]

I have the following plot:
library(reshape)
library(ggplot2)
library(gridExtra)
require(ggplot2)
data2<-structure(list(IR = structure(c(4L, 3L, 2L, 1L, 4L, 3L, 2L, 1L
), .Label = c("0.13-0.16", "0.17-0.23", "0.24-0.27", "0.28-1"
), class = "factor"), variable = structure(c(1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L), .Label = c("Real queens", "Simulated individuals"
), class = "factor"), value = c(15L, 11L, 29L, 42L, 0L, 5L, 21L,
22L), Legend = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), .Label = c("Real queens",
"Simulated individuals"), class = "factor")), .Names = c("IR",
"variable", "value", "Legend"), row.names = c(NA, -8L), class = "data.frame")
p <- ggplot(data2, aes(x =factor(IR), y = value, fill = Legend, width=.15))
data3<-structure(list(IR = structure(c(4L, 3L, 2L, 1L, 4L, 3L, 2L, 1L
), .Label = c("0.13-0.16", "0.17-0.23", "0.24-0.27", "0.28-1"
), class = "factor"), variable = structure(c(1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L), .Label = c("Real queens", "Simulated individuals"
), class = "factor"), value = c(2L, 2L, 6L, 10L, 0L, 1L, 4L,
4L), Legend = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), .Label = c("Real queens",
"Simulated individuals"), class = "factor")), .Names = c("IR",
"variable", "value", "Legend"), row.names = c(NA, -8L), class = "data.frame")
q<- ggplot(data3, aes(x =factor(IR), y = value, fill = Legend, width=.15))
##the plot##
q + geom_bar(position='dodge', colour='black') + ylab('Frequency') + xlab('IR')+scale_fill_grey() +theme(axis.text.x=element_text(colour="black"), axis.text.y=element_text(colour="Black"))+ opts(title='', panel.grid.major = theme_blank(),panel.grid.minor = theme_blank(),panel.border = theme_blank(),panel.background = theme_blank(), axis.ticks.x = theme_blank())
I want the y-axis to display only integers. Whether this is accomplished through rounding or through a more elegant method isn't really important to me.
If you have the scales package, you can use pretty_breaks() without having to manually specify the breaks.
q + geom_bar(position='dodge', colour='black') +
scale_y_continuous(breaks= pretty_breaks())
This is what I use:
ggplot(data3, aes(x = factor(IR), y = value, fill = Legend, width = .15)) +
geom_col(position = 'dodge', colour = 'black') +
scale_y_continuous(breaks = function(x) unique(floor(pretty(seq(0, (max(x) + 1) * 1.1)))))
With scale_y_continuous() and argument breaks= you can set the breaking points for y axis to integers you want to display.
ggplot(data2, aes(x =factor(IR), y = value, fill = Legend, width=.15)) +
geom_bar(position='dodge', colour='black')+
scale_y_continuous(breaks=c(1,3,7,10))
You can use a custom labeller. For example, this function guarantees to only produce integer breaks:
int_breaks <- function(x, n = 5) {
l <- pretty(x, n)
l[abs(l %% 1) < .Machine$double.eps ^ 0.5]
}
Use as
+ scale_y_continuous(breaks = int_breaks)
It works by taking the default breaks, and only keeping those that are integers. If it is showing too few breaks for your data, increase n, e.g.:
+ scale_y_continuous(breaks = function(x) int_breaks(x, n = 10))
These solutions did not work for me and did not explain the solutions.
The breaks argument to the scale_*_continuous functions can be used with a custom function that takes the limits as input and returns breaks as output. By default, the axis limits will be expanded by 5% on each side for continuous data (relative to the range of data). The axis limits will likely not be integer values due to this expansion.
The solution I was looking for was to simply round the lower limit up to the nearest integer, round the upper limit down to the nearest integer, and then have breaks at integer values between these endpoints. Therefore, I used the breaks function:
brk <- function(x) seq(ceiling(x[1]), floor(x[2]), by = 1)
The required code snippet is:
scale_y_continuous(breaks = function(x) seq(ceiling(x[1]), floor(x[2]), by = 1))
The reproducible example from original question is:
data3 <-
structure(
list(
IR = structure(
c(4L, 3L, 2L, 1L, 4L, 3L, 2L, 1L),
.Label = c("0.13-0.16", "0.17-0.23", "0.24-0.27", "0.28-1"),
class = "factor"
),
variable = structure(
c(1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L),
.Label = c("Real queens", "Simulated individuals"),
class = "factor"
),
value = c(2L, 2L, 6L, 10L, 0L, 1L, 4L,
4L),
Legend = structure(
c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L),
.Label = c("Real queens",
"Simulated individuals"),
class = "factor"
)
),
row.names = c(NA,-8L),
class = "data.frame"
)
ggplot(data3, aes(
x = factor(IR),
y = value,
fill = Legend,
width = .15
)) +
geom_col(position = 'dodge', colour = 'black') + ylab('Frequency') + xlab('IR') +
scale_fill_grey() +
scale_y_continuous(
breaks = function(x) seq(ceiling(x[1]), floor(x[2]), by = 1),
expand = expand_scale(mult = c(0, 0.05))
) +
theme(axis.text.x=element_text(colour="black", angle = 45, hjust = 1),
axis.text.y=element_text(colour="Black"),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.border = element_blank(),
panel.background = element_blank(),
axis.ticks.x = element_blank())
I found this solution from Joshua Cook and worked pretty well.
integer_breaks <- function(n = 5, ...) {
fxn <- function(x) {
breaks <- floor(pretty(x, n, ...))
names(breaks) <- attr(breaks, "labels")
breaks
}
return(fxn)
}
q + geom_bar(position='dodge', colour='black') +
scale_y_continuous(breaks = integer_breaks())
The source is:
https://joshuacook.netlify.app/post/integer-values-ggplot-axis/
You can use the accuracy argument of scales::label_number() or scales::label_comma() for this:
fakedata <- data.frame(
x = 1:5,
y = c(0.1, 1.2, 2.4, 2.9, 2.2)
)
library(ggplot2)
# without the accuracy argument, you see .0 decimals
ggplot(fakedata, aes(x = x, y = y)) +
geom_point() +
scale_y_continuous(label = scales::comma)
# with the accuracy argument, all displayed numbers are integers
ggplot(fakedata, aes(x = x, y = y)) +
geom_point() +
scale_y_continuous(label = ~ scales::comma(.x, accuracy = 1))
# equivalent
ggplot(fakedata, aes(x = x, y = y)) +
geom_point() +
scale_y_continuous(label = scales::label_comma(accuracy = 1))
# this works with scales::label_number() as well
ggplot(fakedata, aes(x = x, y = y)) +
geom_point() +
scale_y_continuous(label = scales::label_number(accuracy = 1))
Created on 2021-08-27 by the reprex package (v2.0.0.9000)
All of the existing answers seem to require custom functions or fail in some cases.
This line makes integer breaks:
bad_scale_plot +
scale_y_continuous(breaks = scales::breaks_extended(Q = c(1, 5, 2, 4, 3)))
For more info, see the documentation ?labeling::extended (which is a function called by scales::breaks_extended).
Basically, the argument Q is a set of nice numbers that the algorithm tries to use for scale breaks. The original plot produces non-integer breaks (0, 2.5, 5, and 7.5) because the default value for Q includes 2.5: Q = c(1,5,2,2.5,4,3).
EDIT: as pointed out in a comment, non-integer breaks can occur when the y-axis has a small range. By default, breaks_extended() tries to make about n = 5 breaks, which is impossible when the range is too small. Quick testing shows that ranges wider than 0 < y < 2.5 give integer breaks (n can also be decreased manually).
This answer builds on #Axeman's answer to address the comment by kory that if the data only goes from 0 to 1, no break is shown at 1. This seems to be because of inaccuracy in pretty with outputs which appear to be 1 not being identical to 1 (see example at the end).
Therefore if you use
int_breaks_rounded <- function(x, n = 5) pretty(x, n)[round(pretty(x, n),1) %% 1 == 0]
with
+ scale_y_continuous(breaks = int_breaks_rounded)
both 0 and 1 are shown as breaks.
Example to illustrate difference from Axeman's
testdata <- data.frame(x = 1:5, y = c(0,1,0,1,1))
p1 <- ggplot(testdata, aes(x = x, y = y))+
geom_point()
p1 + scale_y_continuous(breaks = int_breaks)
p1 + scale_y_continuous(breaks = int_breaks_rounded)
Both will work with the data provided in the initial question.
Illustration of why rounding is required
pretty(c(0,1.05),5)
#> [1] 0.0 0.2 0.4 0.6 0.8 1.0 1.2
identical(pretty(c(0,1.05),5)[6],1)
#> [1] FALSE
Google brought me to this question. I'm trying to use real numbers in a y scale. The y scale numbers are in Millions.
The scales package comma method introduces a comma to my large numbers. This post on R-Bloggers explains a simple approach using the comma method:
library(scales)
big_numbers <- data.frame(x = 1:5, y = c(1000000:1000004))
big_numbers_plot <- ggplot(big_numbers, aes(x = x, y = y))+
geom_point()
big_numbers_plot + scale_y_continuous(labels = comma)
Enjoy R :)
One answer is indeed inside the documentation of the pretty() function. As pointed out here Setting axes to integer values in 'ggplot2' the function contains already the solution. You have just to make it work for small values. One possibility is writing a new function like the author does, for me a lambda function inside the breaks argument just works:
... + scale_y_continuous(breaks = ~round(unique(pretty(.))
It will round the unique set of values generated by pretty() creating only integer labels, no matter the scale of values.
If your values are integers, here is another way of doing this with group = 1 and as.factor(value):
library(tidyverse)
data3<-structure(list(IR = structure(c(4L, 3L, 2L, 1L, 4L, 3L, 2L, 1L
), .Label = c("0.13-0.16", "0.17-0.23", "0.24-0.27", "0.28-1"
), class = "factor"), variable = structure(c(1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L), .Label = c("Real queens", "Simulated individuals"
), class = "factor"), value = c(2L, 2L, 6L, 10L, 0L, 1L, 4L,
4L), Legend = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), .Label = c("Real queens",
"Simulated individuals"), class = "factor")), .Names = c("IR",
"variable", "value", "Legend"), row.names = c(NA, -8L), class = "data.frame")
data3 %>%
mutate(value = as.factor(value)) %>%
ggplot(aes(x =factor(IR), y = value, fill = Legend, width=.15)) +
geom_col(position = 'dodge', colour='black', group = 1)
Created on 2022-04-05 by the reprex package (v2.0.1)
This is what I did
scale_x_continuous(labels = function(x) round(as.numeric(x)))

Removing "False"-condition scales::dollar labels on ifelse within geom_label

trying to establish individual bar data labels ONLY if the value is negative. I was able to do it fine for a variable that comprised simple integers, but for a variable that needs to be formatted as dollar with the thousands separator, I can't seem to get rid of the "NA" label.
DolSumPlot <- ggplot(data = DolSums, aes(x = Group.1, fill = Group.2)) +
geom_bar(aes(weight = x), position = position_stack(reverse = TRUE)) +
coord_flip() +
labs(title = "Dollars Billed by Technician and Shop, Between 02/01/2018 and 05/31/2018",
y = "Dollars Billed", x = "Technician", fill = "Shop") +
scale_y_continuous(limits= c(NA,NA),
labels = scales::dollar,
breaks = seq(0, 50000 + 10000, 5000*2),
minor_breaks = seq(0,50000 + 10000, by = 5000)) +
scale_fill_brewer(palette = "Set1") +
geom_label(aes(label=scales::dollar(ifelse(DolSums$x < 0, DolSums$x,NA)),
y = DolSums$x),
show.legend = FALSE, size = 2.6, colour = "white", fontface = "bold")
Data:
DolSums = structure(list(Group.1 = c((names)), Group.2 = structure(c(4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 6L, 6L, 6L, 6L, 6L, 6L, 6L,
5L, 5L, 5L, 5L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("Radio",
"Video", "Engineering", "800Mhz", "PSSRP", "Other"), class = "factor"),
x = c(4646, 16008.5, 48793.1, 4040, 14468.25, 13332, 1565.5,
6060, 6549.85, 2929, 4444, 3257.25, 5904, 2029.5, 3321, 6767,
8105.25, 8105.25, 8130.5, 3131, 5075.25, 3383.5, 4418.75,
23381.5, 1363.5, -2323, 29133.45, 2550.25, 505, 26042.85,
35203.55, 35940.85, 1641.25, 45066.2, 37541.7, 606, 45439.9
)), .Names = c("Group.1", "Group.2", "x"), row.names = c(NA,
-37L), class = "data.frame")
You can do this by using the data argument in geom_label and subsetting only rows with negative x. Also note that since you already have DolSums as input, there is no need to write DolSums$x. Instead, use column name to refer to a specific column directly:
library(ggplot2)
ggplot(data = DolSums, aes(x = Group.1, fill = Group.2)) +
geom_bar(aes(weight = x), position = position_stack(reverse = TRUE)) +
coord_flip() +
labs(title = "Dollars Billed by Technician and Shop, Between 02/01/2018 and 05/31/2018",
y = "Dollars Billed", x = "Technician", fill = "Shop") +
scale_y_continuous(limits= c(NA,NA),
labels = scales::dollar,
breaks = seq(0, 50000 + 10000, 5000*2),
minor_breaks = seq(0,50000 + 10000, by = 5000)) +
scale_fill_brewer(palette = "Set1") +
geom_label(data = DolSums[DolSums$x < 0,],
aes(label=scales::dollar(x),
y = x),
show.legend = FALSE, size = 2.6, colour = "white", fontface = "bold")

How to add comparison bars to a plot to denote which comparison a p value corresponds to

I'm using the following data frame:
df1 <- structure(list(Genotype = structure(c(1L, 1L, 1L, 1L, 1L,
2L,2L,2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L,
1L,1L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L),
.Label= c("miR-15/16 FL", "miR-15/16 cKO"), class = "factor"),
Tissue = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L), .Label = c("iLN", "Spleen", "Skin", "Colon"), class = "factor"),
`Cells/SC/Live/CD8—,, CD4+/Foxp3+,Median,<BV421-A>,CD127` = c(518L,
715L, 572L, 599L, 614L, 881L, 743L, 722L, 779L, 843L, 494L,
610L, 613L, 624L, 631L, 925L, 880L, 932L, 876L, 926L, 1786L,
2079L, 2199L, 2345L, 2360L, 2408L, 2509L, 3129L, 3263L, 3714L,
917L, NA, 1066L, 1059L, 939L, 1269L, 1047L, 974L, 1048L,
1084L)),
.Names = c("Genotype", "Tissue", "Cells/SC/Live/CD8—,,CD4+/Foxp3+,Median,<BV421-A>,CD127"),
row.names = c(NA, -40L), class = c("tbl_df", "tbl", "data.frame"))
and trying to make a plot using ggplot2 where box plots and points are displayed grouped by "Tissue" and interleaved by "Genotype". The significance values are displaying properly but I would like to add lines to denote the comparisons being made and have them start at the center of each "miR-15/16 FL" box plot and end at the center of each "miR-15/16 cKO" box plot and sit directly below the significance values. Below is the code I am using to generate the plot:
library(ggplot2)
library(ggpubr)
color.groups <- c("black","red")
names(color.groups) <- unique(df1$Genotype)
shape.groups <- c(16, 1)
names(shape.groups) <- unique(df1$Genotype)
ggplot(df1, aes(x = Tissue, y = df1[3], color = Genotype, shape = Genotype)) +
geom_boxplot(position = position_dodge(), outlier.shape = NA) +
geom_point(position=position_dodge(width=0.75)) +
ylim(0,1.2*max(df1[3], na.rm = TRUE)) +
ylab('MFI CD127 (of CD4+ Foxp3+ T cells') +
scale_color_manual(values=color.groups) +
scale_shape_manual(values=shape.groups) +
theme_bw() + theme(panel.border = element_blank(), panel.grid.major = element_blank(),
panel.grid.minor = element_blank(), axis.line = element_line(colour = "black"),
axis.title.x=element_blank(), aspect.ratio = 1,
text = element_text(size = 9)) +
stat_compare_means(show.legend = FALSE, label = 'p.format', method = 't.test',
label.y = c(0.1*max(df1[3], na.rm = TRUE) + max(df1[3][c(1:10),], na.rm = TRUE),
0.1*max(df1[3], na.rm = TRUE) + max(df1[3][c(11:20),], na.rm = TRUE),
0.1*max(df1[3], na.rm = TRUE) + max(df1[3][c(21:30),], na.rm = TRUE),
0.1*max(df1[3], na.rm = TRUE) + max(df1[3][c(31:40),], na.rm = TRUE)))
Thanks for any help!
I've created the brackets with three calls to geom_segment. These calls use a new dmax data frame created to provide the reference y-values for positioning the brackets and the p-value labels. The values e and r are for tweaking these positions.
I've made a few other changes to your code.
Change the name of the third column to temp and use this name y=temp in the call to ggplot. Your original code uses y=df1[3], which essentially reaches outside the plot environment to the df1 object in the parent environment, which can cause problems. Also, having a short name to refer to makes it easier to generate the dmax data frame and refer to its columns.
Use the dmax data frame for label.y positions in stat_compare_means, which reduces the amount of code needed. (Incidently, stat_compare_means seems to require hard-coded label.y positions, rather than getting them from an aes mapping of the data.)
Position the p-value labels an absolute distance above each pair of box plots (using the value e), rather than a multiplicative distance. This makes it easier to keep spacing consistent between p-value labels, brackets, and box plots.
# Use a short column name for the third column
names(df1)[3] = "temp"
# Generate data frame of reference y-values for p-value labels and bracket positions
dmax = df1 %>% group_by(Tissue) %>%
summarise(temp=max(temp, na.rm=TRUE),
Genotype=NA)
# For tweaking position of brackets
e = 350
r = 0.6
w = 0.19
bcol = "grey30"
ggplot(df1, aes(x = Tissue, y = temp, color = Genotype, shape = Genotype)) +
geom_boxplot(position = position_dodge(), outlier.shape = NA) +
geom_point(position=position_dodge(width=0.75)) +
ylim(0,1.2*max(df1[3], na.rm = TRUE)) +
ylab('MFI CD127 (of CD4+ Foxp3+ T cells') +
scale_color_manual(values=color.groups) +
scale_shape_manual(values=shape.groups) +
theme_bw() + theme(panel.border = element_blank(), panel.grid.major = element_blank(),
panel.grid.minor = element_blank(), axis.line = element_line(colour = "black"),
axis.title.x=element_blank(), aspect.ratio = 1,
text = element_text(size = 9)) +
stat_compare_means(show.legend = FALSE, label = 'p.format', method = 't.test',
label.y = e + dmax$temp) +
geom_segment(data=dmax,
aes(x=as.numeric(Tissue)-w, xend=as.numeric(Tissue)+w,
y=temp + r*e, yend=temp + r*e), size=0.3, color=bcol, inherit.aes=FALSE) +
geom_segment(data=dmax,
aes(x=as.numeric(Tissue) + w, xend=as.numeric(Tissue) + w,
y=temp + r*e, yend=temp + r*e - 60), size=0.3, color=bcol, inherit.aes=FALSE) +
geom_segment(data=dmax,
aes(x=as.numeric(Tissue) - w, xend=as.numeric(Tissue) - w,
y=temp + r*e, yend=temp + r*e - 60), size=0.3, color=bcol, inherit.aes=FALSE)
To address your comment, here's an example to show that the method above inherently adjusts to any number of x-categories.
Let's begin by adding two new tissue categories:
library(forcats)
df1$Tissue = fct_expand(df1$Tissue, "Tissue 5", "Tissue 6")
df1$Tissue[seq(1,20,4)] = "Tissue 5"
df1$Tissue[seq(21,40,4)] = "Tissue 6"
dmax = df1 %>% group_by(Tissue) %>%
summarise(temp=max(temp, na.rm=TRUE),
Genotype=NA)
Now run exactly the same plot code listed above to get the following plot:

How to fix the following output plot by R? [duplicate]

I have the following plot:
library(reshape)
library(ggplot2)
library(gridExtra)
require(ggplot2)
data2<-structure(list(IR = structure(c(4L, 3L, 2L, 1L, 4L, 3L, 2L, 1L
), .Label = c("0.13-0.16", "0.17-0.23", "0.24-0.27", "0.28-1"
), class = "factor"), variable = structure(c(1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L), .Label = c("Real queens", "Simulated individuals"
), class = "factor"), value = c(15L, 11L, 29L, 42L, 0L, 5L, 21L,
22L), Legend = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), .Label = c("Real queens",
"Simulated individuals"), class = "factor")), .Names = c("IR",
"variable", "value", "Legend"), row.names = c(NA, -8L), class = "data.frame")
p <- ggplot(data2, aes(x =factor(IR), y = value, fill = Legend, width=.15))
data3<-structure(list(IR = structure(c(4L, 3L, 2L, 1L, 4L, 3L, 2L, 1L
), .Label = c("0.13-0.16", "0.17-0.23", "0.24-0.27", "0.28-1"
), class = "factor"), variable = structure(c(1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L), .Label = c("Real queens", "Simulated individuals"
), class = "factor"), value = c(2L, 2L, 6L, 10L, 0L, 1L, 4L,
4L), Legend = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), .Label = c("Real queens",
"Simulated individuals"), class = "factor")), .Names = c("IR",
"variable", "value", "Legend"), row.names = c(NA, -8L), class = "data.frame")
q<- ggplot(data3, aes(x =factor(IR), y = value, fill = Legend, width=.15))
##the plot##
q + geom_bar(position='dodge', colour='black') + ylab('Frequency') + xlab('IR')+scale_fill_grey() +theme(axis.text.x=element_text(colour="black"), axis.text.y=element_text(colour="Black"))+ opts(title='', panel.grid.major = theme_blank(),panel.grid.minor = theme_blank(),panel.border = theme_blank(),panel.background = theme_blank(), axis.ticks.x = theme_blank())
I want the y-axis to display only integers. Whether this is accomplished through rounding or through a more elegant method isn't really important to me.
If you have the scales package, you can use pretty_breaks() without having to manually specify the breaks.
q + geom_bar(position='dodge', colour='black') +
scale_y_continuous(breaks= pretty_breaks())
This is what I use:
ggplot(data3, aes(x = factor(IR), y = value, fill = Legend, width = .15)) +
geom_col(position = 'dodge', colour = 'black') +
scale_y_continuous(breaks = function(x) unique(floor(pretty(seq(0, (max(x) + 1) * 1.1)))))
With scale_y_continuous() and argument breaks= you can set the breaking points for y axis to integers you want to display.
ggplot(data2, aes(x =factor(IR), y = value, fill = Legend, width=.15)) +
geom_bar(position='dodge', colour='black')+
scale_y_continuous(breaks=c(1,3,7,10))
You can use a custom labeller. For example, this function guarantees to only produce integer breaks:
int_breaks <- function(x, n = 5) {
l <- pretty(x, n)
l[abs(l %% 1) < .Machine$double.eps ^ 0.5]
}
Use as
+ scale_y_continuous(breaks = int_breaks)
It works by taking the default breaks, and only keeping those that are integers. If it is showing too few breaks for your data, increase n, e.g.:
+ scale_y_continuous(breaks = function(x) int_breaks(x, n = 10))
These solutions did not work for me and did not explain the solutions.
The breaks argument to the scale_*_continuous functions can be used with a custom function that takes the limits as input and returns breaks as output. By default, the axis limits will be expanded by 5% on each side for continuous data (relative to the range of data). The axis limits will likely not be integer values due to this expansion.
The solution I was looking for was to simply round the lower limit up to the nearest integer, round the upper limit down to the nearest integer, and then have breaks at integer values between these endpoints. Therefore, I used the breaks function:
brk <- function(x) seq(ceiling(x[1]), floor(x[2]), by = 1)
The required code snippet is:
scale_y_continuous(breaks = function(x) seq(ceiling(x[1]), floor(x[2]), by = 1))
The reproducible example from original question is:
data3 <-
structure(
list(
IR = structure(
c(4L, 3L, 2L, 1L, 4L, 3L, 2L, 1L),
.Label = c("0.13-0.16", "0.17-0.23", "0.24-0.27", "0.28-1"),
class = "factor"
),
variable = structure(
c(1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L),
.Label = c("Real queens", "Simulated individuals"),
class = "factor"
),
value = c(2L, 2L, 6L, 10L, 0L, 1L, 4L,
4L),
Legend = structure(
c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L),
.Label = c("Real queens",
"Simulated individuals"),
class = "factor"
)
),
row.names = c(NA,-8L),
class = "data.frame"
)
ggplot(data3, aes(
x = factor(IR),
y = value,
fill = Legend,
width = .15
)) +
geom_col(position = 'dodge', colour = 'black') + ylab('Frequency') + xlab('IR') +
scale_fill_grey() +
scale_y_continuous(
breaks = function(x) seq(ceiling(x[1]), floor(x[2]), by = 1),
expand = expand_scale(mult = c(0, 0.05))
) +
theme(axis.text.x=element_text(colour="black", angle = 45, hjust = 1),
axis.text.y=element_text(colour="Black"),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.border = element_blank(),
panel.background = element_blank(),
axis.ticks.x = element_blank())
I found this solution from Joshua Cook and worked pretty well.
integer_breaks <- function(n = 5, ...) {
fxn <- function(x) {
breaks <- floor(pretty(x, n, ...))
names(breaks) <- attr(breaks, "labels")
breaks
}
return(fxn)
}
q + geom_bar(position='dodge', colour='black') +
scale_y_continuous(breaks = integer_breaks())
The source is:
https://joshuacook.netlify.app/post/integer-values-ggplot-axis/
You can use the accuracy argument of scales::label_number() or scales::label_comma() for this:
fakedata <- data.frame(
x = 1:5,
y = c(0.1, 1.2, 2.4, 2.9, 2.2)
)
library(ggplot2)
# without the accuracy argument, you see .0 decimals
ggplot(fakedata, aes(x = x, y = y)) +
geom_point() +
scale_y_continuous(label = scales::comma)
# with the accuracy argument, all displayed numbers are integers
ggplot(fakedata, aes(x = x, y = y)) +
geom_point() +
scale_y_continuous(label = ~ scales::comma(.x, accuracy = 1))
# equivalent
ggplot(fakedata, aes(x = x, y = y)) +
geom_point() +
scale_y_continuous(label = scales::label_comma(accuracy = 1))
# this works with scales::label_number() as well
ggplot(fakedata, aes(x = x, y = y)) +
geom_point() +
scale_y_continuous(label = scales::label_number(accuracy = 1))
Created on 2021-08-27 by the reprex package (v2.0.0.9000)
All of the existing answers seem to require custom functions or fail in some cases.
This line makes integer breaks:
bad_scale_plot +
scale_y_continuous(breaks = scales::breaks_extended(Q = c(1, 5, 2, 4, 3)))
For more info, see the documentation ?labeling::extended (which is a function called by scales::breaks_extended).
Basically, the argument Q is a set of nice numbers that the algorithm tries to use for scale breaks. The original plot produces non-integer breaks (0, 2.5, 5, and 7.5) because the default value for Q includes 2.5: Q = c(1,5,2,2.5,4,3).
EDIT: as pointed out in a comment, non-integer breaks can occur when the y-axis has a small range. By default, breaks_extended() tries to make about n = 5 breaks, which is impossible when the range is too small. Quick testing shows that ranges wider than 0 < y < 2.5 give integer breaks (n can also be decreased manually).
This answer builds on #Axeman's answer to address the comment by kory that if the data only goes from 0 to 1, no break is shown at 1. This seems to be because of inaccuracy in pretty with outputs which appear to be 1 not being identical to 1 (see example at the end).
Therefore if you use
int_breaks_rounded <- function(x, n = 5) pretty(x, n)[round(pretty(x, n),1) %% 1 == 0]
with
+ scale_y_continuous(breaks = int_breaks_rounded)
both 0 and 1 are shown as breaks.
Example to illustrate difference from Axeman's
testdata <- data.frame(x = 1:5, y = c(0,1,0,1,1))
p1 <- ggplot(testdata, aes(x = x, y = y))+
geom_point()
p1 + scale_y_continuous(breaks = int_breaks)
p1 + scale_y_continuous(breaks = int_breaks_rounded)
Both will work with the data provided in the initial question.
Illustration of why rounding is required
pretty(c(0,1.05),5)
#> [1] 0.0 0.2 0.4 0.6 0.8 1.0 1.2
identical(pretty(c(0,1.05),5)[6],1)
#> [1] FALSE
Google brought me to this question. I'm trying to use real numbers in a y scale. The y scale numbers are in Millions.
The scales package comma method introduces a comma to my large numbers. This post on R-Bloggers explains a simple approach using the comma method:
library(scales)
big_numbers <- data.frame(x = 1:5, y = c(1000000:1000004))
big_numbers_plot <- ggplot(big_numbers, aes(x = x, y = y))+
geom_point()
big_numbers_plot + scale_y_continuous(labels = comma)
Enjoy R :)
One answer is indeed inside the documentation of the pretty() function. As pointed out here Setting axes to integer values in 'ggplot2' the function contains already the solution. You have just to make it work for small values. One possibility is writing a new function like the author does, for me a lambda function inside the breaks argument just works:
... + scale_y_continuous(breaks = ~round(unique(pretty(.))
It will round the unique set of values generated by pretty() creating only integer labels, no matter the scale of values.
If your values are integers, here is another way of doing this with group = 1 and as.factor(value):
library(tidyverse)
data3<-structure(list(IR = structure(c(4L, 3L, 2L, 1L, 4L, 3L, 2L, 1L
), .Label = c("0.13-0.16", "0.17-0.23", "0.24-0.27", "0.28-1"
), class = "factor"), variable = structure(c(1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L), .Label = c("Real queens", "Simulated individuals"
), class = "factor"), value = c(2L, 2L, 6L, 10L, 0L, 1L, 4L,
4L), Legend = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), .Label = c("Real queens",
"Simulated individuals"), class = "factor")), .Names = c("IR",
"variable", "value", "Legend"), row.names = c(NA, -8L), class = "data.frame")
data3 %>%
mutate(value = as.factor(value)) %>%
ggplot(aes(x =factor(IR), y = value, fill = Legend, width=.15)) +
geom_col(position = 'dodge', colour='black', group = 1)
Created on 2022-04-05 by the reprex package (v2.0.1)
This is what I did
scale_x_continuous(labels = function(x) round(as.numeric(x)))

How to display only integer values on an axis using ggplot2

I have the following plot:
library(reshape)
library(ggplot2)
library(gridExtra)
require(ggplot2)
data2<-structure(list(IR = structure(c(4L, 3L, 2L, 1L, 4L, 3L, 2L, 1L
), .Label = c("0.13-0.16", "0.17-0.23", "0.24-0.27", "0.28-1"
), class = "factor"), variable = structure(c(1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L), .Label = c("Real queens", "Simulated individuals"
), class = "factor"), value = c(15L, 11L, 29L, 42L, 0L, 5L, 21L,
22L), Legend = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), .Label = c("Real queens",
"Simulated individuals"), class = "factor")), .Names = c("IR",
"variable", "value", "Legend"), row.names = c(NA, -8L), class = "data.frame")
p <- ggplot(data2, aes(x =factor(IR), y = value, fill = Legend, width=.15))
data3<-structure(list(IR = structure(c(4L, 3L, 2L, 1L, 4L, 3L, 2L, 1L
), .Label = c("0.13-0.16", "0.17-0.23", "0.24-0.27", "0.28-1"
), class = "factor"), variable = structure(c(1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L), .Label = c("Real queens", "Simulated individuals"
), class = "factor"), value = c(2L, 2L, 6L, 10L, 0L, 1L, 4L,
4L), Legend = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), .Label = c("Real queens",
"Simulated individuals"), class = "factor")), .Names = c("IR",
"variable", "value", "Legend"), row.names = c(NA, -8L), class = "data.frame")
q<- ggplot(data3, aes(x =factor(IR), y = value, fill = Legend, width=.15))
##the plot##
q + geom_bar(position='dodge', colour='black') + ylab('Frequency') + xlab('IR')+scale_fill_grey() +theme(axis.text.x=element_text(colour="black"), axis.text.y=element_text(colour="Black"))+ opts(title='', panel.grid.major = theme_blank(),panel.grid.minor = theme_blank(),panel.border = theme_blank(),panel.background = theme_blank(), axis.ticks.x = theme_blank())
I want the y-axis to display only integers. Whether this is accomplished through rounding or through a more elegant method isn't really important to me.
If you have the scales package, you can use pretty_breaks() without having to manually specify the breaks.
q + geom_bar(position='dodge', colour='black') +
scale_y_continuous(breaks= pretty_breaks())
This is what I use:
ggplot(data3, aes(x = factor(IR), y = value, fill = Legend, width = .15)) +
geom_col(position = 'dodge', colour = 'black') +
scale_y_continuous(breaks = function(x) unique(floor(pretty(seq(0, (max(x) + 1) * 1.1)))))
With scale_y_continuous() and argument breaks= you can set the breaking points for y axis to integers you want to display.
ggplot(data2, aes(x =factor(IR), y = value, fill = Legend, width=.15)) +
geom_bar(position='dodge', colour='black')+
scale_y_continuous(breaks=c(1,3,7,10))
You can use a custom labeller. For example, this function guarantees to only produce integer breaks:
int_breaks <- function(x, n = 5) {
l <- pretty(x, n)
l[abs(l %% 1) < .Machine$double.eps ^ 0.5]
}
Use as
+ scale_y_continuous(breaks = int_breaks)
It works by taking the default breaks, and only keeping those that are integers. If it is showing too few breaks for your data, increase n, e.g.:
+ scale_y_continuous(breaks = function(x) int_breaks(x, n = 10))
These solutions did not work for me and did not explain the solutions.
The breaks argument to the scale_*_continuous functions can be used with a custom function that takes the limits as input and returns breaks as output. By default, the axis limits will be expanded by 5% on each side for continuous data (relative to the range of data). The axis limits will likely not be integer values due to this expansion.
The solution I was looking for was to simply round the lower limit up to the nearest integer, round the upper limit down to the nearest integer, and then have breaks at integer values between these endpoints. Therefore, I used the breaks function:
brk <- function(x) seq(ceiling(x[1]), floor(x[2]), by = 1)
The required code snippet is:
scale_y_continuous(breaks = function(x) seq(ceiling(x[1]), floor(x[2]), by = 1))
The reproducible example from original question is:
data3 <-
structure(
list(
IR = structure(
c(4L, 3L, 2L, 1L, 4L, 3L, 2L, 1L),
.Label = c("0.13-0.16", "0.17-0.23", "0.24-0.27", "0.28-1"),
class = "factor"
),
variable = structure(
c(1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L),
.Label = c("Real queens", "Simulated individuals"),
class = "factor"
),
value = c(2L, 2L, 6L, 10L, 0L, 1L, 4L,
4L),
Legend = structure(
c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L),
.Label = c("Real queens",
"Simulated individuals"),
class = "factor"
)
),
row.names = c(NA,-8L),
class = "data.frame"
)
ggplot(data3, aes(
x = factor(IR),
y = value,
fill = Legend,
width = .15
)) +
geom_col(position = 'dodge', colour = 'black') + ylab('Frequency') + xlab('IR') +
scale_fill_grey() +
scale_y_continuous(
breaks = function(x) seq(ceiling(x[1]), floor(x[2]), by = 1),
expand = expand_scale(mult = c(0, 0.05))
) +
theme(axis.text.x=element_text(colour="black", angle = 45, hjust = 1),
axis.text.y=element_text(colour="Black"),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.border = element_blank(),
panel.background = element_blank(),
axis.ticks.x = element_blank())
I found this solution from Joshua Cook and worked pretty well.
integer_breaks <- function(n = 5, ...) {
fxn <- function(x) {
breaks <- floor(pretty(x, n, ...))
names(breaks) <- attr(breaks, "labels")
breaks
}
return(fxn)
}
q + geom_bar(position='dodge', colour='black') +
scale_y_continuous(breaks = integer_breaks())
The source is:
https://joshuacook.netlify.app/post/integer-values-ggplot-axis/
You can use the accuracy argument of scales::label_number() or scales::label_comma() for this:
fakedata <- data.frame(
x = 1:5,
y = c(0.1, 1.2, 2.4, 2.9, 2.2)
)
library(ggplot2)
# without the accuracy argument, you see .0 decimals
ggplot(fakedata, aes(x = x, y = y)) +
geom_point() +
scale_y_continuous(label = scales::comma)
# with the accuracy argument, all displayed numbers are integers
ggplot(fakedata, aes(x = x, y = y)) +
geom_point() +
scale_y_continuous(label = ~ scales::comma(.x, accuracy = 1))
# equivalent
ggplot(fakedata, aes(x = x, y = y)) +
geom_point() +
scale_y_continuous(label = scales::label_comma(accuracy = 1))
# this works with scales::label_number() as well
ggplot(fakedata, aes(x = x, y = y)) +
geom_point() +
scale_y_continuous(label = scales::label_number(accuracy = 1))
Created on 2021-08-27 by the reprex package (v2.0.0.9000)
All of the existing answers seem to require custom functions or fail in some cases.
This line makes integer breaks:
bad_scale_plot +
scale_y_continuous(breaks = scales::breaks_extended(Q = c(1, 5, 2, 4, 3)))
For more info, see the documentation ?labeling::extended (which is a function called by scales::breaks_extended).
Basically, the argument Q is a set of nice numbers that the algorithm tries to use for scale breaks. The original plot produces non-integer breaks (0, 2.5, 5, and 7.5) because the default value for Q includes 2.5: Q = c(1,5,2,2.5,4,3).
EDIT: as pointed out in a comment, non-integer breaks can occur when the y-axis has a small range. By default, breaks_extended() tries to make about n = 5 breaks, which is impossible when the range is too small. Quick testing shows that ranges wider than 0 < y < 2.5 give integer breaks (n can also be decreased manually).
This answer builds on #Axeman's answer to address the comment by kory that if the data only goes from 0 to 1, no break is shown at 1. This seems to be because of inaccuracy in pretty with outputs which appear to be 1 not being identical to 1 (see example at the end).
Therefore if you use
int_breaks_rounded <- function(x, n = 5) pretty(x, n)[round(pretty(x, n),1) %% 1 == 0]
with
+ scale_y_continuous(breaks = int_breaks_rounded)
both 0 and 1 are shown as breaks.
Example to illustrate difference from Axeman's
testdata <- data.frame(x = 1:5, y = c(0,1,0,1,1))
p1 <- ggplot(testdata, aes(x = x, y = y))+
geom_point()
p1 + scale_y_continuous(breaks = int_breaks)
p1 + scale_y_continuous(breaks = int_breaks_rounded)
Both will work with the data provided in the initial question.
Illustration of why rounding is required
pretty(c(0,1.05),5)
#> [1] 0.0 0.2 0.4 0.6 0.8 1.0 1.2
identical(pretty(c(0,1.05),5)[6],1)
#> [1] FALSE
Google brought me to this question. I'm trying to use real numbers in a y scale. The y scale numbers are in Millions.
The scales package comma method introduces a comma to my large numbers. This post on R-Bloggers explains a simple approach using the comma method:
library(scales)
big_numbers <- data.frame(x = 1:5, y = c(1000000:1000004))
big_numbers_plot <- ggplot(big_numbers, aes(x = x, y = y))+
geom_point()
big_numbers_plot + scale_y_continuous(labels = comma)
Enjoy R :)
One answer is indeed inside the documentation of the pretty() function. As pointed out here Setting axes to integer values in 'ggplot2' the function contains already the solution. You have just to make it work for small values. One possibility is writing a new function like the author does, for me a lambda function inside the breaks argument just works:
... + scale_y_continuous(breaks = ~round(unique(pretty(.))
It will round the unique set of values generated by pretty() creating only integer labels, no matter the scale of values.
If your values are integers, here is another way of doing this with group = 1 and as.factor(value):
library(tidyverse)
data3<-structure(list(IR = structure(c(4L, 3L, 2L, 1L, 4L, 3L, 2L, 1L
), .Label = c("0.13-0.16", "0.17-0.23", "0.24-0.27", "0.28-1"
), class = "factor"), variable = structure(c(1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L), .Label = c("Real queens", "Simulated individuals"
), class = "factor"), value = c(2L, 2L, 6L, 10L, 0L, 1L, 4L,
4L), Legend = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), .Label = c("Real queens",
"Simulated individuals"), class = "factor")), .Names = c("IR",
"variable", "value", "Legend"), row.names = c(NA, -8L), class = "data.frame")
data3 %>%
mutate(value = as.factor(value)) %>%
ggplot(aes(x =factor(IR), y = value, fill = Legend, width=.15)) +
geom_col(position = 'dodge', colour='black', group = 1)
Created on 2022-04-05 by the reprex package (v2.0.1)
This is what I did
scale_x_continuous(labels = function(x) round(as.numeric(x)))

Resources