Ordering bars according to the values of y [duplicate] - r

This question already has answers here:
Set the order of a stacked bar chart by the value of one of the variables
(2 answers)
Closed 9 months ago.
Using the code below, I have created the below chart. To make it easier for people to see the pattern, I'd like to order states from left to right according to the y values (Dx) by age 65.
Thanks,
NM
Here is my data:
structure(list(Age = structure(c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), .Label = c("30", "50", "65"), class = "factor"), Dx = c(3.057, 7.847, 17.157, 2.851, 8.861, 21.885, 2.521, 7.889, 21.328), PopName = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L), .Label = c("AK", "AL", "AR"), class = "factor")), row.names = c(NA, -9L), class = c("tbl_df", "tbl", "data.frame"))
library(tidyverse)
library(tidyverse)
CAPS_2019 %>%
group_by(Age, PopName) %>%
mutate(PopName1 = sum(Dx)) %>%
ungroup() %>%
ggplot(aes(x = fct_reorder(PopName, PopName1), y = Dx, fill = factor(as.character(Age)))) +
geom_col(position = position_stack(reverse = TRUE)) +
theme_classic()+
coord_flip()+
labs(x = "State", y = "Deaths (%)", caption = (""), face = "bold", fill = "Age")

Update 2 Try this in your new dataset Age and Popname are already factors. So maybe this should work as expected:
CAPS_2019_data %>%
group_by(Age, PopName) %>%
mutate(PopName1 = sum(Dx)) %>%
ungroup() %>%
ggplot(aes(x = reorder(PopName, PopName1), y = Dx, fill = Age)) +
geom_col(position = position_stack(reverse = TRUE)) +
theme_classic()+
coord_flip()+
labs(x = "State", y = "Deaths (%)", caption = (""), face = "bold", fill = "Age")
Update:
data:
CAPS_2019 <- structure(list(Age = structure(c(1L, 2L, 3L, 1L, 2L, 3L, 1L,
2L, 3L), .Label = c("30", "50", "65"), class = "factor"), Dx = c(3.057,
7.847, 17.157, 2.851, 8.861, 21.885, 2.521, 7.889, 21.328), PopName = structure(c(1L,
1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L), .Label = c("AK", "AL", "AR"), class = "factor")), row.names = c(NA,
-9L), class = c("tbl_df", "tbl", "data.frame"))
To get the stacks ordered use position = position_stack(reverse = TRUE)
To order y axis do some preprocessing with group_by and sum and use fct_reorder from forcats package (it is in tidyverse)
library(tidyverse)
CAPS_2019 %>%
group_by(Age, PopName) %>%
mutate(PopName1 = sum(Dx)) %>%
ungroup() %>%
ggplot(aes(x = fct_reorder(PopName, PopName1), y = Dx, fill = factor(as.character(Age)))) +
geom_col(position = position_stack(reverse = TRUE)) +
theme_classic()+
coord_flip()+
labs(x = "State", y = "Deaths (%)", caption = (""), face = "bold", fill = "Age")

Related

Control order across factors in ggplot2

I have a plot that looks like below. I want to change the order so that the larger value comes first (so cyan would precede red). But I can't seem to do this. What am I doing wrong?
This is my current code block so far:
ggplot(df, aes(x = Gene.Set.Size, y = OR, label =P.value, color = Method, group = Method)) +
geom_point(position=position_dodge(width=0.5)) +
ggrepel::geom_text_repel(size = 6, box.padding = 1, segment.angle = 20, position=position_dodge(width=0.5))+
geom_pointrange(aes(ymax = UpperCI, ymin = LowerCI),position=position_dodge(width=0.5)) +
theme_bw() +
theme(text=element_text(size=25),axis.text.x = element_text(angle = 45, hjust = 1)) +
ylab("Odds ratio") +
xlab("Gene set size") +
theme(plot.margin = unit(c(2,2,2,2), "cm"))
> dput(df)
structure(list(Method = structure(c(1L, 1L, 1L, 2L, 2L, 2L), .Label = c("MAGMA",
"Pairwise"), class = "factor"), P.value = c(8.74e-28, 1.33e-56,
5.57e-92, 1.63e-44, 4.23e-71, 2.78e-95), OR = c(1.39, 1.424668,
1.4, 1.513, 1.478208, 1.409563), UpperCI = c(1.481491, 1.487065,
1.446039, 1.601557, 1.417117, 1.455425), LowerCI = c(1.316829,
1.364601, 1.356358, 1.42, 1.541768, 1.365056), Gene.Set.Size = structure(c(1L,
2L, 3L, 1L, 2L, 3L), .Label = c("500", "1000", "2000"), class = "factor")), row.names = c(NA,
-6L), class = "data.frame")
You must set the factor order.
library(ggplot2)
df <- structure(list(Method = structure(c(1L, 1L, 1L, 2L, 2L, 2L), .Label = c("MAGMA",
"Pairwise"), class = "factor"), P.value = c(8.74e-28, 1.33e-56,
5.57e-92, 1.63e-44, 4.23e-71, 2.78e-95), OR = c(1.39, 1.424668,
1.4, 1.513, 1.478208, 1.409563), UpperCI = c(1.481491, 1.487065,
1.446039, 1.601557, 1.417117, 1.455425), LowerCI = c(1.316829,
1.364601, 1.356358, 1.42, 1.541768, 1.365056), Gene.Set.Size = structure(c(1L,
2L, 3L, 1L, 2L, 3L), .Label = c("500", "1000", "2000"), class = "factor")), row.names = c(NA,
-6L), class = "data.frame")
#reorder Factor
df$Method = factor(df$Method, levels=c("Pairwise", "MAGMA"))
ggplot(df, aes(x=Gene.Set.Size, y=OR, label=P.value,
group= Method, color=Method)) +
geom_point(position=position_dodge(width=0.5)) +
ggrepel::geom_text_repel(size = 6, box.padding = 1, segment.angle = 20, position=position_dodge(width=0.5))+
geom_pointrange(aes(ymax = UpperCI, ymin = LowerCI),position=position_dodge(width=0.5)) +
theme_bw() +
theme(text=element_text(size=25),axis.text.x = element_text(angle = 45, hjust = 1)) +
ylab("Odds ratio") +
xlab("Gene set size") +
theme(plot.margin = unit(c(2,2,2,2), "cm"))
df %>% mutate(Method = fct_relevel(Method, 'Pairwise')) %>% <<your ggplot2 code>
should do the job, assuming you have imported the tidyverse pipe operator %>% and the forcats package, which you can do with require(tidyverse)
You can simply reverse the ordering of the Method factor with forcats::fct_rev.
df$Method <- fct_rev(df$Method)
Alternatively, you can specify the first level when you initially converted that column to a factor.

connect points within position_dodged factor x-axis in ggplot2

I'm trying to add significance annotations to an errorbar plot with a factor x-axis and dodged groups within each level of the x-axis. It is a similar but NOT identical use case to this
My base errorbar plot is:
library(ggplot2)
library(dplyr)
pres_prob_pd = structure(list(x = structure(c(1, 1, 1, 2, 2, 2, 3, 3, 3), labels = c(`1` = 1,
`2` = 2, `3` = 3)), predicted = c(0.571584427222816, 0.712630712634987,
0.156061969566517, 0.0162388386564817, 0.0371877245103279, 0.0165022541901018,
0.131528946944238, 0.35927812866896, 0.0708662221985375), std.error = c(0.355802875027348,
0.471253661425626, 0.457109887762665, 0.352871728451576, 0.442646879181155,
0.425913568532558, 0.376552208691762, 0.48178172708116, 0.451758041335245
), conf.low = c(0.399141779923204, 0.496138837620712, 0.0701919316506831,
0.00819832576725402, 0.0159620304815404, 0.00722904089045731,
0.0675129352870401, 0.17905347369819, 0.030504893442457), conf.high = c(0.728233665534388,
0.861980236164486, 0.311759350126477, 0.031911364587827, 0.0842227723261319,
0.0372248587668487, 0.240584344249407, 0.590437963881823, 0.156035177669385
), group = structure(c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), .Label = c("certain",
"neutral", "uncertain"), class = "factor"), group_col = structure(c(1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), .Label = c("certain", "neutral",
"uncertain"), class = "factor"), language = structure(c(2L, 2L,
2L, 1L, 1L, 1L, 3L, 3L, 3L), .Label = c("english", "dutch", "german"
), class = "factor"), top = c(0.861980236164486, 0.861980236164486,
0.861980236164486, 0.0842227723261319, 0.0842227723261319, 0.0842227723261319,
0.590437963881823, 0.590437963881823, 0.590437963881823)), row.names = c(NA,
-9L), groups = structure(list(language = structure(1:3, .Label = c("english",
"dutch", "german"), class = "factor"), .rows = structure(list(
4:6, 1:3, 7:9), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), row.names = c(NA, 3L), class = c("tbl_df",
"tbl", "data.frame"), .drop = TRUE), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"))
#dodge
pd = position_dodge(.75)
#plot
p = ggplot(pres_prob_pd,aes(x=language,y=predicted,color=group,shape=group)) +
geom_point(position=pd,size=2) +
geom_errorbar(aes(ymax=conf.high,ymin=conf.low),width=.125,position=pd)
p
What I want to do is annotate the plot such that the contrasts between group within each level of language are annotated for significance. I've plotted points representing the relevant contrasts and (toy) sig. annotations as follows:
#bump function
f = function(x){
v = c()
bump=0.025
constant = 0
for(i in x){
v = c(v,i+constant+bump)
bump = bump + 0.075
}
v
}
#create contrasts
combs = data.frame(gtools::combinations(3, 2, v=c("certain", "neutral", "uncertain"), set=F, repeats.allowed=F)) %>%
mutate(contrast=c("cont_1","cont_2","cont_3"))
combs = rbind(combs %>% mutate(language = 'english'),
combs %>% mutate(language='dutch'),
combs %>% mutate(language = "german")) %>%
left_join(select(pres_prob_pd,language:top)%>%distinct(),by='language') %>%
group_by(language)
#long transform and calc y_pos
combs_long = mutate(combs,y_pos=f(top)) %>% gather(long, probability, X1:X2, factor_key=TRUE) %>% mutate(language=factor(language,levels=c("english","dutch","german"))) %>%
arrange(language,contrast)
#back to wide
combs_wide =combs_long %>% spread(long,probability)
combs_wide$p = rep(c('***',"*","ns"),3)
#plot
p +
geom_point(data=combs_long,
aes(x = language,
color=probability,
shape=probability,
y=y_pos),
inherit.aes = T,
position=pd,
size=2) +
geom_text(data=combs_wide,
aes(x=language,
label=p,
y=y_pos+.025,
group=X1),
color='black',
position=position_dodge(.75),
inherit.aes = F)
What I am failing to achieve is plotting a line connecting each of the contrasts of group within each level of language, as is standard when annotating significant group-wise differences. Any help much appreciated!

ggplot add aggregated summaries to a bar plot

I have the following data frame:
structure(list(StepsGroup = structure(c(1L, 1L, 1L, 2L, 2L, 2L,
3L, 3L, 3L), .Label = c("(-Inf,3e+03]", "(3e+03,1.2e+04]", "(1.2e+04, Inf]"
), class = "factor"), GlucoseGroup = structure(c(1L, 2L, 3L,
1L, 2L, 3L, 1L, 2L, 3L), .Label = c("<100", "100-180", ">180"
), class = "factor"), n = c(396L, 1600L, 229L, 787L, 4182L, 375L,
110L, 534L, 55L), freq = c(0.177977528089888, 0.719101123595506,
0.102921348314607, 0.147267964071856, 0.782559880239521, 0.0701721556886228,
0.157367668097282, 0.763948497854077, 0.0786838340486409)), class =
c("grouped_df",
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -9L), vars = "StepsGroup",
labels = structure(list(
StepsGroup = structure(1:3, .Label = c("(-Inf,3e+03]", "(3e+03,1.2e+04]",
"(1.2e+04, Inf]"), class = "factor")), class = "data.frame", row.names =
c(NA, -3L), vars = "StepsGroup", drop = TRUE), indices = list(0:2,
3:5, 6:8), drop = TRUE, group_sizes = c(3L, 3L, 3L), biggest_group_size =
3L)
I would like to create a stacked bar plot, and add a summary of each StepsGroup on top of each bar. So the first group will have 2225, the second 5344 and the third 699.
I am using the following script:
ggplot(d_stepsFastingSummary , aes(y = freq, x = StepsGroup, fill =
GlucoseGroup)) + geom_bar(stat = "identity") +
geom_text(aes(label = sum(n()), vjust = 0))
The part until before the geom_text works, but for the last bit I get the following error:
Error: This function should not be called directly
Any idea how to add the aggregated quantity?
We could create a new dataframe stacked_df which would have sum for each StepsGroup
stacked_df <- df %>% group_by(StepsGroup) %>% summarise(nsum = sum(n))
ggplot(df) +
geom_bar(aes(y = freq, x = StepsGroup, fill= GlucoseGroup),stat = "identity") +
geom_text(data = stacked_df, aes(label = nsum, StepsGroup,y = 1.1))

R: ggplot annotate geom_text with position_dodge and facets

I have an R dataframe data (made with dplyr) that I'm trying to plot with ggplot():
require(dplyr)
data <- structure(list(gGroup = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L, 3L, 3L, 3L, 3L), .Label = c("MC", "R", "UC"), class = "factor"),
Episode = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L), .Label = c("Morning", "Day", "Night", "24 hour"
), class = "factor"), variable = c("HF", "HF", "LF", "LF",
"HF", "HF", "LF", "LF", "HF", "HF", "LF", "LF"), parameter = c("RR",
"RT", "RR", "RT", "RR", "RT", "RR", "RT", "RR", "RT", "RR",
"RT"), mean = c(3.90575222833804, 4.24572828952087, 5.14491629837998,
3.88189313775535, 4.02908403079823, 3.91129824615597, 4.73913642980089,
3.63973850905423, 4.66445796048274, 4.21723744674943, 5.57765585365275,
4.01444148455851), sd = c(1.09129154084895, 1.43102672123806,
1.17782114274004, 1.33381488706382, 1.33497319178289, 1.22259231099975,
1.33329948427898, 1.09625319168102, 1.19876558625356, 1.73746797295816,
1.05862249404741, 1.91144835753868), se = c(0.199241664579179,
0.261268538538247, 0.215039736195078, 0.243520167060353,
0.471984298305965, 0.432251656867227, 0.471392553343098,
0.387584032867524, 0.215304655178374, 0.312058460044998,
0.190134212775724, 0.343306259564318)), .Names = c("gGroup",
"Episode", "variable", "parameter", "mean", "sd", "se"), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -12L), drop = TRUE, indices = list(
0:1, 2:3, 4:5, 6:7, 8:9, 10:11), group_sizes = c(2L, 2L,
2L, 2L, 2L, 2L), biggest_group_size = 2L, labels = structure(list(
gGroup = structure(c(1L, 1L, 2L, 2L, 3L, 3L), .Label = c("MC",
"R", "UC"), class = "factor"), Episode = structure(c(2L,
2L, 2L, 2L, 2L, 2L), .Label = c("Morning", "Day", "Night",
"24 hour"), class = "factor"), variable = c("HF", "LF", "HF",
"LF", "HF", "LF")), .Names = c("gGroup", "Episode", "variable"
), class = "data.frame", row.names = c(NA, -6L)))
Currently I'm using the following code to plot:
require(ggplot2)
require(ggthemes)
pd <- position_dodge(width=0.9)
p <- ggplot(data, aes(x = gGroup, y = mean, fill = variable)) +
facet_grid(parameter~Episode) +
geom_bar(stat="identity", position=pd) +
geom_errorbar(aes(ymin = mean-se, ymax = mean+se), width = .3, position=pd) +
theme_hc() + scale_fill_hc() +
labs(y = "Logit transform of spectral power (m/s2), meanĀ±SE", x= NULL)
ann_text <- data.frame(gGroup = "MC", mean = 6, variable = "LF", parameter = "RR", Episode = "Day")
p + geom_text(aes(ymax = 6.5, width = .2), data = ann_text, label="*", position=pd)
This gives me the following plot:
I'm quite satisfied with the result, but as you can see the asterisk isn't aligned correctly. I looked it up online, and I read this and this and the manual.
Everyone I see the suggestions to use position=position_dodge(width=0.9), but this doesn't help for me. I tried hjust to maybe move the asterisk to the right position, but that's of no use either. Funny thing is that my error bars are aligned correctly.
I feel like I'm overlooking something very simple, but I cannot figure out what it is.
I'm using R 3.1.3 on OSX 10.10.2, and loading the newest versions of ggplot2 and ggthemes.
In order for position_dodge to work, there needs to be a reason to dodge. That is you need to change the ann_text appropriately with variable = c("LF", "HF"), so that there is a reason to dodge. Then just define the label appropriately. Below I assume you just want the * over the LF bar.
ann_text <- data.frame(gGroup = rep("MC",2),
mean = 6,
variable = c("LF", 'HF'),
label = c("*", ""),
parameter = "RR",
Episode = "Day")
p + geom_text(aes(ymax = 6.5, width = .2, label = label), data = ann_text, position=pd)

How to fix the following output plot by R? [duplicate]

I have the following plot:
library(reshape)
library(ggplot2)
library(gridExtra)
require(ggplot2)
data2<-structure(list(IR = structure(c(4L, 3L, 2L, 1L, 4L, 3L, 2L, 1L
), .Label = c("0.13-0.16", "0.17-0.23", "0.24-0.27", "0.28-1"
), class = "factor"), variable = structure(c(1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L), .Label = c("Real queens", "Simulated individuals"
), class = "factor"), value = c(15L, 11L, 29L, 42L, 0L, 5L, 21L,
22L), Legend = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), .Label = c("Real queens",
"Simulated individuals"), class = "factor")), .Names = c("IR",
"variable", "value", "Legend"), row.names = c(NA, -8L), class = "data.frame")
p <- ggplot(data2, aes(x =factor(IR), y = value, fill = Legend, width=.15))
data3<-structure(list(IR = structure(c(4L, 3L, 2L, 1L, 4L, 3L, 2L, 1L
), .Label = c("0.13-0.16", "0.17-0.23", "0.24-0.27", "0.28-1"
), class = "factor"), variable = structure(c(1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L), .Label = c("Real queens", "Simulated individuals"
), class = "factor"), value = c(2L, 2L, 6L, 10L, 0L, 1L, 4L,
4L), Legend = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), .Label = c("Real queens",
"Simulated individuals"), class = "factor")), .Names = c("IR",
"variable", "value", "Legend"), row.names = c(NA, -8L), class = "data.frame")
q<- ggplot(data3, aes(x =factor(IR), y = value, fill = Legend, width=.15))
##the plot##
q + geom_bar(position='dodge', colour='black') + ylab('Frequency') + xlab('IR')+scale_fill_grey() +theme(axis.text.x=element_text(colour="black"), axis.text.y=element_text(colour="Black"))+ opts(title='', panel.grid.major = theme_blank(),panel.grid.minor = theme_blank(),panel.border = theme_blank(),panel.background = theme_blank(), axis.ticks.x = theme_blank())
I want the y-axis to display only integers. Whether this is accomplished through rounding or through a more elegant method isn't really important to me.
If you have the scales package, you can use pretty_breaks() without having to manually specify the breaks.
q + geom_bar(position='dodge', colour='black') +
scale_y_continuous(breaks= pretty_breaks())
This is what I use:
ggplot(data3, aes(x = factor(IR), y = value, fill = Legend, width = .15)) +
geom_col(position = 'dodge', colour = 'black') +
scale_y_continuous(breaks = function(x) unique(floor(pretty(seq(0, (max(x) + 1) * 1.1)))))
With scale_y_continuous() and argument breaks= you can set the breaking points for y axis to integers you want to display.
ggplot(data2, aes(x =factor(IR), y = value, fill = Legend, width=.15)) +
geom_bar(position='dodge', colour='black')+
scale_y_continuous(breaks=c(1,3,7,10))
You can use a custom labeller. For example, this function guarantees to only produce integer breaks:
int_breaks <- function(x, n = 5) {
l <- pretty(x, n)
l[abs(l %% 1) < .Machine$double.eps ^ 0.5]
}
Use as
+ scale_y_continuous(breaks = int_breaks)
It works by taking the default breaks, and only keeping those that are integers. If it is showing too few breaks for your data, increase n, e.g.:
+ scale_y_continuous(breaks = function(x) int_breaks(x, n = 10))
These solutions did not work for me and did not explain the solutions.
The breaks argument to the scale_*_continuous functions can be used with a custom function that takes the limits as input and returns breaks as output. By default, the axis limits will be expanded by 5% on each side for continuous data (relative to the range of data). The axis limits will likely not be integer values due to this expansion.
The solution I was looking for was to simply round the lower limit up to the nearest integer, round the upper limit down to the nearest integer, and then have breaks at integer values between these endpoints. Therefore, I used the breaks function:
brk <- function(x) seq(ceiling(x[1]), floor(x[2]), by = 1)
The required code snippet is:
scale_y_continuous(breaks = function(x) seq(ceiling(x[1]), floor(x[2]), by = 1))
The reproducible example from original question is:
data3 <-
structure(
list(
IR = structure(
c(4L, 3L, 2L, 1L, 4L, 3L, 2L, 1L),
.Label = c("0.13-0.16", "0.17-0.23", "0.24-0.27", "0.28-1"),
class = "factor"
),
variable = structure(
c(1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L),
.Label = c("Real queens", "Simulated individuals"),
class = "factor"
),
value = c(2L, 2L, 6L, 10L, 0L, 1L, 4L,
4L),
Legend = structure(
c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L),
.Label = c("Real queens",
"Simulated individuals"),
class = "factor"
)
),
row.names = c(NA,-8L),
class = "data.frame"
)
ggplot(data3, aes(
x = factor(IR),
y = value,
fill = Legend,
width = .15
)) +
geom_col(position = 'dodge', colour = 'black') + ylab('Frequency') + xlab('IR') +
scale_fill_grey() +
scale_y_continuous(
breaks = function(x) seq(ceiling(x[1]), floor(x[2]), by = 1),
expand = expand_scale(mult = c(0, 0.05))
) +
theme(axis.text.x=element_text(colour="black", angle = 45, hjust = 1),
axis.text.y=element_text(colour="Black"),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.border = element_blank(),
panel.background = element_blank(),
axis.ticks.x = element_blank())
I found this solution from Joshua Cook and worked pretty well.
integer_breaks <- function(n = 5, ...) {
fxn <- function(x) {
breaks <- floor(pretty(x, n, ...))
names(breaks) <- attr(breaks, "labels")
breaks
}
return(fxn)
}
q + geom_bar(position='dodge', colour='black') +
scale_y_continuous(breaks = integer_breaks())
The source is:
https://joshuacook.netlify.app/post/integer-values-ggplot-axis/
You can use the accuracy argument of scales::label_number() or scales::label_comma() for this:
fakedata <- data.frame(
x = 1:5,
y = c(0.1, 1.2, 2.4, 2.9, 2.2)
)
library(ggplot2)
# without the accuracy argument, you see .0 decimals
ggplot(fakedata, aes(x = x, y = y)) +
geom_point() +
scale_y_continuous(label = scales::comma)
# with the accuracy argument, all displayed numbers are integers
ggplot(fakedata, aes(x = x, y = y)) +
geom_point() +
scale_y_continuous(label = ~ scales::comma(.x, accuracy = 1))
# equivalent
ggplot(fakedata, aes(x = x, y = y)) +
geom_point() +
scale_y_continuous(label = scales::label_comma(accuracy = 1))
# this works with scales::label_number() as well
ggplot(fakedata, aes(x = x, y = y)) +
geom_point() +
scale_y_continuous(label = scales::label_number(accuracy = 1))
Created on 2021-08-27 by the reprex package (v2.0.0.9000)
All of the existing answers seem to require custom functions or fail in some cases.
This line makes integer breaks:
bad_scale_plot +
scale_y_continuous(breaks = scales::breaks_extended(Q = c(1, 5, 2, 4, 3)))
For more info, see the documentation ?labeling::extended (which is a function called by scales::breaks_extended).
Basically, the argument Q is a set of nice numbers that the algorithm tries to use for scale breaks. The original plot produces non-integer breaks (0, 2.5, 5, and 7.5) because the default value for Q includes 2.5: Q = c(1,5,2,2.5,4,3).
EDIT: as pointed out in a comment, non-integer breaks can occur when the y-axis has a small range. By default, breaks_extended() tries to make about n = 5 breaks, which is impossible when the range is too small. Quick testing shows that ranges wider than 0 < y < 2.5 give integer breaks (n can also be decreased manually).
This answer builds on #Axeman's answer to address the comment by kory that if the data only goes from 0 to 1, no break is shown at 1. This seems to be because of inaccuracy in pretty with outputs which appear to be 1 not being identical to 1 (see example at the end).
Therefore if you use
int_breaks_rounded <- function(x, n = 5) pretty(x, n)[round(pretty(x, n),1) %% 1 == 0]
with
+ scale_y_continuous(breaks = int_breaks_rounded)
both 0 and 1 are shown as breaks.
Example to illustrate difference from Axeman's
testdata <- data.frame(x = 1:5, y = c(0,1,0,1,1))
p1 <- ggplot(testdata, aes(x = x, y = y))+
geom_point()
p1 + scale_y_continuous(breaks = int_breaks)
p1 + scale_y_continuous(breaks = int_breaks_rounded)
Both will work with the data provided in the initial question.
Illustration of why rounding is required
pretty(c(0,1.05),5)
#> [1] 0.0 0.2 0.4 0.6 0.8 1.0 1.2
identical(pretty(c(0,1.05),5)[6],1)
#> [1] FALSE
Google brought me to this question. I'm trying to use real numbers in a y scale. The y scale numbers are in Millions.
The scales package comma method introduces a comma to my large numbers. This post on R-Bloggers explains a simple approach using the comma method:
library(scales)
big_numbers <- data.frame(x = 1:5, y = c(1000000:1000004))
big_numbers_plot <- ggplot(big_numbers, aes(x = x, y = y))+
geom_point()
big_numbers_plot + scale_y_continuous(labels = comma)
Enjoy R :)
One answer is indeed inside the documentation of the pretty() function. As pointed out here Setting axes to integer values in 'ggplot2' the function contains already the solution. You have just to make it work for small values. One possibility is writing a new function like the author does, for me a lambda function inside the breaks argument just works:
... + scale_y_continuous(breaks = ~round(unique(pretty(.))
It will round the unique set of values generated by pretty() creating only integer labels, no matter the scale of values.
If your values are integers, here is another way of doing this with group = 1 and as.factor(value):
library(tidyverse)
data3<-structure(list(IR = structure(c(4L, 3L, 2L, 1L, 4L, 3L, 2L, 1L
), .Label = c("0.13-0.16", "0.17-0.23", "0.24-0.27", "0.28-1"
), class = "factor"), variable = structure(c(1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L), .Label = c("Real queens", "Simulated individuals"
), class = "factor"), value = c(2L, 2L, 6L, 10L, 0L, 1L, 4L,
4L), Legend = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), .Label = c("Real queens",
"Simulated individuals"), class = "factor")), .Names = c("IR",
"variable", "value", "Legend"), row.names = c(NA, -8L), class = "data.frame")
data3 %>%
mutate(value = as.factor(value)) %>%
ggplot(aes(x =factor(IR), y = value, fill = Legend, width=.15)) +
geom_col(position = 'dodge', colour='black', group = 1)
Created on 2022-04-05 by the reprex package (v2.0.1)
This is what I did
scale_x_continuous(labels = function(x) round(as.numeric(x)))

Resources