I am trying to add labels (letters) above a barplot using ggplot2 function geom_text. My bars are separated using position=position_dodge() and so I need to apply the same for the new labels. However I would like to use also nudge_y to separate the labels from the bar. If I try to use both together R complains that I can use only one of either options. I'd like to do something like this:
Tukey.labels <- geom_text(data=stats,
aes(x=factor(Treatment2), y=x.mean,
label=Tukey.dif),
size=4, nudge_y=3, # move letters in Y
position=position_dodge(0.5)) # move letters in X
To create something like this image Does anybody knows a possibility to shift all my labels the same distance in Y while doing position_dodge at the same time? I could not find answer for this in other posts
Hard to troubleshoot without a reproducible example. Hopefully this helps:
library(dplyr); library(ggplot2)
ggplot(mtcars %>% rownames_to_column("car") ,
aes(as.factor(cyl), mpg, group = car)) +
geom_col(position = position_dodge(0.9)) +
geom_errorbar(aes(ymin = mpg - wt,
ymax = mpg + wt),
position = position_dodge(0.9)) +
geom_text(aes(label = gear, y = mpg + wt), vjust = -0.5,
position = position_dodge(0.9))
In the spirit of the original question, one can easily combine ggplot's position_nudge and position_dodge like this:
position_nudgedodge <- function(x = 0, y = 0, width = 0.75) {
ggproto(NULL, PositionNudgedodge,
x = x,
y = y,
width = width
)
}
PositionNudgedodge <- ggproto("PositionNudgedodge", PositionDodge,
x = 0,
y = 0,
width = 0.3,
setup_params = function(self, data) {
l <- ggproto_parent(PositionDodge,self)$setup_params(data)
append(l, list(x = self$x, y = self$y))
},
compute_layer = function(self, data, params, layout) {
d <- ggproto_parent(PositionNudge,self)$compute_layer(data,params,layout)
d <- ggproto_parent(PositionDodge,self)$compute_layer(d,params,layout)
d
}
)
Then you can use it like this:
Tukey.labels <- geom_text(data=stats,
aes(x=factor(Treatment2), y=x.mean, label=Tukey.dif),
size=4,
position=position_nudgedodge(y=3,width=0.5)
)
Related
Similar to this question, I'm trying to add a, b, c, ... to a grid of facets so they can be referenced in individually elsewhere. With scales = 'fixed', this is relatively easy, as you can even hardcode the x,y coordinates for a geom_text label. However, with scales = 'free', it's a pain to compute all the x,y coordinates for each facet so they labels end up in the same location, visually. Can this be done automatically?
A pure vanilla ggplot2 solution would be to use Inf, -Inf to snap text to the limits of each panel. To automatically get the labels you can use after_stat() to grab the PANEL internal variable. If you plot it as a label, you can have control over the offset from the panel edges by hiding the label itself and setting label.padding.
library(ggplot2)
x = data.frame(a=c('a','b','a','b'),b=c(1,1,2,2),v=runif(8))
ggplot(x,aes(x=v,y=v)) +
geom_point() +
facet_grid('a~b',scales='free') +
geom_label(
aes(x = -Inf, y = Inf, label = after_stat(ifelse(
duplicated(PANEL), "", letters[as.numeric(PANEL)]
))),
vjust = 1, hjust = 0,
fill = NA, label.size = 0, # Don't show box
label.padding = unit(5, "mm") # Control margins to panel bounds
)
If that spacing mechanism seems to finnicky to you, you can use the ggpp::geom_text_npc() function to directly set relative coordinates for the text labels.
x = data.frame(a=c('a','b','a','b'),b=c(1,1,2,2),v=runif(8))
ggplot(x,aes(x=v,y=v)) +
geom_point() +
facet_grid('a~b',scales='free') +
ggpp::geom_text_npc(
aes(npcx = 0.05, npcy = 0.95,
label = after_stat(
ifelse(duplicated(PANEL), "", letters[as.numeric(PANEL)])
))
)
Created on 2022-10-18 by the reprex package (v2.0.1)
Here is a solution with letters as default, but allowing other labels too, and ... is also passed to geom_text.
library(ggplot2)
# main solution
add.letters = function(g,px,py,lab=NULL,...){
data = ggplot_build(g)$layout$layout
if (is.null(lab)){ data$lab = letters[1:nrow(data)] } else { data$lab = lab }
p.range = function(r,p){ r[1] + (r[2]-r[1])*p }
data = do.call(rbind,lapply(1:nrow(data),function(i){
ls = layer_scales(g,data[i,]$ROW,data[i,]$COL)
data.i = cbind(data[i,],
x = p.range(ls$x$range$range,px),
y = p.range(ls$y$range$range,py))
}))
g = g + geom_text(data=data,aes(label=lab,x=x,y=y),...)
}
# mwe
x = data.frame(a=c('a','b','a','b'),b=c(1,1,2,2),v=runif(8))
g = ggplot(x,aes(x=v,y=v)) +
geom_point() +
facet_grid('a~b',scales='free')
g = add.letters(g,.05,.95,lab=c('a','bb','ccc','dddd'),hjust='left')
print(g)
I'm trying to label the outliers in a geom_boxplot using ggrepel::geom_label_repel. It works nicely when there's only one grouping variable, but when I try it for multiple grouping variables I run into a problem. The position argument in ggrepel doesn't seem to work very consistently for some reason, see this example:
library(tidyverse)
library(ggrepel)
set.seed(1337)
df <- tibble(x = rnorm(500),
g1 = factor(sample(c('A','B'), 500, replace = TRUE)),
g2 = factor(sample(c('A','B'), 500, replace = TRUE)),
rownames = 1:500)
is_outlier <- function(x) {
return(x < quantile(x, 0.25) - 1.5 * IQR(x) | x > quantile(x, 0.75) + 1.5 * IQR(x))
}
df_outliers <- df %>% group_by(g1, g2) %>% mutate(outlier=is_outlier(x))
ggplot(df_outliers, aes(x=g1, y=x, fill=g2)) +
geom_boxplot(width=0.3, position = position_dodge(0.5)) +
ggrepel::geom_label_repel(data=. %>% filter(outlier),
aes(label=rownames), position = position_dodge(0.8))
Is there a way to make the labels point to the accompanying dots using ggrepel?
You can try this:
ggplot(df_outliers,
aes(x=g1, y=x, fill=g2, label=rownames)) +
geom_boxplot(width = 0.3, position = position_dodge(0.5)) +
geom_label_repel(data = . %>%
filter(outlier) %>%
group_by(g1) %>%
complete(g2, fill = list(x = 0, rownames = "")),
position = position_dodge(0.5),
box.padding = 1,
min.segment.length = 0,
show.legend = FALSE)
Explanations:
The data source for geom_label_repel() follows aosmith's suggestion to add the B-A combination, filling 0 for x (any number would do, as long as it's not the default NA) and "" for rowname (ggrepel won't plot empty labels, but will take them into account when dodging).
box.padding is set to 1 (increased from the default 0.25) to push the labels further away, so that the line segments are more visible.
min.segment.length is set to 0 (decreased from the default 0.5) to force line segments to be plotted, no matter how short they are.
(show.legend = FALSE is optional. I just don't like seeing "a" letter show up in the legend.)
Is it possible to use the ggridges package to draw sets of bars instead of ridgelines, similar to geom_col()?
I have data such as:
dt = tibble(
hr = c(1,2,3,4,1,2,3,4),
fr = c(.1,.5,.9,.1,.4,.9,.9,.4),
gr = c('Mon','Mon','Mon','Mon','Sun','Sun','Sun','Sun')
)
The plot below gives me:
ggplot(dt, aes(x=hr, y=gr, height=fr)) +
geom_ridgeline() + ylab(NULL)
As you can see it draws a line connecting the values. What I am looking for instead are individual columns, as in this plot:
ggplot(dt, aes(x=hr, y=fr)) +
geom_col() + ylab(NULL) +
facet_wrap(~gr)
Here is a solution tracing out the individual bars.
library(tidyverse)
library(ggridges)
dt = tibble(
hr = c(1,2,3,4,1,2,3,4),
fr = c(.1,.5,.9,.1,.4,.9,.9,.4),
gr = c('Mon','Mon','Mon','Mon','Sun','Sun','Sun','Sun')
)
# function that turns an x, y pair into the shape of a bar of given width
make_bar <- function(x, y, width = 0.9) {
xoff <- width/2
data.frame(x = c(x-xoff*(1+2e-8), x-xoff*(1+1e-8), x-xoff, x+xoff, x+xoff*(1+1e-8), x+xoff*(1+2e-8)),
height = c(NA, 0, y, y, 0, NA))
}
# convert data table using make_bar function
dt %>%
mutate(bars = map2(hr, fr, ~make_bar(.x, .y))) %>%
unnest() -> dt_bars
ggplot(dt_bars, aes(x=x, y=gr, height=height)) +
geom_ridgeline() + ylab(NULL)
I am trying to write a function that uses ggplot but allows user specification of several of the plotting variables. However I'm having trouble getting it to work as a function (receiving an error message: see below).
A small example dataset and working implementation are provided below, together with my attempt at the function and the associated error. I'm sure it is to do with non-standard evaluation (NSE), but I'm unsure how to get around it given my use of filter within the function, and my various attempts have been in vain.
library(dplyr)
library(ggplot2)
df<-data.frame(Date=c(seq(1:50),seq(1:50)), SRI=runif(100,-2,2), SITE=(c(rep("A",50), rep("B", 50))))
ggplot() +
geom_linerange(aes(x = Date, ymin = 0, ymax = SRI), colour = I('blue'), data = filter(df, SRI>0)) +
geom_linerange(aes(x = Date, ymin = SRI, ymax = 0), colour = I('red'), data = filter(df, SRI<=0)) +
facet_wrap(~SITE) +
labs(x = 'Date', y = "yvar", title = "Plot title")
The above works, but when implemented as a function:
plot_fun <- function(df, x, y, ylab="y-lab", plot_title="Title", facets) {
ggplot() +
geom_linerange(aes(x = x, ymin = 0, ymax = y), colour = I('blue'), data = filter(df, y > 0)) +
geom_linerange(aes(x = x, ymin = y, ymax = 0), colour = I('red'), data = filter(df, y <= 0)) +
facet_wrap(~ facets) +
labs(x = 'Date', y = ylab, title = plot_title)
return(p)
}
plot_fun(df, x="Date", y="SRI", ylab="y-lab", plot_title="Title", facets="SITE")
I get the following "Error: Aesthetics must be either length 1 or the same as the data (1): x, ymin, max".
I've tried various approaches using as_string and filter_, but all have been unsuccessful.
Any help much appreciated.
Regards
Nick
You'll need to switch to aes_string as you expected and change your facet_wrap code to either take the facets argument as a formula or remove the tilde as in the answers to this question. You'll also need to switch to using filter_, which can be used along with interp from package lazyeval.
library(lazyeval)
Here is your function with the changes I outlined and the resulting plot:
plot_fun <- function(df, x, y, ylab = "y-lab", plot_title = "Title", facets) {
ggplot() +
geom_linerange(aes_string(x = x, ymin = 0, ymax = y), colour = I('blue'),
data = filter_(df, interp(~var > 0, var = as.name(y)))) +
geom_linerange(aes_string(x = x, ymin = y, ymax = 0), colour = I('red'),
data = filter_(df, interp(~var <= 0, var = as.name(y)))) +
facet_wrap(facets) +
labs(x = 'Date', y = ylab, title = plot_title)
}
plot_fun(df, x="Date", y="SRI", facets="SITE")
I would like to draw a hollow histogram that has no vertical bars drawn inside of it, but just an outline. I couldn't find any way to do it with geom_histogram. The geom_step+stat_bin combination seemed like it could do the job. However, the bins of geom_step+stat_bin are shifted by a half bin either to the right or to the left, depending on the step's direction= parameter value. It seems like it is doing its "steps" WRT bin centers. Is there any way to change this behavior so it would do the "steps" at bin edges?
Here's an illustration:
d <- data.frame(x=rnorm(1000))
qplot(x, data=d, geom="histogram",
breaks=seq(-4,4,by=.5), color=I("red"), fill = I("transparent")) +
geom_step(stat="bin", breaks=seq(-4,4,by=.5), color="black", direction="vh")
I propose making a new Geom like so:
library(ggplot2)
library(proto)
geom_stephist <- function(mapping = NULL, data = NULL, stat="bin", position="identity", ...) {
GeomStepHist$new(mapping=mapping, data=data, stat=stat, position=position, ...)
}
GeomStepHist <- proto(ggplot2:::Geom, {
objname <- "stephist"
default_stat <- function(.) StatBin
default_aes <- function(.) aes(colour="black", size=0.5, linetype=1, alpha = NA)
reparameterise <- function(., df, params) {
transform(df,
ymin = pmin(y, 0), ymax = pmax(y, 0),
xmin = x - width / 2, xmax = x + width / 2, width = NULL
)
}
draw <- function(., data, scales, coordinates, ...) {
data <- as.data.frame(data)[order(data$x), ]
n <- nrow(data)
i <- rep(1:n, each=2)
newdata <- rbind(
transform(data[1, ], x=xmin, y=0),
transform(data[i, ], x=c(rbind(data$xmin, data$xmax))),
transform(data[n, ], x=xmax, y=0)
)
rownames(newdata) <- NULL
GeomPath$draw(newdata, scales, coordinates, ...)
}
guide_geom <- function(.) "path"
})
This also works for non-uniform breaks. To illustrate the usage:
d <- data.frame(x=runif(1000, -5, 5))
ggplot(d, aes(x)) +
geom_histogram(breaks=seq(-4,4,by=.5), color="red", fill=NA) +
geom_stephist(breaks=seq(-4,4,by=.5), color="black")
This isn't ideal, but it's the best I can come up with:
h <- hist(d$x,breaks=seq(-4,4,by=.5))
d1 <- data.frame(x = h$breaks,y = c(h$counts,NA))
ggplot() +
geom_histogram(data = d,aes(x = x),breaks = seq(-4,4,by=.5),
color = "red",fill = "transparent") +
geom_step(data = d1,aes(x = x,y = y),stat = "identity")
Yet another one. Use ggplot_build to build a plot object of the histogram for rendering. From this object x and y values are extracted, to be used for geom_step. Use by to offset x values.
by <- 0.5
p1 <- ggplot(data = d, aes(x = x)) +
geom_histogram(breaks = seq(from = -4, to = 4, by = by),
color = "red", fill = "transparent")
df <- ggplot_build(p1)$data[[1]][ , c("x", "y")]
p1 +
geom_step(data = df, aes(x = x - by/2, y = y))
Edit following comment from #Vadim Khotilovich (Thanks!)
The xmin from the plot object can be used instead (-> no need for offset adjustment)
df <- ggplot_build(p1)$data[[1]][ , c("xmin", "y")]
p1 +
geom_step(data = df, aes(x = xmin, y = y))
An alternative, also less than ideal:
qplot(x, data=d, geom="histogram", breaks=seq(-4,4,by=.5), color=I("red"), fill = I("transparent")) +
stat_summary(aes(x=round(x * 2 - .5) / 2, y=1), fun.y=length, geom="step")
Missing some bins that you can probably add back if you mess around a bit. Only (somewhat meaningless) advantage is it is more in ggplot than #Joran's answer, though even that is debatable.
I answer my own comment earlier today: here is a modified version of #RosenMatev's answer updated for the v2 (ggplot2_2.0.0) using ggproto:
GeomStepHist <- ggproto("GeomStepHist", GeomPath,
required_aes = c("x"),
draw_panel = function(data, panel_scales, coord, direction) {
data <- as.data.frame(data)[order(data$x), ]
n <- nrow(data)
i <- rep(1:n, each=2)
newdata <- rbind(
transform(data[1, ], x=x - width/2, y=0),
transform(data[i, ], x=c(rbind(data$x-data$width/2, data$x+data$width/2))),
transform(data[n, ], x=x + width/2, y=0)
)
rownames(newdata) <- NULL
GeomPath$draw_panel(newdata, panel_scales, coord)
}
)
geom_step_hist <- function(mapping = NULL, data = NULL, stat = "bin",
direction = "hv", position = "stack", na.rm = FALSE,
show.legend = NA, inherit.aes = TRUE, ...) {
layer(
data = data,
mapping = mapping,
stat = stat,
geom = GeomStepHist,
position = position,
show.legend = show.legend,
inherit.aes = inherit.aes,
params = list(
direction = direction,
na.rm = na.rm,
...
)
)
}
TLDR: use geom_step(..., direction = "mid")
This has become much easier since Daniel Mastropietro and Dewey Dunnington implemented the "mid" as an additional option for the direction argument of geom_step for ggplot2 v3.3.0:
library(ggplot2)
set.seed(1)
d <- data.frame(x = rnorm(1000))
ggplot(d, aes(x)) +
geom_histogram(breaks = seq(-4, 4, by=.5), color="red", fill = "transparent") +
geom_step(stat="bin", breaks=seq(-4, 4, by=.5), color = "black", direction = "mid")
Below, for reference, the code from the question formatted like above answer:
ggplot(d, aes(x)) +
geom_histogram(breaks = seq(-4, 4, by=.5), color = "red", fill = "transparent") +
geom_step(stat="bin", breaks = seq(-4, 4, by=.5), color = "black", direction = "vh")
Created on 2020-09-02 by the reprex package (v0.3.0)
a simple way to do something similar to #Rosen Matev (that does not work with ggplot2_2.0.0 as mentioned by #julou), I would just
1) calculate manually the value of the bins (using a small function as shown below)
2) use geom_step()
Hope this helps !
geom_step_hist<- function(d,binw){
dd=NULL
bin=min(d$y) # this enables having a first value that is = 0 (to have the left vertical bar of the plot when using geom_step)
max=max(d$y)+binw*2 # this enables having a last value that is = 0 (to have the right vertical bar of the plot when using geom_step)
xx=NULL
yy=NULL
while(bin<=max){
n=length(temp$y[which(temp$y<bin & temp$y>=(bin-binw))])
yy=c(yy,n)
xx=c(xx,bin-binw)
bin=bin+binw
rm(n)
}
dd=data.frame(xx,yy)
return(dd)
}
hist=ggplot(dd,aes(x=xx,y=yy))+
geom_step()