How can I control the x position of boxplots in ggplot2? - r

First, a quick example to set the stage:
set.seed(123)
dat <- data.frame(
x=rep( c(1, 2, 4, 7), times=25 ),
y=rnorm(100),
gp=rep(1:2, each=50)
)
p <- ggplot(dat, aes(x=factor(x), y=y))
p + geom_boxplot(aes(fill = factor(gp)))
I would like to produce a similar plot, except with control over the x position of each set of boxplots. My first guess was using a non-factor x aesthetic that controls the position along the x-axis of these box plots. However, once I try to do this it seems like geom_boxplot doesn't interpret the aesthetics as I would hope.
p + geom_boxplot( aes(x=x, y=y, fill=factor(gp)) )
In particular, geom_boxplot seems to collapse over all x values in some way when they're non-factors.
Is there a way to control the x position of boxplots with ggplot2? Either through specifying a distance between each level of a factor aesthetic, some more clever use of non-factor aesthetics, or otherwise?

You can use scale_x_discrete() to set positions (ticks) for the x axis.
p <- ggplot(dat, aes(x=factor(x), y=y))
p + geom_boxplot(aes(fill = factor(gp))) +
scale_x_discrete(limits=1:7)

You can also do this with the group aesthetic. However, I'm not sure why you cannot just pass x to the group. This doesn't work:
ggplot() +
geom_boxplot(data=dat, aes(x=x, y=y, fill=factor(gp), group=x))
But this does:
ggplot() +
geom_boxplot(data=dat, aes(x=x, y=y, fill=factor(gp), group=paste(x, gp)))

Related

Is it possible to make a column plot using ggplot in which the column fill is controlled by a third variable?

I have a data frame with three continuous variables (x,y,z). I want a column plot in which x defines the x-axis position of the columns, y defines the length of the columns, and the column colors (function of y) are defined by z. The test code below shows the set up.
`require(ggplot2)
require(viridis)
# Create a dummy data frame
x <- c(rep(0.0, 5),rep(0.5,10),rep(1.0,15))
y <- c(seq(0.0,-5,length.out=5),
seq(0.0,-10,length.out=10),
seq(0.0,-15,length.out=15))
z <- c(seq(10,0,length.out=5),
seq(8,0,length.out=10),
seq(6,0,length.out=15))
df <- data.frame(x=x, y=y, z=z)
pbase <- ggplot(df, aes(x=x, y=y, fill=z))
ptest <- pbase + geom_col(width=0.5, position="identity") +
scale_fill_viridis(option="turbo",
limits = c(0,10),
breaks=seq(0,10,2.5),
labels=c("0","2.5","5.0","7.5","10.0"))
print(ptest)`
The legend has the correct colors but the columns do not. Perhaps this is not the correct way to do this type of plot. I tried using geom_bar() which creates a bars with the correct colors but the y-values are incorrect.
It looks like you have 3 X values that each appear 5, 10, or 15 times. Do you want the bars to be overlaid on top of one another, as they are now? If you add an alpha = 0.5 to the geom_col call you'll see the overlapping bars.
Alternatively, you might use dodging to show the bars next to one another instead of on top of one another.
ggplot(df, aes(x=x, y=y, fill=z, group = z)) +
geom_col(width=0.5, position=position_dodge()) +
scale_fill_viridis_c(option="turbo", # added with ggplot 3.x in 2018
limits = c(0,10),
breaks=seq(0,10,2.5),
labels=c("0","2.5","5.0","7.5","10.0"))
Or you might plot the data in order of y so that the smaller bars appear on top, visibly:
ggplot(dplyr::arrange(df,y), aes(x=x, y=y, fill=z))+
geom_col(width=0.5, position="identity") +
scale_fill_viridis_c(option="turbo",
limits = c(0,10),
breaks=seq(0,10,2.5),
labels=c("0","2.5","5.0","7.5","10.0"))
I solved this by using geom_tile() in place of geom_col().

How to change alignment of bars in ggplot

I'm trying to make a bar plot of length class on the x axis against frequency. Lengths are rounded down, so I'd like the bar for 0cm to plot to the right of 0, rather than centred on 0.
Consider the code:
library(ggplot2)
set.seed(0)
d <- data.frame(x=seq(0,50,5), y=runif(11))
ggplot(d, aes(x=x, y=y)) + geom_col()
which produces
I'd like it to be similar to
ggplot(d, aes(x=x+mean(diff(x)/2), y=y)) + geom_col()
Is there a way to do this by changing the position argument for geom_col rather than manipulating the data directly?
With the newest version of ggplot 3.4.0 you can use the argument just in geom_col or geom_bar:
Adjustment for column placement. Set to 0.5 by default, meaning that
columns will be centered about axis breaks. Set to 0 or 1 to place
columns to the left/right of axis breaks. Note that this argument may
have unintended behaviour when used with alternative positions, e.g.
position_dodge().
Here is a reproducible example:
library(ggplot2)
set.seed(0)
d <- data.frame(x=seq(0,50,5), y=runif(11))
ggplot(d, aes(x=x, y=y)) + geom_col(just = 1)
Created on 2022-11-09 with reprex v2.0.2
Update
A workaround that should work even if the data changes:
library(ggplot2)
d <- data.frame(x= seq(0,50,round(runif(1)*10)))
d$y = runif(nrow(d))
ggplot(d, aes(x=x, y=y)) +
geom_col(width = min(diff(sort(d$x)))*0.9,
position = position_nudge(x = min(diff(sort(d$x)))*0.9/2))
Original answer
Hope this helps:
library(ggplot2)
set.seed(0)
d <- data.frame(x=seq(0,50,5), y=runif(11))
ggplot(d, aes(x=x, y=y)) +
geom_col(position = position_nudge(x = 2))

ggplot2: Varying facet width with independent `Y` axes

Dummy data
d = data.frame(
x = factor(LETTERS[c(1,2,3,4,1,2,3,4,1,2,1,2,1,2,1,2)]),
y = c(100,80,70,60,130,90,65,60,2,3,3,3,2,2,1,2),
grid = rep(letters[1:2], each=8)
)
Issue
ggplot(d, aes(x=x, y=y)) + facet_grid(~grid, scales="free",space="free_x") + geom_point()
I like this graph. My only issue is that both grids use the same Y axis. So, I tried using facet_wrap instead of facet_grid and got
ggplot(d, aes(x=x, y=y)) + facet_wrap(~grid, scales="free") + geom_point()
But unfortunately, facet_wrap does not have a "space" parameter and as a result the right and the left graph are of the same width.
Question
How can I do so that the space between levels of the variable d$x is equal among both facets (leading to facets having different width) AND to have a separate Y axis for each facet. Of course, I would like to keep the facets to be aligned horizontally.
Use ggplot grob and modify the widths in the table
# Capture the plot
q = ggplot(d, aes(x=x, y=y)) + facet_grid(~grid, scales="free",space="free_x") + geom_point()
gt = ggplotGrob(q)
# Modify the widths
gt$widths[5] = unit(8, "cm")
gt$widths[9] = unit(4, "cm")
# Plot the graph
grid.newpage()
grid.draw(gt)

Add label to abline ggplot2 [duplicate]

I'd like to label a horizontal line on a ggplot with multiple series, without associating the line with a series. R ggplot2: Labelling a horizontal line on the y axis with a numeric value asks about the single-series case, for which geom_text solves. However, geom_text associates the label with one of the series via color and legend.
Consider the same example from that question, with another color column:
library(ggplot2)
df <- data.frame(y=1:10, x=1:10, col=c("a", "b")) # Added col
h <- 7.1
plot1 <- ggplot(df, aes(x=x, y=y, color=col)) + geom_point()
plot2 <- plot1 + geom_hline(aes(yintercept=h))
# Applying top answer https://stackoverflow.com/a/12876602/1840471
plot2 + geom_text(aes(0, h, label=h, vjust=-1))
How can I label the line without associating the label to one of the series?
Is this what you had in mind?
library(ggplot2)
df <- data.frame(y=1:10, x=1:10, col=c("a", "b")) # Added col
h <- 7.1
ggplot(df, aes(x=x,y=y)) +
geom_point(aes(color=col)) +
geom_hline(yintercept=h) +
geom_text(data=data.frame(x=0,y=h), aes(x, y), label=h, vjust=-1)
First, you can make the color mapping local to the points layer. Second, you do not have to put all the aesthetics into calls to aes(...) - only those you want mapped to columns of the dataset. Three, you can have layer-specific datasets using data=... in the calls to a specific geom_*.
You can use annotate instead:
plot2 + annotate(geom="text", label=h, x=1, y=h, vjust=-1)
Edit: Removed drawback that x is required, since that's also true of geom_text.

How to fix the geom_text label position so it is always on the middle of the plot?

I would like to create a function that produce a ggplot graph.
data1 <- data.table(x=1:5, y=1:5, z=c(1,2,1,2,1))
data2 <- data.table(x=1:5, y=11:15, z=c(1,2,1,2,1))
myfun <- function(data){
ggplot(data, aes(x=x, y=y)) +
geom_point() +
geom_text(aes(label=y), y=3) +
facet_grid(z~.)
}
myfun(data2)
It is supposed to label some text on the graph. However, without knowing the data in advance I am unable to adjust the positions of text vertically manually. Especially I don't want the label to move positions with data: I want it always stays at about 1/4 vertically of the plots. (top-mid)
How can I do that?
Is there a function that returns the y.limit.up and y.limit.bottom then I can assign y = (y.limit.up + y.limit.bottm) / 2 or something.
Setting either x or y position in geom_text(...) relative to the plot scale in a facet is actually a pretty big problem. #agstudy's solution works if the y scale is the same for all facets. This is because, in calculating range (or max, or min, etc), ggplot uses the unsubsetted data, not the data subsetted for the appropriate facet (see this question).
You can achieve what you want using auxiliary tables, though.
data1 <- data.table(x=1:5, y=1:5, z=c(1,2,1,2,1))
data2 <- data.table(x=1:5, y=11:15, z=c(1,2,1,2,1))
myfun <- function(data){
label.pos <- data[,ypos:=min(y)+0.75*diff(range(y)),by=z] # 75% to the top...
ggplot(data, aes(x=x, y=y)) +
geom_point() +
# geom_text(aes(label=y), y=3) +
geom_text(data=label.pos, aes(y=ypos, label=y)) +
facet_grid(z~., scales="free") # note scales = "free"
}
myfun(data2)
Produces this.
If you want scales="fixed", then #agstudy's solution is the way to go.
You can do this for example:
ggplot(data2, aes(x=x)) +
geom_point(aes(y=y)) +
geom_text(aes(label=y, y=mean(range(y)))) +
facet_grid(z~.)
Or fix y limits manually:
scale_y_continuous(limits = c(10, 15))
#user890739 :
with geom_density you can estimate an ypos variable like this :
data<-dplyr::mutate(group_by(data, z), ypos=max(density(y)$y)*.75*nrow(data))
Then plot the result :
ggplot(data, aes(x=x)) +
stat_density(aes(y=..density..)) +
geom_text(aes(label=y, y=ypos)) +
facet_grid(z~., scales="free")

Resources