Filling under the a curve with ggplot graphs - r

I would like to create a graph with the normal function from x=-2 to x=2 filled under the curve from -2 to 0.
I've tried with ggplot2
qplot(c(-2, 2), stat="function", fun=dnorm, geom="line") +
+ geom_area(aes(xlim=c(-2,0)),stat="function", fun=dnorm)
But I get this graph completely filled instead (the black colour)
How can I get a plot filled only from -2 to 0?
Other options or packages are welcome.
I've also tried with only one command with ggplot and filled option but I can't get it either.
I know some people does it using polygons but the result is not so soft and nice.
PD: I repeat, the solution I'm looking for involves not generating x,y coordinates beforehand but using directly the function with stat="function", fun=dnorm or similar. Thus, my question is not a duplicate.
I've also tried
ggplot(NULL,aes(x=c(-2,2))) + geom_area(aes(x=c(-2,0)),stat="function", fun=dnorm, fill="red") +
geom_area(aes(x=c(0,2)),stat="function", fun=dnorm, fill="blue")
But again it fills all the curve with a single color, blue. The red half seems to be overwritten. The same with geom_ribbon and other options.

Try this:
ggplot(data.frame(x = c(-2, 2)), aes(x)) +
stat_function(fun = dnorm) +
stat_function(fun = dnorm,
xlim = c(-2,0),
geom = "area")

Can't you generate your distribution data with dnorm instead?
library(ggplot2)
x<-seq(-2,2, 0.01)
y<-dnorm(x,0,1)
xddf <- data.frame(x=x,y=y)
qplot(x,y,data=xddf,geom="line")+
geom_ribbon(data=subset(xddf ,x>-2 & x<0),aes(ymax=y),ymin=0,
fill="red",colour=NA,alpha=0.5)+
scale_y_continuous(limits=c(0, .4))

These days, with after_stat() and after_scale(), you could also use
a more flexible approach that lets you explicitly map ranges of x values
to filled sections.
For example, filling some normal distribution quantiles:
library(ggplot2)
breaks <- qnorm(c(0, .05, .2, .5, .8, .95, 1))
ggplot(data.frame(x = c(-2, 2)), aes(x)) +
scale_fill_brewer("x") +
stat_function(
n = 512,
fun = dnorm,
geom = "area",
colour = "gray30",
aes(
fill = after_stat(x) |> cut(!!breaks),
group = after_scale(fill)
)
)
This approach also works with other statistics, e.g. stat_density() for kernel density estimates:
set.seed(42)
ggplot(data.frame(x = rnorm(1000)), aes(x)) +
scale_fill_brewer("x") +
stat_density(
n = 512,
geom = "area",
colour = "gray30",
aes(
fill = after_stat(x) |> cut(!!breaks),
group = after_scale(fill)
)
)

Related

Add new geom as new row in ggplot2, preventing layering of plots

I am pretty sure that this is easy to do but I can't seem to find a proper way to query this question into google or stack, so here we are:
I have a plot made in ggplot2 which makes use of geom_jitter(), efficiently creating one row for each element in a factor and plotting its values.
I would like to add a complementary geom_violin() to the plot, but just adding the extra geom_ function to the plot code returns two layers: the jitter and the violin, one on top of the other (as usually expected).
EDIT:
This is how the plot looks like:
How can I have the violin as a separate row, without generating a second plot?
Side quest: how I can I have the jitter and the violin geoms interleaved? (i.e. element A jitter row followed by element A violin row, and then element B jitter row followed by element B violin row)
This is the minimum required code to make it (without all the theme() embellishments):
P1 <- ggplot(data=TEST_STACK_SUB, aes(x=E, y=C, col=A)) +
theme(... , aspect.ratio=0.3) +
geom_point(position = position_jitter(w = 0.30, h = 0), alpha=0.2, size=0.5) +
geom_violin(data=TEST_STACK_SUB, mapping=aes(x=E, y=C), position="dodge") +
scale_x_discrete() +
scale_y_continuous(limits=c(0,1), breaks=seq(0,1,0.1),
labels=c(seq(0,1,0.1))) +
scale_color_gradient2(breaks=seq(0,100,20),
limits=c(0,100),
low="green3",
high="darkorchid4",
midpoint=50,
name="") +
coord_flip()
options(repr.plot.width=8, repr.plot.height=2)
plot(P1)
Here is a subset of the data to generate it (for you to try):
data
How about manipulating your factor as a continuous variable and nudging the entries across the aes() calls like so:
library(dplyr)
library(ggplot2)
set.seed(42)
tibble(x = rep(c(1, 3), each = 10),
y = c(rnorm(10, 2), rnorm(10))) -> plot_data
ggplot(plot_data) +
geom_jitter(aes(x = x - 0.5, y = y), width = 0.25) +
geom_violin(aes(x = x + 0.5, y = y, group = x), width = 0.5) +
coord_flip() +
labs(x = "x") +
scale_x_continuous(breaks = c(1, 3),
labels = paste("Level", 1:2),
trans = scales::reverse_trans())

ggplot2 add offset to jitter positions

I have data that looks like this
df = data.frame(x=sample(1:5,100,replace=TRUE),y=rnorm(100),assay=sample(c('a','b'),100,replace=TRUE),project=rep(c('primary','secondary'),50))
and am producing a plot using this code
ggplot(df,aes(project,x)) + geom_violin(aes(fill=assay)) + geom_jitter(aes(shape=assay,colour=y),height=.5) + coord_flip()
which gives me this
This is 90% of the way to being what I want. But I would like it if each point was only plotted on top of the violin plot for the matching assay type. That is, the jitterred positions of the points were set such that the triangles were only ever on the upper teal violin plot and the circles in the bottom red violin plot for each project type.
Any ideas how to do this?
In order to get the desired result, it is probably best to use position_jitterdodge as this gives you the best control over the way the points are 'jittered':
ggplot(df, aes(x = project, y = x, fill = assay, shape = assay, color = y)) +
geom_violin() +
geom_jitter(position = position_jitterdodge(dodge.width = 0.9,
jitter.width = 0.5,
jitter.height = 0.2),
size = 2) +
coord_flip()
which gives:
You can use interaction between assay & project:
p <- ggplot(df,aes(x = interaction(assay, project), y=x)) +
geom_violin(aes(fill=assay)) +
geom_jitter(aes(shape=assay, colour=y), height=.5, cex=4)
p + coord_flip()
The labeling can be adjusted by numeric scaled x axis:
# cbind the interaction as a numeric
df$group <- as.numeric(interaction(df$assay, df$project))
# plot
p <- ggplot(df,aes(x=group, y=x, group=cut_interval(group, n = 4))) +
geom_violin(aes(fill=assay)) +
geom_jitter(aes(shape=assay, colour=y), height=.5, cex=4)
p + coord_flip() + scale_x_continuous(breaks = c(1.5, 3.5), labels = levels(df$project))

ggplot2: how to create correct legend after using scale_xx_manual

I have a plot with three different lines. I want one of those lines to have points on as well. I also want the two lines without points to be thicker than the one without points. I have managed to get the plot I want, but I the legend isn't keeping up.
library(ggplot2)
y <- c(1:10, 2:11, 3:12)
x <- c(1:10, 1:10, 1:10)
testnames <- c(rep('mod1', 10), rep('mod2', 10), rep('meas', 10))
df <- data.frame(testnames, y, x)
ggplot(data=df, aes(x=x, y=y, colour=testnames)) +
geom_line(aes(size=testnames)) +
scale_size_manual("", values=c(0.5,1,1)) +
geom_point(aes(alpha=testnames), size=5, shape=4) +
scale_alpha_manual("", values=c(1, 0, 0))
I can remove the second (black) legend:
ggplot(data = df, aes(x=x, y=y, colour=testnames)) +
geom_line(aes(size=testnames)) +
scale_size_manual("", values=c(0.5,1,1), guide='none') +
geom_point(aes(alpha=testnames), size=5, shape=4) +
scale_alpha_manual("", values=c(1, 0.05, 0.05), guide='none')
But what I really want is a merge of the two legends - a legend with colours, cross only on the first variable (meas) and the lines of mod1 and mod2 thicker than the first line. I have tried guide and override, but with little luck.
You don't need transparency to hide the shapes for mod1 and mod2. You can omit these points from the plot and legend by setting their shape to NA in scale_shape_manual:
ggplot(data = df, aes(x = x, y = y, colour = testnames, size = testnames)) +
geom_line() +
geom_point(aes(shape = testnames), size = 5) +
scale_size_manual(values=c(0.5, 2, 2)) +
scale_shape_manual(values=c(8, NA, NA))
This gives the following plot:
NOTE: I used some more distinct values in the size-scale and another shape in order to better illustrate the effect.

How to smartly place text labels beside points of different sizes in ggplot2?

I am trying to make a labeled bubble plot with ggplot2 in R. Here is the simplified scenario:
I have a data frame with 4 variables: 3 quantitative variables, x, y, and z, and another variable that labels the points, lab.
I want to make a scatter plot, where the position is determined by x and y, and the size of the points is determined by z. I then want to place text labels beside the points (say, to the right of the point) without overlapping the text on top of the point.
If the points did not vary in size, I could try to simply modify the aesthetic of the geom_text layer by adding a scaling constant (e.g. aes(x=x+1, y=y+1)). However, even in this simple case, I am having a problem with positioning the text correctly because the points do not scale with the output dimensions of the plot. In other words, the size of the points remains constant in a 500x500 plot and a 1000x1000 plot - they do not scale up with the dimensions of the outputted plot.
Therefore, I think I have to scale the position of the label by the size (e.g. dimensions) of the output plot, or I have to get the radius of the points from ggplot somehow and shift my text labels. Is there a way to do this in ggplot2?
Here is some code:
# Stupid data
df <- data.frame(x=c(1,2,3),
y=c(1,2,3),
z=c(1,2,1),
lab=c("a","b","c"), stringsAsFactors=FALSE)
# Plot with bad label placement
ggplot(aes(x=x, y=y), data=df) +
geom_point(aes(size=z)) +
geom_text(aes(label=lab),
colour="red") +
scale_size_continuous(range=c(5, 50), guide="none")
EDIT: I should mention, I tried hjust and vjust inside of geom_text, but it does not produce the desired effect.
# Trying hjust and vjust, but it doesn't look nice
ggplot(aes(x=x, y=y), data=df) +
geom_point(aes(size=z)) +
geom_text(aes(label=lab), hjust=0, vjust=0.5,
colour="red") +
scale_size_continuous(range=c(5, 50), guide="none")
EDIT: I managed to get something that works for now, thanks to Henrik and shujaa. I will leave the question open just in case someone shares a more general solution.
Just a blurb of what I am using this for: I am plotting a map, and indicating the amount of precipitation at certain stations with a point that is sized proportionally to the amount of precipitation observed. I wanted to add a station label beside each point in an aesthetically pleasing manner. I will be making more of these plots for different regions, and my output plot may have a different resolution or scale (e.g. due to different projections) for each plot, so a general solution is desired. I might try my hand at creating a custom position_jitter, like baptiste suggested, if I have time during the weekend.
It appears that position_*** don't have access to the scales used by other layers, so it's a no go. You could make a clone of GeomText that shifts the labels according to the size mapped,
but it's a lot of effort for a very kludgy and fragile solution,
geom_shiftedtext <- function (mapping = NULL, data = NULL, stat = "identity",
position = "identity",
parse = FALSE, ...) {
GeomShiftedtext$new(mapping = mapping, data = data, stat = stat, position = position,
parse = parse, ...)
}
require(proto)
GeomShiftedtext <- proto(ggplot2:::GeomText, {
objname <- "shiftedtext"
draw <- function(., data, scales, coordinates, ..., parse = FALSE, na.rm = FALSE) {
data <- remove_missing(data, na.rm,
c("x", "y", "label"), name = "geom_shiftedtext")
lab <- data$label
if (parse) {
lab <- parse(text = lab)
}
with(coord_transform(coordinates, data, scales),
textGrob(lab, unit(x, "native") + unit(0.375* size, "mm"),
unit(y, "native"),
hjust=hjust, vjust=vjust, rot=angle,
gp = gpar(col = alpha(colour, alpha),
fontfamily = family, fontface = fontface, lineheight = lineheight))
)
}
})
df <- data.frame(x=c(1,2,3),
y=c(1,2,3),
z=c(1.2,2,1),
lab=c("a","b","c"), stringsAsFactors=FALSE)
ggplot(aes(x=x, y=y), data=df) +
geom_point(aes(size=z), shape=1) +
geom_shiftedtext(aes(label=lab, size=z),
hjust=0, colour="red") +
scale_size_continuous(range=c(5, 100), guide="none")
This isn't a very general solution, because you'll need to tweak it every time, but you should be able to add to the x value for the text some value that's linear depending on z.
I had luck with
ggplot(aes(x=x, y=y), data=df) +
geom_point(aes(size=z)) +
geom_text(aes(label=lab, x = x + .06 + .14 * (z - min(z))),
colour="red") +
scale_size_continuous(range=c(5, 50), guide="none")
but, as the font size depends on your window size, you would need to decide on your output size and tweak accordingly. I started with x = x + .05 + 0 * (z-min(z)) and calibrated the intercept based on the smallest point, then when I was happy with that I adjusted the linear term for the biggest point.
Another alternative. Looks OK with your test data, but you need to check how general it is.
dodge <- abs(scale(df$z))/4
ggplot(data = df, aes(x = x, y = y)) +
geom_point(aes(size = z)) +
geom_text(aes(x = x + dodge), label = df$lab, colour = "red") +
scale_size_continuous(range = c(5, 50), guide = "none")
Update
Just tried position_jitter, but the width argument only takes one value, so right now I am not sure how useful that function would be. But I would be happy to find that I am wrong. Example with another small data set:
df3 <- mtcars[1:10, ]
ggplot(data = df3, aes(x = wt, y = mpg)) +
geom_point(aes(size = qsec), alpha = 0.1) +
geom_text(label = df3$carb, position = position_jitter(width = 0.1, height = 0)) +
scale_size_continuous(range = c(5, 50), guide = "none")

Extend x-limits using ggplot in R

I'm currently trying to plot a histogram with an overlay (given by my_fun) using the following code.
dfr = data.frame(x)
ggplot(dfr,aes(x)) + geom_histogram(colour="darkblue", binwidth = 0.1,
aes(y =..density..), size=1, fill="blue", freq = TRUE)+
stat_function(fun = my_fun, colour = "red")
The x-axis in ggplot is from 1 to 2 (which is the range of my data). However, I would like my plot to have an x-axis from 0 to 3, so that the overlay can be drawn over the range (0, 3).
I've tried adding coord_cartesian(xlim=c(0, 3)) but this does not work. Could you please provide me with some suggestions on changing the range? Thank You.
Just guessing here since you provided only a little useful information in your question, but this works for me:
dat <- data.frame(x=rnorm(100))
ggplot(dat,aes(x=x)) +
geom_histogram(aes(y=..density..),freq=TRUE) +
stat_function(fun = dnorm, colour="red") +
xlim(c(-4,4))
using xlim rather than coord_cartesian. But since you haven't provided any details on your data or function, I can't assure you that this will work in your case.

Resources