I want to use ggplot to plot three curves, each made with stat_function and with its own parameters.
This is done with the code below:
library(ggplot2)
ggplot(data.frame(x = c(0, 25)), aes(x)) +
stat_function(fun = function(x) plogis(x, location = 5, scale = 2), colour = "red") +
stat_function(fun = function(x) plogis(x, location = 9, scale = 3), colour = "blue") +
stat_function(fun = function(x) plogis(x, location = 9, scale = 4), colour = "green")
which gives the figure below:
What I want to achieve is to shift the blue and green curves, exactly as they are, to the right along the horizontal axis (each by an arbitrary amount).
I don't know of an explicit way to do it in ggplot, so I tried to specify a different frame for the second and third geometric objects, as below:
ggplot(data.frame(x = c(0, 25)), aes(x)) +
stat_function(fun = function(x) plogis(x, location = 5, scale = 2), colour = "red") +
stat_function(data = data.frame(x = c(3, 28)), fun = function(x) plogis(x, location = 9, scale = 3), colour = "blue") +
stat_function(data = data.frame(x = c(5, 30)), fun = function(x) plogis(x, location = 9, scale = 4), colour = "green")
But the resulting image is the same as the one above.
Your solution is almost correct, but you need to subtract the same constant within the function itself, so that the y-values still correspond.
c1 <- 4
c2 <- 4
p2 <- ggplot(data.frame(x = c(0, 25)), aes(x)) +
stat_function(fun = function(x) plogis(x, location = 5, scale = 2), colour = "red") +
stat_function(data = data.frame(x = c(0+c1, 25+c1)),
fun = function(x) plogis(x - c1, location = 9, scale = 3), colour = "blue") +
stat_function(data = data.frame(x = c(0+c2, 25+c2)),
fun = function(x) plogis(x - c2, location = 9, scale = 4), colour = "green")
p2
PS: In the answer, I have added the constants also to the data.frame itself, so that the shift is shown (you can remove them from the df in case you want you want only the original x-range shown).
Related
After executing the code here below, I was wondering:
1- Why "A.line" and "B.line" variables appear in the geom_point() legend.
2- why there are four colors in the legend.
I guess both answers are related, but I can not tell what is going on.
I would like to have the legend just with "A.points" and "B.points".
I would also like the same colors in both lines and points (I guess this I can do manually).
Thanks in advance for your help.
Best,
David
data.frame(x = rep(1:2,2),
names.points = rep(c("A.point","B.point"), 2),
y.point = c(2, 4, 7, 9),
names.lines = rep(c("A.line","B.line"), each = 2),
y.line = c(3, 3, 8, 8)) %>%
ggplot() +
geom_point(aes(x = x, y = y.point, group = names.points, colour = names.points), size = 5) +
geom_line(aes(x = x, y = y.line, group = names.lines, colour = names.lines), show.legend = FALSE)
Legends are not related to geoms but to the scales and display the categories (or the range of the values) mapped on an aesthetic. Hence, you get four colors because you have four categories mapped on the color aesthetic. The geoms used are only displayed in the legend key via the so called key glyph which is a point for geom_point and a line for geom_line. And show.legend=FALSE only means to not display the key glyph for geom_line in the legend key, i.e. the legend keys shows only a point but no line.
To remove the categories related to the lines from your legend use e.g. the breaks argument of scale_color_discrete instead.
library(ggplot2)
library(dplyr)
data.frame(
x = rep(1:2, 2),
names.points = rep(c("A.point", "B.point"), 2),
y.point = c(2, 4, 7, 9),
names.lines = rep(c("A.line", "B.line"), each = 2),
y.line = c(3, 3, 8, 8)
) %>%
ggplot() +
geom_point(aes(x = x, y = y.point, group = names.points, colour = names.points), size = 5) +
geom_line(aes(x = x, y = y.line, group = names.lines, colour = names.lines), show.legend = FALSE) +
scale_color_discrete(breaks = c("A.point", "B.point"))
UPDATE To fix your issue with the colors you could use a named color vector:
pal_col <- rep(c("darkblue","darkred"), 2)
names(pal_col) <- c("A.point", "B.point", "A.line", "B.line")
data.frame(
x = rep(1:2, 2),
names.points = rep(c("A.point", "B.point"), 2),
y.point = c(2, 4, 7, 9),
names.lines = rep(c("A.line", "B.line"), each = 2),
y.line = c(3, 3, 8, 8)
) %>%
ggplot() +
geom_point(aes(x = x, y = y.point, group = names.points, colour = names.points), size = 5) +
geom_line(aes(x = x, y = y.line, group = names.lines, colour = names.lines), show.legend = FALSE) +
scale_color_manual(breaks = c("A.point", "B.point"),
values = pal_col)
Sample data
set.seed(123)
par(mfrow = c(1,2))
dat <- data.frame(years = rep(1980:2014, each = 8), x = sample(1000:2000, 35*8 ,replace = T))
boxplot(dat$x ~ dat$year, ylim = c(500, 4000))
I have another dataset that has a single value for some selected years
ref.dat <- data.frame(years = c(1991:1995, 2001:2008), x = sample(1000:2000, 13, replace = T))
plot(ref.dat$years, ref.dat$x, type = "b")
How can I add the line plot on top of the boxplot
With ggplot2 you could do this:
ggplot(dat, aes(x = years, y = x)) +
geom_boxplot(data = dat, aes(group = years)) +
geom_line(data = ref.dat, colour = "red") +
geom_point(data = ref.dat, colour = "red", shape = 1) +
coord_cartesian(ylim = c(500, 4000)) +
theme_bw()
The trick here is to figure out the x-axis on the boxplot. You have 35 boxes and they are plotted at the x-coordinates 1, 2, 3, ..., 35 - i.e. year - 1979. With that, you can add the line with lines as usual.
set.seed(123)
dat <- data.frame(years = rep(1980:2014, each = 8),
x = sample(1000:2000, 35*8 ,replace = T))
boxplot(dat$x ~ dat$year, ylim = c(500, 2500))
ref.dat <- data.frame(years = c(1991:1995, 2001:2008),
x = sample(1000:2000, 13, replace = T))
lines(ref.dat$years-1979, ref.dat$x, type = "b", pch=20)
The points were a bit hard to see, so I changed the point style 20. Also, I used a smaller range on the y-axis to leave less blank space.
I am trying to plot the density curve of a t-distribution with mean = 3 and df = 1.5 using ggplot2. However it is supposed to be symmetric around 3, so I can not use the noncentrality parameter.
ggplot(data.frame(x = c(-4, 10)), aes(x = x)) +
stat_function(fun = dt, args = list(df = 1.5))
Is there a way to simply shift the distribution along the x-axis?
you could also make a custom function for your shifted t-distribution:
custom <- function(x) {dt(x - 3, 1.5)}
ggplot(data.frame(x = c(-4, 10)), aes(x = x)) +
stat_function(fun = custom)
A simple solution is to just change the labels instead:
ggplot(data.frame(x = c(-4, 10)), aes(x = x)) +
stat_function(fun = dt, args = list(df = 1.5)) +
scale_x_continuous(breaks = c(0, 5, 10), labels = c(3, 8, 13))
There is also a function dt.scaled in the metRology package, which in addition to the df, lets you specify the mean and scale.
Relevant code:
dt.scaled <- function(x, df, mean = 0, sd = 1, ncp, log = FALSE) {
if (!log) stats::dt((x - mean)/sd, df, ncp = ncp, log = FALSE)/sd
else stats::dt((x - mean)/sd, df, ncp = ncp, log = TRUE) - log(sd)
}
My question is similar to this but the answers there will not work for me. Basically, I'm trying to produce a regression discontinuity plot with a "fuzzy" design that uses all the data for the treatment and control groups, but only plots the regression line within the "range" of the treatment and control groups.
Below, I've simulated some data and produced the fuzzy RD plot with base graphics. I'm hoping to replicate this plot with ggplot2. Note that the most important part of this is that the light blue regression line is fit using all the blue points, while the peach colored regression line is fit using all the red points, despite only being plotted over the ranges in which individuals were intended to receive treatment. That's the part I'm having a hard time replicating in ggplot.
I'd like to move to ggplot because I'd like to use faceting to produce this same plot across various units in which participants were nested. In the code below, I show a non-example using geom_smooth. When there's no fuzziness within a group, it works fine, but otherwise it fails. If I could get geom_smooth to be limited to only specific ranges, I think I'd be set. Any and all help is appreciated.
Simulate data
library(MASS)
mu <- c(0, 0)
sigma <- matrix(c(1, 0.7, 0.7, 1), ncol = 2)
set.seed(100)
d <- as.data.frame(mvrnorm(1e3, mu, sigma))
# Create treatment variable
d$treat <- ifelse(d$V1 <= 0, 1, 0)
# Introduce fuzziness
d$treat[d$treat == 1][sample(100)] <- 0
d$treat[d$treat == 0][sample(100)] <- 1
# Treatment effect
d$V2[d$treat == 1] <- d$V2[d$treat == 1] + 0.5
# Add grouping factor
d$group <- gl(9, 1e3/9)
Produce regression discontinuity plot with base
library(RColorBrewer)
pal <- brewer.pal(5, "RdBu")
color <- d$treat
color[color == 0] <- pal[1]
color[color == 1] <- pal[5]
plot(V2 ~ V1,
data = d,
col = color,
bty = "n")
abline(v = 0, col = "gray", lwd = 3, lty = 2)
# Fit model
m <- lm(V2 ~ V1 + treat, data = d)
# predicted achievement for treatment group
pred_treat <- predict(m,
newdata = data.frame(V1 = seq(-3, 0, 0.1),
treat = 1))
# predicted achievement for control group
pred_no_treat <- predict(m,
newdata = data.frame(V1 = seq(0, 4, 0.1),
treat = 0))
# Add predicted achievement lines
lines(seq(-3, 0, 0.1), pred_treat, col = pal[4], lwd = 3)
lines(seq(0, 4, 0.1), pred_no_treat, col = pal[2], lwd = 3)
# Add legend
legend("bottomright",
legend = c("Treatment", "Control"),
lty = 1,
lwd = 2,
col = c(pal[4], pal[2]),
box.lwd = 0)
non-example with ggplot
d$treat <- factor(d$treat, labels = c("Control", "Treatment"))
library(ggplot2)
ggplot(d, aes(V1, V2, group = treat)) +
geom_point(aes(color = treat)) +
geom_smooth(method = "lm", aes(color = treat)) +
facet_wrap(~group)
Notice the regression lines extending past the treatment range for groups 1 and 2.
There's probably a more graceful way to make the lines with geom_smooth, but it can be hacked together with geom_segment. Munge the data.frames outside of the plotting call if you like.
ggplot(d, aes(x = V1, y = V2, color = factor(treat, labels = c('Control', 'Treatment')))) +
geom_point(shape = 21) +
scale_color_brewer(NULL, type = 'qual', palette = 6) +
geom_vline(aes(xintercept = 0), color = 'grey', size = 1, linetype = 'dashed') +
geom_segment(data = data.frame(t(predict(m, data.frame(V1 = c(-3, 0), treat = 1)))),
aes(x = -3, xend = 0, y = X1, yend = X2), color = pal[4], size = 1) +
geom_segment(data = data.frame(t(predict(m, data.frame(V1 = c(0, 4), treat = 0)))),
aes(x = 0, xend = 4, y = X1, yend = X2), color = pal[2], size = 1)
Another option is geom_path:
df <- data.frame(V1 = c(-3, 0, 0, 4), treat = c(1, 1, 0, 0))
df <- cbind(df, V2 = predict(m, df))
ggplot(d, aes(x = V1, y = V2, color = factor(treat, labels = c('Control', 'Treatment')))) +
geom_point(shape = 21) +
geom_vline(aes(xintercept = 0), color = 'grey', size = 1, linetype = 'dashed') +
scale_color_brewer(NULL, type = 'qual', palette = 6) +
geom_path(data = df, size = 1)
For the edit with facets, if I understand what you want correctly, you can calculate a model for each group with lapply and predict for each group. Here I'm recombine with dplyr::bind_rows instead of do.call(rbind, ...) for the .id parameter to insert the group number from the list element name, though there are other ways to do the same thing.
df <- data.frame(V1 = c(-3, 0, 0, 4), treat = c('Treatment', 'Treatment', 'Control', 'Control'))
m_list <- lapply(split(d, d$group), function(x){lm(V2 ~ V1 + treat, data = x)})
df <- dplyr::bind_rows(lapply(m_list, function(x){cbind(df, V2 = predict(x, df))}), .id = 'group')
ggplot(d, aes(x = V1, y = V2, color = treat)) +
geom_point(shape = 21) +
geom_vline(aes(xintercept = 0), color = 'grey', size = 1, linetype = 'dashed') +
geom_path(data = df, size = 1) +
scale_color_brewer(NULL, type = 'qual', palette = 6) +
facet_wrap(~group)
I have a plot from the following script.
require(ggplot2)
df.shape <- data.frame(
AX = runif(10),
AY = runif(10),
BX = runif(10, 2, 3),
BY = runif(10, 2, 3)
)
p <- ggplot(df.shape)
p <- p + geom_point(aes(x = AX, y = AY, shape = 15)) +
geom_point(aes(x = BX, y = BY, shape = 19)) +
scale_shape_identity() +
guides(shape = guide_legend(override.aes = list(shape = 15, shape = 19)) )
print(p)
This doesn't produce a legend, describing which shape is "A" and which shape is "B". Note that the squares and circles may be close to one another, so I can't generally define the variable based on location. How do I display a "shape" legend?
I would reshape my data in the long format using reshape:
dt <- reshape(df.shape ,direction='long', varying=list(c(1, 3), c(2, 4)),
,v.names = c('X','Y'), times = c('A','B'))
Then I plot it simply like this
ggplot(dt) +
geom_point(aes(x = X, y = Y, shape = time),size=5) +
scale_shape_manual(values=c(15,19))