My question is similar to this but the answers there will not work for me. Basically, I'm trying to produce a regression discontinuity plot with a "fuzzy" design that uses all the data for the treatment and control groups, but only plots the regression line within the "range" of the treatment and control groups.
Below, I've simulated some data and produced the fuzzy RD plot with base graphics. I'm hoping to replicate this plot with ggplot2. Note that the most important part of this is that the light blue regression line is fit using all the blue points, while the peach colored regression line is fit using all the red points, despite only being plotted over the ranges in which individuals were intended to receive treatment. That's the part I'm having a hard time replicating in ggplot.
I'd like to move to ggplot because I'd like to use faceting to produce this same plot across various units in which participants were nested. In the code below, I show a non-example using geom_smooth. When there's no fuzziness within a group, it works fine, but otherwise it fails. If I could get geom_smooth to be limited to only specific ranges, I think I'd be set. Any and all help is appreciated.
Simulate data
library(MASS)
mu <- c(0, 0)
sigma <- matrix(c(1, 0.7, 0.7, 1), ncol = 2)
set.seed(100)
d <- as.data.frame(mvrnorm(1e3, mu, sigma))
# Create treatment variable
d$treat <- ifelse(d$V1 <= 0, 1, 0)
# Introduce fuzziness
d$treat[d$treat == 1][sample(100)] <- 0
d$treat[d$treat == 0][sample(100)] <- 1
# Treatment effect
d$V2[d$treat == 1] <- d$V2[d$treat == 1] + 0.5
# Add grouping factor
d$group <- gl(9, 1e3/9)
Produce regression discontinuity plot with base
library(RColorBrewer)
pal <- brewer.pal(5, "RdBu")
color <- d$treat
color[color == 0] <- pal[1]
color[color == 1] <- pal[5]
plot(V2 ~ V1,
data = d,
col = color,
bty = "n")
abline(v = 0, col = "gray", lwd = 3, lty = 2)
# Fit model
m <- lm(V2 ~ V1 + treat, data = d)
# predicted achievement for treatment group
pred_treat <- predict(m,
newdata = data.frame(V1 = seq(-3, 0, 0.1),
treat = 1))
# predicted achievement for control group
pred_no_treat <- predict(m,
newdata = data.frame(V1 = seq(0, 4, 0.1),
treat = 0))
# Add predicted achievement lines
lines(seq(-3, 0, 0.1), pred_treat, col = pal[4], lwd = 3)
lines(seq(0, 4, 0.1), pred_no_treat, col = pal[2], lwd = 3)
# Add legend
legend("bottomright",
legend = c("Treatment", "Control"),
lty = 1,
lwd = 2,
col = c(pal[4], pal[2]),
box.lwd = 0)
non-example with ggplot
d$treat <- factor(d$treat, labels = c("Control", "Treatment"))
library(ggplot2)
ggplot(d, aes(V1, V2, group = treat)) +
geom_point(aes(color = treat)) +
geom_smooth(method = "lm", aes(color = treat)) +
facet_wrap(~group)
Notice the regression lines extending past the treatment range for groups 1 and 2.
There's probably a more graceful way to make the lines with geom_smooth, but it can be hacked together with geom_segment. Munge the data.frames outside of the plotting call if you like.
ggplot(d, aes(x = V1, y = V2, color = factor(treat, labels = c('Control', 'Treatment')))) +
geom_point(shape = 21) +
scale_color_brewer(NULL, type = 'qual', palette = 6) +
geom_vline(aes(xintercept = 0), color = 'grey', size = 1, linetype = 'dashed') +
geom_segment(data = data.frame(t(predict(m, data.frame(V1 = c(-3, 0), treat = 1)))),
aes(x = -3, xend = 0, y = X1, yend = X2), color = pal[4], size = 1) +
geom_segment(data = data.frame(t(predict(m, data.frame(V1 = c(0, 4), treat = 0)))),
aes(x = 0, xend = 4, y = X1, yend = X2), color = pal[2], size = 1)
Another option is geom_path:
df <- data.frame(V1 = c(-3, 0, 0, 4), treat = c(1, 1, 0, 0))
df <- cbind(df, V2 = predict(m, df))
ggplot(d, aes(x = V1, y = V2, color = factor(treat, labels = c('Control', 'Treatment')))) +
geom_point(shape = 21) +
geom_vline(aes(xintercept = 0), color = 'grey', size = 1, linetype = 'dashed') +
scale_color_brewer(NULL, type = 'qual', palette = 6) +
geom_path(data = df, size = 1)
For the edit with facets, if I understand what you want correctly, you can calculate a model for each group with lapply and predict for each group. Here I'm recombine with dplyr::bind_rows instead of do.call(rbind, ...) for the .id parameter to insert the group number from the list element name, though there are other ways to do the same thing.
df <- data.frame(V1 = c(-3, 0, 0, 4), treat = c('Treatment', 'Treatment', 'Control', 'Control'))
m_list <- lapply(split(d, d$group), function(x){lm(V2 ~ V1 + treat, data = x)})
df <- dplyr::bind_rows(lapply(m_list, function(x){cbind(df, V2 = predict(x, df))}), .id = 'group')
ggplot(d, aes(x = V1, y = V2, color = treat)) +
geom_point(shape = 21) +
geom_vline(aes(xintercept = 0), color = 'grey', size = 1, linetype = 'dashed') +
geom_path(data = df, size = 1) +
scale_color_brewer(NULL, type = 'qual', palette = 6) +
facet_wrap(~group)
Related
I have a large number of variables and would like to create scatterplots comparing all variables to a single variable. I have been able to do this in base R using lapply, but I cannot complete the same task in ggplot2 using lapply.
Below is an example dataset.
df <- data.frame("ID" = 1:16)
df$A <- c(1,2,3,4,5,6,7,8,9,10,11,12,12,14,15,16)
df$B <- c(5,6,7,8,9,10,13,15,14,15,16,17,18,18,19,20)
df$C <- c(11,12,14,16,10,12,14,16,10,12,14,16,10,12,14,16)
I define the variables I would like to generate scatterplots with, using the code below:
df_col_names <- df %>% select(A:C) %>% colnames(.)
Below is how I have been able to successfully complete the task of plotting all variables against variable A, using lapply in base R:
lapply(df_col_names, function(x) {
tiff(filename=sprintf("C:\\Documents\\%s.tiff", x),
width = 1000, height = 1000, res=200)
plot(df$A, df[[x]],
pch=19,
cex = 1.5,
ylab = x,
ylim = c(0, 20),
xlim = c(0, 20))
dev.off()
})
Below is my attempt at completing the task in ggplot2 without any success. It generates the tiff images, although they are empty.
lapply(df_col_names, function(x) {
tiff(filename=sprintf("C:\\Documents\\%s.tiff", x),
width = 1000, height = 1000, res=200)
ggplot(df) +
geom_point(data = df,
aes(x = A, y = df_col_names[[x]], size = 3)) +
geom_smooth(aes(x = A, y = df_col_names[[x]], size = 0), method = "lm", size=0.5) +
coord_fixed(ratio = 1, xlim = c(0, 20), ylim = c(0, 20)) +
guides(size = FALSE, color = FALSE) +
theme_bw(base_size = 14)
dev.off()
})
It works for me with ggsave. Also note that you are passing string column names to ggplot so use .data to refer to actual column values.
library(ggplot2)
lapply(df_col_names, function(x) {
ggplot(df) +
geom_point( aes(x = A, y = .data[[x]], size = 3)) +
geom_smooth(aes(x = A, y = .data[[x]], size = 0), method = "lm", size=0.5) +
coord_fixed(ratio = 1, xlim = c(0, 20), ylim = c(0, 20)) +
guides(size = FALSE, color = FALSE) +
theme_bw(base_size = 14) -> plt
ggsave(sprintf("%s.tiff", x), plt)
})
I have a data and would like to plot the lines and have control over the order that lines are laying on top of each other.
I would like to use 'cale_color_viridis()' as my pallet. I have no idea how can plot the lighter(yellow) line on the darker ones.
Here is my toy data frame and my code:
toy_data <- data.frame(x = c(1,3,1,2,5,0), y = c(0, 01, 1, 0.6, 1, .7),
col = rep(c("r", "b", "g"), each = 2), group = seq(0,1, by = 0.2))
ggplot(toy_data, aes(x = x, y = y, group = col, color = group)) +
geom_line(size = 2) +
scale_color_viridis()
any idea how can I do this?
The group aesthetic determines the plotting order, in this case, the col variable which is character data. It will normally plot in alphabetical order (b g r), so to get the yellow line from col "g" to print last, you could convert it to a factor ordered in order of appearance, like with forcats::fct_inorder:
ggplot(toy_data,
aes(x = x, y = y, group = col %>% forcats::fct_inorder(), color = group)) +
geom_line(size = 2) +
scale_color_viridis_c() # added in ggplot2 3.0 in July 2018.
# scale_color_viridis for older ggplot2 versions
If col is numeric, you could achieve the same thing by giving your "top" series the biggest number.
toy_data2 <- data.frame(x = c(1,3,1,2,5,0), y = c(0, 01, 1, 0.6, 1, .7),
col = rep(c(3, 1, 2), each = 2), group = seq(0,1, by = 0.2))
ggplot(toy_data2,
aes(x = x, y = y, group = if_else(col == 2, 1e10, col), color = group)) +
geom_line(size = 2) +
scale_color_viridis_c()
I am helping someone translate hand-drawn economics supply and demand functions into image files that can be included in a Word document. These have been going well using Hmisc::bezier and geom_path modeled after Andrew Heiss's recon plots and using his curve_intersect function. That is, until the author asked that one of the supply curves should pass through a specified set of coordinates. The Hmisc::bezier function only uses the first and last control point as absolute, and bends toward intermediate points so the specified intersection point does not match the curve. I tried creating a spline of 2 bezier curves with the bezier function from the bezier package (v1.1.2, https://cran.r-project.org/web/packages/bezier/bezier.pdf), but this fails with "Error in FUN(X[[i]], ...) : object 'x' not found", which I do not understand or know how to fix.
Please let me know where I am going wrong or if there is a better method! I will include the commented out attempts using various functions. Please excuse the amateurish code, as I am a relative newb at R and ggplot2.
This section not directly relevant to my question
# Graph figures for physical economics, negative oil prices paper
library(reconPlots)
library(dplyr)
library(ggplot2)
library(patchwork)
library(ggrepel)
library(bezier)
library(ggforce)
options(ggrepel.max.time = 1)
options(ggrepel.max.iter = 20000)
#Set seed value for ggrepel
set.seed(52)
# panel (a)
#Set values of curves using the bezier function, each pair of c() values
# is an xy coordinate, and the sets of coordinates control the shape of the
# curve
supply <- Hmisc::bezier(c(1, 5, 6), c(3, 4, 9)) %>%
as_data_frame()
demand <- Hmisc::bezier(c(0, 9, 9), c(6, 6, 6)) %>%
as_data_frame()
label_height <- Hmisc::bezier(c(0, 9, 9), c(8, 8, 8)) %>%
as_data_frame()
# Calculate the intersections of the two curves
intersections <- bind_rows(curve_intersect(supply, demand))
# Calculate point where the curve label(s) intersect a specified height
supply_label <- bind_rows(curve_intersect(supply, label_height))
labels <- data_frame(label = expression("PS"[CR]^DRL),
x = supply_label$x,
y = supply_label$y)
production <- ggplot(mapping = aes(x = x, y = y)) +
#Draw the supply curve. Demand is not drawn in this figure, but the
# intersections of an imaginary demand curve are used to illustrate P0
# and Q0, the intersection point, and the dotted lines
geom_path(data = supply, color = "#0073D9", size = 1) +
geom_segment(data = intersections,
aes(x = x, y = 0, xend = x, yend = y), lty = "dotted") +
geom_segment(data = intersections,
aes(x = 0, y = y, xend = x, yend = y), lty = "dotted") +
#Draw the supply curve label using the intersection calculated above, using
# GGrepel so that the labels do not overlap the curve line
geom_text_repel(data = labels
,aes(x = x, y = y, label = label)
,parse = TRUE
,direction = "x"
,force = 3
,force_pull = 0.1
,hjust = 0
,min.segment.length = 0
) +
#Draw the intersection point based on intersection function between supply
# and the phantom flat demand curve at height y=6
geom_point(data = intersections, size = 3) +
#Use scale functions to set y-axis label, axis intersection point labels,
# and limits of the viewing area
scale_x_continuous(expand = c(0, 0), breaks = intersections$x
,labels = expression(Q[CR]^{DRL-PS})
,limits=c(0,9)
) +
scale_y_continuous(expand = c(0, 0), breaks = c(intersections$y, 9)
,labels = c(expression(P[CR]==frac("$",brl))
,expression(P[CR]))
,limits=c(0,9)
) +
#Use labs function to set x-axis title and title of each graph using the
# caption function so that it displays on the bottom
labs(x = expression(frac(Barrels,Week)),
caption = expression(atop("(a) Driller Production Supply", "of Crude Oil"))
) +
#Set classic theme, x-axis title on right-hand side using larger font of
# relative size 1.2, graph title on left-hand side using same larger font
theme_classic() +
theme(axis.title.y = element_blank(),
axis.title.x = element_text(hjust = 1),
axis.text = element_text(size=rel(1.2)),
plot.caption = element_text(hjust = 0.5, size=rel(1.2))
) +
coord_equal()
# Save the intersections so we can set the same quantity, price for panel (c)
specified_intersections = intersections
# Panel (b)
supply <- Hmisc::bezier(c(3.99, 4), c(0, 9)) %>%
as_data_frame()
demand <- Hmisc::bezier(c(2, 3, 4, 5), c(9, 6.5, 6, 5.5)) %>%
as_data_frame()
demand_capacity <- Hmisc::bezier(c(5, 5), c(0, 5.5)) %>%
as_data_frame()
supply_capacity <- Hmisc::bezier(c(4.999, 5), c(0, 9)) %>%
as_data_frame()
supply_label_height <- Hmisc::bezier(c(0, 9), c(9, 9)) %>%
as_data_frame()
demand_label_height <- Hmisc::bezier(c(0, 9), c(8, 8)) %>%
as_data_frame()
capacity_label_height <- Hmisc::bezier(c(0, 9), c(9, 9)) %>%
as_data_frame()
# Calculate the intersections of the two curves
intersections <- bind_rows(curve_intersect(supply,
demand))
supply_label <- bind_rows(curve_intersect(supply
,supply_label_height))
demand_label <- bind_rows(curve_intersect(demand
,demand_label_height))
capacity_label <- bind_rows(curve_intersect(supply_capacity
,capacity_label_height))
labels <- data_frame(label = c(expression("OD"[CR]^DRL),expression("OS"[CR]^DRL)
,expression("Q"[CR]^CAP)
),
x = c(demand_label$x, supply_label$x
, capacity_label$x
),
y = c(demand_label$y, supply_label$y
, capacity_label$y
)
)
inventory <- ggplot(mapping = aes(x = x, y = y)) +
geom_path(data = supply, color = "#0073D9", size = 1) +
geom_path(data = demand, color = "#FF4036", size = 1) +
geom_path(data = demand_capacity, color = "#FF4036", size = 1) +
geom_path(data = supply_capacity, color = "#0073D9", size = 1, lty = "dashed") +
geom_segment(data = intersections,
aes(x = 0, y = y, xend = x, yend = y), lty = "dotted") +
geom_text_repel(data = labels
,aes(x = x, y = y, label = label)
,parse = TRUE
,direction = "x"
,force = 3
,force_pull = 0.1
,hjust = c(0, 0, 1)
,min.segment.length = 0
) +
geom_point(data = intersections, size = 3) +
scale_x_continuous(expand = c(0, 0), breaks = c(intersections$x
, 5),
labels = c(expression(paste(Q[CR]^{DRL-OS},phantom(12345)))
,expression(Q[CR]^CAP)
)
, limits=c(0,9)) +
scale_y_continuous(expand = c(0, 0), breaks = c(intersections$y, 9),
labels = c(expression(P[CR]),expression(P[CR]))
, limits=c(0,9)) +
labs(x = "Barrels",
caption = expression(atop("(b) Driller Storage / Ownership", "of Crude Oil"))
) +
theme_classic() +
theme(axis.title.y = element_blank(),
axis.title.x = element_text(hjust = 1),
axis.text = element_text(size=rel(1.2)),
plot.caption = element_text(hjust = 0.5, size=rel(1.2))
) +
coord_equal()
Relevant section
# panel (c)
# ggforce package method
#supply <- list(c(1, 4, specified_intersections$x, 5, 7),
# c(3, 4, specified_intersections$y, 7, 9)) %>%
# as_data_frame()
# bezier package method: Fails with "Error in FUN(X[[i]], ...) : object 'x' not found"
t <- seq(0, 2, length=10)
p <- list(c(1, 4, specified_intersections$x, 7, 8),
c(3, 4, specified_intersections$y, 6, 9))
#p <- matrix(c(1,3, 4,4, specified_intersections$x,specified_intersections$y,
# 7,6, 8,9), nrow=5, ncol=2, byrow=TRUE)
supply <- bezier(t=t, p=p) %>%
as_data_frame()
# Original: Fails because it does not pass through the specified intersection
#supply <- Hmisc::bezier(c(1, specified_intersections$x, 8),
# c(3, specified_intersections$y, 9)) %>%
# as_data_frame()
# Hmisc method: Fails because there is no way to get the two curves to appear
# contiguous
#supply1 <- Hmisc::bezier(c(1, 4, specified_intersections$x),
# c(3, 4, specified_intersections$y)) %>%
# as_data_frame()
#supply2 <- Hmisc::bezier(c(specified_intersections$x, 6, 7),
# c(specified_intersections$y, 8, 9)) %>%
# as_data_frame()
#demand <- Hmisc::bezier(c(0, 9), c(specified_intersections$y, specified_intersections$y)) %>%
# as_data_frame()
label_height <- Hmisc::bezier(c(0, 9), c(8, 8)) %>%
as_data_frame()
# Calculate the intersections of the two curves
#intersections <- bind_rows(curve_intersect(supply, demand))
#supply_label <- bind_rows(curve_intersect(supply,
# label_height))
#labels <- data_frame(label = expression("SS"[CR]^DRL),
# x = supply_label$x,
# y = supply_label$y)
sales <- ggplot(mapping = aes(x = x, y = y)) +
# ggforce package method
# geom_bspline(data = supply, color = "#0073D9", size = 1) +
# Original geom_path method
geom_path(data = supply, color = "#0073D9", size = 1) +
# Supply 1 and 2 for Hmisc method
# geom_path(data = supply1, color = "#0073D9", size = 1) +
# geom_path(data = supply2, color = "#0073D9", size = 1) +
geom_segment(data = specified_intersections,
aes(x = x, y = 0, xend = x, yend = y), lty = "dotted") +
geom_segment(data = specified_intersections,
aes(x = 0, y = y, xend = x, yend = y), lty = "dotted") +
# geom_text_repel(data = labels
# ,aes(x = x, y = y, label = label)
# ,parse = TRUE
# ,direction = "x"
# ,force = 3
# ,force_pull = 0.1
# ,hjust = 0
# ,min.segment.length = 0
# ) +
geom_point(data = specified_intersections, size = 3) +
scale_x_continuous(expand = c(0, 0), breaks = specified_intersections$x,
labels = expression(Q[CR]^{DRL-SS}), limits=c(0,9)) +
scale_y_continuous(expand = c(0, 0), breaks = c(specified_intersections$y, 9),
labels = c(expression(P[CR]),expression(P[CR]))) +
labs(x = expression(frac(Barrels,Week)),
caption = expression(atop("(c) Driller Sales Supply", "of Crude Oil"))
) +
theme_classic() +
theme(axis.title.y = element_blank(),
axis.title.x = element_text(hjust = 1),
axis.text = element_text(size=rel(1.2)),
plot.caption = element_text(hjust = 0.5, size=rel(1.2))
) +
coord_equal()
patchwork <- (production | inventory | sales)
patchwork
Graphs before implementation of fixed coordinates. Need to move panel (c) intersection point to match panel (a)
I solved the "Error in FUN(X[[i]], ...) : object 'x' not found" by printing the supply variable and noticing that the bezier function names its rows V1,V2 and not x,y. I needed to set the aesthetics of the geom_path to the correct mapping.
Relevant Section, trimmed to only the bezier method
# panel (c)
# bezier package method
t <- seq(0, 2, length = 100)
p <- matrix(c(1,3, 4,4, specified_intersections$x,specified_intersections$y,
7,6, 8,9), nrow=5, ncol=2, byrow=TRUE)
supply <- bezier::bezier(t=t, p=p, deg=2) %>%
as_data_frame()
sales <- ggplot(mapping = aes(x = x, y = y)) +
# Original geom_path method
geom_path(data = supply, mapping = aes(x = V1, y = V2),
color = "#0073D9", size = 1, inherit.aes = FALSE) +
geom_segment(data = specified_intersections,
aes(x = x, y = 0, xend = x, yend = y), lty = "dotted") +
geom_segment(data = specified_intersections,
aes(x = 0, y = y, xend = x, yend = y), lty = "dotted") +
geom_point(data = specified_intersections, size = 3) +
scale_x_continuous(expand = c(0, 0), breaks = specified_intersections$x,
labels = expression(Q[CR]^{DRL-SS}), limits=c(0,9)) +
scale_y_continuous(expand = c(0, 0), breaks = c(specified_intersections$y, 9),
labels = c(expression(P[CR]),expression(P[CR]))) +
labs(x = expression(frac(Barrels,Week)),
caption = expression(atop("(c) Driller Sales Supply", "of Crude Oil"))
) +
theme_classic() +
theme(axis.title.y = element_blank(),
axis.title.x = element_text(hjust = 1),
axis.text = element_text(size=rel(1.2)),
plot.caption = element_text(hjust = 0.5, size=rel(1.2))
) +
coord_equal()
patchwork <- (production | inventory | sales)
patchwork
This does not solve my larger problem of needing a smooth curve that passes through a specified set of coordinates, as it produces two bezier curves that do not match.
I will do some research on using functions to specify bezier curves and find out if there is some mathematical or programmatic way to specify a bezier curve that passes through a set of fixed coordinates. If I find one, I'll edit this answer.
If anyone knows how to accomplish this, I would appreciate any help!
Kinked bezier curves
I'm plotting a discrete CDF. I have a few questions regarding geom_step which I'm not finding by using Google.
Is it possible to make the line segment representing the jump dashed
rather than solid to better show whats going on?
Is it possible to add geom_point more efficiently than I do? (less
c/p).
Below is my current solution:
library(tidyverse)
library(ggthemes)
theme_set(theme_few())
x0 <- seq(-0.5, -0.01, by = 0.01)
x1 <- seq(0, 0.99, by = 0.02)
x2 <- seq(1, 1.99, by = 0.02)
x3 <- seq(2, 2.99, by = 0.02)
x35 <- seq(3, 3.49, by = 0.01)
x4 <- seq(3.5, 3.99, by = 0.01)
tibble_ex <- tibble(
x0 = x0,
x1 = x1,
x2 = x2,
x3 = x3,
x35 = x35,
x4 = x4
)
tibble_ex %>%
gather(x, xax, x0:x4) %>%
mutate(cdf = case_when(x == 'x0' ~ 0,
x == 'x1' ~ 1/2,
x == 'x2' ~ 3/5,
x == 'x3' ~ 4/5,
x == 'x35' ~ 9/10,
x == 'x4' ~ 1)) %>%
ggplot(aes(x = xax, y = cdf)) +
geom_step() +
geom_point(aes(x = 0, y = 0), size = 3, shape = 21, fill = 'white') +
geom_point(aes(x = 1, y = 0.5), size = 3, shape = 21, fill = 'white') +
geom_point(aes(x = 2, y = 3/5), size = 3, shape = 21, fill = 'white') +
geom_point(aes(x = 3, y = 4/5), size = 3, shape = 21, fill = 'white') +
geom_point(aes(x = 3.5, y = 9/10), size = 3, shape = 21, fill = 'white') +
geom_point(aes(x = 0, y = 0.5), size = 3, shape = 21, fill = 'black') +
geom_point(aes(x = 1, y = 3/5), size = 3, shape = 21, fill = 'black') +
geom_point(aes(x = 2, y = 4/5), size = 3, shape = 21, fill = 'black') +
geom_point(aes(x = 3, y = 9/10), size = 3, shape = 21, fill = 'black') +
geom_point(aes(x = 3.5, y = 1), size = 3, shape = 21, fill = 'black') +
labs(x = 'x', y = 'F(x)')
ggplot will be more powerful to use if you can put your data into a data frame and structure it so that the characteristics of your data can be mapped directly.
Here's a way to take your data and augment it with additional rows that represent the connecting points, by matching each x with the prior cdf value. I added a column, type, to keep track of which is which. I also arrange df so that geom_segment plots the points in the right order.
new_steps <-
tibble(x = c(0:3, 3.5, 4),
cdf = c(0, .5, .6, .8, .9, 1))
df <- new_steps %>%
mutate(type = "cdf") %>%
bind_rows(new_steps %>%
mutate(type = "prior",
cdf = lag(cdf))) %>%
drop_na() %>%
arrange(x, desc(type))
Then we can map the points' fill and the geom_segments' linetype to type.
ggplot(df) +
geom_point(aes(x, cdf, fill = type),
shape = 21) +
scale_fill_manual(values = c("black", "white")) +
geom_segment(aes(x = lag(x), y = lag(cdf),
xend = x, yend = cdf,
lty = type)) +
scale_linetype_manual(values = c("dashed", "solid"))
(1) No, there is not a built-in way to make the geom_step half-dashed. But if you post this as a separate question, perhaps someone will help create a new geom for this.
(2) The answer is to put the points you want plotted in a data frame, like anything else you might want to plot:
point_data = data.frame(x = rep(c(0, 1, 2, 3, 3.5), 2),
y = c(0, rep(c(.5, .6, .8, .9), 2), 1),
z = rep(c("a", "b"), each = 5))
# calling your gathered/mutated version of tibble_ex df
ggplot(df, aes(x = xax, y = cdf)) +
geom_step() +
geom_point(data = point_data, aes(x = x, y = y, fill = z), shape = 21) +
scale_fill_manual(values = c("white", "black"), guide = FALSE) +
labs(x = 'x', y = 'F(x)')
For the second part of your question, you can put all the coordinates in a separate data frame and call geom_point only once:
ddf <- data.frame(xax = rep(c(0:3, 3.5), 2),
cdf = c(0, .5, .6, .8, .9, .5, .6, .8, .9, 1),
col = rep(c("white", "black"), each = 5))
dev.new()
tibble_ex %>%
gather(x, xax, x0:x4) %>%
mutate(cdf = case_when(x == 'x0' ~ 0,
x == 'x1' ~ 1/2,
x == 'x2' ~ 3/5,
x == 'x3' ~ 4/5,
x == 'x35' ~ 9/10,
x == 'x4' ~ 1)) %>%
ggplot(aes(x = xax, y = cdf)) +
geom_step() +
geom_point(data = ddf, aes(fill = I(col)), size = 3, shape = 21) +
labs(x = 'x', y = 'F(x)')
I have the following data structure:
y <- rep(1:10, 2)
group <- rep(c('a', 'b'), each = 10)
dens <- c(c(seq(from = 0, to = 0.8, by = 0.1), 0),
c(seq(from = -0, to = -0.8, by = -0.1), 0))
my_dat <- data.frame(group, dens, y, stringsAsFactors = FALSE )
These are calculated density disributions, in order to make a grouped violin plot, such as in
Split violin plot with ggplot2
# Plot 1:
require(ggplot2)
ggplot(my_dat, aes(x = dens, y = y, fill = group)) +
geom_polygon(color = 'black', show.legend = FALSE)
Now this is simplified, because my data contains hundreds of rows for a smooth outline. (However, there is the central vertical line in my case.) I would now like to remove exactly this vertical central line.
(I guess the problem is removing any specified part of the polygon.)
An idea in my example was to overplot this with a vertical line:
#Plot 2
ggplot(my_dat, aes(x = dens, y = y, fill = group)) +
geom_polygon(color = 'black', show.legend = FALSE) +
geom_segment(x = 0,
xend = 0,
y = min(y) + 0.2,
yend = max(y) - 0.2,
color = '#00BFC4')
But to get the end of the over plotting segment line correct is tricky. (I have purposefully left the line a bit too short for demonstration)
edit
the groups are not distributed in a symmetrical fashion, although my example strongly suggests so.
You can always just plot another polygon on top
x <- with(my_dat, chull(dens, y))
my_dat2 <- my_dat[c(x, x[1L]), ]
ggplot(my_dat, aes(x = dens, y = y, fill = group)) +
geom_polygon(show.legend = FALSE) +
geom_polygon(data = my_dat2, aes(group = 1), size = 1,
fill = 'transparent',
# fill = NA, ## or this
color = 'black')
I think the simpler solution is to first draw all the outlines and then all the filled areas. This should work for any arbitrary polygon shapes.
y <- rep(1:10, 2)
group <- rep(c('a', 'b'), each = 10)
dens <- c(c(seq(from = 0, to = 0.8, by = 0.1), 0),
c(seq(from = -0, to = -0.8, by = -0.1), 0))
my_dat <- data.frame(group, dens, y, stringsAsFactors = FALSE )
require(ggplot2)
ggplot(my_dat, aes(x = dens, y = y)) +
geom_polygon(color = 'black', fill = NA, size = 2) +
geom_polygon(aes(fill = group), color = NA)