I have a figure like below and I would like to change the order of colored lines from blue->green->red to r->g->b, so that it is the same as the legend's order. I have found several tutorials to change the order of legends, but I want to keep its order in this case (since they are 1, 2, and 3).
Here is the code to generate the data and figure.
population_num <- 100
population <- tibble(
gender = as.numeric(rbinom(population_num, 1, 0.5)),
age=rnorm(population_num, mean=50, sd=20),
score=rnorm(population_num, mean=80, sd=30),
setid=sample(c(1,2,3), size=population_num, replace=T)
)
temp <- population %>%
group_by(setid) %>%
do(model1 = tidy(lm(score ~ age, data = .)),
model2 = tidy(lm(score ~ age + gender, data = .))) %>%
gather(model_name, model, -setid) %>%
unnest() %>%
filter(term == "age")
interval1 <- -qnorm((1-0.9)/2)
ggplot(temp, aes(colour = as.factor(setid))) +
geom_hline(yintercept = 0, colour = gray(1/2), lty = 2) +
geom_linerange(aes(x = model_name, ymin = estimate - std.error*interval1,
ymax = estimate + std.error*interval1),
lwd = 1, position = position_dodge(width = 1/2)) +
scale_x_discrete(limits=c("model2", "model1"), labels=c("M2", "M1")) +
coord_flip()
(This question is once asked in Japanese Stackoverflow, but couldn't get answers.)
You can change the width parameter of position_dodge to a negative. This does produce a warning:
Warning message:
position_dodge requires non-overlapping x intervals
but plots fine:
ggplot(temp, aes(colour = as.factor(setid))) +
geom_hline(yintercept = 0, colour = gray(1/2), lty = 2) +
geom_linerange(aes(x = model_name, ymin = estimate - std.error*interval1,
ymax = estimate + std.error*interval1),
lwd = 1, position = position_dodge(width = -1/2)) +
scale_x_discrete(limits=c("model2", "model1"), labels=c("M2", "M1")) +
coord_flip()
Related
Problem: My dataset has a shared baseline (timepoint 1) with 2 repeated measures. The followup points are repeated with 2 different conditions (ie a cross-over). When plotting, the results, the error bars and data points overlap.
library(tidyverse)
set.seed(11)
df_test <-
tibble(group = factor(rep(c("A", "B"), each = 3)),
timepoint = factor(rep(1:3, 2)),
y = c(0, rnorm(2, mean = 0, sd = .2), 0, rnorm(2, mean = .7, sd = 0.35))) %>%
mutate(ymax = y + .2,
ymin = y - .2)
# plot_nododge <-
df_test %>%
ggplot(aes(timepoint, y,
group = group,
shape = group,
fill = group)) +
geom_linerange(linetype = 1,
aes(ymin = ymin,
ymax = ymax)) +
geom_point() +
geom_line()
Solution with position_dodge(): This solution fixes the overlap, but the "Baseline" point is actually a single measure. I would like for this to be a single point to avoid confusion, but still dodge the followup points.
Is there a simple solution to this that I'm missing?
I would like to use a custom dodge for each point (eg scale_position_dodge_identity), but position is not accepted as an aesthetic.
# plot1 <-
df_test %>%
ggplot(aes(timepoint, y,
group = group,
shape = group,
fill = group)) +
geom_linerange(linetype = 1,
position = position_dodge(.2),
aes(ymin = ymin,
ymax = ymax)) +
geom_point(position = position_dodge(.2)) +
geom_line(position = position_dodge(.2))
Solution and expansion*
Maybe something better will be developed in the future but manually adjusting the position seems best for now. Since the point shape and color should also reflect the different "Baseline" measure and the x-axis should really have nice labels, I've edited to show some more finishing touches.
df_test %>%
mutate(shape_var = case_when(timepoint == 1 ~ 22,
group == "A" ~ 21,
group == "B" ~ 24),
fill_var = case_when(timepoint == 1 ~ "black",
group == "A" ~ "red",
group == "B" ~ "blue"),
timepoint2 = as.numeric(as.character(timepoint)),
timepoint2 = timepoint2 + 0.05*(timepoint2 > 1 & group == "B"),
timepoint2 = timepoint2 - 0.05*(timepoint2 > 1 & group == "A")) %>%
ggplot(aes(timepoint2, y,
group = group,
shape = shape_var,
fill = fill_var)) +
geom_linerange(aes(ymin = ymin,
ymax = ymax)) +
geom_line() +
geom_point(size = 2) +
scale_x_continuous(breaks = 1:3, # limit to selected time points
labels = c("Baseline", "time 1", "time 2"), # label like a discrete scale
limits = c(.75, 3.25)) + # give it some room
scale_shape_identity(guide = "legend",
name = "Treatment",
breaks = c(21, 24), # Defines legend order too; only label the treatment groups
labels = c("A", "B" )) + # this part is tricky - could easily reverse them
scale_fill_identity(guide = "legend",
name = "Treatment",
breaks = c("red", "blue"),# Defines legend order too; only label the treatment groups
labels = c("A", "B" )) + # this part is tricky - could easily reverse them
labs(x=NULL)
Created on 2021-09-19 by the reprex package (v2.0.1)
After some experimentation, I have found a pretty good way to do this. Seems like you can use position_dodge2(width = c(...)) to specify individual dodge widths.
For your example, specify width = c(0.000001, 0.2, 0.2):
df_test %>%
ggplot(aes(timepoint, y, group = group, shape = group, fill = group)) +
geom_linerange(linetype = 1, position = position_dodge2(width = c(0.000001, 0.2, 0.2)), aes(ymin = ymin, ymax = ymax)) +
geom_point(position = position_dodge2(width = c(0.000001, 0.2, 0.2))) +
geom_line(position = position_dodge2(width = c(0.000001, 0.2, 0.2)))
Initially, I tried setting width = c(0, 0.2, 0.2), but that didn't behave as expected:
?position_dodge doesn't explicity explain using a string of values as the dodge widths, so I don't know why it doesn't work when you use zero as a width. A value of 0.000001 is 'close enough' to zero that you can't tell by looking at the figure, so hopefully this will suffice.
One solution could be to just change the position of the original points directly in the data set for the purpose of the plot:
df_test %>%
mutate(timepoint = as.numeric(as.character(timepoint))) %>%
mutate(timepoint = timepoint + 0.1*(timepoint > 1 & group == "B")) %>%
ggplot(aes(timepoint, y,
group = group,
shape = group,
fill = group)) +
geom_linerange(linetype = 1,
aes(ymin = ymin,
ymax = ymax)) +
scale_x_continuous(breaks = unique(as.numeric(as.character(df_test$timepoint))))+
geom_point() +
geom_line()
I have been struggling with this for hours now. I have the following script:
library(ggplot2)
sims = replicate(1000, sample(c(0,0,0,0,1,1,1,2,2,2), size=3, replace=FALSE))
df = data.frame(x=colSums(sims == 0),
y=colSums(sims == 1))
df$count <- 1
total_counts = aggregate(count ~ ., df, FUN = sum)
min_count = min(total_counts$count)
max_count = max(total_counts$count)
p = (ggplot(df, aes(x=x, y=y))
+ geom_count(aes(color=..n.., size=..n..), alpha=0.8)
+ guides(color = 'legend', size=FALSE)
+ labs(color='Count')
+ scale_colour_gradient(limits = c(min_count, max_count),
breaks = round(seq(min_count, max_count, length.out=5)),
labels = round(seq(min_count, max_count, length.out=5)))
+ scale_size_continuous(range = c(3, 7.5))
)
So far so good. The problem is that I want to add two additional sets of points:
df2 = data.frame(x=c(0, 1, 2, 3),
y=c(1.5253165, 1.0291262, 0.4529617, 0))
df3 = data.frame(x=c(0, 1, 2, 3),
y=c(1.5, 1, 0.5, 0))
To get something like this:
p2 = (p
+ geom_point(data=df2, aes(x=x, y=y), alpha=0.4, color="red", size = 2.5)
+ geom_point(data=df3, aes(x=x, y=y), alpha=0.4, color="green", size = 2.5)
)
The problem is that I am not being capable of adding these new points to the legend. I would like the legend to be in a different "section". Namely, to have an empty string title (to differentiate these points from "Count" title), and to have strings instead of numbers in their labels ("Simulated means" and "Theoretical means", for example).
Is there any way to achieve this?
A trick I learned from #tjebo is that you can use the ggnewscale package to spawn additional legends. At what point in plot construction you call the new scale is important, so you first want to make a geom/stat layer and add the desired scale. Once these are declared, you can use new_scale_colour() and all subsequent geom/stat layers will use a new colour scale.
library(ggplot2)
#> Warning: package 'ggplot2' was built under R version 4.0.5
library(ggnewscale)
#> Warning: package 'ggnewscale' was built under R version 4.0.3
sims = replicate(1000, sample(c(0,0,0,0,1,1,1,2,2,2), size=3, replace=FALSE))
df = data.frame(x=colSums(sims == 0),
y=colSums(sims == 1))
df$count <- 1
total_counts = aggregate(count ~ ., df, FUN = sum)
min_count = min(total_counts$count)
max_count = max(total_counts$count)
df2 = data.frame(x=c(0, 1, 2, 3),
y=c(1.5253165, 1.0291262, 0.4529617, 0))
df3 = data.frame(x=c(0, 1, 2, 3),
y=c(1.5, 1, 0.5, 0))
ggplot(df, aes(x, y)) +
geom_count(aes(colour = after_stat(n), size = after_stat(n)),
alpha = 0.5) +
scale_colour_gradient(
limits = c(min_count, max_count),
breaks = round(seq(min_count, max_count, length.out = 5)),
labels = round(seq(min_count, max_count, length.out = 5)),
guide = "legend"
) +
new_scale_colour() +
geom_point(aes(colour = "Simulated means"),
data = df2, alpha = 0.4) +
geom_point(aes(colour = "Theoretical means"),
data = df3, alpha = 0.4) +
scale_colour_discrete(
name = ""
) +
scale_size_continuous(range = c(3, 7.5), guide = "none")
Created on 2021-04-22 by the reprex package (v1.0.0)
(P.S. sorry for reformatting your code, it just read more easily for myself this way)
I am working with a dataset called HappyDB for a class presentation and analyzing demographic differences in word frequency. I'm using tidytext for most of the analyses, and using their online guide to create most of my visuals. However, I'm running into a problem with the code to create the frequency plot of words with labels. My dataset is structured differently from theirs, and I thought I was accounting for it but I evidently was not. This is their sample code to generate the graph (comparing Jane Austen with the Bronte sisters and H.G. Wells)
library(tidyr)
frequency <- bind_rows(mutate(tidy_bronte, author = "Brontë Sisters"),
mutate(tidy_hgwells, author = "H.G. Wells"),
mutate(tidy_books, author = "Jane Austen")) %>%
mutate(word = str_extract(word, "[a-z']+")) %>%
count(author, word) %>%
group_by(author) %>%
mutate(proportion = n / sum(n)) %>%
select(-n) %>%
spread(author, proportion) %>%
gather(author, proportion, `Brontë Sisters`:`H.G. Wells`)
library(scales)
# expect a warning about rows with missing values being removed
ggplot(frequency, aes(x = proportion, y = `Jane Austen`, color = abs(`Jane Austen` - proportion))) +
geom_abline(color = "gray40", lty = 2) +
geom_jitter(alpha = 0.1, size = 2.5, width = 0.3, height = 0.3) +
geom_text(aes(label = word), check_overlap = TRUE, vjust = 1.5) +
scale_x_log10(labels = percent_format()) +
scale_y_log10(labels = percent_format()) +
scale_color_gradient(limits = c(0, 0.001), low = "darkslategray4", high = "gray75") +
facet_wrap(~author, ncol = 2) +
theme(legend.position="none") +
labs(y = "Jane Austen", x = NULL)
And that code generates this plot:
I'm hoping to emulate this with demographics in my dataset, but keep getting errors. Here is my code, which uses a dataset that I have already tidied:
library(dplyr)
library(tidyr)
library(ggplot2)
library(tidytext)
library(stringr)
windowsFonts(Franklin=windowsFont("Franklin Gothic Demi"))
marriedmen <- tidy_hm[which(tidy_hm$marital =="married" &
tidy_hm$gender == "m"),]
marriedwomen <- tidy_hm[which(tidy_hm$marital =="married" &
tidy_hm$gender == "f"),]
singlemen <- tidy_hm[which(tidy_hm$marital =="single" &
tidy_hm$gender == "m"),]
frequency <- bind_rows(mutate(marriedmen, status = "Married men"),
mutate(marriedwomen, status = "Married women"),
mutate(singlemen, status = "Single men")) %>%
count(status, word) %>%
group_by(status) %>%
mutate(proportion = n / sum(n)) %>%
select(-n) %>%
spread(status, proportion) %>%
gather(status, proportion, `Married women`:`Single men`)
library(scales)
# expect a warning about rows with missing values being removed
ggplot(frequency, aes(x = proportion, y = 'Married men', color = abs(`Married men` - proportion)) +
geom_abline(color = "gray40", lty = 2) +
geom_jitter(alpha = 0.1, size = 2.5, width = 0.3, height = 0.3) +
geom_text(aes(label = word), check_overlap = TRUE, vjust = 1.5) +
scale_x_log10(labels = percent_format()) +
scale_y_log10(labels = percent_format()) +
scale_color_gradient(limits = c(0, 0.001), low = "darkslategray4", high = "gray75") +
facet_wrap(~status, ncol = 2) +
theme(legend.position="none") +
labs(y = NULL, x = NULL)
But I keep getting this error:
Error in log(x, base) : non-numeric argument to mathematical function
I tried removing the scale rows, but that caused a bunch of data to get eliminated and the plot didn't look anything like it was supposed to, and had no line, labels, or colors. I'm pretty new to r and coding in general so any help is appreciated.
I know that this is possible with other types of models (e.g., this), but I haven't come across this for a model fit with "brms".
Has anyone had luck plotting the coefficients of multiple models, with different DVs, that were fit through brms?
Edit:
This is as far as I've gotten trying to use the method from that link for brms.
m1h<-fixef(mcmcm1_h1)
m1e<-fixef(mcmcm1_e1) #these extract fixed effect info from a model fit with brms; below is an example of their output#
m1h
Estimate Est.Error 2.5%ile 97.5%ile
Intercept 0.2615716 0.1482702 -0.01995366 0.5593722
m1hframe <- data.frame(Variable = colnames(m1h),Coefficient = m1h[, 1],SE =
m1h[, 2],min = m1h[, 3],max = m1h[, 4],modelName = "HH")
m1eframe <- data.frame(Variable = colnames(m1e),Coefficient = m1e[, 1],SE =
m1e[, 2],min = m1e[, 3],max = m1e[, 4],modelName = "Em")
allModelFrame <- data.frame(rbind(m1hframe, m1eframe))
library(ggplot2)
zp1 <- ggplot(allModelFrame, aes(colour = modelName))
zp1 <- zp1 + geom_hline(yintercept = 0, colour = gray(1/2), lty = 2)
zp1 <- zp1 + geom_linerange(aes(x = Variable, ymin = min,ymax = max),lwd = 1,
position = position_dodge(width = 1/2))
zp1 <- zp1 + geom_pointrange(aes(x = Variable, y = Coefficient, ymin = min,
ymax = max,lwd = 1/2, position = position_dodge(width = 1/2),shape = 21, fill
= "WHITE"))
zp1 <- zp1 + coord_flip() + theme_bw()
zp1 <- zp1 + ggtitle("two models")
print(zp1)
This is the error I am getting:
Don't know how to automatically pick scale for object of type PositionDodge/Position/ggproto. Defaulting to continuous.
Error: A continuous variable can not be mapped to shape
A slightly simpler solution using broom:
multiplot <- function(x) {
x %>% purrr::map(function(.) {
broom::tidy(., conf.int = TRUE, par_type = "non-varying") }) %>%
dplyr::bind_rows(.id = "model") %>%
ggplot(aes(term, estimate, ymin = lower, ymax = upper, color = model)) +
geom_pointrange(position = position_dodge(width = 0.3)) + coord_flip()
}
Use like this multiplot(list(m1, m2, m3)).
My question is similar to this but the answers there will not work for me. Basically, I'm trying to produce a regression discontinuity plot with a "fuzzy" design that uses all the data for the treatment and control groups, but only plots the regression line within the "range" of the treatment and control groups.
Below, I've simulated some data and produced the fuzzy RD plot with base graphics. I'm hoping to replicate this plot with ggplot2. Note that the most important part of this is that the light blue regression line is fit using all the blue points, while the peach colored regression line is fit using all the red points, despite only being plotted over the ranges in which individuals were intended to receive treatment. That's the part I'm having a hard time replicating in ggplot.
I'd like to move to ggplot because I'd like to use faceting to produce this same plot across various units in which participants were nested. In the code below, I show a non-example using geom_smooth. When there's no fuzziness within a group, it works fine, but otherwise it fails. If I could get geom_smooth to be limited to only specific ranges, I think I'd be set. Any and all help is appreciated.
Simulate data
library(MASS)
mu <- c(0, 0)
sigma <- matrix(c(1, 0.7, 0.7, 1), ncol = 2)
set.seed(100)
d <- as.data.frame(mvrnorm(1e3, mu, sigma))
# Create treatment variable
d$treat <- ifelse(d$V1 <= 0, 1, 0)
# Introduce fuzziness
d$treat[d$treat == 1][sample(100)] <- 0
d$treat[d$treat == 0][sample(100)] <- 1
# Treatment effect
d$V2[d$treat == 1] <- d$V2[d$treat == 1] + 0.5
# Add grouping factor
d$group <- gl(9, 1e3/9)
Produce regression discontinuity plot with base
library(RColorBrewer)
pal <- brewer.pal(5, "RdBu")
color <- d$treat
color[color == 0] <- pal[1]
color[color == 1] <- pal[5]
plot(V2 ~ V1,
data = d,
col = color,
bty = "n")
abline(v = 0, col = "gray", lwd = 3, lty = 2)
# Fit model
m <- lm(V2 ~ V1 + treat, data = d)
# predicted achievement for treatment group
pred_treat <- predict(m,
newdata = data.frame(V1 = seq(-3, 0, 0.1),
treat = 1))
# predicted achievement for control group
pred_no_treat <- predict(m,
newdata = data.frame(V1 = seq(0, 4, 0.1),
treat = 0))
# Add predicted achievement lines
lines(seq(-3, 0, 0.1), pred_treat, col = pal[4], lwd = 3)
lines(seq(0, 4, 0.1), pred_no_treat, col = pal[2], lwd = 3)
# Add legend
legend("bottomright",
legend = c("Treatment", "Control"),
lty = 1,
lwd = 2,
col = c(pal[4], pal[2]),
box.lwd = 0)
non-example with ggplot
d$treat <- factor(d$treat, labels = c("Control", "Treatment"))
library(ggplot2)
ggplot(d, aes(V1, V2, group = treat)) +
geom_point(aes(color = treat)) +
geom_smooth(method = "lm", aes(color = treat)) +
facet_wrap(~group)
Notice the regression lines extending past the treatment range for groups 1 and 2.
There's probably a more graceful way to make the lines with geom_smooth, but it can be hacked together with geom_segment. Munge the data.frames outside of the plotting call if you like.
ggplot(d, aes(x = V1, y = V2, color = factor(treat, labels = c('Control', 'Treatment')))) +
geom_point(shape = 21) +
scale_color_brewer(NULL, type = 'qual', palette = 6) +
geom_vline(aes(xintercept = 0), color = 'grey', size = 1, linetype = 'dashed') +
geom_segment(data = data.frame(t(predict(m, data.frame(V1 = c(-3, 0), treat = 1)))),
aes(x = -3, xend = 0, y = X1, yend = X2), color = pal[4], size = 1) +
geom_segment(data = data.frame(t(predict(m, data.frame(V1 = c(0, 4), treat = 0)))),
aes(x = 0, xend = 4, y = X1, yend = X2), color = pal[2], size = 1)
Another option is geom_path:
df <- data.frame(V1 = c(-3, 0, 0, 4), treat = c(1, 1, 0, 0))
df <- cbind(df, V2 = predict(m, df))
ggplot(d, aes(x = V1, y = V2, color = factor(treat, labels = c('Control', 'Treatment')))) +
geom_point(shape = 21) +
geom_vline(aes(xintercept = 0), color = 'grey', size = 1, linetype = 'dashed') +
scale_color_brewer(NULL, type = 'qual', palette = 6) +
geom_path(data = df, size = 1)
For the edit with facets, if I understand what you want correctly, you can calculate a model for each group with lapply and predict for each group. Here I'm recombine with dplyr::bind_rows instead of do.call(rbind, ...) for the .id parameter to insert the group number from the list element name, though there are other ways to do the same thing.
df <- data.frame(V1 = c(-3, 0, 0, 4), treat = c('Treatment', 'Treatment', 'Control', 'Control'))
m_list <- lapply(split(d, d$group), function(x){lm(V2 ~ V1 + treat, data = x)})
df <- dplyr::bind_rows(lapply(m_list, function(x){cbind(df, V2 = predict(x, df))}), .id = 'group')
ggplot(d, aes(x = V1, y = V2, color = treat)) +
geom_point(shape = 21) +
geom_vline(aes(xintercept = 0), color = 'grey', size = 1, linetype = 'dashed') +
geom_path(data = df, size = 1) +
scale_color_brewer(NULL, type = 'qual', palette = 6) +
facet_wrap(~group)