I am trying to find a way to colour the background after a specific value.
Here in this example, I want to colour the spaces after the value 5 (here shown with a vertical line).
#
library(lme4)
library(tidyverse)
data("sleepstudy")
#
sleepstudy = sleepstudy %>% mutate(days = ifelse(Days > 5, 1, 0))
#
m1 = sleepstudy %>% group_by(Days, days) %>% summarise(m = mean(Reaction))
m1
m1 %>% ggplot(aes(Days, m)) +
geom_point() +
geom_vline(xintercept = 6) +
theme_minimal()
I want to achieve something like this
However, when I use the following line, I get an error message.
m1 %>% ggplot(aes(Days, m)) +
geom_point() +
geom_vline(xintercept = 6) +
theme_minimal() +
geom_ribbon(data = m1, aes(x = c(6,9), ymin=0, ymax = 400), fill = 'khaki', alpha = 0.2)
Maybe the following does what the question asks for.
First of all, if the error bars are to be plotted, the data preparation code must change.
There is no need to compute an extra variable, days that tells if Days are greater than 6.
The standard errors must be computed.
This can be all done in one pipe only.
library(lme4)
library(tidyverse)
data("sleepstudy")
m1 <- sleepstudy %>%
group_by(Days) %>%
summarise(m = mean(Reaction),
s = sd(Reaction))
Now the plot.
I have changed the order of the geoms, to have the points, error bars and vertical line over the ribbon.
I have also increased the alpha level to 0.30.
There is no need to reset the x aesthetic, it is set since the beginning of the plot.
It's the latter point that caused the code error.
Error: Aesthetics must be either length 1 or the same as the data (10): x
m1 %>% ggplot(aes(Days, m)) +
theme_minimal() +
geom_ribbon(data = m1 %>% filter(Days > 5),
aes(ymin = 0, ymax = 400),
fill = 'khaki',
alpha = 0.30) +
geom_vline(xintercept = 6) +
geom_point() +
geom_errorbar(aes(ymin = m - s, ymax = m + s))
Related
I have a very specific situation where I am trying to get a regression line to start at the origin and fit to one set of points on the x axis. Ideally I wouldn't have to do this but the line from the origin to these points is actually meaningful in my case.
I have come up with a simple example
library(dplyr)
library(ggplot2)
y<-c(1,2,3,4,5,6,7,8)
x<-c(3,3,3,3,3,3,3,3)
z<-as.data.frame(cbind(x,y))
z %>% ggplot(aes(x,y)) + geom_point() +
geom_smooth(formula = y ~ x + 0, method = "lm") +
theme_bw() + expand_limits(x = 0, y = 0) +
theme(aspect.ratio = 1)
Here, geom smooth will not fit a line from the origin to the points at x = 3. I'm assuming that there is some background script telling geom smooth to not plot a line where no variation in x axis exists. I've tested this somewhat and by changing one of the x values to 0 I can indeed get a line from the origin (though the y value I choose influences the confidence interval which is not ideal).
y<-c(1,2,3,4,5,6,7,8)
x<-c(3,3,3,3,3,3,3,0)
z<-as.data.frame(cbind(x,y))
z %>% ggplot(aes(x,y)) + geom_point() +
geom_smooth(formula = y ~ x + 0, method = "lm") +
theme_bw() + expand_limits(x = 0, y = 0) +
theme(aspect.ratio = 1)
I don't want to fiddle with the dataset and add a point at y = 0, x = 0 as I'm worried about that influencing some error estimate (however small). I'm assuming that there is some condition that I can set within geom smooth or some other command to force the line to fit. Any help is appreciated, thanks
Remember that linear regression just tells you the conditional mean of y for a given x. The "regression" at x = 3 is simply the best estimate of the mean of y at x = 3. Since all of your points are at x = 3, the conditional mean of y when x = 3 is just mean(y)
So all you need should be a line going from (0, 0) to (0, mean(y)). It really doesn't make any sense to have a standard error around this line, though perhaps it might be justified depending on the context.
library(dplyr)
library(ggplot2)
y<-c(1,2,3,4,5,6,7,8)
x<-c(3,3,3,3,3,3,3,3)
z<-as.data.frame(cbind(x,y))
z %>% ggplot(aes(x,y)) + geom_point() +
geom_smooth(formula = y ~ x + 0, method = "lm") +
theme_bw() + expand_limits(x = 0, y = 0) +
theme(aspect.ratio = 1) +
geom_line(data = data.frame(x = c(0, 3), y = c(0, mean(y))))
Kind of silly but workable solution that I have figured out.
If I add an incredibly small amount of random variation to values in the x axis and specify fullrange = TRUE within geom_smooth then I can get the line to fit with an error estimate.
y<-c(1,2,3,4,5,6,7,8)
x<-c(3,3,3,3,3,3,3,3)
z<-as.data.frame(cbind(x,y))
z %>% mutate(rand = rnorm(8, mean=0.0000000001, sd=0.000000000001), x = x + rand) %>%
ggplot(aes(x,y)) + geom_point() +
geom_smooth(formula = y ~ x + 0, fullrange = TRUE ,method = "lm") +
theme_bw() + expand_limits(x = 0, y = 0) +
theme(aspect.ratio = 1)
I'm trying to recreate some of the very nice animations showing the behavior of loess from David Robinson found at Variance Explained. When I try to recreate the animations there I get a different behavior with my groups than Robinson shows. In the animation the points are jumping about in a way I don't expect. I'm wondering if there is different behavior from tidyr::crossing than in the deprecated inflate function he uses from broom. Any advice on how to make the animation in last plot below appreciated.
library(lattice)
library(ggplot2)
library(broom)
theme_set(theme_bw())
mod <- loess(NOx ~ E, ethanol, degree = 1, span = .75)
fit <- broom::augment(mod)
# plot to animate with lm showing moving loess
ggplot(fit, aes(E, NOx)) +
geom_point() +
geom_line(aes(y = .fitted), color = "red")
library(dplyr)
dat <- ethanol %>%
# note use of crossing over inflate
tidyr::crossing(center = unique(ethanol$E)) %>%
mutate(dist = abs(E - center)) %>%
filter(rank(dist) / n() <= .75) %>%
mutate(weight = (1 - (dist / max(dist)) ^ 3) ^ 3)
# animate plot -- awry
p <- ggplot(dat, aes(x=E, y=NOx)) +
geom_point(aes(alpha = weight)) +
geom_smooth(aes(group = center, weight = weight), method = "lm", se = FALSE) +
geom_vline(aes(xintercept = center), lty = 2) +
geom_line(aes(y = .fitted), data = fit, color = "red")
# why so many lm fits in middle range of E that are below loess line?
# something is wrong with the groups defined by center?
p
# make the animation
library(gganimate)
p + labs(title = 'E={frame_time}') + transition_time(center)
Answering myself. I was missing a group_by
E.g.,
dat <- ethanol %>%
tidyr::crossing(center = unique(ethanol$E)) %>%
group_by(center) %>%
mutate(dist = abs(E - center)) %>%
filter(rank(dist) / n() <= .75) %>%
mutate(weight = (1 - (dist / max(dist)) ^ 3) ^ 3)
p <- ggplot(dat, aes(x=E, y=NOx)) +
geom_point(aes(alpha = weight)) +
geom_smooth(aes(group = center, weight = weight), method = "lm", se = FALSE) +
geom_vline(aes(xintercept = center), lty = 2) +
geom_line(aes(y = .fitted), data = fit, color = "red")
library(gganimate)
p + transition_states(center)
I'm aware there are similar posts but I could not get those answers to work in my case.
e.g. Here and here.
Example:
diamonds %>%
ggplot(aes(scale(price) %>% as.vector)) +
geom_density() +
xlim(-3, 3) +
facet_wrap(vars(cut))
Returns a plot:
Since I used scale, those numbers are the zscores or standard deviations away from the mean of each break.
I would like to add as a row underneath the equivalent non scaled raw number that corresponds to each.
Tried:
diamonds %>%
ggplot(aes(scale(price) %>% as.vector)) +
geom_density() +
xlim(-3, 3) +
facet_wrap(vars(cut)) +
geom_text(aes(label = price))
Gives:
Error: geom_text requires the following missing aesthetics: y
My primary question is how can I add the raw values underneath -3:3 of each break? I don't want to change those breaks, I still want 6 breaks between -3:3.
Secondary question, how can I get -3 and 3 to actually show up in the chart? They have been trimmed.
[edit]
I've been trying to make it work with geom_text but keep hitting errors:
diamonds %>%
ggplot(aes(x = scale(price) %>% as.vector)) +
geom_density() +
xlim(-3, 3) +
facet_wrap(vars(cut)) +
geom_text(label = price)
Error in layer(data = data, mapping = mapping, stat = stat, geom = GeomText, :
object 'price' not found
I then tried changing my call to geom_text()
geom_text(data = diamonds, aes(price), label = price)
This results in the same error message.
You can make a custom labeling function for your axis. This takes each label on the axis and performs a custom transform for you. In your case you could paste the z score, a line break, and the z-score times the standard deviation plus the mean. Because of the distribution of prices in the diamonds data set, this means that z scores below about -1 represent negative prices. This may not be a problem in your own data. For clarity I have drawn in a vertical line representing $0
labeller <- function(x) {
paste0(x,"\n", scales::dollar(sd(diamonds$price) * x + mean(diamonds$price)))
}
diamonds %>%
ggplot(aes(scale(price) %>% as.vector)) +
geom_density() +
geom_vline(aes(xintercept = -0.98580251364833), linetype = 2) +
facet_wrap(vars(cut)) +
scale_x_continuous(label = labeller, limits = c(-3, 3)) +
xlab("price")
We can use the sec_axis functionality in scale_x_continuous. To use this functionality we need to manually scale your data. This will add a secondary axis at the top of the plot, not underneath. So it's not quite exactly what you're looking for.
library(tidyverse)
# manually scale the data
mean_price <- mean(diamonds$price)
sd_price <- sd(diamonds$price)
diamonds$price_scaled <- (diamonds$price - mean_price) / sd_price
# make the plot
ggplot(diamonds, aes(price_scaled))+
geom_density()+
facet_wrap(~cut)+
scale_x_continuous(sec.axis = sec_axis(~ mean_price + (sd_price * .)),
limits = c(-3, 4), breaks = -3:3)
You could cheat a bit by passing some dummy data to geom_text:
geom_text(data = tibble(label = round(((-3:3) * sd_price) + mean_price),
y = -0.25,
x = -3:3),
aes(x, y, label = label))
This question already has answers here:
How to fill with different colors between two lines? (originally: fill geom_polygon with different colors above and below y = 0 (or any other value)?)
(4 answers)
Closed 7 months ago.
Aim
I am trying to fill the area between two lines in a plot generated with ggplot in R. I would like to fill everything between the lines above of the horizontal line with a different color than below the horizontal line.
I succeeded to fill everything between the two lines with a single color, however, I did not manage to differentiate above and below the vertical line by two different colors.
Code
set.seed(123)
# Load packages
library(tidyverse)
# Create sample dataframe
df <- data.frame(x=seq(1,50,1),y=runif(50, min = 0, max = 10))
# Generate plot
ggplot(data = df, aes(x = x, y = y)) +
geom_line() +
geom_hline(yintercept = 5) +
theme_classic() +
geom_ribbon(aes(ymin=5,ymax=y), fill="blue")
Question
How do I fill the space above and below the horizontal line with a different color?
You can calculate the coordinates of the points where the two lines intersect & add them to your data frame:
m <- 5 # replace with desired y-intercept value for the horizontal line
# identify each run of points completely above (or below) the horizontal
# line as a new section
df.new <- df %>%
arrange(x) %>%
mutate(above.m = y >= m) %>%
mutate(changed = is.na(lag(above.m)) | lag(above.m) != above.m) %>%
mutate(section.id = cumsum(changed)) %>%
select(-above.m, -changed)
# calculate the x-coordinate of the midpoint between adjacent sections
# (the y-coordinate would be m), & add this to the data frame
df.new <- rbind(
df.new,
df.new %>%
group_by(section.id) %>%
filter(x %in% c(min(x), max(x))) %>%
ungroup() %>%
mutate(mid.x = ifelse(section.id == 1 |
section.id == lag(section.id),
NA,
x - (x - lag(x)) /
(y - lag(y)) * (y - m))) %>%
select(mid.x, y, section.id) %>%
rename(x = mid.x) %>%
mutate(y = m) %>%
na.omit())
With this data frame, you can then define two separate geom_ribbon layers with different colours. Comparison of results below (note: I also added a geom_point layer for illustration, & changed the colours because the blue in the original is a little glaring on the eyes...)
p1 <- ggplot(df,
aes(x = x, y = y)) +
geom_ribbon(aes(ymin=5, ymax=y), fill="dodgerblue") +
geom_line() +
geom_hline(yintercept = m) +
geom_point() +
theme_classic()
p2 <- ggplot(df.new, aes(x = x, y = y)) +
geom_ribbon(data = . %>% filter(y >= m),
aes(ymin = m, ymax = y),
fill="dodgerblue") +
geom_ribbon(data = . %>% filter(y <= m),
aes(ymin = y, ymax = m),
fill = "firebrick1") +
geom_line() +
geom_hline(yintercept = 5) +
geom_point() +
theme_classic()
fill three layers in order, for a particularly ugly result:
# Generate plot
ggplot(data = df, aes(x = x, y = y)) +
geom_line() +
geom_hline(yintercept = 5) +
theme_classic() +
geom_ribbon(aes(ymin=y,ymax=10), fill="green")+
geom_ribbon(aes(ymin=0,ymax=y), fill="yellow")+
geom_ribbon(aes(ymin=5,ymax=y), fill="blue")
I am attempting to produce a graph that shows two groups of error bars, but the different error bars represent different estimates of central tendency/variability (e.g., mean with sd and median with quantiles). I'm trying to use position_dodge, but it's not working, and I suspect this is because I'm feeding it values from a different dataset. Here's a reproducible example:
#### simulate dosages
dose = factor(rep(c("small", "medium", "large"), times=10))
dose = relevel(dose, "small")
#### simulate fevers, based on dosage (but highly skewed)
fever = rnorm(length(dose), 100, 1)
betas = matrix(c(0, -3, -6), nrow=1)
fever = fever + as.numeric(betas%*%t(model.matrix(fever~dose)))
#### put into data frame
d = data.frame(dose=dose, fever=fever)
#### compute means and standard errors
means = d %>% group_by(dose) %>% summarise(mean=mean(fever), lower=mean - sd(fever), upper = mean + sd(fever))
medians = d %>% group_by(dose) %>% summarise(median=median(fever), lower=quantile(fever, .25), upper = quantile(fever, .75))
#### put all into a ggplot
ggplot(d, aes(x=dose, y=fever)) +
geom_jitter(alpha=.2, width=.2) +
geom_point(data=means, aes(x=dose, y=mean)) +
geom_point(data=medians, aes(x=dose, y=median), col="red") +
geom_errorbar(data=means, aes(y=mean, ymin=lower, ymax=upper), width=.2, position=position_dodge(width=.2)) +
geom_errorbar(data= medians, aes(y=median, ymin=lower, ymax=upper), width=.2, position=position_dodge(width=.2), col="red")
Which gives the results of the following image:
Notice dodging isn't working.
Let's assume I can't just use stat_summary (I can't...I'm actually comparing means with some robust estimates from another package). Is there any way to offset the error bars/dots so they can be better seen?
Combine your dataframes for both statistics so you can map the kind of statistic on group:
means <- df %>%
group_by(dose) %>%
summarise(Statistic = "Mean", Value = mean(fever), lower=mean(fever) - sd(fever), upper = mean(fever) + sd(fever))
medians <- df %>%
group_by(dose) %>%
summarise(Statistic = "Median", Value = median(fever), lower=quantile(fever, 0.25), upper = quantile(fever, 0.75))
df2 <- bind_rows(means, medians)
#### put all into a ggplot
ggplot(df, aes(x = dose, y = fever)) +
geom_jitter(alpha = .2, width = .2) +
geom_point(data = df2, aes(x = dose, y = Value, color = Statistic)) +
geom_errorbar(data = df2, aes(y = Value, ymin = lower, ymax = upper,
group = Statistic, color = Statistic),
width=.2, position = position_dodge(width = .2))