I want to create a line plot in ggplot2 that the panel background colors alternate between white and grey based on the X axis values.
In this case DOY is day of year and I would like for it to transition between each day.
I included some basic sample code. Basically want between DOY 1-2 to be white and DOY 2-3 to be grey and so forth.
Any help is appreciated, thanks in advance.
DOY <- c(1, 2, 3, 4, 5)
Max <- c(200, 225, 250, 275, 300)
sample <- data.frame(DOY, Max)
ggplot()+
geom_line(data=sample, aes(x=DOY, y=Max), color = "black")
One way to approach this is to add a new variable (called e.g. stripe) to the data, which alternates based on the value of DOY. Then you can use that variable as the basis for filled, transparent rectangles.
I'm assuming that DOY is a sequence of integers with interval = 1, so we can assign on the basis of whether DOY is odd or even.
(Note: sample - not a great variable name as there's a function of that name).
library(dplyr)
library(ggplot2)
sample %>%
mutate(stripe = factor(ifelse(DOY %% 2 == 0, 1, 0))) %>%
ggplot(aes(DOY, Max)) +
geom_point() +
geom_rect(aes(xmax = DOY + 1,
xmin = DOY,
ymin = min(Max),
ymax = Inf,
fill = stripe), alpha = 0.4) +
scale_fill_manual(values = c("white", "grey50")) +
theme_bw() +
guides(fill = FALSE)
Result:
Related
I'm new to data scinece and I need any help, please.
I need to put labels from column (integers) on to the graphed line on the plot at certain x points, like x=20, then, x=50, then x=80 or at least every 20-30 steps. I am using geam_text, but it puts the labels on every point that it draws, and I need only on some certain points, so it is readable.
the code is:
ggp<-cut_offs %>%
ggplot(mapping=aes(x=IL6, y=RhoTSHT3_in_less))+
geom_line(color="blue")+
geom_point(col=ifelse(cut_offs$pvalTSHT3_in_less<0.05, "red", "black"))+
ylim(0.1,0.7)+
geom_text(aes(label=n_more))
So, I guess I need to change my last line of code to something like this:
geom_text(aes(label=ifelse(x in labels, cut_offs$n_more, "")))
where labels is a list with point where I wanna put labels.
currently, my graph looks like this, which is unreadable:
I tried this
geom_text(aes(label=ifelse(x in labels, cut_offs$n_more, "")))
and of course it's not working, how do I write it in R?
We don't have your actual data to demonstrate an answer, but I have constructed a very similar set with the same names, range and approximate shape as your own (see footnote).
Using this, we see that your code produces much the same set of problems:
library(tidyverse)
cut_offs %>%
ggplot(aes(IL6, RhoTSHT3_in_less)) +
geom_line(color = "blue")+
geom_point(col = ifelse(cut_offs$pvalTSHT3_in_less < 0.05, "red", "black"))+
ylim(0.1, 0.7) +
geom_text(aes(label = n_more))
To label, say, only every 25th measurement along the x axis, we can do:
cut_offs %>%
ggplot(aes(IL6, RhoTSHT3_in_less)) +
geom_line(color = "blue")+
geom_point(col = ifelse(cut_offs$pvalTSHT3_in_less < 0.05, "red", "black"))+
ylim(0.1, 0.7) +
geom_text(data = . %>% filter(row_number() %% 25 == 1), aes(label = n_more),
nudge_y = 0.05)
Footnote - data used
set.seed(1)
cut_offs <- data.frame(IL6 = seq(0, 500, len = 251),
RhoTSHT3_in_less = c(seq(0.45, 0.22, len = 20) +
rnorm(20, 0, 0.02),
runif(231, .2, .25)),
n_more = sample(300, 251),
pvalTSHT3_in_less = runif(251, 0, 0.2))
I am plotting a time series of returns and would like to use NBER recession dating to shade recessions, like FRED graphs do.
The recession variable is in the same data frame and is a 1, 0 variable for: 1 = Recession, 0 = Expansion.
The idea is to use geom_rect and alpha = (Recession == 1) to shade the areas where Recession == 1.
The code for the gg_plot is below. Thanks for the help!
ERVALUEplot <- ggplot(data = Alldata)+
geom_line(aes(x = Date, y = ERVALUE), color = 'red')+
geom_rect(aes(x = Date, alpha = (Alldata$Recession ==1)), color = 'grey')
I think your case might be slightly simplified by using geom_tile() instead of geom_rect(). The output is the same but the parametrisation is easier.
I have presumed your data had a structure roughly like this:
library(ggplot2)
set.seed(2)
Alldata <- data.frame(
Date = Sys.Date() + 1:10,
ERVALUE = cumsum(rnorm(10)),
Recession = sample(c(0, 1), 10, replace = TRUE)
)
With this data, we can make grey rectangles wherever recession == 1 as follows. Here, I've mapped it to a scale to generate a legend automatically.
ggplot(Alldata, aes(Date)) +
geom_tile(aes(alpha = Recession, y = 1),
fill = "grey", height = Inf) +
geom_line(aes(y = ERVALUE), colour = "red") +
scale_alpha_continuous(range = c(0, 1), breaks = c(0, 1))
Created on 2021-08-25 by the reprex package (v1.0.0)
I am trying to compare the distributions of a continuous variable across groups using violin plots. Pretty easy. However, I would like to make comparisons across distributions easier by showing the distribution for one of the groups (the reference) in grey with a low alpha value in the background. Something like this but with a violin plot:
My current approach plots the data twice. For the first geom_violin, I duplicate the data for the reference group and plot it in grey. For the second geom_violin, I use the actual data d. In this example, the two violin plots in grey and blue should look the same for the group "blue". However, they are NOT the same even though they are based on exactly the same data for group "blue".
How can I resolve this problem? Or is there another better approach to do this?
d <- tibble(
group = sample(c("green", "blue"), 1000, replace = TRUE, prob = c(0.7, 0.3)),
x = ifelse(group == "green", rnorm(1000, 1, 1), rnorm(1000, 0, 3))
)
dblue <- filter(d, group == "blue")
dblue <- bind_rows(dblue, mutate(dblue, group = "green"))
ggplot(d, aes(x = factor(group), y = x)) +
geom_violin(data = dblue, fill = alpha("#333333", 0.2), color = alpha("#333333", 0)) +
geom_violin(fill = alpha("#0072B2", 0.8), color = alpha("#0072B2", 0))
Add scale = "width" to the second geom_violin
ggplot(d, aes(x = factor(group), y = x)) +
geom_violin(data = dblue, fill = alpha("#333333", 0.2), color = alpha("#333333", 0)) +
geom_violin(fill = alpha("#0072B2", 0.8), color = alpha("#0072B2", 0),
scale = "width")
I'm trying to create a line graph depicting different trajectories over time for two groups/conditions. I have two groups for which the data 'eat' was collected at five time points (1,2,3,4,5).
I'd like the lines to connect the mean point for each group at each of five time points, so I'd have two points at Time 1, two points at Time 2, and so on.
Here's a reproducible example:
#Example data
library(tidyverse)
library(ggplot2)
eat <- sample(1:7, size = 30, replace = TRUE)
df <- data.frame(id = rep(c(1, 2, 3, 4, 5, 6), each = 5),
Condition = rep(c(0, 1), each = 15),
time = c(1, 2, 3, 4, 5),
eat = eat
)
df$time <- as.factor(df$time)
df$Condition <- as.factor(df$Condition)
#Create the plot.
library(ggplot2)
ggplot(df, aes(x = time, y = eat, fill = Condition)) + geom_line() +
geom_point(size = 4, shape = 21) +
stat_summary(fun.y = mean, colour = "red", geom = "line")
The problem is, I need my lines to go horizontally (ie to show two different colored lines moving across the x-axis). But this code just connects the dots vertically:
If I don't convert Time to a factor, but only convert Condition to a factor, I get a mess of lines. The same thing happens in my actual data, as well.
I'd like it to look like this aesthetically, with the transparent error envelopes wrapping each line. However, I don't want it to be curvy, I want the lines to be straight, connecting the means at each point.
Here's the lines running in straight segments through the means of each time, with the range set to be the standard deviation of the points at the time. One stat.summary makes the mean line with the colour aesthetic, the other makes the area using the inherited fill aesthetic. ggplot2::mean_se is a convenient function that takes a vector and returns a data frame with the mean and +/- some number of standard errors. This is the right format for thefun.data argument to stat_summary, which passes these values to the geom specified. Here, geom_ribbon accepts ymin and ymax values to plot a ribbon across the graph.
library(tidyverse)
set.seed(12345)
eat <- sample(1:7, size = 30, replace = T)
df <- data.frame(
Condition = rep(c(0, 1), each = 15),
time = c(1, 2, 3, 4, 5),
eat = eat
)
df$Condition <- as.factor(df$Condition)
ggplot(df, aes(x = time, y = eat, fill = Condition)) +
geom_point(size = 4, shape = 21, colour = "black") +
stat_summary(geom = "ribbon", fun.data = mean_se, alpha = 0.2) +
stat_summary(
mapping = aes(colour = Condition),
geom = "line",
fun.y = mean,
show.legend = FALSE
)
Created on 2018-07-09 by the reprex package (v0.2.0).
Here's my best guess at what you want:
# keep time as numeric
df$time = as.numeric(as.character(df$time))
ggplot(df, aes(x = time, y = eat, group = Condition)) +
geom_smooth(
aes(fill = Condition, linetype = Condition),
method = "lm",
level = 0.65,
color = "black",
size = 0.3
) +
geom_point(aes(color = Condition))
Setting the level = 0.65 is about +/- 1 standard deviation on the linear model fit.
I think this code will get you most of the way there
library(tidyverse)
eat <- sample(1:7, size = 30, replace = TRUE)
tibble(id = rep(c(1, 2, 3, 4, 5, 6), each = 5),
Condition = factor(rep(c(0, 1), each = 15)),
time = factor(rep(c(1, 2, 3, 4, 5), 6)),
eat = eat) %>%
ggplot(aes(x = time, y = eat, fill = Condition, group = Condition)) +
geom_point(size = 4, shape = 21) +
geom_smooth()
geom_smooth is what you were looking for, I think. This creates a linear model out of the points, and as long as your x value is a factor, it should use the mean and connect the points that way.
This question already has an answer here:
Manually setting group colors for ggplot2
(1 answer)
Closed 6 years ago.
I'm doing multiple plots split by one variable and in each plot, colour code based on another variable.
set.seed(12345)
dates = seq(as.Date("2000-01-01"), as.Date("2016-01-01"), by = 1)
dd = data.table(date = dates, value = rnorm(length(dates)))
dd[, year := lubridate::year(date)]
dd[, c := cut(value, c(-Inf, -3, 3, Inf))]
for (thisyear in 2000:2015) {
ggplot(dd[year == thisyear]) +
geom_ribbon(aes(x = date, ymin = -Inf, ymax = Inf, fill = c), alpha = 0.1)
}
dd[, length(unique(c)), by = year]
year V1
1: 2000 1
2: 2001 2
3: 2002 2
4: 2003 3
5: 2004 3
....
Now the colour in different plots will be inconsistent since not every year has the same length of unique cut values. Even worse is when one year has all (-Inf,3] values (unlikely here of course) and another year has all [3,Inf) values, they will both be coloured red in two plots.
How can I specify that (-Inf, 3] always take blue and (-3,3] always take green?
One way to manually specify the colors to use, would be to simply create a column in your data frame specifying the plot color to use.
For example:
# scatter plot
dd$color <- ifelse(dd$value <= 3, 'blue', 'green')
ggplot(dd, aes(date, value)) + geom_point(colour=dd$color)
# ribbon plot
thisyear <- '2001'
dd_year <- dd[year == thisyear,]
ggplot(dd_year, aes(date, group=color, colour=color)) +
geom_ribbon(aes(ymin=value - 1, ymax=value + 1, fill=color), alpha=0.5) +
scale_fill_manual(values=unique(dd_year$color)) +
scale_color_manual(values=unique(dd_year$color))
This would result in all points <= 3 being colored blue, and the remaining ones green.
Not the most interesting example perhaps since there is only only data point that gets colored green here, but it should look like this:
You can create a named vector of colors to pass to scale_fill_manual. This allows you to choose the colors of each group as well as ensuring that each plot has the same colors among groups.
colors = c("blue", "green", "red")
names(colors) = levels(dd$c)
(-Inf,-3] (-3,3] (3, Inf]
"blue" "green" "red"
Now the same plot, but with scale_fill_manual added.
for (thisyear in 2000:2015) {
print(ggplot(dd[year == thisyear]) +
geom_ribbon(aes(x = date, y = value, ymin = -Inf, ymax = Inf, fill = c), alpha = 0.1) +
scale_fill_manual(values = colors))
}