Shade ggplot2 background according to factor level [duplicate] - r

This question already has answers here:
Make the background of a graph different colours in different regions
(3 answers)
Closed 4 years ago.
I have a dataset with two variables (Tb and Ta) over time (Date). Looks like this:
` Date Tb Ta Light
1 2015-02-15 01:13:00 36.103 22.751 nightime
2 2015-02-15 01:55:00 36.103 22.626 nightime
3 2015-02-15 02:37:00 35.605 22.626 nightime
4 2015-02-15 03:19:00 35.605 22.751 nightime
5 2015-02-15 04:01:00 36.103 23.001 nightime
6 2015-02-15 04:43:00 35.605 22.876 nightime`
I am trying to make a plot with different shading for levels in the factor 'Light'. So all points with 'nightime' in 'Light' would be shaded in grey while the 'daytime' would be white. Something like this:
Is there a way to get geom_rect() to work with a factor level? I need all points coded as 'nightime' shaded with a grey background ...
I tried the following based on Using ggplot2 in R, how do I make the background of a graph different colours in different regions?
ggplot() + geom_rect(data=tbdf, (aes(xmin=Date,
xmax=Date, ymin=min(Tb), ymax=max(Tb), fill=Light))) +
geom_line(data = tbdf, aes(x = Date, y = Tb)) +
geom_line(data = tbdf, aes(x = Date, y = Ta), colour='grey') +
xlab('Date') +
ylab('Temperature (°C)')
and it ends up with a legend for Light but still the usual grey shading:
Any suggestions?

Due to a lack of sample data I created some.
library(ggplot2)
library(dplyr)
daytime <- rep(rep(c("day", "night"), each = 12),10)
df <- data.frame(date = 1:length(daytime), daytime, value = rnorm(length(daytime)))
> head(df)
date daytime value
1 1 day -0.7016900
2 2 day -0.5886091
3 3 day -0.1962264
4 4 day 1.3621115
5 5 day -1.5810459
6 6 day -0.6598885
Then I determined the start and end of each day and each night. For the sample data this would not be necessary but I guess the real data are not that simple.
period <- case_when(daytime != lead(daytime) & daytime == "day" ~ "endDay",
daytime != lead(daytime) & daytime == "night" ~ "endNight",
daytime != lag(daytime) & daytime == "day" ~ "beginDay",
daytime != lag(daytime) & daytime == "night" ~ "beginNight")
period[1] <- "beginDay"
period[length(period)] <- "endNight"
Combining both:
rect <- cbind(df,period)
names(rect)[1] <- "date"
and creating 2 dataframes, one for night and one for daytime with the corresponding x values of each period.
rect_night <- na.omit(rect[rect$daytime == "night", ])[ ,-2:-3]
rect_night <- data.frame(start = rect_night[rect_night$period == "beginNight", 1],
end = rect_night[rect_night$period == "endNight", 1])
rect_day <- na.omit(rect[rect$daytime == "day", ])
rect_day <- data.frame(start = rect_day[rect_day$period == "beginDay", 1],
end = rect_day[rect_day$period == "endDay", 1])
Putting alltogether in a plot.
ggplot(alpha = 0.3) +
geom_rect(data = rect_night,aes(xmin = start, xmax = end, ymin = -5, ymax = 5), fill = "grey") +
geom_rect(data = rect_day,aes(xmin = start, xmax = end, ymin = -5, ymax = 5), fill = "yellow") +
geom_line(data = df, aes(x = date, y = value))

It seems to have been a problem with the Date format. Changing it from 2015-02-15 01:13:00 format to a number (eg. 42050.05) worked great. Here's the code I used
ggplot() + geom_rect(data=tbdf, (aes(xmin=Date-0.5, xmax=Date+0.5,
ymin=min(Ta)-0.5, ymax=max(Tb)+0.5, fill=factor(Light)))) +
scale_fill_manual(values=c("grey", "white"), guide=FALSE) +
geom_line(data = tbdf, aes(x = Date, y = Tb), size=0.8) +
geom_line(data = tbdf, aes(x = Date, y = Ta), colour='grey50', size=0.8) +
ylab('Temperature (°C)')
Which gave me this
Which cleans up nicely.

Related

Change ggplot2 point color based on date occurring less than 4 weeks after previous date

I have an example dataframe composed of:
example dataframe
I have used ggplot2 to plot dates on the x-axis with a count on the y-axis:
df_ggplot <- read.csv("ggplot_ex.csv", header = T, na.strings = "", fileEncoding = "UTF-8-BOM")
df_ggplot$Date <- mdy(df_ggplot$Date)
df_ggplot$Ccount <- as.numeric(as.character(df_ggplot$Ccount))
ggplot(df_ggplot, aes(x=Date, y = Ccount)) +
geom_line() +
geom_point()
ggplot ex output
I am wanting points that occur less than 4 weeks after the previous point to turn red. Can anyone help? In this example, the second point would be red as it occurs about 2 weeks after the previous point.
You probably have to do the calculation in the dataframe before the plot (make sure your Date column is in the correct date format).
One option you can try:
df_ggplot <- df_ggplot %>%
mutate(time_diff = difftime(time1 = Date, time2 = lag(x = Date, n = 1), units = "weeks"),
is_red = as.factor(time_diff < 4))
will give you the points that must be flagged.
Date Ccount time_diff is_red
1 2019-08-17 20000 NA weeks <NA>
2 2019-08-30 15000 1.857143 weeks TRUE
3 2019-09-30 25000 4.285714 weeks FALSE
Then you can plot, using some the colors you want.
ggplot(df_ggplot, aes(x = Date, y = Ccount)) +
geom_line() +
geom_point(aes(color = is_red)) +
scale_color_manual(values = c("black", "red"), na.value = "black")

Connect geom_line only between specified factors

I have a dataset that has diameter values for 4 treatment groups for several different months. I am plotting Diameter ~ Treatment for each month, as well as the Diameter changes between months ~ Treatment.
Dataset looks like this:
# the data that contains diameter for each month and diameter differences between months
> head(gatheredDiameterAndTreatmentData)
Treatment Month Diameter
1 Aux_Drop Diameter_mm.Sep01 55.88
2 Aux_Spray Diameter_mm.Sep01 63.50
3 DMSO Diameter_mm.Sep01 66.04
4 Water Diameter_mm.Sep01 43.18
5 Aux_Drop Diameter_mm.Sep01 38.10
6 Aux_Spray Diameter_mm.Sep01 76.20
# data that contains mean diameter and mean diameter changes for each month
> head(subMeansDiameter)
Treatment Month Diameter SEdiam
1 Aux_Drop Diameter_mm.Dec 83.63857 29.62901
2 Aux_Drop Diameter_mm.Feb01 101.20923 24.84024
3 Aux_Drop Diameter_mm.Feb02 110.00154 22.51364
4 Aux_Drop Diameter_mm.Jan 93.00308 25.13485
5 Aux_Drop Diameter_mm.Mar 116.84000 22.19171
6 Aux_Drop Diameter_mm.Nov01 74.50667 17.40454
Here is my code:
# assign the factors name to pick
factorsOnXaxis.DiameterByMonth = c(
"Diameter_mm.Sep01", "DiameterDiff.Sep01ToDec", "Diameter_mm.Dec", "DiameterDiff.DecToMar", "Diameter_mm.Mar")
# assign name to above factors
factorsOnXaxisName = c('Sep','Dec-Sep','Dec', 'Mar-Dec', 'Mar')
# start plotting
gatheredDiameterAndTreatmentData %>%
subset(Diameter != "NA") %>%
ggplot(aes(x = factor(Month), y = Diameter)) +
geom_point(aes(colour = Treatment), na.rm = TRUE,
position = position_dodge(width = 0.2)) +
geom_point(data = subMeansDiameter, size = 4, aes(colour = Treatment),
na.rm = TRUE, position = position_dodge(width = 0.2)) +
theme_bw() + # remove background
# add custom color to the "Treatment" levels
scale_colour_manual(
values = c("Aux_Drop" = "Purple", "Aux_Spray" = "Red",
"DMSO" = "Orange", "Water" = "Green")) +
# rearrange the x-axis
scale_x_discrete(limits = factorsOnXaxis.DiameterByMonth, labels = factorsOnXaxisName) +
# to connect the "subMeans - Diameter" values across time points
geom_line(data = subMeansDiameter, aes(
x = Month, y = Diameter, group = Treatment, colour = Treatment),
position = position_dodge(width = 0.2))
Which gives me a plot like this:
Instead of geom_line connecting line for each time points I want the line to be joined between specified x-axis factors, i.e
between Sep, Dec, March
between Dec-Sep to Mar-Dec
I tried to manipulate the code line that uses geom_line as:
geom_line(data = subMeansDiameter, aes(
x = c("DiameterDiff.Sep01ToDec", "DiameterDiff.DecToMar"), y = Diameter, group = Treatment, colour = Treatment),
position = position_dodge(width = 0.2))
to connect the line between Dec-Sep to Mar-Dec.
But, this is not working. How can I change my code?
Here is the data file I stores as *.tsv.
gatheredDiameterAndTreatmentData = http://s000.tinyupload.com/index.php?file_id=38251290073324236098
subMeans = http://s000.tinyupload.com/index.php?file_id=93947954496987393129
Here you need to define groups explicitly as color is not enough.
Your example is not reproducible but here's something that will give you the idea, here's a plot with no explicit group:
ggplot(iris,aes(Sepal.Width, Sepal.Length, color = Species)) + geom_line()
And now here's one with a group aesthetic, I have split the data using Sepal.Length's values but you'll most likely use an ifelse deending on the month :
ggplot(iris,aes(Sepal.Width, Sepal.Length, color = Species,
group = interaction(Species, Sepal.Length > 5.5))) +
geom_line()

Scale the x-axes with quarterly date format

I created a plot in R using the ggplot library:
library(ggplot2)
ggplot(df, aes(x = yQ, y = value, group =1)) +
geom_line(aes(color = variable), size = 1) +
scale_color_manual(values = c("#00AFBB", "#E7B800"))
I got the plot that I want but the only problem is that variable, yQ values have the format:
1990Q1
1900Q2
1990Q3
1990Q4
......
......
2017Q1
2017Q2
2017Q3
2017Q4
and because there are many years, the x-axis label cannot show all the dates clearly (they overlapped).
Therefore, I want the x-axis label to show only Q1 and Q3 for every 5 years.
So I want the x-axis to be something like this:
1990Q1 1990Q3 1995Q1 1995Q3 ...... 2015Q1 2015Q3
I tried to use scale_x_date but my dates are not in date format (e.g. 1990Q1) and therefore this does not work. How can I fix it?
The question does not provide reproducible input but using df from the Note below with the autoplot.zoo method of ggplot's autoplot generic we can write:
library(ggplot2)
library(zoo)
z <- read.zoo(df, index = "yQ", FUN = as.yearqtr)
autoplot(z) + scale_x_yearqtr()
Note
Test input--
df <- data.frame(yQ = c("1990Q1", "1990Q2", "1990Q3", "1990Q4"), value = 1:4)
The zoo::format.yearqtr() function is quite easy to use with ggplot2.
Try
scale_x_date(labels = function(x) zoo::format.yearqtr(x, "%YQ%q"))
Use function zoo::as.yearqtr (zoo package) to work with quarterly dates.
Generate example data:
year <- 1990:2000
quar <- paste0("Q", 1:4)
foo <- as.vector(outer(year, quar, paste0))
data <- data.frame(dateQ = foo, Y = rnorm(length(foo)))
head(data)
dateQ Y
1 1990Q1 -0.09944705
2 1991Q1 0.14493910
3 1992Q1 0.54856787
4 1993Q1 1.12966224
5 1994Q1 -0.93539302
6 1995Q1 0.24772265
Transform quarterly date to "normal" date:
data$dateNorm <- as.Date(zoo::as.yearqtr(data$dateQ))
head(data)
dateQ Y dateNorm
1 1990Q1 -0.09944705 1990-01-01
2 1991Q1 0.14493910 1991-01-01
3 1992Q1 0.54856787 1992-01-01
4 1993Q1 1.12966224 1993-01-01
5 1994Q1 -0.93539302 1994-01-01
6 1995Q1 0.24772265 1995-01-01
It sets Q1/2/3/4 as the first day of January/April/July/October.
data[grep("1991", data$dateQ), ]
dateQ Y dateNorm
2 1991Q1 0.1449391 1991-01-01
13 1991Q2 1.5878678 1991-04-01
24 1991Q3 -0.1071823 1991-07-01
35 1991Q4 2.2905729 1991-10-01
Now you can plot it or perform other calculations as it's in Date format.
library(ggplot2)
ggplot(data, aes(dateNorm, Y)) +
geom_line()
You can
manipulate x-axis breaks and labels with scale_x_discrete(breaks = ..., labels = ...)
change the angle of text with theme(axis.text.x = element_text(angle = ...))
I generated some data
Combs <- expand.grid(1990:2017, c("Q1", "Q2", "Q3", "Q4"))
df <- data.frame(
yQ = sort(apply(Combs, 1, paste, collapse="")),
value = runif(112)
)
In the first example, I subset yQ values you want with a logical vector - and change the angle of text
library(ggplot2)
pattern <- c(T, F, T, F, rep(F, 16))
ggplot(df, aes(x = yQ, y = value, group =1)) +
geom_line(aes(color = "red"), size = 1) +
scale_x_discrete(breaks = df$yQ[pattern], labels = df$yQ[pattern]) +
theme(axis.text.x = element_text(angle=90))
But notice that ticks marks not specified by break are not shown - so the alternative is to copy yQ values into a vector and make non-relevant years = ""
xVec <- as.character(df$yQ)
xVec[pattern==F] <- ""
ggplot(df, aes(x = yQ, y = value, group =1)) +
geom_line(aes(color = "red"), size = 1) +
scale_x_discrete(breaks = df$yQ, labels = xVec) +
theme(axis.text.x = element_text(angle=90))

R ggplot set colour for specific value [duplicate]

This question already has an answer here:
Manually setting group colors for ggplot2
(1 answer)
Closed 6 years ago.
I'm doing multiple plots split by one variable and in each plot, colour code based on another variable.
set.seed(12345)
dates = seq(as.Date("2000-01-01"), as.Date("2016-01-01"), by = 1)
dd = data.table(date = dates, value = rnorm(length(dates)))
dd[, year := lubridate::year(date)]
dd[, c := cut(value, c(-Inf, -3, 3, Inf))]
for (thisyear in 2000:2015) {
ggplot(dd[year == thisyear]) +
geom_ribbon(aes(x = date, ymin = -Inf, ymax = Inf, fill = c), alpha = 0.1)
}
dd[, length(unique(c)), by = year]
year V1
1: 2000 1
2: 2001 2
3: 2002 2
4: 2003 3
5: 2004 3
....
Now the colour in different plots will be inconsistent since not every year has the same length of unique cut values. Even worse is when one year has all (-Inf,3] values (unlikely here of course) and another year has all [3,Inf) values, they will both be coloured red in two plots.
How can I specify that (-Inf, 3] always take blue and (-3,3] always take green?
One way to manually specify the colors to use, would be to simply create a column in your data frame specifying the plot color to use.
For example:
# scatter plot
dd$color <- ifelse(dd$value <= 3, 'blue', 'green')
ggplot(dd, aes(date, value)) + geom_point(colour=dd$color)
# ribbon plot
thisyear <- '2001'
dd_year <- dd[year == thisyear,]
ggplot(dd_year, aes(date, group=color, colour=color)) +
geom_ribbon(aes(ymin=value - 1, ymax=value + 1, fill=color), alpha=0.5) +
scale_fill_manual(values=unique(dd_year$color)) +
scale_color_manual(values=unique(dd_year$color))
This would result in all points <= 3 being colored blue, and the remaining ones green.
Not the most interesting example perhaps since there is only only data point that gets colored green here, but it should look like this:
You can create a named vector of colors to pass to scale_fill_manual. This allows you to choose the colors of each group as well as ensuring that each plot has the same colors among groups.
colors = c("blue", "green", "red")
names(colors) = levels(dd$c)
(-Inf,-3] (-3,3] (3, Inf]
"blue" "green" "red"
Now the same plot, but with scale_fill_manual added.
for (thisyear in 2000:2015) {
print(ggplot(dd[year == thisyear]) +
geom_ribbon(aes(x = date, y = value, ymin = -Inf, ymax = Inf, fill = c), alpha = 0.1) +
scale_fill_manual(values = colors))
}

Bar plot with variable bar widths as date ranges on the x-axis

I wish to make a bar graph where the response variable (weight change) is measured over time periods of different length, defined by a start and an end date. The width of the bars should correspond to the length of the period. A small example of my data:
wtchange.data <- structure(list(start.date = structure(1:3, .Label = c("2015-04-01",
"2015-04-15", "2015-04-30"), class = "factor"), end.date = structure(1:3, .Label = c("2015-04-15",
"2015-04-30", "2015-05-30"), class = "factor"), wtchange = c(5L,
10L, 15L), se = c(1.2, 2.5, 0.8)), .Names = c("start.date", "end.date",
"wtchange", "se"), class = "data.frame", row.names = c(NA, -3L
))
wtchange.data
# start.date end.date wtchange se
# 1 2015-04-01 2015-04-15 5 1.2
# 2 2015-04-15 2015-04-30 10 2.5
# 3 2015-04-30 2015-05-30 15 0.8
wtchange.data$start.date <- as.Date(wtchange.data$start.date)
wtchange.data$end.date <- as.Date(wtchange.data$end.date)
Attempting to use geom_bar:
library(ggplot2)
ggplot(wtchange.data, aes(x = start.date, y = wtchange)) +
geom_bar(stat = "identity", color = "black") +
geom_errorbar(aes(ymin = wtchange-se, ymax = wtchange+se), width = 1)
(not allowed >2 links with <10 reputation, so can unfortunately not show the first plot)
The main problem is that when aesthetics of the plot area are defined (x = start.date, y = wtchange), I can use only one variable (start.date in this example) for the x-axis, but I really need to somehow use both start.date and end.date to delimit bar widths corresponding to each period. The graph should look something like this (drawn in Paint):
A secondary problem is that the bars should touch without gaps, but I am not sure if it is even possible, given that the bars have to be of different widths, so you cannot set one bar width for all bars. Would it be possible to set width for each bar manually?
Edit:
Thank you Henrik for the links. I have made some further progress.
I calculated date midpoints for centering the bars at:
wtchange.data$date.midpoint <- wtchange.data$start.date +
(wtchange.data$end.date - wtchange.data$start.date)/2
And then calculated period lengths for using as bar widths:
wtchange.data$period.length <- wtchange.data$end.date - wtchange.data$start.date
The updated graph code is now:
ggplot(wtchange.data, aes(x = date.midpoint, y = wtchange)) +
geom_bar(stat = "identity", color = "black", width = wtchange.data$period.length) +
geom_errorbar(aes(ymin = wtchange-se, ymax = wtchange+se), width = 1)
The only problem remaining is that there still is a small gap between bars in one place. I guess this is due to the way R rounds date difference calculation to the nearest number of days?
You are right: it's the calculation of difference between end and start dates which is the reason for the gap. We need to use numeric periods instead of difftime (see explanation below) when calculating the width and the midpoint.
# length of periods, width of bars as numeric
df$width <- as.numeric(df$end.date - df$start.date)
# mid-points
df$mid <- df$start.date + df$width / 2
# dates for breaks
dates <- unique(c(df$start.date, df$end.date))
ggplot(df, aes(x = mid, y = wtchange)) +
geom_bar(stat = "identity", color = "black", width = df$width) +
geom_errorbar(aes(ymin = wtchange - se, ymax = wtchange + se), width = 1) +
scale_x_date(breaks = dates)
Corresponding geom_rect code:
# mid-points
df$mid <- df$start.date + as.numeric(df$end.date - df$start.date) / 2
# dates for breaks
dates <- unique(c(df$start.date, df$end.date))
ggplot(df, aes(x = mid, y = wtchange)) +
geom_rect(aes(xmin = start.date, xmax = end.date, ymin = 0, ymax = wtchange), color = "black") +
geom_errorbar(aes(ymin = wtchange - se, ymax = wtchange + se), width = 1) +
scale_x_date(breaks = dates)
And slightly less ink demanding with geom_step:
# need to add an end date to the last period
df2 <- tail(df, 1)
df2$start.date <- df2$end.date
df2 <- rbind(df, df2)
# mid-points
df$mid <- df$start.date + as.numeric(df$end.date - df$start.date) / 2
ggplot() +
geom_step(data = df2, aes(x = start.date, y = wtchange)) +
geom_errorbar(data = df, aes(x = mid, ymin = wtchange - se, ymax = wtchange + se), width = 1) +
scale_x_date(breaks = dates) +
ylim(0, 16) +
theme_bw()
On the "difftime issue":
Values of class Date can be represented internally as fractional days (see ?Date and ?Ops.Date; try: Sys.Date(); Sys.Date() + 0.5; Sys.Date() + 0.5 + 0.5). However, when adding a difftime object to a Date, the difftime object is rounded the nearest whole day (see x argument in ?Ops.Date).
Let's check the calculations using your start date 2015-04-15 and end date 2015-04-30:
mid <- (as.Date("2015-04-30") - as.Date("2015-04-15")) / 2
mid
# Time difference of 7.5 days
str(mid)
# Class 'difftime' atomic [1:1] 7.5
# ..- attr(*, "units")= chr "days"
# calculate the midpoint using the difftime object
as.Date("2015-04-15") + mid
# [1] "2015-04-23"
# calculating midpoint using numeric object yields another date...
as.Date("2015-04-15") + as.numeric(mid)
# [1] "2015-04-22"
# But is "2015-04-15" above in fact fractional, i.e. "2015-04-22 point 5"?
# Let's try and add 0.5
as.Date("2015-04-15") + as.numeric(mid) + 0.5
# [1] "2015-04-23"
# Yes.
Thus, we use the numeric period, instead of the difftime period.

Resources