I am having a problem aligning my graphics using facet_wrap(). I have multiple years of data but I am subsetting to display only two years. For some unknown reason the upper plot is shifted to the left and the lower plot is shifted to the right (See attached snapshot). My dataset is very large to post here so it can be downloaded from the link below if someone is willing to help me to align the plots. https://login.filesanywhere.com/fs/v.aspx?v=8c6b678a5c61707ab0ae
Here is the snapshot:
This is what I have tried:
library(writexl)
library(readxl)
library(tidyverse)
library(lubridate)
mydat <- read_excel("all.xlsx", sheet="Sheet1")
#subset 2 years
start <- 1998
end <- 1999
a <- dplyr::filter(mydat, year %in% start:end)
ggplot(a,aes(date,Salinity,color=Box)) +
geom_line(size=.8) +
theme(legend.position = 'none') +
scale_x_date(date_breaks = "1 month",date_labels = "%b",expand=c(0,0.5)) +
labs(x="",y= "test") +
facet_wrap(~year,ncol=1)
I will be subsetting up to 10 years in the future. I am also wondering what's the best to subset multiple years using dplyr or base. Thanks beforehand.
I get the following error after trying your suggestion:
subset 2 years
start <- 1998
end <- 2000
a <- mydat %>%
dplyr::filter(year %in% start:end) %>% # if you need to subset the years
mutate(date = as.Date(gsub("\\d{4}", "0000", date)))
ggplot(a,aes(date, Salinity,color=Box)) +
geom_line(size=.8) +
scale_x_date(date_breaks = "1 month",date_labels = "%b",expand=c(0,0.5)) +
labs(x="",y= "test") +
facet_wrap(~year,ncol=1,scales="free_x")
Error in charToDate(x) :
character string is not in a standard unambiguous format
Maybe 'date' doesn't like the zeros?
Question: Does it work for you using the dataset from the link?
The plot from the OP appears shifted because the date column in the data contains separate years. Thus, the ggplot function is displaying those dates chronologically along the x-axis, spanning years.
A solution is to change the year in all the dates in the date column to be the same (e.g., "1111"), while the actual year information is retained in the year column. The years in the year column is used to stratify the plot.
# example data
mydat <- data.frame(
date = as.Date(
c("1998-10-1", "1998-11-1", "1999-10-1", "1999-11-1", "1999-12-1")),
value = 1:5,
year = c(1998, 1998, 1999, 1999, 1999))
mydat
# date value year
#1 1998-10-01 1 1998
#2 1998-11-01 2 1998
#3 1999-10-01 3 1999
#4 1999-11-01 4 1999
#5 1999-12-01 5 1999
# solution by changing the year in all the dates the same
library(ggplot2)
mydat$date <- as.Date(gsub("\\d{4}", "1111", mydat$date))
ggplot(mydat, aes(date, value)) +
geom_line(size=.8) +
scale_x_date(date_breaks = "1 month",date_labels = "%b",expand=c(0,0.5)) +
labs(x="",y= "test") +
facet_wrap(~year, ncol = 1)
In gsub("\\d{4}", "1111", mydat$date), a regular expression is used for the pattern. It finds four consecutive digits and substitutes them with "1111".
EDIT
If you don't want to make a hard change to the actual date data, you can use pipe operators (%>%) and other functions from the dplyr package:
library(ggplot2)
library(dplyr) # or library(tidyverse) works, too
# subset 2 years
start <- 1998
end <- 1999
mydat %>%
filter(year %in% start:end) %>% # if you need to subset the years
mutate(date = as.Date(gsub("\\d{4}", "1111", date))) %>%
ggplot(aes(date, value)) +
geom_line(size = 0.8) +
scale_x_date(date_breaks = "1 month", date_labels = "%b", expand = c(0,0.5)) +
labs(x = "", y = "test") +
facet_wrap(~year, ncol = 1)
UPDATE NOTE
I noticed that the year 0000 (or with just 0) or any year that is 9999+ do not work in the OP's dataset. The year may have to be an integer between 1 and 9998. As long as the arbitrary year is the same throughout the date column, the plot will work as intended.
Related
Similar to this question: Split up time series per year for plotting which has done in Python, I want to display the daily time series as multiple lines by year. How can I achieve this in R?
library(ggplot2)
library(dplyr)
# Dummy data
df <- data.frame(
day = as.Date("2017-06-14") - 0:364,
value = runif(365) + seq(-140, 224)^2 / 10000
)
# Most basic bubble plot
p <- ggplot(df, aes(x=day, y=value)) +
geom_line() +
xlab("")
p
Out:
One solution is using ggplot2, but date_labels are displayed incorrectly:
library(tidyverse)
library(lubridate)
p <- df %>%
# mutate(date = ymd(date)) %>%
mutate(date=as.Date(date)) %>%
mutate(
year = factor(year(date)), # use year to define separate curves
date = update(date, year = 1) # use a constant year for the x-axis
) %>%
ggplot(aes(date, value, color = year)) +
scale_x_date(date_breaks = "1 month", date_labels = "%b")
# Raw daily data
p + geom_line()
Out:
Alternative solution is to use gg_season from feasts package:
library(feasts)
library(tsibble)
library(dplyr)
tsibbledata::aus_retail %>%
filter(
State == "Victoria",
Industry == "Cafes, restaurants and catering services"
) %>%
gg_season(Turnover)
Out:
References:
Split up time series per year for plotting
R - How to create a seasonal plot - Different lines for years
If you want your x axis to represent the months from January to February, then perhaps getting the yday of the date and adding it to the first of January on a random year would be simplest:
library(tidyverse)
library(lubridate)
df <- data.frame(
day = as.Date("2017-06-14") - 0:364,
value = runif(365) + seq(-140, 224)^2 / 10000
)
df %>%
mutate(year = factor(year(day)), date = yday(day) + as.Date('2017-01-01')) %>%
ggplot(aes(date, value, color = year)) +
geom_line() +
scale_x_date(breaks = seq(as.Date('2017-01-01'), by = 'month', length = 12),
date_labels = '%b')
Created on 2023-02-07 with reprex v2.0.2
I tend to think simple is better:
transform(df, year = format(day, "%Y")) |>
ggplot(aes(x=day, y=value, group=year, color=year)) +
geom_line() +
xlab(NULL)
optionally removing the year legend with + guides(colour = "none").
I am trying to plot average daily trip counts by month. However, I am struggling in finding how I can only include the mean number of trips per day by month in the plot instead of the total monthly trips.
The days of the week and months have already been converted from numeric type to abbreviations and have also been ordered (type: ).
Here's what I've tried for the plot.
by_day <- df_temp %>%
group_by(Start.Day)
ggplot(by_day, aes(x=Start.Month,
fill=Start.Month)) +
geom_bar() +
scale_fill_brewer(palette = "Paired") +
labs(title="Number of Daily Trips by Month",
x=" ",
y="Number of Daily Trips")
Here's the plot I am trying to replicate:
You are almost there. Since you did not share a reproducible example, I simulate your data. You may need to adapt the variable naming and/or correct my assumptions.
{lubridate} is a powerful package for date-time crunching. It comes handy when working with dates and binning dates for summaries, etc.
# simulating your data
## a series of dates from June through October
days <- seq(from = lubridate::ymd("2020-06-01")
,to = lubridate::ymd("2020-10-30")
,by = "1 day")
## random trips on each day
set.seed(666)
trips <- sample(2000:5000, length(days), replace = TRUE)
# putting things together in a data frame
df_temp <- data.frame(date = days, counts = trips) %>%
# I assume the variable Start.Month is the monthly bin
# let's use lubridate to "bin" the month from the date
mutate(Start.Month = lubridate::floor_date(date, unit = "month"))
# aggregate trips for each month, calculate average daily trips
by_month <- df_temp %>%
group_by(Start.Month) %>% # group by the binning variable
summarise(Avg.Trips = mean(counts)) # calculate the mean for each group
ggplot( data = by_month
, aes(x = Start.Month, y = Avg.Trips
, fill=as.factor(Start.Month)) # to work with a discrete palette, factorise
) +
# ------------ bar layer -----------------------------------------
## instead of geom_bar(... stat = "identity"), you can use geom_col()
## and define the fill colour
geom_col() +
scale_fill_brewer(palette = "Paired") +
# ------------ if you like provide context with annotation -------
geom_text(aes(label = Avg.Trips %>% round(2)), vjust = 1) +
# ------------ finalise plot with labels, theme, etc.
labs(title="Number of Daily Trips by Month",
x=NULL, # setting an unused lab to NULL is better than printing empty " "!
y="Number of Daily Trips"
) +
theme_minimal() +
theme(legend.position = "none") # to suppress colour legend
I have a data frame like this (in date order):
freq date
3 Jan-18
2 Feb-18
42 Mar-18
2 Apr-18
4 May-18
However, when I plot this with the following code, it doesn't order by the order saved in the data frame. Instead it plots them in alphabetical order (see x-axis). How can this be fixed so that the plot is done in the order saved in the data frame?
Note that the date column is of type character which is likely why, but changing this to date format is tricky since there is no day, and when you do so it changes e.g. Jun-18 to 01-1918-06, which doesn't look nice on a graph. So, I'm trying to do this without changing it to date format if possible.
ggplot(df, aes(x = date, y = freq)) +
geom_point()
1) Assuming the data shown reproducibly in the Note at the end convert the data to a zoo series with yearmon index (which can represent a year and month without a day) in which case it is straight-forward using autoplot.zoo . Omit the geom argument if you want a line plot.
library(ggplot2)
library(zoo)
z <- read.zoo(df, index = "date", FUN = as.yearmon, format = "%b-%y")
autoplot(z, geom = "point") + scale_x_yearmon()
2) This also works:
library(dplyr)
library(ggplot2)
library(zoo)
df %>%
mutate(date = as.yearmon(date, format = "%b-%y")) %>%
ggplot(aes(date, freq)) + geom_point() + scale_x_yearmon()
Note
Lines <- "
freq date
3 Jan-18
2 Feb-18
42 Mar-18
2 Apr-18
4 May-18"
df <- read.table(text = Lines, header = TRUE)
Another way and if data is showed as in example could be:
library(dplyr)
#Code
df %>%
mutate(date=factor(date,levels = unique(date),ordered = T)) %>%
ggplot(aes(x=date,y=freq))+
geom_point()
Output:
Or formatting the date variable:
#Code2
df %>%
mutate(date=as.Date(paste0(date,'-01'),'%b-%y-%d')) %>%
ggplot(aes(x=date,y=freq))+
geom_point()+
scale_x_date(date_labels = '%b-%y')+
ggtitle('My title')
Output:
Some data used:
#Data
structure(list(freq = c(3L, 2L, 42L, 2L, 4L), date = c("Jan-18",
"Feb-18", "Mar-18", "Apr-18", "May-18")), class = "data.frame", row.names = c(NA,
-5L))
If you don't want to rely on the zoo package, you could simply pick a year (e.g. 2021) and the conversion of the date column in your example works fine. You can then specify how the date is displayed in ggplot2's scale_x_date(). Here is how it looks like.
library(ggplot2)
df <- read.table(header = T, text = "
freq date
3 Jan-18
2 Feb-18
42 Mar-18
2 Apr-18
4 May-18")
df$date <- as.Date(paste0(df$date, "-2021"), format = "%B-%d-%Y")
ggplot(df, aes(date, y = freq)) +
geom_point() +
theme_bw() +
labs(x = "Date", y = "Frequency") +
scale_x_date(date_breaks = "2 weeks", date_labels = "%d-%b") +
theme(axis.text.x = element_text(angle = 45, vjust = 0.5))
I have county level data recording the year an invasive insect pest was first detected in that county between 2002 and 2018. I created a map using ggplot2 and the maps package that fills the county polygons with a color according to the year the pest was detected.
**Is there a way to use the gganimate package to animate this map with the first frame filling in only polygons with a detection date of 2002, the second frame filling polygons with a detection date of 2003 or earlier (so 2002 and 2003), a third frame for detection dates of 2004 or earlier (2002, 2003, 2004), etc.? **
Clarification: I'd like it so all the county polygons are always visible and filled in with white initially and each frame of the animation adds fills in counties based on the year of detection.
I've tried using the transition_reveal(data$detect_year) with the static plot but get an error that "along data must either be integer, numeric, POSIXct, Date, difftime, orhms".
Here's some code for a reproducible example:
library(dplyr)
library(purrr)
library(maps)
library(ggplot2)
library(gganimate)
# Reproducible example
set.seed(42)
map_df <- map_data("county") %>%
filter(region == "minnesota")
map_df$detection_year <- NA
# Add random detection year to each county
years <- 2002:2006
map_list <- split(map_df, f = map_df$subregion)
map_list <- map(map_list, function(.x) {
.x$detection_years <- mutate(.x, detection_years = sample(years, 1))
})
# collapse list back to data frame
map_df <- bind_rows(map_list)
map_df$detection_years <- as.factor(map_df$detection_years)
# Make plot
static_plot <- ggplot(map_df,
aes(x = long,
y = lat,
group = group)) +
geom_polygon(data = map_df, color = "black", aes(fill = detection_years)) +
scale_fill_manual(values = terrain.colors(n = length(unique(map_df$detection_years))),
name = "Year EAB First Detected") +
theme_void() +
coord_fixed(1.3)
animate_plot <- static_plot +
transition_reveal(detection_years)
If it's possible to do this with gganimate, I'd like to but I'm also open to other solutions if anyone has ideas.
After getting an answer from #RLave that almost did what I wanted and spending a little time with the documentation, I was able to figure out a way to do what I want. It doesn't seem very clean, but it works.
Essentially, I created a copy of my data frame for each year that needed a frame in the animation. Then for each year of detection I wanted to animate, I edited the detection_year variable in that copy of the data frame so that any county that had a detection in the year of interest or earlier retained their values and any county that had no detection yet was converted to the value I plotted as white. This made sure all the counties were always plotted. Then I needed to use transition_manual along with a unique ID I gave to each copy of the original data frame to determine the order of the animation.
library(dplyr)
library(purrr)
library(maps)
library(ggplot2)
library(gganimate)
# Reproducible example
set.seed(42)
years <- 2002:2006
map_df <- map_data("county") %>%
filter(region == "minnesota")
map_df <- map_df %>%
group_by(subregion) %>%
mutate(detection_year = sample(years,1))
animate_data <- data.frame()
for(i in 2002:2006){
temp_dat <- map_df %>%
mutate(detection_year = as.numeric(as.character(detection_year))) %>%
mutate(detection_year = case_when(
detection_year <= i ~ detection_year,
detection_year > i ~ 2001
),
animate_id = i - 2001
)
animate_data <- bind_rows(animate_data, temp_dat)
}
animate_data$detection_year <- as.factor(as.character(animate_data$detection_year))
# Make plot
static_plot <- ggplot(animate_data,
aes(x = long,
y = lat,
group = group)) +
geom_polygon(data = animate_data, color = "black", aes(fill = detection_year)) +
scale_fill_manual(values = c("white",
terrain.colors(n = 5)),
name = "Year First Detected") +
theme_void() +
coord_fixed(1.3) #+
facet_wrap(~animate_id)
animate_plot <- static_plot +
transition_manual(frames = animate_id)
animate_plot
Possibily this, but I'm not sure that this is the expected output.
I changed your code, probably you don't need to split. I used group_by to assign a year to each region.
set.seed(42)
years <- 2002:2006
map_df <- map_data("county") %>%
filter(region == "minnesota")
map_df <- map_df %>%
group_by(subregion) %>%
mutate(detection_year = sample(years,1))
For the transition you need to define the id, here the same as the grouping (subregion or group) and a correct date format for the transition (along) variable (I used lubridate::year())
# Make plot
static_plot <- ggplot(map_df,
aes(x = long,
y = lat,
group = group)) +
geom_polygon(color = "black", aes(fill = as.factor(detection_year))) +
scale_fill_manual(values = terrain.colors(n = length(unique(map_df$detection_year))),
name = "Year EAB First Detected") +
theme_void() +
coord_fixed(1.3)
animate_plot <- static_plot +
transition_reveal(subregion, # same as the group variable
lubridate::year(paste0(detection_year, "-01-01"))) # move along years
Does this do it for you?
Building on this question and the use of "water year" in R I have question regarding plotting in ggplot2 with a common date axis over many years. A water year is definitely the start of the year to be October 1st ending September 30. It is a little more sensible for the hydrological cycle.
So say I have this data set:
library(dplyr)
library(ggplot2)
library(lubridate)
df <- data.frame(Date=seq.Date(as.Date("1910/1/1"), as.Date("1915/1/1"), "days"),
y=rnorm(1827,100,1))
Then here is the wtr_yr function:
wtr_yr <- function(dates, start_month=10) {
# Convert dates into POSIXlt
dates.posix = as.POSIXlt(dates)
# Year offset
offset = ifelse(dates.posix$mon >= start_month - 1, 1, 0)
# Water year
adj.year = dates.posix$year + 1900 + offset
# Return the water year
adj.year
}
What I would like to do is use colour as a grouping variable, then make a x axes that only consists of month and date information. Usually I've done like so (using the lubridate package):
ymd(paste0("1900","-",month(df$Date),"-",day(df$Date)))
This works fine if year is arranged normally. However in this water year scenario, the real year span the water year. So ideally I'd like a plot that goes from October 1 to September 30 and plot separate lines for each water year maintaining all the correct water years. Here is where I am so far:
df1 <- df %>%
mutate(wtr_yrVAR=factor(wtr_yr(Date))) %>%
mutate(CDate=as.Date(paste0("1900","-",month(Date),"-",day(Date))))
df1 <- %>%
ggplot(aes(x=CDate, y=y, colour=wtr_yrVAR)) +
geom_point()
So plotting that obviously date spans from Jan to Dec. Any ideas how I can force ggplot2 to plot these along the water year lines?
Here is a method that works:
df3 <- df %>%
mutate(wtr_yrVAR=factor(wtr_yr(Date))) %>%
#seq along dates starting with the beginning of your water year
mutate(CDate=as.Date(paste0(ifelse(month(Date) < 10, "1901", "1900"),
"-", month(Date), "-", day(Date))))
Then:
df3 %>%
ggplot(., aes(x = CDate, y = y, colour = wtr_yrVAR)) +
geom_point() +
scale_x_date(date_labels = "%b %d")
Which gives:
not very elegant but this should work:
df1 <- df %>%
mutate(wtr_yrVAR=factor(wtr_yr(Date))) %>%
mutate(CDdate= as.Date(as.numeric(Date - as.Date(paste0(wtr_yrVAR,"-10-01"))), origin = "1900-10-01"))
df1 %>% ggplot(aes(x =CDdate, y=y, colour=wtr_yrVAR)) +
geom_line() + theme_bw()+scale_x_date(date_breaks = "1 month", date_labels = "%b", limits = c(as.Date("1899-09-30"),as.Date("1900-10-01")))+theme_bw()