I have seen many questions in SO on converting daily data to weekly using xts, zoo or lubridate packages. None of the answers was found appropriate for my problem. I have tried the following code
library(zoo)
library(lubridate)
library(xts)
library(tidyverse)
#Calculation for multistation
set.seed(123)
df <- data.frame("date"= seq(from = as.Date("1970-1-1"), to = as.Date("2000-12-31"), by = "day"),
"Station1" = runif(length(seq.Date(as.Date("1970-1-1"), as.Date("2000-12-31"), "days")), 10, 30),
"Station2" = runif(length(seq.Date(as.Date("1970-1-1"), as.Date("2000-12-31"), "days")), 11, 29),
"Station3" = runif(length(seq.Date(as.Date("1970-1-1"), as.Date("2000-12-31"), "days")), 9, 28))
head(df)
# Aggregate over week
df %>%
mutate(Week = week(ymd(date)),
Year = year(ymd(date))) %>%
pivot_longer(-c(Week, date, Year), values_to = "value", names_to = "Station") %>%
group_by(Year, Week, Station) %>%
summarise(Weekly = mean(value)) %>%
arrange(Station) %>%
print(n = 55)
From the output you can see that 1970 cotains 53 weeks which I don't want. I want to start the week from the first date of every year and the 52nd week should have 8 days in a nonleap year and in case of leap years 9th and 52nd week should have 8 days so that every year contains 52 weeks only. How to do that in R?
Why not just write a function that gives the meteorological week from the definition you gave? Package lubridate will give you the day of the year with yday, which can act as the index for a vector of the correct week labels. These are straightforward to construct with simple modular math and concatenation.
You then only need to figure out if you are in a leap year, which again is possible using lubridate::leap_year. Combine these in an ifelse and you have an easy-to-use function:
met_week <- function(dates)
{
normal_year <- c((0:363 %/% 7 + 1), 52)
leap_year <- c(normal_year[1:59], 9, normal_year[60:365])
year_day <- lubridate::yday(dates)
return(ifelse(lubridate::leap_year(dates), leap_year[year_day], normal_year[year_day]))
}
and you can do
df %>% mutate(week = met_week(date))
You could just do it manually on the day of the year, not sure there is a function already built for that.
df %>%
mutate(Week = pmin(52, ceiling(yday(date) / 7)),
Year = year(ymd(date)))
Related
let's say I have a list of dates from March 1st to July 15th:
daterange = as.data.frame(seq(as.Date("2020-3-1"), as.Date("2020-7-15"), "days"))
I want to group the dates by 1-15 and 16-30/31 for each month. So the dates in March will be separated into two groups: Mar 1-15 and Mar 16-31. Then keep doing this for every month.
I know the lubridate package can sort by week, but I don't know how to set a custom range.
Thanks
We can create a logical vector on day as well as a group on yearmon
library(dplyr)
library(zoo)
library(lubridate)
library(stringr)
daterange2 <- daterange %>%
set_names('Date') %>%
group_by(yearmon = as.yearmon(Date),
Daygroup = (day(Date) > 15) + 1) %>%
mutate(Label = str_c(format(Date, '%b'),
str_c(min(day(Date)), max(day(Date)), sep='-'), sep= ' '))
Using base R, you can create two groups in each month by pasting the month value from each date and assign value 1/2 based on the date.
newdaterange <- transform(daterange, group = paste0(format(date, "%b"), '-group-',
ifelse(as.integer(format(date, "%d")) > 15, 1, 2)))
I'm trying to back into a fake birthdate based on the age of a consumer. I'm using lubridate package. Here is my code:
ymd(today) - years(df$age) - months(sample(1:12, 1)) - days(sample(1:31, 1)).
I want to use this to generate a different dob that equals the age. When I run this inline it gives every row the same month and day and different year. I want the month and day to vary as well.
You can make a date with the year of birth at 1st of January and then add random duration of days to it.
library(lubridate)
library(dplyr)
set.seed(5)
df <- data.frame(age = c(18, 33, 58, 63))
df %>%
mutate(dob = make_date(year(Sys.Date()) - age, 1, 1) +
duration(sample(0:364, n()), unit = "days"))
In base R, we can extract the year from the age column subtract it from current year, select a random month and date, paste the values together and create a Date object.
set.seed(123)
df <- data.frame(age = sample(100, 5))
as.Date(paste(as.integer(format(Sys.Date(), "%Y")) - df$age,
sprintf("%02d", sample(12, nrow(df))),
sprintf("%02d", sample(30, nrow(df))), sep = "-"))
#[1] "1990-01-29" "1940-06-14" "1978-09-19" "1933-05-16" "1928-04-03"
However, in this case you might need to make an extra check for month of February, or to be safe you might want to sample dates only from 28 instead of 30 here.
I would like to retain my current date column in year-month format as date. It currently gets converted to chr format. I have tried as_datetime but it coerces all values to NA.
The format I am looking for is: "2017-01"
library(lubridate)
df<- data.frame(Date=c("2017-01-01","2017-01-02","2017-01-03","2017-01-04",
"2018-01-01","2018-01-02","2018-02-01","2018-03-02"),
N=c(24,10,13,12,10,10,33,45))
df$Date <- as_datetime(df$Date)
df$Date <- ymd(df$Date)
df$Date <- strftime(df$Date,format="%Y-%m")
Thanks in advance!
lubridate only handle dates, and dates have days. However, as alistaire mentions, you can floor them by month of you want work monthly:
library(tidyverse)
df_month <-
df %>%
mutate(Date = floor_date(as_date(Date), "month"))
If you e.g. want to aggregate by month, just group_by() and summarize().
df_month %>%
group_by(Date) %>%
summarize(N = sum(N)) %>%
ungroup()
#> # A tibble: 4 x 2
#> Date N
#> <date> <dbl>
#>1 2017-01-01 59
#>2 2018-01-01 20
#>3 2018-02-01 33
#>4 2018-03-01 45
You can solve this with zoo::as.yearmon() function. Follows the solution:
library(tidyquant)
library(magrittr)
library(dplyr)
df <- data.frame(Date=c("2017-01-01","2017-01-02","2017-01-03","2017-01-04",
"2018-01-01","2018-01-02","2018-02-01","2018-03-02"),
N=c(24,10,13,12,10,10,33,45))
df %<>% mutate(Date = zoo::as.yearmon(Date))
You can use cut function, and use breaks="month" to transform all your days in your dates to the first day of the month. So any date within the same month will have the same date in the new created column.
This is usefull to group all other variables in your data frame by month (essentially what you are trying to do). However cut will create a factor, but this can be converted back to a date. So you can still have the date class in your data frame.
You just can't get rid of the day in a date (because then, is not a date...). Afterwards you can create a nice format for axes or tables. For example:
true_date <-
as.POSIXlt(
c(
"2017-01-01",
"2017-01-02",
"2017-01-03",
"2017-01-04",
"2018-01-01",
"2018-01-02",
"2018-02-01",
"2018-03-02"
),
format = "%F"
)
df <-
data.frame(
Date = cut(true_date, breaks = "month"),
N = c(24, 10, 13, 12, 10, 10, 33, 45)
)
## here df$Date is a 'factor'. You could use substr to create a formated column
df$formated_date <- substr(df$Date, start = 1, stop = 7)
## and you can convert back to date class. format = "%F", is ISO 8601 standard date format
df$true_date <- strptime(x = as.character(df$Date), format = "%F")
str(df)
How to add one column price.wk.average to the data such that price.wk.average is equal to the average price of last week, and also add one column price.mo.average to the data such that it equals to the average price of last month? The price.wk.average will be the same for the entire week.
Dates Price Demand Price.wk.average Price.mo.average
2010-1-1 x x
2010-1-2 x x
......
2015-1-1 x x
jkl,
try to post reproducible examples. It will make it easier to help you. you can use dplyr:
library(dplyr)
df <- data.frame(date = seq(as.Date("2017-1-1"),by="day",length.out = 100), price = round(runif(100)*100+50,0))
df <- df %>%
group_by(week = week(date)) %>%
mutate(Price.wk.average = mean(price)) %>%
ungroup() %>%
group_by(month = month(date)) %>%
mutate(Price.mo.average = mean(price))
(Since I don't have enough points to comment)
I wanted to point out that Eric's answer will not distinguish average weekly price by year. Therefore, if you are interested in unique weeks (Week 1 of 2012 != Week 1 of 2015 ), you will need to do extra work to group by unique weeks.
df <- data.frame( Dates = c("2010-1-1", "2010-1-2", "2015-01-3"),
Price = c(50, 20, 40) )
Dates Price
1 2010-1-1 50
2 2010-1-2 20
3 2015-01-3 40
Just to keep your data frame tidy, I suggest converting dates to POSIX format then sorting the data frame:
library(lubridate)
df <- df %>%
mutate(Dates = lubridate::parse_date_time(Dates,"ymd")) %>%
arrange( Dates )
To group by unique weeks:
df <- df %>%
group_by( yw = paste( year(Dates), week(Dates)))
Then mutate and ungroup.
To group by unique months:
df <- df %>%
group_by( ym = paste( year(Dates), month(Dates)))
and mutate and ungroup.
Having a tibble of financial data, I would like to filter it by only selecting the first non-Monday of every week. Usually it will be a Tuesday, but sometimes it can be a Wednesday if Tuesday is a Holiday.
Here is my code that works in most cases
XLF <- quantmod::getSymbols("XLF", from = "2000-01-01", auto.assign = FALSE)
library(tibble)
library(lubridate)
library(dplyr)
xlf <- as_tibble(XLF) %>% rownames_to_column(var = "date") %>%
select(date, XLF.Adjusted)
xlf$date <- ymd(xlf$date)
# We create Month, Week number and Days of the week columns
# Then we remove all the Mondays
xlf <- xlf %>% mutate(Year = year(date), Month = month(date),
IsoWeek = isoweek(date), WDay = wday(date)) %>%
filter(WDay != 2)
# Creating another tibble just for ease of comparison
xlf2 <- xlf %>%
group_by(Year, IsoWeek) %>%
filter(row_number() == 1) %>%
ungroup()
That said, there are some issues that I have not been able to solve so far.
The issue is for instance that it is skipping "2002-12-31" which is a Tuesday because it is considered as part of the first ISO week of 2003.
There are a few similar issues.
My question is how could I select of the first non-Monday of every week without such issues while staying in the tidyverse (ie. not having to use xts / zoo class)?
You can create a consistently increasing week number yourself. Perhaps not the most elegant solution but it works fine for me.
as_tibble(XLF) %>%
rownames_to_column(var = "date")%>%
select(date, XLF.Adjusted)%>%
mutate(date = ymd(date),
Year = year(date),
Month = month(date),
WDay = wday(date),
WDay_label = wday(date, label = T))%>%
# if the weekday number is higher in the line above or
# if the date in the previous line is more than 6 days ago
# the week number should be incremented
mutate(week_increment = (WDay < lag(WDay) | difftime(date, lag(date), unit = 'days') > 6))%>%
# the previous line causes the first element to be NA due to
# the fact that the lag function can't find a line above
# we correct this here by setting the first element to TRUE
mutate(week_increment = ifelse(row_number() == 1,
TRUE,
week_increment))%>%
# we can sum the boolean elements in a cumulative way to get a week number
mutate(week_number = cumsum(week_increment))%>%
filter(WDay != 2)%>%
group_by(Year, week_number) %>%
filter(row_number() == 1)