I have a dataframe with a date column as follows:
library(tidyverse)
df <- data.frame(
id = c(1, 2, 4, 65, 77, 5, 4),
date = c("2020-04-18", "2020-04-20", "2020-04-01", "2020-04-19",
"2020-04-02", "2020-04-01", "2020-04-20")
) %>% mutate(date = as.Date(date))
I would like to systematically filter the date column with elements antecedent to the last Sunday.
Today is 2020-04-20, and it's a Monday.
The first Sunday is 2020-04-19
last_sunday <- as.Date(`2020-04-19`)
df %>% filter(date <= last_sunday)
id date
1 1 2020-04-18
2 4 2020-04-01
3 65 2020-04-19
4 77 2020-04-02
5 5 2020-04-01
How can I programmatically define the variable "last_sunday".
One option could be:
df %>%
filter(date <= min(date[as.POSIXlt(date)$wday == 0]))
id date
1 1 2020-04-18
2 4 2020-04-01
3 65 2020-04-19
4 77 2020-04-02
5 5 2020-04-01
using lubridate
previous_sunday <- lubridate::floor_date(Sys.Date(), "week")
previous_sunday
[1] "2020-04-19"
df %>% filter(date <= previous_sunday)
id date
1 1 2020-04-18
2 4 2020-04-01
3 65 2020-04-19
4 77 2020-04-02
5 5 2020-04-01
Related
I have the following data:
library(tidyverse)
library(lubridate)
df <- tibble(date = as_date(c("2019-11-20", "2019-11-27", "2020-04-01", "2020-04-15", "2020-09-23", "2020-11-25", "2021-03-03")))
# A tibble: 7 x 1
date
<date>
1 2019-11-20
2 2019-11-27
3 2020-04-01
4 2020-04-15
5 2020-09-23
6 2020-11-25
7 2021-03-03
I also have an ordered comparison vector of dates:
comparison <- seq(as_date("2019-12-01"), today(), by = "months") - 1
I now want to compare my dates in df to those comparison dates and so something like:
if date in df is < comparison[1], then assign a 1
if date in df is < comparison[2], then assign a 2
and so on.
I know I could do it with a case_when, e.g.
df %>%
mutate(new_var = case_when(date < comparison[1] ~ 1,
date < comparison[2] ~ 2))
(of course filling this up with all comparisons).
However, this would require to manually write out all sequential conditions and I'm wondering if I couldn't just automate it. I though about creating a match lookup first (i.e. take the comparison vector, then add the respective new_var number (i.e. 1, 2, and so on)) and then match it against my data, but I only know how to do that for exact matches and don't know how I can add the "smaller than" condition.
Expected result:
# A tibble: 7 x 2
date new_var
<date> <dbl>
1 2019-11-20 1
2 2019-11-27 1
3 2020-04-01 6
4 2020-04-15 6
5 2020-09-23 11
6 2020-11-25 13
7 2021-03-03 17
You can use findInterval as follows:
df %>% mutate(new_var = df$date %>% findInterval(comparison) + 1)
# A tibble: 7 x 2
date new_var
<date> <dbl>
1 2019-11-20 1
2 2019-11-27 1
3 2020-04-01 6
4 2020-04-15 6
5 2020-09-23 11
6 2020-11-25 13
7 2021-03-03 17
I wish to calculate the intervals between dates. The differences in days should take weekends in account. I have over 200 dates stamps.
For example, the currently displayed time difference between 5th (Tuesday) and 11th (Monday) January are 5 days. I would like to obtain 3 days.
I could manage to get to a solution without excluding Saturday and Sunday with the following code and the packages lubridate and dplyr.
Could you please guide me how to exclude the weekends for calculation?
Thank you.
library(lubridate)
library(dplyr)
dates <- c("2021-01-01", "2021-01-04", "2021-01-05", "2021-01-06", "2021-01-11", "2021-01-13", "2021-01-14", "2021-01-18", "2021-01-25", "2021-01-29")
d <- do.call(rbind, lapply(dates, as.data.frame))
dateoverview <- rename(d, Dates = 1)
dateoverview$Dates <- lubridate::ymd(dateoverview$Dates)
datecalculation <- dateoverview %>%
mutate(Days = Dates - lag(Dates)) %>%
mutate(Weekday = wday(Dates, label = FALSE))
datecalculation
## Dates Days Weekday
## 1 2021-01-01 NA days 6
## 2 2021-01-04 3 days 2
## 3 2021-01-05 1 days 3
## 4 2021-01-06 1 days 4
## 5 2021-01-11 5 days 2
## 6 2021-01-13 2 days 4
## 7 2021-01-14 1 days 5
## 8 2021-01-18 4 days 2
## 9 2021-01-25 7 days 2
## 10 2021-01-29 4 days 6
Probably, there is a function somewhere already doing this but here is a custom one which can help you calculate date difference excluding weekends.
library(dplyr)
library(purrr)
date_diff_excluding_wekeends <- function(x, y) {
if(is.na(x) || is.na(y)) return(NA)
sum(!format(seq(x, y - 1, by = '1 day'), '%u') %in% 6:7)
}
datecalculation %>%
mutate(Days = map2_dbl(lag(Dates), Dates, date_diff_excluding_wekeends))
# Dates Days Weekday
#1 2021-01-01 NA 6
#2 2021-01-04 1 2
#3 2021-01-05 1 3
#4 2021-01-06 1 4
#5 2021-01-11 3 2
#6 2021-01-13 2 4
#7 2021-01-14 1 5
#8 2021-01-18 2 2
#9 2021-01-25 5 2
#10 2021-01-29 4 6
seq(x, y - 1, by = '1 day') creates a sequence of dates between previous date and current date - 1.
format(..., "%u") returns day of the week. 1 is for Monday, 7 for Sunday.
Using sum(!format(...) %in% 6:7) we count number of days that are present on weekdays.
Another possible solution:
library(lubridate)
# sample data
df = data.frame(Dates = seq(ymd('2021-01-01'),ymd('2021-12-31'),by='days'))
df_weekdays = df %>% filter(!(weekdays(as.Date(df$Dates)) %in% c('Saturday','Sunday')))
#Application to your data
datecalculation = datecalculation %>%
filter(!(weekdays(as.Date(datecalculation$Dates)) %in% c('Saturday','Sunday')))
I have data like this:
library(lubridate)
library(dplyr)
set.seed(2021)
gen_date <- seq(ymd_h("2021-01-01-00"), ymd_h("2021-09-30-23"), by = "hours")
hourx <- hour(gen_date)
datex <- date(gen_date)
sales <- round(runif(length(datex), 10, 50), 0)*100
mydata <- data.frame(datex, hourx, sales)
How do i get the last three months data using dplyr? or How do i get the last six months data using dplyr?. What i want is full data from "2021-06-01" to "2021-09-30". Thank You.
We may get the max value of 'datex', create a sequnece of 6 or 3 months with seq backwards, and create a logical vector with 'datex' to filter
library(dplyr)
n <- 6
out <- mydata %>%
filter(datex >= seq(floor_date(max(datex), 'month'),
length.out = n + 1, by = '-1 month'))
-checking
> head(out)
datex hourx sales
1 2021-03-01 4 5000
2 2021-03-01 11 3200
3 2021-03-01 18 1500
4 2021-03-02 1 4400
5 2021-03-02 8 4400
6 2021-03-02 15 4400
> max(mydata$datex)
[1] "2021-09-30"
For 3 months
n <- 3
out2 <- mydata %>%
filter(datex >= seq(floor_date(max(datex), 'month'),
length.out = n + 1, by = '-1 month'))
> head(out2)
datex hourx sales
1 2021-06-01 3 2100
2 2021-06-01 7 1300
3 2021-06-01 11 4800
4 2021-06-01 15 1500
5 2021-06-01 19 3200
6 2021-06-01 23 3400
You may try
library(xts)
x <- mydata %>%
mutate(month = month(datex)) %>%
filter(month %in% last(unique(month), 3))
unique(x$month)
[1] 7 8 9
I have a df indicating start and end dates of a certain observation. Often this observation lasts longer than one day, giving it a value of >0 in the "duration" column. I want to add the days which lay in between "start" and "end" ("duration") as new rows into my df. How can I do this?
Example df
df <- data.frame(start_date = c(as.Date("1/1/2020", "1/25/2020", "2/11/2020")),
end_date = c(as.Date("1/5/2020", "1/26/2020", "2/13/2020")),
duration = c(4, 1, 2))
Are you looking for such a solution?
library(dplyr)
library(lubridate)
df %>%
mutate(start_date = mdy(start_date),
end_date = mdy(end_date)) %>%
mutate(duration = end_date - start_date)
data:
df <- data.frame(start_date = c("1/1/2020", "1/25/2020", "2/11/2020"),
end_date = c("1/5/2020", "1/26/2020", "2/13/2020"))
Output:
start_date end_date duration
1 2020-01-01 2020-01-05 4 days
2 2020-01-25 2020-01-26 1 days
3 2020-02-11 2020-02-13 2 day
You can simply subtract df$start_date from df$end_date:
df$end_date - df$start_date
#Time differences in days
#[1] 4 1 2
or use difftime:
difftime(df$end_date, df$start_date, "days")
#Time differences in days
#[1] 4 1 2
To get a sequence of dates use seq:
do.call(c, Map(seq, df$start_date, df$end_date, by=1))
# [1] "2020-01-01" "2020-01-02" "2020-01-03" "2020-01-04" "2020-01-05"
# [6] "2020-01-25" "2020-01-26" "2020-02-11" "2020-02-12" "2020-02-13"
Data:
df <- data.frame(start_date = as.Date(c("1/1/2020", "1/25/2020", "2/11/2020"), "%m/%d/%y"),
end_date = as.Date(c("1/5/2020", "1/26/2020", "2/13/2020"), "%m/%d/%y"),
duration = c(4, 1, 2))
Are you looking for this solution?
library(tidyverse)
df %>%
mutate(date = map2(start_date, end_date, seq, by = '1 day')) %>%
unnest(date) -> result
result
# start_date end_date duration date
# <date> <date> <dbl> <date>
# 1 2020-01-01 2020-01-05 4 2020-01-01
# 2 2020-01-01 2020-01-05 4 2020-01-02
# 3 2020-01-01 2020-01-05 4 2020-01-03
# 4 2020-01-01 2020-01-05 4 2020-01-04
# 5 2020-01-01 2020-01-05 4 2020-01-05
# 6 2020-01-25 2020-01-26 1 2020-01-25
# 7 2020-01-25 2020-01-26 1 2020-01-26
# 8 2020-02-11 2020-02-13 2 2020-02-11
# 9 2020-02-11 2020-02-13 2 2020-02-12
#10 2020-02-11 2020-02-13 2 2020-02-13
You can drop the columns that you don't need using select.
data
df <- structure(list(start_date = structure(c(18262, 18286, 18303),class = "Date"),
end_date = structure(c(18266, 18287, 18305), class = "Date"),
duration = c(4, 1, 2)), class = "data.frame", row.names = c(NA, -3L))
I have a large dataset with thousands of dates in the ymd format. I want to convert this column so that way there are three individual columns by year, month, and day. There are literally thousands of dates so I am trying to do this with a single code for the entire dataset.
You can use the year(), month(), and day() extractors in lubridate for this. Here's an example:
library('dplyr')
library('tibble')
library('lubridate')
## create some data
df <- tibble(date = seq(ymd(20190101), ymd(20191231), by = '7 days'))
which yields
> df
# A tibble: 53 x 1
date
<date>
1 2019-01-01
2 2019-01-08
3 2019-01-15
4 2019-01-22
5 2019-01-29
6 2019-02-05
7 2019-02-12
8 2019-02-19
9 2019-02-26
10 2019-03-05
# … with 43 more rows
Then mutate df using the relevant extractor function:
df <- mutate(df,
year = year(date),
month = month(date),
day = day(date))
This results in:
> df
# A tibble: 53 x 4
date year month day
<date> <dbl> <dbl> <int>
1 2019-01-01 2019 1 1
2 2019-01-08 2019 1 8
3 2019-01-15 2019 1 15
4 2019-01-22 2019 1 22
5 2019-01-29 2019 1 29
6 2019-02-05 2019 2 5
7 2019-02-12 2019 2 12
8 2019-02-19 2019 2 19
9 2019-02-26 2019 2 26
10 2019-03-05 2019 3 5
# … with 43 more rows
If you only want the new three columns, use transmute() instead of mutate().
Using lubridate but without having to specify a separator:
library(tidyverse)
df <- tibble(d = c('2019/3/18','2018/10/29'))
df %>%
mutate(
date = lubridate::ymd(d),
year = lubridate::year(date),
month = lubridate::month(date),
day = lubridate::day(date)
)
Note that you can change the first entry from ymd to fit other formats.
A slighlty different tidyverse solution that requires less code could be:
Code
tibble(date = "2018-05-01") %>%
mutate_at(vars(date), lst(year, month, day))
Result
# A tibble: 1 x 4
date year month day
<chr> <dbl> <dbl> <int>
1 2018-05-01 2018 5 1
#Data
d = data.frame(date = c("2019-01-01", "2019-02-01", "2012/03/04"))
library(lubridate)
cbind(d,
read.table(header = FALSE,
sep = "-",
text = as.character(ymd(d$date))))
# date V1 V2 V3
#1 2019-01-01 2019 1 1
#2 2019-02-01 2019 2 1
#3 2012/03/04 2012 3 4
OR
library(dplyr)
library(tidyr)
library(lubridate)
d %>%
mutate(date2 = as.character(ymd(date))) %>%
separate(date2, c("year", "month", "day"), "-")
# date year month day
#1 2019-01-01 2019 01 01
#2 2019-02-01 2019 02 01
#3 2012/03/04 2012 03 04