I have data like this:
library(lubridate)
library(dplyr)
set.seed(2021)
gen_date <- seq(ymd_h("2021-01-01-00"), ymd_h("2021-09-30-23"), by = "hours")
hourx <- hour(gen_date)
datex <- date(gen_date)
sales <- round(runif(length(datex), 10, 50), 0)*100
mydata <- data.frame(datex, hourx, sales)
How do i get the last three months data using dplyr? or How do i get the last six months data using dplyr?. What i want is full data from "2021-06-01" to "2021-09-30". Thank You.
We may get the max value of 'datex', create a sequnece of 6 or 3 months with seq backwards, and create a logical vector with 'datex' to filter
library(dplyr)
n <- 6
out <- mydata %>%
filter(datex >= seq(floor_date(max(datex), 'month'),
length.out = n + 1, by = '-1 month'))
-checking
> head(out)
datex hourx sales
1 2021-03-01 4 5000
2 2021-03-01 11 3200
3 2021-03-01 18 1500
4 2021-03-02 1 4400
5 2021-03-02 8 4400
6 2021-03-02 15 4400
> max(mydata$datex)
[1] "2021-09-30"
For 3 months
n <- 3
out2 <- mydata %>%
filter(datex >= seq(floor_date(max(datex), 'month'),
length.out = n + 1, by = '-1 month'))
> head(out2)
datex hourx sales
1 2021-06-01 3 2100
2 2021-06-01 7 1300
3 2021-06-01 11 4800
4 2021-06-01 15 1500
5 2021-06-01 19 3200
6 2021-06-01 23 3400
You may try
library(xts)
x <- mydata %>%
mutate(month = month(datex)) %>%
filter(month %in% last(unique(month), 3))
unique(x$month)
[1] 7 8 9
Related
I have the following data:
library(tidyverse)
library(lubridate)
df <- tibble(date = as_date(c("2019-11-20", "2019-11-27", "2020-04-01", "2020-04-15", "2020-09-23", "2020-11-25", "2021-03-03")))
# A tibble: 7 x 1
date
<date>
1 2019-11-20
2 2019-11-27
3 2020-04-01
4 2020-04-15
5 2020-09-23
6 2020-11-25
7 2021-03-03
I also have an ordered comparison vector of dates:
comparison <- seq(as_date("2019-12-01"), today(), by = "months") - 1
I now want to compare my dates in df to those comparison dates and so something like:
if date in df is < comparison[1], then assign a 1
if date in df is < comparison[2], then assign a 2
and so on.
I know I could do it with a case_when, e.g.
df %>%
mutate(new_var = case_when(date < comparison[1] ~ 1,
date < comparison[2] ~ 2))
(of course filling this up with all comparisons).
However, this would require to manually write out all sequential conditions and I'm wondering if I couldn't just automate it. I though about creating a match lookup first (i.e. take the comparison vector, then add the respective new_var number (i.e. 1, 2, and so on)) and then match it against my data, but I only know how to do that for exact matches and don't know how I can add the "smaller than" condition.
Expected result:
# A tibble: 7 x 2
date new_var
<date> <dbl>
1 2019-11-20 1
2 2019-11-27 1
3 2020-04-01 6
4 2020-04-15 6
5 2020-09-23 11
6 2020-11-25 13
7 2021-03-03 17
You can use findInterval as follows:
df %>% mutate(new_var = df$date %>% findInterval(comparison) + 1)
# A tibble: 7 x 2
date new_var
<date> <dbl>
1 2019-11-20 1
2 2019-11-27 1
3 2020-04-01 6
4 2020-04-15 6
5 2020-09-23 11
6 2020-11-25 13
7 2021-03-03 17
I wish to calculate the intervals between dates. The differences in days should take weekends in account. I have over 200 dates stamps.
For example, the currently displayed time difference between 5th (Tuesday) and 11th (Monday) January are 5 days. I would like to obtain 3 days.
I could manage to get to a solution without excluding Saturday and Sunday with the following code and the packages lubridate and dplyr.
Could you please guide me how to exclude the weekends for calculation?
Thank you.
library(lubridate)
library(dplyr)
dates <- c("2021-01-01", "2021-01-04", "2021-01-05", "2021-01-06", "2021-01-11", "2021-01-13", "2021-01-14", "2021-01-18", "2021-01-25", "2021-01-29")
d <- do.call(rbind, lapply(dates, as.data.frame))
dateoverview <- rename(d, Dates = 1)
dateoverview$Dates <- lubridate::ymd(dateoverview$Dates)
datecalculation <- dateoverview %>%
mutate(Days = Dates - lag(Dates)) %>%
mutate(Weekday = wday(Dates, label = FALSE))
datecalculation
## Dates Days Weekday
## 1 2021-01-01 NA days 6
## 2 2021-01-04 3 days 2
## 3 2021-01-05 1 days 3
## 4 2021-01-06 1 days 4
## 5 2021-01-11 5 days 2
## 6 2021-01-13 2 days 4
## 7 2021-01-14 1 days 5
## 8 2021-01-18 4 days 2
## 9 2021-01-25 7 days 2
## 10 2021-01-29 4 days 6
Probably, there is a function somewhere already doing this but here is a custom one which can help you calculate date difference excluding weekends.
library(dplyr)
library(purrr)
date_diff_excluding_wekeends <- function(x, y) {
if(is.na(x) || is.na(y)) return(NA)
sum(!format(seq(x, y - 1, by = '1 day'), '%u') %in% 6:7)
}
datecalculation %>%
mutate(Days = map2_dbl(lag(Dates), Dates, date_diff_excluding_wekeends))
# Dates Days Weekday
#1 2021-01-01 NA 6
#2 2021-01-04 1 2
#3 2021-01-05 1 3
#4 2021-01-06 1 4
#5 2021-01-11 3 2
#6 2021-01-13 2 4
#7 2021-01-14 1 5
#8 2021-01-18 2 2
#9 2021-01-25 5 2
#10 2021-01-29 4 6
seq(x, y - 1, by = '1 day') creates a sequence of dates between previous date and current date - 1.
format(..., "%u") returns day of the week. 1 is for Monday, 7 for Sunday.
Using sum(!format(...) %in% 6:7) we count number of days that are present on weekdays.
Another possible solution:
library(lubridate)
# sample data
df = data.frame(Dates = seq(ymd('2021-01-01'),ymd('2021-12-31'),by='days'))
df_weekdays = df %>% filter(!(weekdays(as.Date(df$Dates)) %in% c('Saturday','Sunday')))
#Application to your data
datecalculation = datecalculation %>%
filter(!(weekdays(as.Date(datecalculation$Dates)) %in% c('Saturday','Sunday')))
Questions:
Load the brexit_polls data frame from dslabs:
How many polls had a start date (startdate) in April (month number 4)?*
The start date data within brexit_polls data set has multiple years as points but I want to filter only for the month of April.
I have tried using a a regex then april <- brexit_polls %>% regex(startdate,"....-04-..")
I also tried using the tibbletime package but it wouldn't load to my R. Any suggetions?
dat <- data.frame(startdate = seq(as.Date("2021-01-01"), len=30, by="week"))
head(dat)
# startdate
# 1 2021-01-01
# 2 2021-01-08
# 3 2021-01-15
# 4 2021-01-22
# 5 2021-01-29
# 6 2021-02-05
library(dplyr)
dat %>%
filter("04" == format(startdate, format="%m"))
# startdate
# 1 2021-04-02
# 2 2021-04-09
# 3 2021-04-16
# 4 2021-04-23
# 5 2021-04-30
dat %>%
group_by(month = format(startdate, format="%m")) %>%
tally()
# # A tibble: 7 x 2
# month n
# <chr> <int>
# 1 01 5
# 2 02 4
# 3 03 4
# 4 04 5
# 5 05 4
# 6 06 4
# 7 07 4
dat %>%
group_by(month = format(startdate, format="%m")) %>%
tally() %>%
filter(month == "04")
# # A tibble: 1 x 2
# month n
# <chr> <int>
# 1 04 5
I inferred dplyr, but this works in base as well:
subset(dat, format(startdate, format="%m") == "04")
# startdate
# 14 2021-04-02
# 15 2021-04-09
# 16 2021-04-16
# 17 2021-04-23
# 18 2021-04-30
I have a dataframe with a date column as follows:
library(tidyverse)
df <- data.frame(
id = c(1, 2, 4, 65, 77, 5, 4),
date = c("2020-04-18", "2020-04-20", "2020-04-01", "2020-04-19",
"2020-04-02", "2020-04-01", "2020-04-20")
) %>% mutate(date = as.Date(date))
I would like to systematically filter the date column with elements antecedent to the last Sunday.
Today is 2020-04-20, and it's a Monday.
The first Sunday is 2020-04-19
last_sunday <- as.Date(`2020-04-19`)
df %>% filter(date <= last_sunday)
id date
1 1 2020-04-18
2 4 2020-04-01
3 65 2020-04-19
4 77 2020-04-02
5 5 2020-04-01
How can I programmatically define the variable "last_sunday".
One option could be:
df %>%
filter(date <= min(date[as.POSIXlt(date)$wday == 0]))
id date
1 1 2020-04-18
2 4 2020-04-01
3 65 2020-04-19
4 77 2020-04-02
5 5 2020-04-01
using lubridate
previous_sunday <- lubridate::floor_date(Sys.Date(), "week")
previous_sunday
[1] "2020-04-19"
df %>% filter(date <= previous_sunday)
id date
1 1 2020-04-18
2 4 2020-04-01
3 65 2020-04-19
4 77 2020-04-02
5 5 2020-04-01
I have a large dataset with thousands of dates in the ymd format. I want to convert this column so that way there are three individual columns by year, month, and day. There are literally thousands of dates so I am trying to do this with a single code for the entire dataset.
You can use the year(), month(), and day() extractors in lubridate for this. Here's an example:
library('dplyr')
library('tibble')
library('lubridate')
## create some data
df <- tibble(date = seq(ymd(20190101), ymd(20191231), by = '7 days'))
which yields
> df
# A tibble: 53 x 1
date
<date>
1 2019-01-01
2 2019-01-08
3 2019-01-15
4 2019-01-22
5 2019-01-29
6 2019-02-05
7 2019-02-12
8 2019-02-19
9 2019-02-26
10 2019-03-05
# … with 43 more rows
Then mutate df using the relevant extractor function:
df <- mutate(df,
year = year(date),
month = month(date),
day = day(date))
This results in:
> df
# A tibble: 53 x 4
date year month day
<date> <dbl> <dbl> <int>
1 2019-01-01 2019 1 1
2 2019-01-08 2019 1 8
3 2019-01-15 2019 1 15
4 2019-01-22 2019 1 22
5 2019-01-29 2019 1 29
6 2019-02-05 2019 2 5
7 2019-02-12 2019 2 12
8 2019-02-19 2019 2 19
9 2019-02-26 2019 2 26
10 2019-03-05 2019 3 5
# … with 43 more rows
If you only want the new three columns, use transmute() instead of mutate().
Using lubridate but without having to specify a separator:
library(tidyverse)
df <- tibble(d = c('2019/3/18','2018/10/29'))
df %>%
mutate(
date = lubridate::ymd(d),
year = lubridate::year(date),
month = lubridate::month(date),
day = lubridate::day(date)
)
Note that you can change the first entry from ymd to fit other formats.
A slighlty different tidyverse solution that requires less code could be:
Code
tibble(date = "2018-05-01") %>%
mutate_at(vars(date), lst(year, month, day))
Result
# A tibble: 1 x 4
date year month day
<chr> <dbl> <dbl> <int>
1 2018-05-01 2018 5 1
#Data
d = data.frame(date = c("2019-01-01", "2019-02-01", "2012/03/04"))
library(lubridate)
cbind(d,
read.table(header = FALSE,
sep = "-",
text = as.character(ymd(d$date))))
# date V1 V2 V3
#1 2019-01-01 2019 1 1
#2 2019-02-01 2019 2 1
#3 2012/03/04 2012 3 4
OR
library(dplyr)
library(tidyr)
library(lubridate)
d %>%
mutate(date2 = as.character(ymd(date))) %>%
separate(date2, c("year", "month", "day"), "-")
# date year month day
#1 2019-01-01 2019 01 01
#2 2019-02-01 2019 02 01
#3 2012/03/04 2012 03 04