Match Days of the Week within an Interval to Create Specific Dates - r

I am working with a data set that has the following structure for its dates:
Week DateStart DateEnd Day
1 5-Aug-16 11-Aug-16 Monday
2 12-Aug-16 18-Aug-16 Thursday
Where "Week" corresponds to a study week number, "DateStart" and "DateEnd" are the first and last days of that week, and "Day" represents the specific day from within that week. I would like to use the "DateStart", "DateEnd", and "Day" fields to create a new field, "Date", that assigns a specific date to each "Day" that falls within the "DateStart" and "DateEnd" interval.
I've used the %--% operator to turn DateStart and DateEnd into an interval:
Week_Interval <- DateStart %--% DateEnd
but then I haven't had much luck on figuring out how to match the Day field to a date within the resulting interval. I've tried reading through the lubridate documentation, but it didn't seem like there was anything in there that could specifically solve my problem. I'm hoping someone here might have some experience with this and could help point me in the right direction.
My ideal output would be something like:
Week DateStart DateEnd Day Date
1 5-Aug-16 11-Aug-16 Monday 08-08-2016
2 12-Aug-16 18-Aug-16 Thursday 18-08-2016
Where the date follows the standard dd-mm-yyyy format.

Take the difference between the day of the week of Day and DateStart modulo 7 and add that to the DateStart.
No packages are used.
dow <- c("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday")
transform(DF, Date =
DateStart + (match(Day, dow) - 1 - as.POSIXlt(DateStart)$wday) %% 7)
giving:
Week DateStart DateEnd Day Date
1 1 2016-08-05 2016-08-11 Monday 2016-08-08
2 2 2016-08-12 2016-08-18 Thursday 2016-08-18
Note 1
An alternative to writing out the days of the week, provided you are in an English locale, is:
dow <- weekdays(as.Date("1950-01-01") + 0:6)
Note 2
In the example the Start Date is Friday on both rows. If it were known that that is always the case we could shorten the code by hard coding it as 5:
transform(DF, Date = DateStart + (match(Day, dow) - 1 - 5) %% 7)
Note 3
The input, in reproducible form, is:
Lines <- "Week DateStart DateEnd Day
1 5-Aug-16 11-Aug-16 Monday
2 12-Aug-16 18-Aug-16 Thursday"
DF <- read.table(text = Lines, header = TRUE)
fmt <- "%d-%b-%y"
DF <- transform(DF, DateStart = as.Date(DateStart, fmt),
DateEnd = as.Date(DateEnd, fmt))

# example data
df = read.table(text = "
Week DateStart DateEnd Day
1 5-Aug-16 11-Aug-16 Monday
2 12-Aug-16 18-Aug-16 Thursday
", header=T, stringsAsFactors=F)
library(tidyverse)
library(lubridate)
df %>%
group_by(Week, Day) %>% # for each week and day
mutate(Date = list(seq(dmy(DateStart), dmy(DateEnd), "1 day")), # get sequence of dates between start and end
Day2 = map(Date, weekdays)) %>% # get name of days for each date in the sequence
unnest() %>% # unnest dates
ungroup() %>% # forget the grouping
filter(Day == Day2) %>% # keep days that match
select(-Day2) # remove unnecessary column
# # A tibble: 2 x 5
# Week DateStart DateEnd Day Date
# <int> <chr> <chr> <chr> <date>
# 1 1 5-Aug-16 11-Aug-16 Monday 2016-08-08
# 2 2 12-Aug-16 18-Aug-16 Thursday 2016-08-18

Related

Computing age today from date of birth variable (R)

I have a dataframe with information on date of birth by individual id.
mydf <- data.frame(id=c(1,2),
dtbirth=as.Date(c("2012-01-01","2013-04-01")))
I would like to compute the age of the individuals as of today. The code below seems to work but outputs "days" to the new variable age
mydf %>%
mutate(age=floor((today()-dtbirth)/365))
We can wrap with as.integer/as.numeric to remove the class attribute difftime
mydf %>%
mutate(age= as.integer(floor((today()-dtbirth)/365)))
-output
# id dtbirth age
#1 1 2012-01-01 9
#2 2 2013-04-01 8
By default, when we use the -, the difftime picks up the unit by "auto"
mydf %>%
mutate(age = today() - dtbirth)
# id dtbirth age
#1 1 2012-01-01 3430 days
#2 2 2013-04-01 2974 days
If we need more fine control, use difftime itself and specify the units
mydf %>%
mutate(age = difftime(today(), dtbirth, units = 'weeks'))
# id dtbirth age
#1 1 2012-01-01 490.0000 weeks
#2 2 2013-04-01 424.8571 weeks
We cannot have units greater than 'weeks' as the available options are
difftime(time1, time2, tz,
units = c("auto", "secs", "mins", "hours",
"days", "weeks"))
and it is mentioned as
Units such as "months" are not possible as they are not of constant length. To create intervals of months, quarters or years use seq.Date or seq.POSIXt.

Filter data by last 12 Months of the total data available in R

R:
I have a data-set with N Products sales value from some yyyy-mm-dd to some yyyy-mm-dd, I just want to filter the data for the last 12 months for each product in the data-set.
Eg:
Say, I have values from 2016-01-01 to 2020-02-01
So now I want to filter the sales values for the last 12 months that is from 2019-02-01 to 2020-02-01
I just cannot simply mention a "filter(Month >= as.Date("2019-04-01") & Month <= as.Date("2020-04-01"))" because the end date keeps changing for my case as every months passes by so I need to automate the case.
You can use :
library(dplyr)
library(lubridate)
data %>%
group_by(Product) %>%
filter(between(date, max(date) - years(1), max(date)))
#filter(date >= (max(date) - years(1)) & date <= max(date))
You can test whether the date is bigger equal the maximal date per product minus 365 days:
library(dplyr)
df %>%
group_by(Products) %>%
filter(Date >= max(Date)-365)
# A tibble: 6 x 2
# Groups: Products [3]
Products Date
<dbl> <date>
1 1 2002-01-21
2 1 2002-02-10
3 2 2002-02-24
4 2 2002-02-10
5 2 2001-07-01
6 3 2005-03-10
Data
df <- data.frame(
Products = c(1,1,1,1,2,2,2,3,3,3),
Date = as.Date(c("2000-02-01", "2002-01-21", "2002-02-10",
"2000-06-01", "2002-02-24", "2002-02-10",
"2001-07-01", "2003-01-02", "2005-03-10",
"2002-05-01")))
If your aim is to just capture entries from today back to the same day last year, then:
The function Sys.Date() returns the current date as an object of type Date. You can then convert that to POSIXlc form to adjust the year to get the start date. For example:
end.date <- Sys.Date()
end.date.lt <- asPOSIXlt(end.date)
start.date.lt <- end.date.lt
start.date.lt$year <- start.date.lt$year - 1
start.date <- asPOSIXct(start.date.lt)
Now this does have one potential fail-state: if today is February 29th. One way to deal with that would be to write a "today.last.year" function to do the above conversion, but give an explicit treatment for leap years - possibly including an option to count "today last year" as either February 28th or March 1st, depending on which gives you the desired behaviour.
Alternatively, if you wanted to filter based on a start-of-month date, you can make your function also set start.date.lt$day = 1, and so forth if you need to adjust in different ways.
Input:
product date
1: a 2017-01-01
2: b 2017-04-01
3: a 2017-07-01
4: b 2017-10-01
5: a 2018-01-01
6: b 2018-04-01
7: a 2018-07-01
8: b 2018-10-01
9: a 2019-01-01
10: b 2019-04-01
11: a 2019-07-01
12: b 2019-10-01
Code:
library(lubridate)
library(data.table)
DT <- data.table(
product = rep(c("a", "b"), 6),
date = seq(as.Date("2017-01-01"), as.Date("2019-12-31"), by = "quarter")
)
yearBefore <- function(x){
year(x) <- year(x) - 1
x
}
date_DT <- DT[, .(last_date = last(date)), by = product]
date_DT[, year_before := yearBefore(last_date)]
result <- DT[, date_DT[DT, on = .(product, year_before <= date), nomatch=0]]
result[, last_date := NULL]
setnames(result, "year_before", "date")
Output:
product date
1: a 2018-07-01
2: b 2018-10-01
3: a 2019-01-01
4: b 2019-04-01
5: a 2019-07-01
6: b 2019-10-01
Is this what you are looking for?

Assigning values to all rows within a specific hour range using monthly data

I have a dataframe in the following format:
temp:
id time date
1 06:22:30 2018-01-01
2 08:58:00 2018-01-15
3 09:30:21 2018-01-30
The actual data set continues on for 9000 rows with obs for times throughout the month of January. I want to write a code that will assign each row a new value depending on which hour range the time variable belongs to.
A couple of example hour ranges would be:
Morning peak: 06:00:00 - 08:59:00
Morning: 09:00:00 - 11:59:00
The desired output would look like this:
id time date time_of_day
1 06:22:30 2018-01-01 MorningPeak
2 08:58:00 2018-01-15 MorningPeak
3 09:30:21 2018-01-30 Morning
I have tried playing around with time objects using the chron package using the following code to specify different time ranges:
MorningPeak <- temp[temp$Time >= "06:00:00" & temp$Time <= "08:59:59",]
MorningPeak$time_of_day <- "MorningPeak"
Morning <- temp[temp$Time >= "09:00:00" & temp$Time <= "11:59:59",]
Midday$time_of_day <- "Morning"
The results could then be merged and then manipulated to get everything in the same column. Is there a way to do this such that the desired result is generated and no extra data manipulation is required? I am interested in learning how to make my code more efficient.
You are comparing characters and not time/datetime objects, you need to convert it to date-time before comparison. It seems you can compare the hour of the day to get appropriate labels.
library(dplyr)
df %>%
mutate(hour = as.integer(format(as.POSIXct(time, format = "%T"), "%H")),
time_of_day = case_when(hour >= 6 & hour < 9 ~ "MorningPeak",
hour >= 9 & hour < 12 ~ "Morning",
TRUE ~ "Rest of the day"))
# id time date hour time_of_day
#1 1 06:22:30 2018-01-01 6 MorningPeak
#2 2 08:58:00 2018-01-15 8 MorningPeak
#3 3 09:30:21 2018-01-30 9 Morning
You can add more hourly criteria if needed.
We can also use cut
cut(as.integer(format(as.POSIXct(df$time, format = "%T"), "%H")),
breaks = c(-Inf, 6, 9, 12, Inf), right = FALSE,
labels = c("Rest of the day", "MorningPeak", "Morning", "Rest of the day"))

How to convert numerical value to time in hours and minutes?

I have an excel dataset in which there are dates and time points as follows:
record_id date_E1 time_E1 date_E2 time_E2 ...
1 2019/8/24 09:00:00 2019/8/25 18:00:00
I would like to construct a variable which contains the number of hours past the first time and date, (09:00 a.m 2019/8/24). When I read the excel file with
read_excel("C:/visit.xlsx")
the time_E1 .. appears as 0.3750000 0.7736111 0.4131944 0.4131944,
and the date appears as 43640 43640 43641 43642, in R. I use visit_dates<-as.Date(as.numeric(visit_date_L$Day), origin = "1899-12-30")
to convert dates to 2019-8-24 and .. but do not know how to convert time of the day and convert to the hours past the first time point. What I expect is a vector like: 0, 42, ... hours past first time point.
I have used the following code:
as.POSIXct(visit_times, format = " %H-%M", origin = "09:00:00"),
but it returns a NULL vector. After that I could use the following code to transpose and combine date and time data:
visit_time <- subset(MY_visit, select = c(record_id, time_E1, ...)
visit_date <- subset(MY_visit, select = c(record_id, date_E1,...)
visit_time_L <- melt(visit_time, id.vars=c("record_id"))
visit_date_L <- melt(visit_date, id.vars=c("record_id"))
names(visit_time_L)[names(visit_time_L)=="value"] <- "time"
names(visit_date_L)[names(visit_date_L)=="value"] <- "Day"
visit_all <- cbind(visit_time_L, visit_date_L)
Any ideas how can I solve this problem?
Here is an approach that you can try. I have dates/times stored in an Excel file. Read it in and keep the columns as characters. Convert the dates to their proper format, as you did. Convert the fractions of the time of day to numeric and multiply by 24. Paste the dates/times together and convert to date format, then find the difference between the two in hours (the result will be in days, so multiply by 24).
library(dplyr);library(readxl); library(lubridate)
df <- read_excel('Book1.xlsx',col_types = c('text'))
# A tibble: 1 x 4
date1 time1 date2 time2
<chr> <chr> <chr> <chr>
1 43466 0.375 43467 0.41666666666666669
df %>% mutate_at(c('date1','date2'), ~ as.Date(as.numeric(.),origin='1899-12-30')) %>%
mutate_at(c('time1','time2'), ~ as.numeric(.)*24) %>%
mutate(t1=ymd_h(paste(date1,time1)),
t2=ymd_h(paste(date2,time2)),
diff=as.numeric(t2-t1)*24)
# A tibble: 1 x 7
date1 time1 date2 time2 t1 t2 diff
<date> <dbl> <date> <dbl> <dttm> <dttm> <dbl>
1 2019-01-01 9 2019-01-02 10 2019-01-01 09:00:00 2019-01-02 10:00:00 25

calculate statistical week, starting 1st January, as used in fisheries data

Fisheries data is often collected by statistical weeks that start January 1st every year. The second week starts on the following Sunday each year.
So in 2013 Jan. 1st to Jan. 5 was week 1 and Jan. 6 to Jan.12 was week two. I am trying to calculate the statical week given a date for a number of years. My data is just dates in d-m-y format (i.e 16-6-1990) and I want a statistical week output in R code.
An example would be:
> d <- as.Date(c("01-01-2013","06-01-2013","01-01-2006","08-01-2006"),"%d-%m-%Y")
And the desired result would be:
> statweek(d)
[1] 1 2 1 2
Try this:
> d <- as.Date("01-01-2013", "%d-%m-%Y") + 0:7 # first 8 days of 2013
> d
[1] "2013-01-01" "2013-01-02" "2013-01-03" "2013-01-04" "2013-01-05"
[6] "2013-01-06" "2013-01-07" "2013-01-08"
>
> ufmt <- function(x) as.numeric(format(as.Date(x), "%U"))
> ufmt(d) - ufmt(cut(d, "year")) + 1
[1] 1 1 1 1 1 2 2 2
Note: The first Sunday in the year is defined as the start of week 1 by %U which means that if the year does not start on Sunday then we must add 1 to the week so that the first week is week 1 rather than week 0. ufmt(cut(d, "year")) equals one if d's year starts on Sunday and zero otherwise so the formula above reduces to ufmt(d) if d's year starts on Sunday and ufmt(d)+1 if not.
UPDATE: corrections so Jan starts at week 1 even if year starts on a Sunday, e.g. 2006.
Here is the statweek function. The main argument can be a character vector of dates (the default after reading a data.frame, for example). You can specify the format of the dates (has a default: format="%d-%m-%Y")
d1 <- c("01-01-2013","06-01-2013","01-01-2006","08-01-2006") # format="%d-%m-%Y"
d2 <- c("01/01/2013","06/01/2013","01/01/2006","08/01/2006") # format="%d/%m/%Y"
statweek = function(dates, format="%d-%m-%Y", ...) {
# convert to Date
dates = as.Date(dates, format=format, ...)
# get correction for the first week of the year (0 if 1-Jan not a Sunday)
firstweek = 1 - as.numeric(format(as.Date(cut(dates, "year")), "%U"))
output = as.numeric(format(dates, "%U")) + firstweek
return(output)
}
And the examples:
statweek(d1)
[1] 1 2 1 2
statweek(d1, format="%d-%m-%Y")
[1] 1 2 1 2
statweek(d2, format="%d/%m/%Y")
[1] 1 2 1 2

Resources