Need to create sequence of date intervals - r

I need to create shift timings for next 15 days
So I am trying like this:
library(lubridate)
c = matrix(nrow=360, ncol=45)
date1 <- ymd_hms("2000-01-01 00:00:00",tz = "US/Eastern")
date2 <- ymd_hms("2000-01-01 08:00:00",tz = "US/Eastern")
date3<- ymd_hms("2000-01-01 16:00:00",tz = "US/Eastern")
date4<- ymd_hms("2000-02-01 00:00:00",tz = "US/Eastern")
I created three shift intervals for day1 as follows:
shift1<-interval(date1,date2)
shift2<-interval(date2,date3)
shift3<-interval(date4,date3)
And I want to create similar intervals for next 14 days. I am trying like this:
end<-as.matrix(rep(c(shift1,shift2,shift3)+days(1),14)))
This is the error
Error in per#.Data + num :
Arithmetic operators undefined for 'numeric' and 'Interval' classes:
convert one to numeric or a matching time-span class.

How about this apply family solution?
library(lubridate)
date1 <- ymd_hms("2000-01-01 00:00:00",tz = "US/Eastern")
end<- lapply(0:14, function(x){
lapply(c(0,8,16), function(y){
interval((date1+days(x)+hours(y)), (date1+days(x)+hours(y+8)))
})
})
lapply(0:14) will create group of intervals from day 0 to day 14 starting in date1
lapply(c(0,8,16) will create n+1 intervals where n = c(0,8,16) starting from date1 + day X (X determined by first lapply)
The result will be a list of lists where for example end[[8]][3] will be the 3rd intervals of day 8 starting date1
Best!

Related

Categorizing data using date variable in R

I am having trouble in using the date variable in my dataset to create categories of 6 months time period. I want to create these time period categories for years between 2017-1-1 and 2020-6-30. The time period categories for each year would be from 2017-1-1 to 2017-6-30, and 2017-7-1 to 2017-12-31 until 2020-6-30.
I have used the following two types of codes to create date categories but I am getting a similar error:
#CODE1
#checking for date class
myData <- str(myData)
myData #date in factor class
#convert to date class
date_class <- as.Date(myData$date, format = "%m/%d/%Y")
myData$date_class <- as.Date(myData$date, format = "%m/%d/%Y")
myData
#creating timeperiod category 1
date_cat <- NA
myData$date_cat[which(myData$date_class >= "2017-1-1" & myData$date_class < "2017-7-1")] <- 1
#CODE2
#converting to date format
myData$date <- strptime(myData$date,format="%m/%d/%Y")
myData$date <- as.POSIXct(myData$date)
myData
#creating timeperiod category 1
date_cat <- NA
myData$date_cat[which(myData$date >= "2017-1-1" & myData$date < "2017-7-1")] <- 1
For both the codes I am getting a similar error
Error in $<-.data.frame(*tmp*, date_cat, value = numeric(0)) :
replacement has 0 rows, data has 1123
Please help me with understanding where I am going wrong.
Thanks,
Priya
Here's a function (to.interval) that returns a time interval {0, 1, 2, 3, ...}, given parameters of the event date, index date, and interval width. Probably a good idea to include error checking in the function, so if for example the event date is prior to the anchor date, it returns NA.
df <- data.frame(event.date=as.Date(c("2017-01-01", "2017-08-01", "2018-04-30")))
to.interval <- function(anchor.date, future.date, interval.days){
round(as.integer(future.date - anchor.date) / interval.days, 0)}
df$interval <- to.interval(as.Date('2017-01-01'),
df$event.date, 180 )
df
Output
event.date interval
1 2017-01-01 0
2 2017-08-01 1
3 2018-04-30 3

R How to Split given Time Periods in interval of 30 days in R

I have data with Order Id, Start Date & End Date. I have to split both the Start and End dates into intervals of 30 days, and derive two new variables “split start date” and “split end date”.
Example: The below example illustrates how split dates are created when the Start Date is “01/05/2017” and the End Date is “06/07/2017”
Suppose, an order have start and end dates as below
see the image for example
What is the code for this problem in R ?
Here is a solution which should generalize to multiple order id's. I have created a sample data with two order id's. The basic idea is to calculate the number of intervals between start_date and end_date. Then we repeat the row for each order id by the number of intervals, and also create a sequence to determine which interval we are in. This is the purpose of creating functions f and g and the use of Map.
The remaining is just vector manipulations where we define split_start_date and split_end_date. The last statement is to ensure that split_end_date does not exceed end_date.
df <- data.frame(
order_id = c(1, 2),
start_date = c(as.Date("2017-05-01"), as.Date("2017-08-01")),
end_date = c(as.Date("2017-07-06"), as.Date("2017-09-15"))
)
df$diff_days <- as.integer(df$end_date - df$start_date)
df$num_int <- ceiling(df$diff_days / 30)
f <- function(rowindex) {
rep(rowindex, each = df[rowindex, "num_int"])
}
g <- function(rowindex) {
1:df[rowindex, "num_int"]
}
rowindex_rep <- unlist(Map(f, 1:nrow(df)))
df2 <- df[rowindex_rep, ]
df2$seq <- unlist(Map(g, 1:nrow(df)))
df3 <- df2
df3$split_start_date <- df3$start_date + (df3$seq - 1) * 30
df3$split_end_date <- df3$split_start_date + 29
df3[which(df3$seq == df3$num_int), ]$split_end_date <-
df3[which(df3$seq == df3$num_int), ]$end_date

R how to avoid a loop. Counting weekends between two dates in a row for each row in a dataframe

I have two columns of dates. Two example dates are:
Date1= "2015-07-17"
Date2="2015-07-25"
I am trying to count the number of Saturdays and Sundays between the two dates each of which are in their own column (5 & 7 in this example code). I need to repeat this process for each row of my dataframe. The end results will be one column that represents the number of Saturdays and Sundays within the date range defined by two date columns.
I can get the code to work for one row:
sum(weekdays(seq(Date1[1,5],Date2[1,7],"days")) %in% c("Saturday",'Sunday')*1))
The answer to this will be 3. But, if I take out the "1" in the row position of date1 and date2 I get this error:
Error in seq.Date(Date1[, 5], Date2[, 7], "days") :
'from' must be of length 1
How do I go line by line and have one vector that lists the number of Saturdays and Sundays between the two dates in column 5 and 7 without using a loop? Another issue is that I have 2 million rows and am looking for something with a little more speed than a loop.
Thank you!!
map2* functions from the purrr package will be a good way to go. They take two vector inputs (eg two date columns) and apply a function in parallel. They're pretty fast too (eg previous post)!
Here's an example. Note that the _int requests an integer vector back.
library(purrr)
# Example data
d <- data.frame(
Date1 = as.Date(c("2015-07-17", "2015-07-28", "2015-08-15")),
Date2 = as.Date(c("2015-07-25", "2015-08-14", "2015-08-20"))
)
# Wrapper function to compute number of weekend days between dates
n_weekend_days <- function(date_1, date_2) {
sum(weekdays(seq(date_1, date_2, "days")) %in% c("Saturday",'Sunday'))
}
# Iterate row wise
map2_int(d$Date1, d$Date2, n_weekend_days)
#> [1] 3 4 2
If you want to add the results back to your original data frame, mutate() from the dplyr package can help:
library(dplyr)
d <- mutate(d, end_days = map2_int(Date1, Date2, n_weekend_days))
d
#> Date1 Date2 end_days
#> 1 2015-07-17 2015-07-25 3
#> 2 2015-07-28 2015-08-14 4
#> 3 2015-08-15 2015-08-20 2
Here is a solution that uses dplyr to clean things up. It's not too difficult to use with to assign the columns in the dataframe directly.
Essentially, use a reference date, calculate the number of full weeks (by floor or ceiling). Then take the difference between the two. The code does not include cases in which the start date or end data fall on Saturday or Sunday.
# weekdays(as.Date(0,"1970-01-01")) -> "Friday"
require(dplyr)
startDate = as.Date(0,"1970-01-01") # this is a friday
df <- data.frame(start = "2015-07-17", end = "2015-07-25")
df$start <- as.Date(df$start,"", format = "%Y-%m-%d", origin="1970-01-01")
df$end <- as.Date(df$end, format = "%Y-%m-%d","1970-01-01")
# you can use with to define the columns directly instead of %>%
df <- df %>%
mutate(originDate = startDate) %>%
mutate(startDayDiff = as.numeric(start-originDate), endDayDiff = as.numeric(end-originDate)) %>%
mutate(startWeekDiff = floor(startDayDiff/7),endWeekDiff = floor(endDayDiff/7)) %>%
mutate(NumSatsStart = startWeekDiff + ifelse(startDayDiff %% 7>=1,1,0),
NumSunsStart = startWeekDiff + ifelse(startDayDiff %% 7>=2,1,0),
NumSatsEnd = endWeekDiff + ifelse(endDayDiff %% 7 >= 1,1,0),
NumSunsEnd = endWeekDiff + ifelse(endDayDiff %% 7 >= 2,1,0)
) %>%
mutate(NumSats = NumSatsEnd - NumSatsStart, NumSuns = NumSunsEnd - NumSunsStart)
Dates are number of days since 1970-01-01, a Thursday.
So the following is the number of Saturdays or Sundays since that date
f <- function(d) {d <- as.numeric(d); r <- d %% 7; 2*(d %/% 7) + (r>=2) + (r>=3)}
For the number of Saturdays or Sundays between two dates, just subtract, after decrementing the start date to have an inclusive count.
g <- function(d1, d2) f(d2) - f(d1-1)
These are all vectorized functions so you can just call directly on the columns.
# Example data, as in Simon Jackson's answer
d <- data.frame(
Date1 = as.Date(c("2015-07-17", "2015-07-28", "2015-08-15")),
Date2 = as.Date(c("2015-07-25", "2015-08-14", "2015-08-20"))
)
As follows
within(d, end_days<-g(Date1,Date2))
# Date1 Date2 end_days
# 1 2015-07-17 2015-07-25 3
# 2 2015-07-28 2015-08-14 4
# 3 2015-08-15 2015-08-20 2

Count the number of Fridays or Mondays in Month in R

I would like a function that counts the number of specific days per month..
i.e.. Nov '13 -> 5 fridays.. while Dec'13 would return 4 Fridays..
Is there an elegant function that would return this?
library(lubridate)
num_days <- function(date){
x <- as.Date(date)
start = floor_date(x, "month")
count = days_in_month(x)
d = wday(start)
sol = ifelse(d > 4, 5, 4) #estimate that is the first day of the month is after Thu or Fri then the week will have 5 Fridays
sol
}
num_days("2013-08-01")
num_days(today())
What would be a better way to do this?
1) Here d is the input, a Date class object, e.g. d <- Sys.Date(). The result gives the number of Fridays in the year/month that contains d. Replace 5 with 1 to get the number of Mondays:
first <- as.Date(cut(d, "month"))
last <- as.Date(cut(first + 31, "month")) - 1
sum(format(seq(first, last, "day"), "%w") == 5)
2) Alternately replace the last line with the following line. Here, the first term is the number of Fridays from the Epoch to the next Friday on or after the first of the next month and the second term is the number of Fridays from the Epoch to the next Friday on or after the first of d's month. Again, we replace all 5's with 1's to get the count of Mondays.
ceiling(as.numeric(last + 1 - 5 + 4) / 7) - ceiling(as.numeric(first - 5 + 4) / 7)
The second solution is slightly longer (although it has the same number of lines) but it has the advantage of being vectorized, i.e. d could be a vector of dates.
UPDATE: Added second solution.
There are a number of ways to do it. Here is one:
countFridays <- function(y, m) {
fr <- as.Date(paste(y, m, "01", sep="-"))
to <- fr + 31
dt <- seq(fr, to, by="1 day")
df <- data.frame(date=dt, mon=as.POSIXlt(dt)$mon, wday=as.POSIXlt(dt)$wday)
df <- subset(df, df$wday==5 & df$mon==df[1,"mon"])
return(nrow(df))
}
It creates the first of the months, and a day in the next months.
It then creates a data frame of month index (on a 0 to 11 range, but we only use this for comparison) and weekday.
We then subset to a) be in the same month and b) on a Friday. That is your result set, and
we return the number of rows as your anwser.
Note that this only uses base R code.
Without using lubridate -
#arguments to pass to function:
whichweekday <- 5
whichmonth <- 11
whichyear <- 2013
#function code:
firstday <- as.Date(paste('01',whichmonth,whichyear,sep="-"),'%d-%m-%Y')
lastday <- if(whichmonth == 12) { '31-12-2013' } else {seq(as.Date(firstday,'%d-%m-%Y'), length=2, by="1 month")[2]-1}
sum(
strftime(
seq.Date(
from = firstday,
to = lastday,
by = "day"),
'%w'
) == whichweekday)

Problems adding a month to X using POSIXlt in R - need to reset value using as.Date(X)

This works for me in R:
# Setting up the first inner while-loop controller, the start of the next water year
NextH2OYear <- as.POSIXlt(firstDate)
NextH2OYear$year <- NextH2OYear$year + 1
NextH2OYear<-as.Date(NextH2OYear)
But this doesn't:
# Setting up the first inner while-loop controller, the start of the next water month
NextH2OMonth <- as.POSIXlt(firstDate)
NextH2OMonth$mon <- NextH2OMonth$mon + 1
NextH2OMonth <- as.Date(NextH2OMonth)
I get this error:
Error in as.Date.POSIXlt(NextH2OMonth) :
zero length component in non-empty POSIXlt structure
Any ideas why? I need to systematically add one year (for one loop) and one month (for another loop) and am comparing the resulting changed variables to values with a class of Date, which is why they are being converted back using as.Date().
Thanks,
Tom
Edit:
Below is the entire section of code. I am using RStudio (version 0.97.306). The code below represents a function that is passed an array of two columns (Date (CLass=Date) and Discharge Data (Class=Numeric) that are used to calculate the monthly averages. So, firstDate and lastDate are class Date and determined from the passed array. This code is adapted from successful code that calculates the yearly averages - there maybe one or two things I still need to change over, but I am prevented from error checking later parts due to the early errors I get in my use of POSIXlt. Here is the code:
MonthlyAvgDischarge<-function(values){
#determining the number of values - i.e. the number of rows
dataCount <- nrow(values)
# Determining first and last dates
firstDate <- (values[1,1])
lastDate <- (values[dataCount,1])
# Setting up vectors for results
WaterMonths <- numeric(0)
class(WaterMonths) <- "Date"
numDays <- numeric(0)
MonthlyAvg <- numeric(0)
# while loop variables
loopDate1 <- firstDate
loopDate2 <- firstDate
# Setting up the first inner while-loop controller, the start of the next water month
NextH2OMonth <- as.POSIXlt(firstDate)
NextH2OMonth$mon <- NextH2OMonth$mon + 1
NextH2OMonth <- as.Date(NextH2OMonth)
# Variables used in the loops
dayCounter <- 0
dischargeTotal <- 0
dischargeCounter <- 1
resultsCounter <- 1
loopCounter <- 0
skipcount <- 0
# Outer while-loop, controls the progression from one year to another
while(loopDate1 <= lastDate)
{
# Inner while-loop controls adding up the discharge for each water year
# and keeps track of day count
while(loopDate2 < NextH2OMonth)
{
if(is.na(values[resultsCounter,2]))
{
# Skip this date
loopDate2 <- loopDate2 + 1
# Skip this value
resultsCounter <- resultsCounter + 1
#Skipped counter
skipcount<-skipcount+1
} else{
# Adding up discharge
dischargeTotal <- dischargeTotal + values[resultsCounter,2]
}
# Adding a day
loopDate2 <- loopDate2 + 1
#Keeping track of days
dayCounter <- dayCounter + 1
# Keeping track of Dicharge position
resultsCounter <- resultsCounter + 1
}
# Adding the results/water years/number of days into the vectors
WaterMonths <- c(WaterMonths, as.Date(loopDate2, format="%mm/%Y"))
numDays <- c(numDays, dayCounter)
MonthlyAvg <- c(MonthlyAvg, round((dischargeTotal/dayCounter), digits=0))
# Resetting the left hand side variables of the while-loops
loopDate1 <- NextH2OMonth
loopDate2 <- NextH2OMonth
# Resetting the right hand side variable of the inner while-loop
# moving it one year forward in time to the next water year
NextH2OMonth <- as.POSIXlt(NextH2OMonth)
NextH2OMonth$year <- NextH2OMonth$Month + 1
NextH2OMonth<-as.Date(NextH2OMonth)
# Resettting vraiables that need to be reset
dayCounter <- 0
dischargeTotal <- 0
loopCounter <- loopCounter + 1
}
WaterMonths <- format(WaterMonthss, format="%mm/%Y")
# Uncomment the line below and return AvgAnnualDailyAvg if you want the water years also
# AvgAnnDailyAvg <- data.frame(WaterYears, numDays, YearlyDailyAvg)
return((MonthlyAvg))
}
Same error occurs in regular R. When doing it line by line, its not a problem, when running it as a script, it it.
Plain R
seq(Sys.Date(), length = 2, by = "month")[2]
seq(Sys.Date(), length = 2, by = "year")[2]
Note that this works with POSIXlt too, e.g.
seq(as.POSIXlt(Sys.Date()), length = 2, by = "month")[2]
mondate.
library(mondate)
now <- mondate(Sys.Date())
now + 1 # date in one month
now + 12 # date in 12 months
Mondate is bit smarter about things like mondate("2013-01-31")+ 1 which gives last day of February whereas seq(as.Date("2013-01-31"), length = 2, by = "month")[2] gives March 3rd.
yearmon If you don't really need the day part then yearmon may be preferable:
library(zoo)
now.ym <- yearmon(Sys.Date())
now.ym + 1/12 # add one month
now.ym + 1 # add one year
ADDED comment on POSIXlt and section on yearmon.
Here is you can add 1 month to a date in R, using package lubridate:
library(lubridate)
x <- as.POSIXlt("2010-01-31 01:00:00")
month(x) <- month(x) + 1
>x
[1] "2010-03-03 01:00:00 PST"
(note that it processed the addition correctly, as 31st of Feb doesn't exist).
Can you perhaps provide a reproducible example? What's in firstDate, and what version of R are you using? I do this kind of manipulation of POSIXlt dates quite often and it seems to work:
Sys.Date()
# [1] "2013-02-13"
date = as.POSIXlt(Sys.Date())
date$mon = date$mon + 1
as.Date(date)
# [1] "2013-03-13"

Resources