Time between Vector dates in R - r

Given a vector of dates, V.dates, write a function that determines the time, in days, from present day for each element. Next determine the quarter, as defined by 91 day segments, from present day in reverse chronological order. Define quarter '0' to be the time between present day and (present day - 91), quarter '1' to be (present day - 91) to (present day - 182), etc. Lastly, return a data frame that contains the original date, the duration from present day, and the quarter to which the date belongs. Keep in mind that dates may be before or after present day. For example, assuming present day is '10/27/2010' and an input date of '6/20/2009', the function should return that the input date is 494 days from present day and belongs in quarter 5.
I have currently worked out the time between the two vectors using:
V.dates <- as.Date(c("27-10-2010","20-6-2009"),format="%d-%m-%Y")
difftime(V.dates[1],V.dates[2],units="days")
I am lost on how to determine the quarter.

You can use gl to get the quarter
v2 <- as.vector(difftime(V.dates[1],V.dates[2],units="days"))
max(as.numeric(gl(v2, 91, v2))-1)
#[1] 5
Or
v2 %/% 91
#[1] 5

Related

Convert from character to date in a "YYYY-WW" format in R

I have a hard time converting character to date in R.
I have a file where the dates are given as "2014-01", where the first is the year and the second is the week of the year. I want to convert this to a date type.
I have tried the following
z <- as.Date('2014-01', '%Y-%W')
print(z)
Output: "2014-12-05"
Which is not what I desire. I want to get the same format out, ie. the output should be "2014-01" but now as a date type.
It sounds like you are dealing with some version of year week, which exists in three forms in lubridate:
week() returns the number of complete seven day periods that have
occurred between the date and January 1st, plus one.
isoweek() returns the week as it would appear in the ISO 8601 system,
which uses a reoccurring leap week.
epiweek() is the US CDC version of epidemiological week. It follows
same rules as isoweek() but starts on Sunday. In other parts of the
world the convention is to start epidemiological weeks on Monday,
which is the same as isoweek.
Lubridate has functions to extract these from a date, but I don't know of a built-in way to go the other direction, from week to one representative day (out of 7 possible). One simple way if you're dealing with the first version would be to add 7 * (Week - 1) to jan 1 of the year.
library(dplyr)
data.frame(yearweek = c('2014-01', '2014-03')) %>%
tidyr::separate(yearweek, c("Year", "Week"), convert = TRUE) %>%
mutate(Date = as.Date(paste0(Year, "-01-01")) + 7 * (Week-1))
Year Week Date
1 2014 1 2014-01-01
2 2014 3 2014-01-15

Year-month-week expression

I have a data written in specific expression. To simplify the data, here is the example I made:
df<-data.frame(date=c(2012034,2012044,2012051,2012063,2012074),
math=c(100,100,23,46,78))
2012034 means 4th week of march,2012. Likewise 2012044 means 4th week of April,2012. I was trying to make the values of date expressing some order. The reason why I have to do this is because when I don't change them to time expressions, x axis of the scatter plot looks really weird.
My goal is this:
Find the oldest date in date column and name it as 1. In this case, 2012034 should be 1. Next, find the second oldest date in date column and calculate how many weeks passed after that date. The second oldest date in date is 2012044.So, 5 weeks after the oldest date 2012034. So it should be changed as 1+5=6. So, likewise, I want to number the date to indicate how many weeks have passed since the oldest date
One way to do it is by also specifying the day of the week and subtract it at the end, i.e.
as.Date(paste0(df$date, '-1'), '%Y%m%U-%u') - 1
#[1] "2012-03-22" "2012-04-22" "2012-05-01" "2012-06-15" "2012-07-22"

How to create intervals of 1 hour

How to create for every date hourly timestamps?
So for example from 00:00 til 23:59. The result of the function could be 10:00. I read on the internet that loop could work but we couldn't make it fit.
Data sample:
df = data.frame( id = c(1, 2, 3, 4), Date = c(2021-04-18, 2021-04-19, 2021-04-21
07:07:08.000, 2021-04-22))
A few points:
The input shown in the question is not valid R syntax so we assume what we have is the data frame shown reproducibly in the Note at the end.
the question did not describe the specific output desired so we will assume that what is wanted is a POSIXct vector of hourly values which in (1) below we assume is from the first hour of the minimum date to the last hour of the maximum date in the current time zone or in (2) below we assume that we only want hourly sequences for the dates in df also in the current time zone.
we assume that any times in the input should be dropped.
we assume that the id column of the input should be ignored.
No packages are used.
1) This calculates hour 0 of the first date and hour 0 of the day after the last date giving rng. The as.Date takes the Date part, range extracts out the smallest and largest dates into a vector of two components, adding 0:1 adds 0 to the first date leaving it as is and 1 to the second date converting it to the date after the last date. The format ensures that the Dates are converted to POSIXct in the current time zone rather than UTC. Then it creates an hourly sequence from those and uses head to drop the last value since it would be the day after the input's last date.
rng <- as.POSIXct(format(range(as.Date(df$Date)) + 0:1))
head(seq(rng[1], rng[2], "hour"), -1)
2) Another possibility is to paste together each date with each hour from 0 to 23 and then convert that to POSIXct. This will give the same result if the input dates are sequential; otherwise, it will give the hours only for those dates provided.
with(expand.grid(Date = as.Date(df$Date), hour = paste0(0:23, ":00:00")),
sort(as.POSIXct(paste(Date, hour))))
Note
df <- data.frame( id = c(1, 2, 3, 4),
Date = c("2021-04-18", "2021-04-19", "2021-04-21 07:07:08.000", "2021-04-22"))

How to subtract a number of weeks from a yearweek/weeknumber in R?

I have a couples of weeknumbers of interest. Lets take '202124' (this week) as an example. How can I subtract x weeks from this week number?
Lets say I want to know the week number of 2 weeks prior, ideally I would like to do 202124 - 2 which would give me 202122. This is fine for most of the year however 202101 - 2 will give 202099 which is obviously not a valid week number. This would happen on a large scale so a more elegant solution is required. How could I go about this?
convert the year week values to dates subtract in days and format the output.
x <- c('202124', '202101')
format(as.Date(paste0(x, 1), '%Y%W%u') - 14, '%Y%V')
#[1] "202122" "202052"
To convert year week value to date we also need day of the week, I have used it as 1st day of the week.

R: How to lag xts column by one day of the set

Imagine an intra-day set of data, e.g. hourly intervals. Thanks to Google and valuable Joshua's answers to other people, I managed to create new columns in the xts object carrying DAILY Open/High/Low/Close values. These are daily values applied on intra-day intervals so all rows of the same day have the same value in particular column. Since the HLC values are look-ahead biased, I want to move them to the next day. Let's focus on just one column called Prev.Day.Close.
Actual status:
My Prev.Day.Close column caries proper values for the current day. All "2010-01-01 ??:??" rows have the same value - Close of 2010-01-01 trading session. So it is not PREVIOUS day at the moment how the column name says.
What I need:
Lag the Prev.Day.Close column to the NEXT DAY OF THE SET.
I cannot lag it using lag() because it works on row (not day) basis. It must not be fixed calendar day like:
C <- ave(x$Close, .indexday(x), FUN = last)
index(C) <- index(C) + 86400
x$Prev.Day.Close <- C
Because this solution does not care about real data in the set. For example it adds new rows because the original data set has holes on weekends and holidays. Moreover, two particular days may not have the same number of intervals (rows) so the shifted data will not fit.
Desired result:
All rows of the first day in the set have NA in Prev.Day.Close because there is no previous day to get data from.
All rows of the second day have the same value in Prev.Day.Close - Any of the values I actually have in Prev.Day.Close of previous day.
The same for every next row.
If I understand correctly, here's one way to do it:
require(xts)
# sample data
dt <- .POSIXct(seq(1, 86400*4, 3600), tz="UTC")-1
x <- xts(seq_along(dt), dt)
# get the last value for each calendar day
daily.last <- apply.daily(x, last)
# merge the last value of the day with the origianl data set
y <- merge(x, daily.last)
# now lag the last value of the day and carry the NA forward
# y$daily.last <- na.locf(lag(y$daily.last))
y$daily.last <- lag(y$daily.last)
y$daily.last <- na.locf(y$daily.last)
Basically, you want to get the end of day values, merge them with the original data, then lag them. That will align the previous end of day values with the beginning of the day.

Resources