how to generate speific periods during serval days in R - r

I want to generate the same period during serval days, e.g. from 09:30:00 to 16:00:00 every day, and I know that
dates<- seq(as.POSIXct("2000-01-01 9:00",tz='UTC'), as.POSIXct("2000-04-9 16:00",tz='UTC'), by=300)
can help me obtain the time series observed every 5 minutes during 24 hours in 100 days. But what I want is the 09:30:00 to 16:00:00 over 100 days.
Thanks in advance

Here is one way. We can create a date sequence for every day, and then create sub-list with each day for the five minute interval. Finally, we can combine this list. final_seq is the final output.
date_seq <- seq(as.Date("2000-01-01"), as.Date("2000-04-09"), by = 1)
hour_seq <- lapply(date_seq, function(x){
temp_date <- as.character(x)
temp_seq <- seq(as.POSIXct(paste(temp_date, "09:30"), tz = "UTC"),
as.POSIXct(paste(temp_date, "16:00"), tz = "UTC"),
by = 300)
})
final_seq <- do.call("c", hour_seq)

An option using tidyr::crossing() (which I love) and the lubridate package:
crossing(c1 = paste(dmy("01/01/2000") + seq(1:100), "09:30"),
c2 = seq(0, 390, 5)) %>%
mutate(time_series = ymd_hm(c1) + minutes(c2)) %>%
pull(time_series)

Related

Excel days since 1899-12-30 from POSIX in R

I have a date in POSIX in R and need to supply a data frame with it in Excel days since decimal format - which is days since 1899-12-30 and hour as a fraction of 24.
I can only find examples of the other way around.
Have had a go below, but a bit long winded and am sure there must be an existing function in lubridate or openxlsx?
library(lubridate)
start_date <- "1899-12-31"
end_date <- "2022-11-15 07:00:00"
d8_df <- as.character(seq(
from=as.POSIXct(start_date, tz="UTC"),
to=as.POSIXct(end_date, tz="UTC"),
by="day"
))
day_decimal <- NROW(d8_df)+hour(ymd_hms(end_date))/24
If you need days its seq(0, 44883). In Excel 0 is 1900-01-01 and 44883 is 2022-11-18. The current day is simply days since 1900.
library(openxlsx2)
dat <- data.frame(days_since_31Dec1899 = c(0, 44883))
wb <- wb_workbook() %>%
wb_add_worksheet() %>% wb_add_data(x = dat) %>%
wb_add_numfmt(dims = "A2:A3", numfmt = 22) %>%
wb_open()

Create n different dates in consecutive months from a starting year-month

I have a starting time specified as a year-month character, e.g. "2020-12". From the start, for each of T consecutive months, I need to generate n different dates (year-month-day), where the day is random.
Any help will be useful!
The data I'm working on:
data <- data.frame(
data = sample(seq(as.Date('2000/01/01'), as.Date('2020/01/01'), by="day"), 500),
price = round(runif(500, min = 10, max = 20),2),
quantity = round(rnorm(500,30),0)
)
func <- function(start, months, n) {
startdate <- as.Date(paste0(start, "-01"))
enddate <- seq(startdate, by = "month", length.out = months)
months <- seq_len(months)
enddate_lt <- as.POSIXlt(enddate)
enddate_lt$mon <- enddate_lt$mon + 1
enddate_lt$mday <- enddate_lt$mday - 1
days_per_month <- as.integer(format(enddate_lt, format = "%d"))
days <- lapply(days_per_month, sample, size = n)
dates <- Map(`+`, enddate, days)
do.call(c, dates)
}
set.seed(2021)
func("2020-12", 4, 3)
# [1] "2020-12-08" "2020-12-07" "2020-12-15" "2021-01-27" "2021-01-08" "2021-01-13" "2021-02-21" "2021-02-07" "2021-02-28"
# [10] "2021-03-28" "2021-03-07" "2021-03-15"
func("2020-12", 5, 2)
# [1] "2020-12-06" "2020-12-16" "2021-01-08" "2021-01-10" "2021-02-24" "2021-02-13" "2021-03-20" "2021-03-29" "2021-04-19"
# [10] "2021-04-28"
func("2020-12", 2, 10)
# [1] "2020-12-29" "2020-12-30" "2020-12-04" "2020-12-15" "2020-12-09" "2020-12-27" "2020-12-05" "2020-12-06" "2020-12-23"
# [10] "2020-12-17" "2021-01-03" "2021-01-20" "2021-01-05" "2021-01-22" "2021-01-23" "2021-01-06" "2021-01-10" "2021-01-07"
# [19] "2021-01-19" "2021-01-12"
Most of the dancing with POSIXlt objects is because it gives us clean (base R) access to the number of days in a month, which makes sampleing the days in a month rather simple. It can also be done (code-golf shorter) using the lubridate package, but I don't know that that is any more correct than this code is.
This just dumps out a sequence of random dates, with n days per month. It does not sort within each month, though it does output the months in order. (That's not a difficult extension, there just wasn't a requirement for it.) It doesn't put out a frame, you can easily extend this to fit in a frame or call data.frame(date = do.call(c, dates)) on the last line, depending on what you need to do with the output.
You could convert the start time to a class for monthly data, zoo::yearmon. Then use as.Date.yearmon and its frac argument ("a number between 0 and 1 inclusive that indicates the fraction of the way through the period that the result represents") with random values from runif (uniform between 0 and 1) to convert to a random date within each year-month.
start = "2020-12"
T = 3
n = 2
library(zoo)
set.seed(1)
as.Date(as.yearmon(start) + rep((1:T)/12, each = n), frac = runif(T * n))
# [1] "2021-01-08" "2021-01-12" "2021-02-16" "2021-02-25" "2021-03-07" "2021-03-27"

Match events with weather data

I have a list with event and times. So a little like this df:
event <- c("x", "y")
date <- c("12-12-2014", "13-12-2014")
time <- c("11:00", "14:00")
df_event <- data.frame(event, date, time)
What I would like to do now is match these events with weather data. Thing is however that the timestamps from the weather I have do not match the event dates. They are like:
date <- c("12-12-2014", "12-12-2015")
time <- c("12:00", "14:00")
degrees <- c(12, 13)
df_weather <- data.frame(date,time, degrees)
Does anybody have suggestions on how I can easily match the so I can the weather data that is closest to the event?
Looks like a duplicate of this question. Adapting one of those answers for you:
#First, convert your date+time into POSIXct so that we have an index to search
df_event$date2 <- as.POSIXct(strptime(paste(df_event$date, df_event$time),
format = "%d-%m-%Y %H:%M"))
df_weather$datePXct <- as.POSIXct(strptime(paste(df_weather$date, df_weather$time),
format = "%d-%m-%Y %H:%M"))
#Find variables in df_weather that match timestamp in df_event
df_event <- cbind(df_event, event.degrees = df_weather[ unlist(sapply((df_event$date2),
function(x) which.min(abs(x - df_weather$datePXct))) ), c("degrees")])
df_event
# event date time date2 event.degrees
#1 x 12-12-2014 11:00 2014-12-12 11:00:00 12
#2 y 13-12-2014 14:00 2014-12-13 14:00:00 12

Create vector of non-weekend time intervals for part of a day in R

I have a raw dataset of observations taken at 5 minute intervals between 6am and 9pm during weekdays only. These do not come with date-time information for plotting etc so I am attempting to create a vector of date-times to add to this to my data. ie this:
X425 X432 X448
1 0.07994814 0.1513559 0.1293103
2 0.08102852 0.1436480 0.1259074
to this
X425 X432 X448
2010-05-24 06:00 0.07994814 0.1513559 0.1293103
2010-05-24 06:05 0.08102852 0.1436480 0.1259074
I have gone about this as follows:
# using lubridate and xts
library(xts)
library(lubridate)
# sequence of 5 min intervals from 06:00 to 21:00
sttime <- hms("06:00:00")
intervals <- sttime + c(0:180) * minutes(5)
# sequence of days from 2010-05-24 to 2010-11-05
dayseq <- timeBasedSeq("2010-05-24/2010-11-05/d")
# add intervals to dayseq
dayPlusTime <- function(days, times) {
dd <- NULL
for (i in 1:2) {
dd <- c(dd,(days[i] + times))}
return(dd)
}
obstime <- dayPlusTime(dayseq, intervals)`
But obstime is coming out as a list. days[1] + times works so I guess it's something to do with the way the POSIXct objects are concatenated together to make dd but i can't figure out what am I doing wrong otr where to go next.
Any help appreciated
A base alternative:
# create some dummy dates
dates <- Sys.Date() + 0:14
# select non-weekend days
wd <- dates[as.integer(format(dates, format = "%u")) %in% 1:5]
# create times from 06:00 to 21:00 by 5 min interval
times <- format(seq(from = as.POSIXct("2015-02-18 06:00"),
to = as.POSIXct("2015-02-18 21:00"),
by = "5 min"),
format = "%H:%M")
# create all date-time combinations, paste, convert to as.POSIXct and sort
wd_times <- sort(as.POSIXct(do.call(paste, expand.grid(wd, times))))
One of the issues is that your interval vector does not change the hour when the minutes go over 60.
Here is one way you could do this:
#create the interval vector
intervals<-c()
for(p in 6:20){
for(j in seq(0,55,by=5)){
intervals<-c(intervals,paste(p,j,sep=":"))
}
}
intervals<-c(intervals,"21:0")
#get the days
dayseq <- timeBasedSeq("2010-05-24/2010-11-05/d")
#concatenate everything and format to POSIXct at the end
obstime<-strptime(unlist(lapply(dayseq,function(x){paste(x,intervals)})),format="%Y-%m-%d %H:%M", tz="GMT")

Subset xts object by time of day

A simple question: I know how to subset time series in xts for years, months and days from the help: x['2000-05/2001'] and so on.
But how can I subset my data by hours of the day? I would like to get all data between 07:00 am and 06:00 pm. I.e., I want to extract the data during business time - irrelevant of the day (I take care for weekends later on). Help has an example of the form:
.parseISO8601('T08:30/T15:00')
But this does not work in my case. Does anybody have a clue?
If your xts object is called x then something like y <- x["T09:30/T11:00"] works for me to get a slice of the morning session, for example.
For some reason to cut xts time of day using x["T09:30/T11:00"] is pretty slow, I use the method from R: Efficiently subsetting dataframe based on time of day and data.table time subset vs xts time subset to make a faster function with similar syntax:
cut_time_of_day <- function(x, t_str_begin, t_str_end){
tstr_to_sec <- function(t_str){
#"09:00:00" to sec of day
as.numeric(as.POSIXct(paste("1970-01-01", t_str), "UTC")) %% (24*60*60)
}
#POSIX ignores leap second
#sec_of_day = as.numeric(index(x)) %% (24*60*60) #GMT only
sec_of_day = {lt = as.POSIXlt(index(x)); lt$hour *60*60 + lt$min*60 + lt$sec} #handle tzone
sec_begin = tstr_to_sec(t_str_begin)
sec_end = tstr_to_sec(t_str_end)
return(x[ sec_of_day >= sec_begin & sec_of_day <= sec_end,])
}
Test:
n = 100000
dtime <- seq(ISOdate(2001,1,1), by = 60*60, length.out = n)
attributes(dtime)$tzone <- "CET"
x = xts((1:n), order.by = dtime)
y2 <- cut_time_of_day(x,"07:00:00", "09:00:00")
y1 <- x["T07:00:00/T09:00:00"]
identical(y1,y2)

Resources