Consider this simple example
bogus <- function(start_time, end_time){
print(paste('hey this starts on', start_time, 'until', end_time))
}
start_time <- ymd('2018-01-01')
end_time <- ymd('2018-05-01')
> bogus(start_time, end_time)
[1] "hey this starts on 2018-01-01 until 2018-05-01"
Unfortunately, doing so with a long time range does not work with my real-life bogus function, so I need to break my original time range into monthly pieces.
In other words the first call would be bogus(ymd('2018-01-01'), ymd('2018-01-31')), the second one bogus(ymd('2018-02-01'), ymd('2018-02-28')), etc.
Is there a simple way to do using purrr and lubridate?
Thanks
Are you looking for something like:
library(lubridate)
seq_dates <- seq(start_time, end_time - 1, by = "month")
lapply(seq_dates, function(x) print(paste('hey this starts on', x, 'until', ceiling_date(x, unit = "month") - 1)))
You could also do a short bogus function like:
bogus <- function(start_var, end_var) {
require(lubridate)
seq_dates <- seq(as.Date(start_var), as.Date(end_var) - 1, by = "month")
printed_statement <- lapply(seq_dates, function(x) paste('hey this starts on', x, 'until', ceiling_date(x, unit = "month") - 1))
for (i in printed_statement) { print(i) }
}
And call it like:
bogus("2018-01-01", "2018-05-01")
Output:
[1] "hey this starts on 2018-01-01 until 2018-01-31"
[1] "hey this starts on 2018-02-01 until 2018-02-28"
[1] "hey this starts on 2018-03-01 until 2018-03-31"
[1] "hey this starts on 2018-04-01 until 2018-04-30"
This way you can just give minimum start and maximum end date and get everything in-between.
With base:
seqdate<-seq.Date(start_time,end_time,by="1 month")
dateranges<-data.frame(start.dates=seqdate[1:length(seqdate)-1],
end.dates=seqdate[2:length(seqdate)]-1)
start.dates end.dates
1 2018-01-01 2018-01-31
2 2018-02-01 2018-02-28
3 2018-03-01 2018-03-31
4 2018-04-01 2018-04-30
Related
I'm having trouble converting character values into date (hour + minutes), I have the following codes:
start <- c("2022-01-10 9:35PM","2022-01-10 10:35PM")
end <- c("2022-01-11 7:00AM","2022-01-11 8:00AM")
dat <- data.frame(start,end)
These are all in character form. I would like to:
Convert all the datetimes into date format and into 24hr format like: "2022-01-10 9:35PM" into "2022-01-10 21:35",
and "2022-01-11 7:00AM" into "2022-01-11 7:00" because I would like to calculate the difference between the dates in hrs.
Also I would like to add an ID column with a specific ID, the desired data would like this:
ID <- c(101,101)
start <- c("2022-01-10 21:35","2022-01-10 22:35")
end <- c("2022-01-11 7:00","2022-01-11 8:00")
diff <- c(9,10) # I'm not sure how the calculations would turn out to be
dat <- data.frame(ID,start,end,diff)
I would appreciate all the help there is! Thanks!!!
You can use lubridate::ymd_hm. Don't use floor if you want the exact value.
library(dplyr)
library(lubridate)
dat %>%
mutate(ID = 101,
across(c(start, end), ymd_hm),
diff = floor(end - start))
start end ID diff
1 2022-01-10 21:35:00 2022-01-11 07:00:00 101 9 hours
2 2022-01-10 22:35:00 2022-01-11 08:00:00 101 9 hours
The base R approach with strptime is:
strptime(dat$start, "%Y-%m-%d %H:%M %p")
[1] "2022-01-10 09:35:00 CET" "2022-01-10 10:35:00 CET"
I have a data Y . Y has a column time .
time column looks like this:
For example, 20211201000010 means 2021-12-01 00:00:10 .
time <- strptime(Y$time, format = "%Y%m%d%H%M%S")
start_time <- min(time)
In this code, start_time is 2021-12-01 00:00:02.
But I want to round up the start_timeas 2021-12-01 00:00:10,since the start_time should be 10 seconds interval for my data.
How can I round up 2021-12-01 00:00:02 as 2021-12-01 00:00:10 ?
lubridate package is always our friends for datetime work.
library(lubridate)
xx1 <- '20211201010002'
ymd_hms(xx1) %>%
ceiling_date(unit = '10s')
[1] "2021-12-01 01:00:10 UTC"
You may need to calculate the remainder (divide by 10) before you format the data.
e.g.
20211201000002 %% 10 = 2;
20211201000010 %% 10 = 0
Then you find the first 0 in your list.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I need to perform walk forward optimization on a time series. The attached image shows a diagram of how this should be done. I have to perform my data processing function on each period, the number of periods I have to adjust to a variable (for example: I assign a start and end date and each period in the test should be 1 month). My problem is as follows: I do not know how to shift the dates by the value of the out-of-sample period and get a sheet with the results of calculations for each period at the output of the function. The value of the out-of-sample period will be 30% of the total length of the selected period. What tools in R can I use to solve my problem?
start date: 2019-01-01, end date: 2019-12-31
first period: from 2019-01-01 to 2019-03-31
second period: from 2019-02-01 to 2019-04-30 etc...
Assuming that the question is asking how to form the sequence of start, end and start-of-testing (oos) dates given st and en shown below, first form the months sequence and then transform it to append the start-of-test date. To do that seq can generate a beginning of month Date sequence. Also if we add an integer to a Date class object then the result is to add or subtract that number of days so we can get the end of the month by subtracting one day from the start of the next month.
We have allocated 70% of the three month period to training and 30% to the test making use of the fact that the difference between two Date objects is the number of days between them. 70/30 is what the question asks for; however, that means that there will be a few days not in any test in each period whereas the diagram has no days that are not in any test except at the beginning. If all days are to be in a test then we might instead use the third month in the period as the test period and the first two months as the training period. In that case uncomment the commented out transform line. We also show this variation at the end.
Finally define a function f (we have shown a dummy calculation to make it possible to run the code) with arguments start, end and test to perform whatever calculation is needed. It can produce any sort of output object for one train/test instance. We can use either Map or by as shown below. The output list of results will have one component per row of d.
# input
st <- as.Date("2019-01-01")
en <- as.Date("2019-12-31")
months <- seq(st, en, by = "month")
d <- data.frame(start = head(months, -2), end = c(tail(months, -3) - 1, en))
# append date that test starts -- d shown at end
d <- transform(d, test = start + .7 * (end - start + 1))
# d <- transform(d, test = tail(months, -2))
# replace this with your function. Can be many lines.
f <- function(start, end, test) {
data.frame(start, end, test) # dummy calc - just show dates
}
# use `Map` or `by` to run f nrow(d) times giving a list of results,
# one component per row of d
with(d, Map(f, start, end, test))
# or
by(d, 1:nrow(d), with, f(start, end, test))
The data frame d above is:
> d
start end test
1 2019-01-01 2019-03-31 2019-03-05
2 2019-02-01 2019-04-30 2019-04-04
3 2019-03-01 2019-05-31 2019-05-04
4 2019-04-01 2019-06-30 2019-06-04
5 2019-05-01 2019-07-31 2019-07-04
6 2019-06-01 2019-08-31 2019-08-04
7 2019-07-01 2019-09-30 2019-09-03
8 2019-08-01 2019-10-31 2019-10-04
9 2019-09-01 2019-11-30 2019-11-04
10 2019-10-01 2019-12-31 2019-12-04
If we had used the commented out version of d then it would look like this (same except last column):
start end test
1 2019-01-01 2019-03-31 2019-03-01
2 2019-02-01 2019-04-30 2019-04-01
3 2019-03-01 2019-05-31 2019-05-01
4 2019-04-01 2019-06-30 2019-06-01
5 2019-05-01 2019-07-31 2019-07-01
6 2019-06-01 2019-08-31 2019-08-01
7 2019-07-01 2019-09-30 2019-09-01
8 2019-08-01 2019-10-31 2019-10-01
9 2019-09-01 2019-11-30 2019-11-01
10 2019-10-01 2019-12-31 2019-12-01
Graphics
We can display these as gantt charts using ggplot2.
library(ggplot2)
library(gridExtra)
library(scales)
n <- nrow(d)
Plot <- function(x, main) {
ggplot(x, aes(size = I(15))) +
geom_segment(aes(x = start, xend = test, y = n:1, yend = n:1), col = "green") +
geom_segment(aes(x = test, xend = end, y = n:1, yend = n:1), col = "blue") +
scale_x_date(labels = date_format("%b\n%Y"), breaks = date_breaks("month")) +
ggtitle(main) +
theme(legend.position = "none",
axis.title = element_blank(),
axis.text.y = element_blank(),
axis.ticks.y = element_blank(),
panel.grid.major = element_line(colour = "#808080"))
}
d <- transform(d, test = start + .7 * (end - start + 1))
g1 <- Plot(d, "70/30")
d <- transform(d, test = tail(months, -2))
g2 <- Plot(d, "2 months/1 month")
grid.arrange(g1, g2, ncol = 2)
Thanks everyone for the help. i found a way to solve by writing a small function.
dates <- function(startDate, endDate, periodLength, lag){
start <- as.Date(startDate)
end <- as.Date(endDate)
data <- start
while(data[length(data)] < end){
x <- as.Date(data[length(data)] + lag)
data <- as.Date(rbind(data, x))
}
end <- data + periodLength
data <- data.table(data, end)
colnames(data) <- c('start', 'end')
data$start <- as.Date(data$start)
data$end <- as.Date(data$end)
data <- as.list(as.data.table(t(data)))
return(data)
}
where
startDate - this is the start date of the testing period,
endDate - this is the end date of the testing period,
periodLength - this is the length of one period in days,
lag - this is the offset (the length of the OOS period)
dates(startDate = '2019-01-01', endDate = '2019-06-30', periodLength = 30, lag = 10)
$V1
[1] "2019-01-01" "2019-01-31"
$V2
[1] "2019-01-11" "2019-02-10"
$V3
[1] "2019-01-21" "2019-02-20"
$V4
[1] "2019-01-31" "2019-03-02"
$V5
[1] "2019-02-10" "2019-03-12"
$V6
[1] "2019-02-20" "2019-03-22"
$V7
[1] "2019-03-02" "2019-04-01"
$V8
[1] "2019-03-12" "2019-04-11"
$V9
[1] "2019-03-22" "2019-04-21"
$V10
[1] "2019-04-01" "2019-05-01"
$V11
[1] "2019-04-11" "2019-05-11"
$V12
[1] "2019-04-21" "2019-05-21"
$V13
[1] "2019-05-01" "2019-05-31"
$V14
[1] "2019-05-11" "2019-06-10"
$V15
[1] "2019-05-21" "2019-06-20"
$V16
[1] "2019-05-31" "2019-06-30"
$V17
[1] "2019-06-10" "2019-07-10"
$V18
[1] "2019-06-20" "2019-07-20"
$V19
[1] "2019-06-30" "2019-07-30"
The rnoaa package only allows you to gather 30 days worth of air pressure information at a time https://cran.r-project.org/web/packages/rnoaa/rnoaa.pdf. I'm looking to create a function/ for loop to pull data from the package a month at a time. It's specific the date format that is requires, YYYYMMDD. No - or /. I started with a function, but the lapply, doesn't seem to be applying to the function to call the air pressure data.
I have tried loops in many ways, and I can't seem to get it. Here's an example.
for (i in dates)) {
air_pressure[i] <- coops_search(begin_date = start[i], end_date = end[i],
station_name = 8727520, product= "air_pressure", units = "metric", time_zone = "gmt")
print(air_pressure[i])
}
start<-seq(as.Date("2015/01/01"), by = "month", length.out = 100)
start <- as.numeric(gsub("-","",start))
end<-seq(as.Date("2015/02/01"), by = "month", length.out = 100)
end <- as.numeric(gsub("-","",end))
pressure_function<- function(air_pressure) {
coops_search(station_name = 8727520, begin_date = starting,
end_date = ending, product = "air_pressure")
}
lapply(pressure_function, starting= start, ending= end, FUN= sum)
No real error messages, just don't populate, or run the function.
There's some pretty fundamental things wrong here. First, your for loop has too many closing parentheses. Second, your lapply call passes a function as the first parameter; that does not work, pass it in the second slot. And more ....
Anyway, try this:
library(rnoaa)
fun <- function(begin, end) {
coops_search(station_name = 8727520, begin_date = gsub("-", "", begin),
end_date = gsub("-", "", end), product = "air_pressure")
}
start_dates <- seq(as.Date("2015-01-01"), as.Date("2015-12-31"), by = "month")
end_dates <- seq(as.Date("2015-02-01"), as.Date("2016-01-01"), by = "month") - 1
res <- Map(fun, start_dates, end_dates)
df <- dplyr::bind_rows(lapply(res, "[[", "data"))
head(df)
#> t v f
#> 1 2015-01-01 00:00:00 1025.3 0,0,0
#> 2 2015-01-01 00:06:00 1025.4 0,0,0
#> 3 2015-01-01 00:12:00 1025.5 0,0,0
#> 4 2015-01-01 00:18:00 1025.6 0,0,0
#> 5 2015-01-01 00:24:00 1025.6 0,0,0
#> 6 2015-01-01 00:30:00 1025.6 0,0,0
NROW(df)
#> [1] 87600
I have a data frame with start and stop times for an experiment and I want to calculate the duration of each experiment (one line per experiment). Data frame:
start_t stop_t
7:35 7:48
23:50 00:15
11:22 12:06
I created a function to convert the time to POSIX format and calculate the duration, testing if start and stop crosses midnight:
TimeDiff <- function(t1,t2) {
if (as.numeric(as.POSIXct(paste("2016-01-01", t1))) > as.numeric(as.POSIXct(paste("2016-01-01", t2)))) {
t1n <- as.numeric(as.POSIXct(paste("2016-01-01", t1)))
t2n <- as.numeric(as.POSIXct(paste("2016-01-02", t2)))
}
if (as.numeric(as.POSIXct(paste("2016-01-01", t1))) < as.numeric(as.POSIXct(paste("2016-01-01", t2)))) {
t1n <- as.numeric(as.POSIXct(paste("2016-01-01", t1)))
t2n <- as.numeric(as.POSIXct(paste("2016-01-01", t2)))
}
#calculate time-difference in seconds
t2n - t1n
}
Then I wanted to apply this function to my data frame using either the 'mutate' function in 'dplyr' or an 'apply' function, e.g.:
mutate(df, dur = TimeDiff(start_t, stop_t))
But the result is that the 'dur' table is filled with just the same value. I ended up using a clunky for-loop to apply my function to the dataframe, but would want a more elegant solution. Help wanted!
Day can be incremented when the time stamp passes midnight. I am not sure if that is necessary to just to test if start and stop crosses midnight. Hope this helps!
df = data.frame(start_t = c("7:35", "23:50","11:22"), stop_t=c("7:48", "00:15", "12:06"), stringsAsFactors = F)
myfun = function(tvec1, tvec2, units_args="secs") {
tvec1_t = as.POSIXct(paste("2016-01-01", tvec1))
tvec2_t = as.POSIXct(paste("2016-01-01", tvec2))
time_diff = difftime(tvec2_t, tvec1_t, units = units_args)
return( time_diff )
}
# append new columns (base R)
df$time_diff = myfun(df$start_t, df$stop_t)
df$cross = ifelse(df$time_diff < 0, 1, 0)
output:
start_t stop_t time_diff cross
1 7:35 7:48 780 secs 0
2 23:50 00:15 -84900 secs 1
3 11:22 12:06 2640 secs 0
Since you don't have dates but only times, there is indeed the problem of experiments crossing midnight. Your function does not work, because it is not vectorized, i.e. it doesn't compute the difference for each element on its own.
The following works but is still not perfectly elegant:
If the start happened before the end, we simply subtract to get the duration.
If we cross midnight (the heuristic for this is not very stable), we calculate the difference until midnight and add the duration on the next day.
library(tidyverse)
diff_time <- function(start, end) {
case_when(start < end ~ end - start,
start > end ~ parse_time("23:59") - start + end + parse_time("0:01")
)
}
df %>%
mutate_all(parse_time) %>%
mutate(duration = diff_time(start_t, stop_t))
#> start_t stop_t duration
#> 1 07:35:00 07:48:00 780 secs
#> 2 23:50:00 00:15:00 1500 secs
#> 3 11:22:00 12:06:00 2640 secs
If you had dates, you could simply do:
df %>%
mutate(duration = stop_t - start_t)
Data
df <- read.table(text = "start_t stop_t
7:35 7:48
23:50 00:15
11:22 12:06", header = T)
The simplest way I can think of involves lubridate:
library(lubridate)
library(dplyr)
#make a fake df
df <- data.frame(start = c('7:35', '23:50', '11:22'), stop = c('7:48', '00:15', '12:06'), stringsAsFactors = FALSE)
#convert to lubridate minutes/seconds format, then subtract
df %>%
mutate(start = ms(start), stop = ms(stop)) %>%
mutate(dur= stop - start)
Output:
start stop dur
1 7M 35S 7M 48S 13S
2 23M 50S 15S -23M -35S
3 11M 22S 12M 6S 1M -16S
The problem with your circumstance is that the second line will confuse lubridate - it will show 23 hours and some minutes because it will assume all of these times are on the same day. You should probably add the day:
library(lubridate)
library(dplyr)
#make a fake df
df <- data.frame(start = c('2017/10/08 7:35', '2017/10/08 23:50', '2017/10/08 11:22'), stop = c('2017/10/08 7:48', '2017/10/09 00:15', '2017/10/08 12:06'), stringsAsFactors = FALSE)
#convert to lubridate minutes/seconds format, then subtract
df %>%
mutate(start = ymd_hm(start), stop = ymd_hm(stop)) %>%
mutate(dur= stop - start)
Output:
start stop dur
1 2017-10-08 07:35:00 2017-10-08 07:48:00 13 mins
2 2017-10-08 23:50:00 2017-10-09 00:15:00 25 mins
3 2017-10-08 11:22:00 2017-10-08 12:06:00 44 mins