I want to create a single column with a sequence of date/time increasing every hour for one year or one month (for example). I was using a code like this to generate this sequence:
start.date<-"2012-01-15"
start.time<-"00:00:00"
interval<-60 # 60 minutes
increment.mins<-interval*60
x<-paste(start.date,start.time)
for(i in 1:365){
print(strptime(x, "%Y-%m-%d %H:%M:%S")+i*increment.mins)
}
However, I am not sure how to specify the range of the sequence of dates and hours. Also, I have been having problems dealing with the first hour "00:00:00"? Not sure what is the best way to specify the length of the date/time sequence for a month, year, etc? Any suggestion will be appreciated.
I would strongly recommend you to use the POSIXct datatype. This way you can use seq without any problems and use those data however you want.
start <- as.POSIXct("2012-01-15")
interval <- 60
end <- start + as.difftime(1, units="days")
seq(from=start, by=interval*60, to=end)
Now you can do whatever you want with your vector of timestamps.
Try this. mondate is very clever about advancing by a month. For example, it will advance the last day of Jan to last day of Feb whereas other date/time classes tend to overshoot into Mar. chron does not use time zones so you can't get the time zone bugs that code as you can using POSIXct. Here x is from the question.
library(chron)
library(mondate)
start.time.num <- as.numeric(as.chron(x))
# +1 means one month. Use +12 if you want one year.
end.time.num <- as.numeric(as.chron(paste(mondate(x)+1, start.time)))
# 1/24 means one hour. Change as needed.
hours <- as.chron(seq(start.time.num, end.time.num, 1/24))
Related
I'm trying to get the next week day for a vector of dates in R. My approach was to create a vector of weekdays and then find the date to the weekend date I have. The problem is that for Saturday and some holidays (which are a lot in my country) i end up getting the previous week day which doesn't work.
This is an example of my problem:
vecDates = as.Date(c("2011-01-11","2011-01-12","2011-01-13","2011-01-14","2011-01-17","2011-01-18",
"2011-01-19","2011-01-20","2011-01-21","2011-01-24"))
testDates = as.Date(c("2011-01-22","2011-01-23"))
findInterval(testDates,vecDates)
for both dates the correct answer should be 10 which is "2011-01-24" but I get 9.
I though of a solution where I remove all the previous dates to the date i'm analyzing, and then use findInterval. It works but it is not vectorized and therefore kind of slow which does not work for my actual purpose.
Does this do what you want?
vecDates = as.Date(c("2011-01-11","2011-01-12",
"2011-01-13","2011-01-14",
"2011-01-17","2011-01-18",
"2011-01-19","2011-01-20",
"2011-01-21","2011-01-24"))
testDates = as.Date(c("2011-01-20","2011-01-22","2011-01-23"))
get_next_biz_day <- function(testdays, bizdays){
o <- findInterval(testdays, bizdays) + 1
bizdays[o]
}
get_next_biz_day(testDates, vecDates)
#[1] "2011-01-21" "2011-01-24" "2011-01-24"
I have a dataset in .csv, and I have added in a column on my own in the csv that takes the total time taken for a task to be completed. There are two other columns that consists of the start time and the end time, and that is where I calculated the total time taken column from. The format of the start time and end time columns are in the datetime format 5/7/2018 16:13 while the format of the total time taken column is 0:08:20(H:MM:SS).
I understand that for datetime, it is possible to use the functions as.Date or as.POSIXlt to change the variable type from a factor to that of date. Is there a function that I can convert my total time taken column to (from that of factor) so that I can use it to plot scatterplots/plots in general? I tried as.numeric but the numbers that come out are gibberish and do not correspond to the original time.
If you want to plot the total time taken for each row, then I would suggest just plotting that difference as seconds. Here is a code snippet which shows how you can convert your start or end date into a numerical value:
start <- "5/7/2018 16:13"
start_date <- as.POSIXct(start, format="%d/%m/%Y %H:%M")
as.numeric(start_date)
[1] 1530799980
The above is a UNIX timestamp, which is number of seconds since the epoch (January 1, 1970). But, since you want a difference between start and end times, this detail does not really matter for you, and the difference you get should be valid.
If you want to use minutes, hours, or some other time unit, then you can easily convert.
Is there a time interval data (variable) type in R? I have a CSV file with datetime and timeinterval columns. The data type of the datetime column can be POSIXlt, but I don't know how to set a timeinterval data type for the other column. Is it possible, or what is the best way to handle time inervals in R?
The time interval values in my CSV file looks like this [<number of days> %H:%M:%S]:
'0 20:32:59'
In Python pandas, there is a timedelta64[ns] data type for time intervals.
Thank you!
Split the strings into the number of days and the time, using stringi, then use lubridate to manipulate the components.
library(stringi)
library(lubridate)
In the following example:
([0-9]+) means capture one or more digits.
+ means one or more spaces (not captured).
([0-9]{2}:[0-9]{2}:[0-9]{2}) means capture 2 digits, a colon, 2 digits, another colon, and 2 more digits.
x <- "0 20:32:59"
matches <- stri_match_first_regex(x, "([0-9]+) +([0-9]{2}:[0-9]{2}:[0-9]{2})")
The number of days is the second column, and the hours/minutes/seconds are in the third column.
days creates a period of the number of days; hms creates a period of hours, minutes, and seconds.
n_days <- days(as.integer(matches[, 2]))
time <- hms(matches[, 3])
Now your total is just n_days + time, though presumably you want this relative to some origin, for example:
Sys.time() + n_days + time
Yes, see ? difftime
If your csv is already split into columns, apply as.difftime to one and as.POSIXlt to the other.
For example:
as.difftime(0, units="days") + as.POSIXlt("20:32:59", format="%H:%M:%S")
[Edit]
If the entire result is to be an interval, this would do it:
as.difftime(0, units="days") + as.difftime("20:32:59", format="%H:%M:%S")
I have some timedelta strings which were exported from Python. I'm trying to import them for use in R, but I'm getting some weird results.
When the timedeltas are small, I get results that are off by 2 days, e.g.:
> as.difftime('26 days 04:53:36.000000000',format='%d days %H:%M:%S.000000000')
Time difference of 24.20389 days
When they are larger, it doesn't work at all:
> as.difftime('36 days 04:53:36.000000000',format='%d days %H:%M:%S.000000000')
Time difference of NA secs
I also read into 'R' some time delta objects I had processed with 'Python' and had a similar issue with the 26 days 04:53:36.000000000 format. As Gregor said, %d in strptime is the day of the month as a zero padded decimal number so won't work with numbers >31 and there doesn't seem to be an option for cumulative days (probably because strptime is for date time objects and not time delta objects).
My solution was to convert the objects to strings and extract the numerical data as Gregor suggested and I did this using the gsub function.
# convert to strings
data$tdelta <- as.character(data$tdelta)
# extract numerical data
days <- as.numeric(gsub('^.*([0-9]+) days.*$','\\1',data$tdelta))
hours <- as.numeric(gsub('^.*ys ([0-9]+):.*$','\\1',data$tdelta))
minutes <- as.numeric(gsub('^.*:([0-9]+):.*$','\\1',data$tdelta))
seconds <- as.numeric(gsub('^.*:([0-9]+)..*$','\\1',data$tdelta))
# add up numerical components to whatever units you want
time_diff_seconds <- seconds + minutes*60 + hours*60*60 + days*24*60*60
# add column to data frame
data$tdelta <- time_diff_seconds
That should allow you to do computations with the time differences. Hope that helps.
I have some numbers that represent dates in milliseconds since epoch, 00:00:00 Coordinated Universal Time (UTC), Thursday, 1 January 1970
1365368400000,
1365973200000,
1366578000000
I'm converting them to date format:
as.Date(as.POSIXct(my_dates/1000, origin="1970-01-01", tz="GMT"))
answer:
[1] "2013-04-07" "2013-04-14" "2013-04-21"
How to convert these strings back to milliseconds since epoch?
Here are your javascript dates
x <- c(1365368400000, 1365973200000, 1366578000000)
You can convert them to R dates more easily by dividing by the number of milliseconds in one day.
y <- as.Date(x / 86400000, origin = "1970-01-01")
To convert back, just convert to numeric and multiply by this number.
z <- as.numeric(y) * 86400000
Finally, check that the answer is what you started with.
stopifnot(identical(x, z))
As per the comment, you may sometimes get numerical rounding errors leading to x and z not being identical. For numerical comparisons like this, use:
library(testthat)
expect_equal(x, z)
I will provide a simple framework to handle various kinds of dates encoding and how to go back an forth. Using the R package ‘lubridate’ this is made very easy using the period and interval classes.
When dealing with days, it can be easy as one can use the as.numeric(Date) to get the number of dates since the epoch. To get any unit of time smaller than a day one can convert using the various factors (24 for hours, 24 * 60 for minutes, etc.) However, for months, the math can get a bit more tricky and thus I prefer in many instances to use this method.
library(lubridate)
as.period(interval(start = epoch, end = Date), unit = 'month')#month
This can be used for year, month, day, hour, minute, and smaller units through apply the factors.
Going the other way such as being given months since epoch:
library(lubridate)
epoch %m+% as.period(Date, unit = 'months')
I presented this approach with months as it might be the more complicated one. An advantage to using period and intervals is that it can be adjusted to any epoch and unit very easily.