I'm having trouble creating a time series (POSIXct or dttm column) with a row every 15 minutes.
Something that will look like this for every 15 minutes between Jan 1st 2015 and Dec 31st 2016 (here as month/day/year hour:minutes):
1/15/2015 0:00
1/15/2015 0:15
1/15/2015 0:30
1/15/2015 0:45
1/15/2015 1:00
A loop starting date of 01/01/2015 0:00 and then adding 15 minutes until 12/31/2016 23:45?
Does anyone has an idea of how this can be done easily?
Little bit easier to read
library(lubridate)
seq(ymd_hm('2015-01-01 00:00'),ymd_hm('2016-12-31 23:45'), by = '15 mins')
intervals.15.min <- 0 : (366 * 24 * 60 * 60 / 15 / 60)
res <- as.POSIXct("2015-01-01","GMT") + intervals.15.min * 15 * 60
res <- res[res < as.POSIXct("2016-01-01 00:00:00 GMT")]
head(res)
# "2015-01-01 00:00:00 GMT" "2015-01-01 00:15:00 GMT" "2015-01-01 00:30:00 GMT"
tail(res)
# "2015-12-31 23:15:00 GMT" "2015-12-31 23:30:00 GMT" "2015-12-31 23:45:00 GMT"
Related
Could you please tell me how to rearrange the datetime of data set A in order to compatible with datetime of data set B (which is in GMT+10 format)?
Thank you.
**data set A**
sitecode status start end
ANS0009 spike 11/09/2013 04:45:00 PM (GMT+11) 11/09/2013 05:00:00 PM (GMT+11)
ARM0064 spike 05/03/2014 11:00:00 AM (GMT+10) 05/03/2014 11:15:00 AM (GMT+10)
BAS0059 dry 13/01/2013 00:00:00 AM (GMT+11) 29/03/2013 11:45:00 PM (GMT+11)
BAS0059 spike 11/03/2014 10:15:00 AM (GMT+10) 11/03/2014 10:30:00 AM (GMT+10)
BLC0097 failure 12/20/2012 05:00:00 PM (GMT+11) 12/31/2012 11:45:00 PM (GMT+11)
BLC0097 spike 24/12/2015 04:59:45 PM (GMT+10) 24/12/2015 05:01:50 PM (GMT+10)
**data set B**
sitecode status start end
EUM0056 record 2012-12-01 11:00:00 2013-10-06 01:45:00
EUM0056 missing 2013-10-06 01:45:00 2013-10-06 03:00:00
EUM0056 record 2013-10-06 03:00:00 2014-03-11 20:15:00
MDL0026 record 2012-12-07 11:00:00 2013-04-04 19:45:00
MDL0026 missing 2013-04-04 19:45:00 2014-02-27 23:00:00
MDL0026 record 2014-02-27 23:00:00 2014-10-05 01:45:00
We can could use lubridate to parse multiple formats after splitting the string into two to remove the (GMT + ...).
library(lubridate)
library(stringr)
v1 <- strsplit(str1, "\\s+(?=\\()", perl = TRUE)[[1]]
parse_date_time(v1[1], c("%d/%m/%Y %I:%M:%S %p", "%m/%d/%Y %I:%M:%S %p"),
tz= "GMT", exact = TRUE) + lubridate::hours(str_extract(v1[2], "\\d+"))
#[1] "2013-09-12 03:45:00 GMT"
Using the full dataset example
datA[c("start", "end")] <- lapply(datA[c("start", "end")], function(x){
m1 <- do.call(rbind, strsplit(x, "\\s+(?=\\()", perl = TRUE))
parse_date_time(m1[,1], c("%d/%m/%Y %I:%M:%S %p", "%m/%d/%Y %I:%M:%S %p"),
tz = "GMT", exact = TRUE) + lubridate::hours(str_extract(m1[,2], "\\d+")
)})
data
str1 <- "11/09/2013 04:45:00 PM (GMT+11)"
require(lubridate)
exampleA <- c("11/09/2013 04:45:00 PM (GMT+11)",
"11/09/2013 04:45:00 PM (GMT+10)")
exampleA <- as.data.frame(exampleA)
exampleA$flag <- 0
exampleA$flag[grep(" PM \\(GMT\\+11\\)", exampleA$exampleA)] <- 1
exampleA$exampleA <- gsub(" PM \\(GMT\\+11\\)","", exampleA$exampleA)
exampleA$exampleA <- gsub(" PM \\(GMT\\+10\\)","", exampleA$exampleA)
exampleA$exampleA <- mdy_hms(exampleA$exampleA)
exampleA$exampleA[exampleA$flag == 1] <- exampleA$exampleA - 3600
exampleB <- c("2013-11-09 03:45:00", "2013-11-09 04:45:00")
exampleB <- ymd_hms(exampleB)
# Proof it works
exampleA$exampleA == exampleB
[1] TRUE TRUE
If you have a mix of formats in 1 data set (i.e. mdy, ydm, etc) you can deal with this by using if statements -- either in a function which you can apply or a for loop -- and text if a certain position has a value >12 to determine the format, then use the appropriate lubridate function to convert it.
I have recently updated R to version 3.2.3 and now I found a problem using seq with dates:
date1<-as.POSIXct("2014-01-30 02:00:00")
date2<-as.POSIXct("2014-12-24 11:00:00")
seq(date1,date2,by="month")
#[1] "2014-01-30 02:00:00 CET" "2014-03-02 02:00:00 CET"
#[3] NA "2014-04-30 02:00:00 CEST"
#[5] "2014-05-30 02:00:00 CEST" "2014-06-30 02:00:00 CEST"
#[7] "2014-07-30 02:00:00 CEST" "2014-08-30 02:00:00 CEST"
#[9] "2014-09-30 02:00:00 CEST" "2014-10-30 02:00:00 CET"
#[11] "2014-11-30 02:00:00 CET"
I don't understand where the NA comes from. I have tried on different machines with both the same R version as mine or a previous one and in the place of that NA they correctly give "2014-03-30". Furthermore, if I change the year in the dates from 2014 to 2015, no NAs are returned!
I guess that during the installation something in my locale was modified but I cannot understand how to fix the problem.
Sys.getlocale() returns:
"en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8"
and my system is a Mac Book Pro with Maverick.
Thanks for any help!
I was guessing Germany and here's what the situation was in that CET timezone on Mar 30 (a Sunday)
http://www.timetemperature.com/utc-converter/utc-20140330-germany-12.html
UTC or GMT Time Germany
Sunday 30th March 2014 00:00:00 Sun 01:00 AM
Sunday 30th March 2014 01:00:00 Sun 03:00 AM*
Changing the setting to Italy, I get the same result:
UTC or GMT Time Italy
Sunday 30th March 2014 00:00:00 Sun 01:00 AM
Sunday 30th March 2014 01:00:00 Sun 03:00 AM*
The key here is to be suspicious of weirdness when the time is in the early morning hours of a Spring or Fall date, or when calculations of intervals crosses such dates. The rules change from year to year, and since countries often do the switch on a Sunday or Saturday morning, the exact dates jump around.
The changes vary by country and in the US they may vary by state or even "sub-state" boundaries: In Washington State in 2014 you find the change happening onhte second Sunday of March:
http://www.timetemperature.com/utc-converter/utc-20140309-us-washington+state-12.html
UTC or GMT Time US-Washington State
snipped several times
Sunday 9th March 2014 07:00:00 Sat 11:00 PM
Sunday 9th March 2014 08:00:00 Sun 12:00 AM
Sunday 9th March 2014 09:00:00 Sun 01:00 AM
Sunday 9th March 2014 10:00:00 Sun 03:00 AM*
Sunday 9th March 2014 11:00:00 Sun 04:00 AM*
I'm in the same TZ as Washington state. With a Sys.timezone set, one can reproduce the NA, at least on a Mac. The implementation of times and timezones is OS-specific, so it's possible to see variations in how these weirdities get visible:
> Sys.timezone(location = TRUE)
[1] "America/Los_Angeles"
> date1<-as.POSIXct("2014-01-09 02:00:00")
> date2<-as.POSIXct("2014-12-09 11:00:00")
> seq(date1,date2,by="month")
[1] "2014-01-09 02:00:00 PST" "2014-02-09 02:00:00 PST"
[3] NA "2014-04-09 02:00:00 PDT"
[5] "2014-05-09 02:00:00 PDT" "2014-06-09 02:00:00 PDT"
[7] "2014-07-09 02:00:00 PDT" "2014-08-09 02:00:00 PDT"
[9] "2014-09-09 02:00:00 PDT" "2014-10-09 02:00:00 PDT"
[11] "2014-11-09 02:00:00 PST" "2014-12-09 02:00:00 PST"
By inspecting the relevant code in seq.POSIXt there appears that a call to seq with by="month" works as follows
[some manipulation of the data]
conversion of data1 & data2 to POSIXlt
creation of a sequence of months numbers spanning the interval from data1 to data2 (in this case 0,...,11)
manual update of data1$mon to this sequence of months (and up to this point the dates are all properly handled)
finally, the resulting dates are converted to POSIXct and here the NA shows up
while the resulting NA is technically correct, since it is trying to convert an invalid date ("2014-01-30 02:00:00 CET", which does not exist) to POSIXct, could the issue be possibly worked around by passing through difftimes? [*]
not sure it is worth, though...
[*] here by difftimes I mean to add the correct number of seconds to the dates instead of just adding the months...
I have a list of dates as this:
"2014-01-20 18:47:09 GMT"
"2014-01-20 18:46:59 GMT"
"2014-01-20 18:46:59 GMT"
"2014-01-20 18:46:59 GMT"
"2014-01-20 18:46:41 GMT"
I used this code to split the dates in four-hour intervals
data.frame(table(cut(datenormord, breaks = "4 hour")))
Results are these:
2013-07-22 06:00:00 144
2013-07-22 11:00:00 268
2013-07-22 16:00:00 331
2013-07-22 21:00:00 332
What I want is to see how many observations there are in each interval of four hours but not taking account of days months and years. For example I would like to see how many observations there are from 00:00 to 04:00 by adding observations of everyday of every year contained in my dataset
For example i want something like this:
01:00:00 1230
06:00:00 2430
11:00:00 3230
You can try removing the dates from your date using strftime then reformatting them to a date, which will just add the current day, year and month to all the datapoints. You can them break and count like you posted.
datenormord<-c("2014-01-20 01:47:09 GMT", "2014-01-20 07:46:59 GMT","2014-01-20 13:46:59 GMT" ,"2014-01-20 18:46:59 GMT" ,"2014-01-20 18:46:41 GMT")
datenormord<-strftime(as.POSIXlt(datenormord), format="%H:%M:%S")
datenormord<-as.POSIXlt(datenormord, format="%H:%M:%S")
result<-data.frame(table(cut(datenormord, breaks = "4 hour")))
You can remove the date in the final data frame as well:
result$Var1<-with(result,format(strftime(Var1,format="%H:%M")))
I want to split x (which is a factor)
dd = data.frame(x = c("29-4-2014 06:00:00", "9-4-2014 12:00:00", "9-4-2014 00:00:00", "6-5-2014 00:00:00" ,"7-4-2014 00:00:00" , "29-5-2014 00:00:00"))
x
29-4-2014 06:00:00
9-4-2014 12:00:00
9-4-2014 00:00:00
6-5-2014 00:00:00
7-4-2014 00:00:00
29-5-2014 00:00:00
at the horizontal space and get two columns as:
x.date x.time
29-4-2014 06:00:00
9-4-2014 12:00:00
9-4-2014 00:00:00
6-5-2014 00:00:00
7-4-2014 00:00:00
29-5-2014 00:00:00
Any suggestion is appreciated!
strsplit is typically used here, but you can also use read.table:
read.table(text = as.character(dd$x))
# V1 V2
# 1 29-4-2014 06:00:00
# 2 9-4-2014 12:00:00
# 3 9-4-2014 00:00:00
# 4 6-5-2014 00:00:00
# 5 7-4-2014 00:00:00
# 6 29-5-2014 00:00:00
Other option (better)
# Convert to POSIXct objects
times <- as.POSIXct(dd$x, format="%d-%m-%Y %T")
# You may also want to specify the time zone
times <- as.POSIXct(dd$x, format="%d-%m-%Y %T", tz="GMT")
Then, to extract times
strftime(times, "%T")
[1] "06:00:00" "12:00:00" "00:00:00" "00:00:00" "00:00:00" "00:00:00"
or dates
strftime(times, "%D")
[1] "04/29/14" "04/09/14" "04/09/14" "05/06/14" "04/07/14" "05/29/14"
or, any format you want, really
strftime(times, "%d %b %Y at %T")
[1] "29 Apr 2014 at 06:00:00" "09 Apr 2014 at 12:00:00"
[3] "09 Apr 2014 at 00:00:00" "06 May 2014 at 00:00:00"
[5] "07 Apr 2014 at 00:00:00" "29 May 2014 at 00:00:00"
See, for more info: ?as.POSIXct and ?strftime
Here is another approach using lubridate:
dd = data.frame(x = c("29-4-2014 06:00:00", "9-4-2014 12:00:00", "9-4-2014 00:00:00", "6-5-2014 00:00:00" ,"7-4-2014 00:00:00" , "29-5-2014 00:00:00"),
stringsAsFactors = FALSE)
Note the use of stringsAsFactors = FALSE, which prevents your dates from being read as factors.
library(lubridate)
dd2 <- transform(dd,x2 = dmy_hms(x))
transform(dd2, the_year = year(x2))
x x2 the_year
1 29-4-2014 06:00:00 2014-04-29 06:00:00 2014
2 9-4-2014 12:00:00 2014-04-09 12:00:00 2014
3 9-4-2014 00:00:00 2014-04-09 00:00:00 2014
4 6-5-2014 00:00:00 2014-05-06 00:00:00 2014
5 7-4-2014 00:00:00 2014-04-07 00:00:00 2014
6 29-5-2014 00:00:00 2014-05-29 00:00:00 2014
This is an example of my dataset:
> head(daily[,c(6,7)])->test
> head(test)
timeMin min
316 2013-05-02 13:45:00 3239
317 2013-05-03 12:30:00 3260
318 2013-05-04 12:30:00 3165
319 2013-05-05 12:30:00 3404
320 2013-05-06 12:30:00 3514
321 2013-05-07 13:15:00 3626
I need mean(timeMin), in order to know what´s the time of the day (hour:minute) at what the event usually happens. I have tried this:
library(lubridate)
> test$hourMin<-paste(hour(test$timeMin),minute(test$timeMin),sep=":”)
> test$hourMin <- hm(test$hourMin)
And I got this:
> head(test)
timeMin min hourMin
316 2013-05-02 13:45:00 3239 13H 45M 0S
317 2013-05-03 12:30:00 3260 12H 30M 0S
318 2013-05-04 12:30:00 3165 12H 30M 0S
319 2013-05-05 12:30:00 3404 12H 30M 0S
320 2013-05-06 12:30:00 3514 12H 30M 0S
321 2013-05-07 13:15:00 3626 13H 15M 0S
however, when I try to calculate the mean I had no result:
> mean(test$hourMin)
[1] 0
It should be straightforward, but I don´t know how to do it, since I am a beginner. I would appreciate any help. Thanks
It's really not elegant, but the only way I found for now is to change the date components to the same day and to compute the mean on the result. With lubridate :
time <- df$timeMin
time <- update(time, year=2000, month=1, mday=1)
mean(time)
# [1] "2000-01-01 12:50:00 CET"
Hopefully someone will provide something better...
I'm calculating seconds past 1st Jan, 2013 midnight and then taking mean of that and adding it back to 1st Jan, 2013 midnight.
I guess there are packages that can do this from just one command but if you, like me, don't wish to rely too much on packages then this solution should work for you.
library(data.table)
timetable <- data.table(TimeMin = c("2013-05-02 13:45:00",
"2013-05-03 12:30:00",
"2013-05-04 12:30:00",
"2013-05-05 12:30:00",
"2013-05-06 12:30:00",
"2013-05-07 13:15:00")
)
timetable <- timetable[, TimePastMin :=
difftime(
"2013-01-01 00:00:00",
TimeMin,
units = "secs"
)
]
meanTimePastMin <- mean(timetable[, TimePastMin])
meanTimeMin <- strptime("2013-01-01 00:00:00", "%Y-%m-%d %H:%M:%S") - meanTimePastMin
meanTimeMin
# "2013-05-05 00:50:00 IST"