Create ITime intervals in data.table - r

I have a datetime variable (vardt) as a character in large data table. E.g. "21/07/2011 15:54:57"
I can turn it into ITime class (e.g. 15:54:57) with DT[,newtimevar:=as.ITime(substr(DT$vardt,12,19))] but I would like to create groups of minutes, so from 21/07/2011 15:54:57 I would obtain 15:54:00 or 15:54.
I have tried: DT[,cuttime := as.ITime(cut(DT$vardt, breaks = "1 min",))]
but it didn't work. I am reading the zoo package documentation but I haven't found anything yet. Any idea/function that could be useful for this case in a large data table?

Here are two possible approaches:
library(data.table)
##
x <- Sys.time()+sample(seq(0,24*3600,60),101,TRUE)
x <- gsub(
"(\\d+)\\-(\\d+)\\-(\\d+)",
"\\3/\\2/\\1",
x)
##
DT <- data.table(vardt=x)
##
DT[,time:=as.ITime(substr(vardt,12,19))]
##
DT[,hour_min:=as.ITime(
gsub("(\\d+)\\:(\\d+)\\:(\\d+)",
"\\1\\:\\2\\:00",time))]
DT[,c_hour_min:=substr(time,1,5)]
##
R> head(DT)
vardt time hour_min c_hour_min
1: 28/01/2015 05:38:30 05:38:30 05:38:00 05:38
2: 27/01/2015 14:15:30 14:15:30 14:15:00 14:15
3: 28/01/2015 06:03:30 06:03:30 06:03:00 06:03
4: 28/01/2015 00:37:30 00:37:30 00:37:00 00:37
5: 27/01/2015 17:59:30 17:59:30 17:59:00 17:59
6: 28/01/2015 03:46:30 03:46:30 03:46:00 03:46
R> str(DT,vec.len=2)
Classes ‘data.table’ and 'data.frame': 101 obs. of 4 variables:
$ vardt : chr "28/01/2015 05:38:30" "27/01/2015 14:15:30" ...
$ time :Class 'ITime' int [1:101] 20310 51330 21810 2250 64770 ...
$ hour_min :Class 'ITime' int [1:101] 20280 51300 21780 2220 64740 ...
$ c_hour_min: chr "05:38" "14:15" ...
- attr(*, ".internal.selfref")=<externalptr>
The first case, hour_min, preserves the ITime class, while the second case, c_hour_min, is just a character vector.

Related

Why does cut() turn my POSIXct vector into a factor vector and what can I do to stop this?

How can I use cut while maintaining the POSIXct class of my date.time vector?
library(data.table)
library(lubridate)
Some data:
air.temp <- c(-1.7202,-1.6524,-1.5689,-1.6785,-1.6060,-1.8843)
soil.temp <- c(3.6972,3.6839,3.6716,3.6586,3.6460,3.6701)
date.time <- c('2007-01-01 00:05:00','2007-01-01 00:10:00',
'2007-01-01 00:15:00','2007-01-01 00:20:00',
'2007-01-01 00:25:00','2007-01-01 00:30:00')
DT <- data.table(date.time, air.temp, soil.temp)
DT[, date.time := parse_date_time(date.time, 'YmdHMS')]
Structure shows the date.time column is in the desired POSIXTct format:
str(DT)
Classes ‘data.table’ and 'data.frame': 6 obs. of 3 variables:
$ date.time: POSIXct, format: "2007-01-01 00:05:00" ...
$ air.temp : num -1.72 -1.65 -1.57 -1.68 -1.61 ...
$ soil.temp: num 3.7 3.68 3.67 3.66 3.65 ...
- attr(*, ".internal.selfref")=<externalptr>
Now I cut five minute data to fifteen minute:
DT_15_min <- DT[, lapply(.SD, mean), by=(date.time = cut(date.time, "15 min"))]
Structure shows the conversion to factor vector:
str(DT_15_min)
Classes ‘data.table’ and 'data.frame': 2 obs. of 3 variables:
$ date.time: Factor w/ 2 levels "2007-01-01 00:05:00",..: 1 2
$ air.temp : num -1.65 -1.72
$ soil.temp: num 3.68 3.66
- attr(*, ".internal.selfref")=<externalptr>
Is it possible to cut while maintaining POSIXct vector class?
My desired result is to have my data agregated from a five minute interval to a fifteen minute interval while maintaining the original class of the vector (POSIXct in this case).
As always, I am grateful for any advice.
cut is designed to return factors. If you want to group by 15 min intervals, you could try using the rounding functions from lubridate, e.g.
DT_15_min <- DT[, lapply(.SD, mean), by=(date.time = floor_date(date.time, "15 mins"))]
str(DT_15_min)
Classes ‘data.table’ and 'data.frame': 3 obs. of 3 variables:
$ date.time: POSIXct, format: "2007-01-01 00:00:00" "2007-01-01 00:15:00" ...
$ air.temp : num -1.69 -1.62 -1.88
$ soil.temp: num 3.69 3.66 3.67
- attr(*, ".internal.selfref")=<externalptr>
you can also use dplyr:
df=tibble(date.time, air.temp, soil.temp)%>%mutate(date.time=ceiling_date(ymd_hms(date.time),unit="15 mins"))%>%
group_by(date.time)%>%summarize_all(funs(mean))

Error in converting character [duplicate]

This question already has answers here:
How to convert time to decimal
(3 answers)
Closed 5 years ago.
I have data like this.
> head(new3)
Date Hour Dayahead Actual Difference
1 2015-01-01 0:00 42955 42425 530
2 2015-01-01 0:15 42412 42021 391
3 2015-01-01 0:30 41901 42068 -167
4 2015-01-01 0:45 41355 41874 -519
5 2015-01-01 1:00 40710 41230 -520
6 2015-01-01 1:15 40204 40810 -606
Their characteristics are as below:
> str(new3)
'data.frame': 35044 obs. of 5 variables:
$ Date : Date, format: "2015-01-01" "2015-01-01" "2015-01-01" "2015-
01-01" ...
$ Hour : chr "0:00" "0:15" "0:30" "0:45" ...
$ Dayahead : chr "42955" "42412" "41901" "41355" ...
$ Actual : int 42425 42021 42068 41874 41230 40810 40461 40160 39958
39671 ...
$ Difference: chr "530" "391" "-167" "-519" ...
I tried to change Hour and Dayahead as numberic by doing as.numeric. But it shows me this.
> new3$Dayahead<-as.numeric(new3$Dayahead)
Warning message:
NAs introduced by coercion
> new3$Hour<-as.numeric(new3$Hour)
Warning message:
NAs introduced by coercion
So when I checked with str again, it showed me this.
> str(new3)
'data.frame': 35044 obs. of 5 variables:
$ Date : Date, format: "2015-01-01" "2015-01-01" "2015-01-01" "2015-
01-01" ...
$ Hour : num NA NA NA NA NA NA NA NA NA NA ...
$ Dayahead : num 42955 42412 41901 41355 40710 ...
$ Actual : int 42425 42021 42068 41874 41230 40810 40461 40160 39958
39671 ...
$ Difference: chr "530" "391" "-167" "-519" ...
questions is,
1) why do I have 'NAs introduced by coercion' warning message?
2) How can I solve the problem above?
3) Why do I get NA data for Hour and how can I solve it?
Thank you.
As already mentioned in the comments, if your string contains a non-numeric character (i.e., ":" in your Hour column), you cannot convert it to numeric, that's why you get NA.
I am not sure why do you want to convert your times to numeric, but if you'd like to perform some operations on it (e.g., calculate time differences) then you should convert your dates to Posix format. In your case run:
new3$fulldate <- as.POSIXlt(paste(new3$Date, new3$Hour, sep = " "))
Try this:
hour <- c("0:00", "0:15", "0:30", "0:45", "1:00", "1:15")
replace the : per . And you could convert
hour <- gsub(":", ".", hour)
hour <- as.numeric(hour)
hour
[1] 0.00 0.15 0.30 0.45 1.00 1.15

R -- base::apply - dimension errors (?) [duplicate]

I want to run a function on all periods of an xts matrix. apply() is very fast but the returned matrix has transposed dimensions compared to the original object:
> dim(myxts)
[1] 7429 48
> myxts.2 = apply(myxts, 1 , function(x) { return(x) })
> dim(myxts.2)
[1] 48 7429
> str(myxts)
An 'xts' object from 2012-01-03 09:30:00 to 2012-01-30 16:00:00 containing:
Data: num [1:7429, 1:48] 4092500 4098500 4091500 4090300 4095200 ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:48] "Open" "High" "Low" "Close" ...
Indexed by objects of class: [POSIXlt,POSIXt] TZ:
xts Attributes:
NULL
> str(myxts.2)
num [1:48, 1:7429] 4092500 4098500 4091100 4098500 0 ...
- attr(*, "dimnames")=List of 2
..$ : chr [1:48] "Open" "High" "Low" "Close" ...
..$ : chr [1:7429] "2012-01-03 09:30:00" "2012-01-03 09:31:00" "2012-01-03 09:32:00" "2012-01-03 09:33:00" ...
> nrow(myxts)
[1] 7429
> head(myxts)
Open High Low Close
2012-01-03 09:30:00 4092500 4098500 4091100 4098500
2012-01-03 09:31:00 4098500 4099500 4092000 4092000
2012-01-03 09:32:00 4091500 4095000 4090000 4090200
2012-01-03 09:33:00 4090300 4096400 4090300 4094900
2012-01-03 09:34:00 4095200 4100000 4095200 4099900
2012-01-03 09:35:00 4100000 4100000 4096500 4097500
How can I preserve myxts dimensions?
That's what apply is documented to do. From ?apply:
Value:
If each call to ‘FUN’ returns a vector of length ‘n’, then ‘apply’
returns an array of dimension ‘c(n, dim(X)[MARGIN])’ if ‘n > 1’.
In your case, 'n'=48 (because you're looping over rows), so apply will return an array of dimension c(48, 7429).
Also note that myxts.2 is not an xts object. It's a regular array. You have a couple options:
transpose the results of apply before re-creating your xts object:
data(sample_matrix)
myxts <- as.xts(sample_matrix)
dim(myxts) # [1] 180 4
myxts.2 <- apply(myxts, 1 , identity)
dim(myxts.2) # [1] 4 180
myxts.2 <- xts(t(apply(myxts, 1 , identity)), index(myxts))
dim(myxts.2) # [1] 180 4
Vectorize your function so it operates on all the rows of an xts
object and returns an xts object. Then you don't have to worry
about apply at all.
Finally, please start providing reproducible examples. It's not that hard and it makes it a lot easier for people to help. I've provided an example above and I hope you can use it in your following questions.

ifelse Statement Returning Number Instead Of Date

I have a series of dates in my code that are in an ifelse statement, that are returning a single numerical value instead of a date.
osa <- read.delim("C:/RMathew/RScripts/osaevents/osaevents.txt", stringsAsFactors=TRUE)
#
osa$datetime <- ymd_hms(osa$datetime)
osa$date <- as.Date(osa$datetime)
sixoclock <- 6*60*60
osa$daystart <- ymd_hms(ymd(osa$date) + sixoclock)
osa$dateplus <- osa$date + 1
osa$dateminus <- osa$date - 1
osa$dayend <- ymd_hms(ymd(osa$dateplus) + sixoclock)
osa$dateloca <- osa$datetime >= osa$daystart
osa$datelocb <- osa$datetime < osa$dayend
osa$milldate <- ifelse(osa$dateloca==TRUE & osa$datelocb==TRUE,
osa$date,osa$dateminus)
The place where this data originates considers the time between 6 AM on any given day to 6 AM the following day, as one day. The code above is trying to compare the date to the question of is it after 6 AM on a particular day, but before 6 AM on the following day, to assign it the earlier day's date (for whatever day it might be).
So far so good, but it returns a single number for the osa$milldate instead of the dates in the ifelse columns.
'data.frame': 897 obs. of 16 variables:
$ datetime : POSIXct, format: "2015-08-13 15:11:53" "2015-08-13 14:53:26" "2015-08-13 14:34:58" "2015-08-13 14:16:18" ...
$ stream : Factor w/ 1 level "fc": 1 1 1 1 1 1 1 1 1 1 ...
$ fe : num 18.1 18 17.6 18.1 18.5 ...
$ ni : num 2.97 2.99 2.92 3.2 3.32 ...
$ cu : num 3.41 3.35 2.99 3.58 3.73 ...
$ pd : num 138 157 139 166 183 ...
$ mg : num 13.8 13.8 14.4 14.3 13.9 ...
$ so : num 9.67 9.81 9.65 10.58 11.37 ...
$ date : Date, format: "2015-08-13" "2015-08-13" "2015-08-13" "2015-08-13" ...
$ daystart : POSIXct, format: "2015-08-13 06:00:00" "2015-08-13 06:00:00" "2015-08-13 06:00:00" "2015-08-13 06:00:00" ...
$ dateplus : Date, format: "2015-08-14" "2015-08-14" "2015-08-14" "2015-08-14" ...
$ dateminus: Date, format: "2015-08-12" "2015-08-12" "2015-08-12" "2015-08-12" ...
$ dayend : POSIXct, format: "2015-08-14 06:00:00" "2015-08-14 06:00:00" "2015-08-14 06:00:00" "2015-08-14 06:00:00" ...
$ dateloca : logi TRUE TRUE TRUE TRUE TRUE TRUE ...
$ datelocb : logi TRUE TRUE TRUE TRUE TRUE TRUE ...
$ milldate : num 16660 16660 16660 16660 16660 ...
Thoughts? Also, there is likely to be a more elegant way to do this.
See the help file for ifelse
Warning:
The mode of the result may depend on the value of ‘test’ (see the
examples), and the class attribute (see ‘oldClass’) of the result
is taken from ‘test’ and may be inappropriate for the values
selected from ‘yes’ and ‘no’.
Sometimes it is better to use a construction such as
(tmp <- yes; tmp[!test] <- no[!test]; tmp)
, possibly extended to handle missing values in ‘test’.
This describes precisely what is going on in your example -- the date class attribute is lost -- and a work around -- a multi-step approach.
osa$milldate <- osa$date
ind<- osa$dateloca==TRUE & osa$datelocb==TRUE
osa$milldate[!ind] <- osa$dateminus
Another option is replace.
A. Webb set me on the right path. The ifelse class was stripping the answer of the date format. The solution above with the index seemed to jumble the date order for some reason. As A. Webb pointed out, in the help file, the following line fixed it immediately.
class(osa$milldate) <- class(osa$date)

Why does range(index(x)) inside rollapply behave differently to outside (and just for my articifial data!)

I was about to blog about a useful R function I'd made, went to create some dummy data, but the dummy data behaves differently! Help!
library(xts)
data=xts(1:139,Sys.Date()-139:1)
Looking at it, it all looks good:
> head(data)
[,1]
2012-03-07 1
2012-03-08 2
2012-03-09 3
2012-03-10 4
2012-03-11 5
2012-03-12 6
> tail(data)
[,1]
2012-07-18 134
2012-07-19 135
2012-07-20 136
2012-07-21 137
2012-07-22 138
2012-07-23 139
> head(index(data))
[1] "2012-03-07" "2012-03-08" "2012-03-09" "2012-03-10" "2012-03-11" "2012-03-12"
> tail(index(data))
[1] "2012-07-18" "2012-07-19" "2012-07-20" "2012-07-21" "2012-07-22" "2012-07-23"
> range(index(data))
[1] "2012-03-07" "2012-07-23"
But, rollapply is weird. The range(index()) gives "1 40" instead of the strings.
> rollapply(data,width=40,by=30,FUN=function(x){print(range(index(x)));length(x)})
[1] 1 40
[1] 1 40
[1] 1 40
[1] 1 40
2012-03-26 40
2012-04-25 40
2012-05-25 40
2012-06-24 40
This is officially weird, because on my real data rollapply outputs a date range as strings. Comparing str on my real data and the above artificial data, and they are identical. In particular they both say 'Indexed by objects of class: [Date] TZ:' and they both say: 'tclass: chr "Date"'
Well, no, I exaggerate; the following artificial data has identical structure to my real data:
data=xts(data.frame(a=1:139,b=seq(3.14,by=0.01,length.out=139)),Sys.Date()-139:1)
It has exactly the same weird rollapply issue.
P.S. The useful function I mentioned is a rollapply wrapper; I've not shown it above because I don't need to: the core xts rollapply shows the problem too. But I'll post a link to it, in a comment, when I finally blog about it :-)
UPDATE
Here is some output with an xts object where it works:
> rollapply(data,width=40,by=30,FUN=function(x){print(class(x));print(range(index(x)));length(x)})
[1] "xts" "zoo"
[1] "2012-01-02" "2012-02-24"
...
> class(data)
[1] "xts" "zoo"
> str(data)
An ‘xts’ object from 2012-01-02 to 2012-07-18 containing:
Data: num [1:139, 1] 76.9 76.7 76.7 77.1 76.9 ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr "Close"
Indexed by objects of class: [Date] TZ:
xts Attributes:
List of 2
$ tclass: chr "Date"
$ tzone : chr ""
Here is some output with my artificial xts object (except I've added: colnames(data)=c("Close"))
> rollapply(data,width=40,by=30,FUN=function(x){print(class(x));print(range(index(x)));length(x)})
[1] "integer"
[1] 1 40
...
> class(data)
[1] "xts" "zoo"
> str(data)
An ‘xts’ object from 2012-03-07 to 2012-07-23 containing:
Data: int [1:139, 1] 1 2 3 4 5 6 7 8 9 10 ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr "Close"
Indexed by objects of class: [Date] TZ:
xts Attributes:
List of 2
$ tclass: chr "Date"
$ tzone : chr ""
I.e. identical str/class, identical function call, but different result. The xts object where it works is read from a csv file using this code:
d=read.table(fname,sep=',',header=T,stringsAsFactors=F)
x=as.xts(subset(d,select=-datestamp),order.by=as.Date(d$datestamp))
Observe the following:
rollapply(data,width=40,by=30,FUN=function(x){class(x)})
2012-03-26 integer
2012-04-25 integer
2012-05-25 integer
2012-06-24 integer
rollapply is passing the subsets of data as integer rather than xts objects.
The code for zoo:::rollapply.zoo appears to only use standard [ subsetting so it's not clear why the class information is being lost.
Edit
Actually there is a line:
dat <- mapply(f, seq_along(time(data)), width, MoreArgs = list(data = coredata(data),
...), SIMPLIFY = FALSE)
So only the coredata is being passed to the eventual function. This means you can't use rollapply to get these partial ranges.

Resources