R: Aggregating by dates with POSIXct? - r

I have some zoo series that use POSIXct index.
In order to aggregate by days I've tried these two ways:
aggregate(myzoo,format((index((myzoo)),"%Y-%m-%d")),sum)
aggregate(myzoo,as.Date(index(myzoo)),sum)
I don't know why they don't give the same output.
myzoo series had the weekends removed. The "as.Date way" seems to be OK but the "format way" aggregation gives me data on the weekends.
Why?
Which one is the right?
I've even tried it as.POSIXct(format(...))

As I mentioned in my comment, you need to be careful when changing the format of a timestamp that includes time with a time zone, because it can get shifted between days. Without any data, it's hard to say exactly what your problem is, but you might also try apply.daily from xts:
apply.daily(myzoo, sum)
Here's a working example:
> x <- zoo(2:20, as.POSIXct("2003-02-01") + (2:20) * 7200)
> apply.daily(x, sum)
> 2003-02-01 22:00:00 2003-02-02 16:00:00
65 144

Related

Formatting 24-hour time variable to capture observations in different ranges

I currently have a data frame with a column for Start.Time (imported from a *.csv file), and the format is in 24 hour format (e.g., 20:00:00 equals 8pm). My goal is to capture observations with a start time in various intervals (e.g., between 9:00:00 and 10:00:00), which also meet other criteria. However, it seems that R sorts this 'character' variable in a way that does not align with how our day goes (e.g., 14:00:00 is considered a lower value than 9:00:00).
For example, below is a line of code that works as intended, where I am capturing observations on two different trail segments, which had a start time between 8:00:00 and 9:00:00.
RLLtoMist8.9<-sum((dataset1$Trail.Segment==52|dataset1$Trail.Segment==55) &
(dataset1$Start.Time>="8:00" & dataset1$Start.Time < "9:00"),
na.rm=TRUE)
RLLtoMist8.9
But, this code below does not work as intended, as R is 'valuing' 9:00:00 as greater than 10:00:00.
RLLtoMist9.10 <-
sum((dataset1$Trail.Segment==52|dataset1$Trail.Segment==55) &
(dataset1$Start.Time>="9:00:00 AM" & dataset1$Start.Time < "10:00:00 AM"),
na.rm=TRUE)
It's certainly true that character types are sorted so that "14:00" is less than "9:00". However R has a datetime class which would sort times correctly once a character representation has been parsed.
a <- as.POSIXct("14:00", format="%H:%M")
b <- as.POSIXct("8:00", format="%H:%M")
# test
> a < b
[1] FALSE
You would be able to convert an entire column with:
dataset1$Start.Time <- as.POSIXct(dataset1$Start.Time, format="%H:%M")
The dates of a and b were the system date at the time of conversion, so if you printed them you would see dates and times in the default format. There are packages, such as chron, that let you use just times, but POSIXt objects have dates and times necessarily. See ?DateTimeClasses. The lubridate package also has an 'interval' class and there exist a difftime function in base-R.
There's also seq.POSIXt and cut.POSIXt functions, either of which could be used to create multiple time or date boundaries for categorical transformations of datetimes.
Using the data.table library:
# convert to data table
dataset1<-data.table(dataset1)
# format to a date format rather that character
dataset1[, Start.Time := as.POSIXct(Start.Time, format="%H:%M:%S")]
#now do your filtering
dataset1[between(Start.Time, as.POSIXct("09:00:00", format="%H:%M:%S"), as.POSIXct("10:00:00", format="%H:%M:%S")) & (Trail.Segment==52 | Trail.Segment==55)]

NA difference in as.difftime R

It might seem as if it is duplicate of Find time difference in days with R but I guess it is not.
The problem is simple. I have two time stamps (format='%H:%M:%S'):
times <- c('02:51:43', '02:45:52')
and I want to calculate the time difference, however my attempt results with an unwanted behaviour:
as.difftime(times[1], times[2])
# Time difference of NA secs
I tried to specify format along with the units='secs', but I get the error that the argument time2 is not used.
Can someone give me a hint where I make a mistake?
(Sorry in advance, but I ain't even sure if it is reproducible.)
We can convert the times into POSIXct format and then subtract
x1 <- as.POSIXct(times, format = "%H:%M:%S", tz = "UTC")
x1[1] - x1[2]
#Time difference of 5.85 mins
which is also equivalent to
difftime(x1[1], x1[2])
I also encountered this problem, and I assigned the date-time class to the object again then it worked.
Suppose I have 2 date-time objects:
day1<-as.Date('2018-12-31')
day2<-as.Date('2019-12-31')
But this 'Time difference of NA secs' occurred, so I simply do this:
day1<-as.Date(day1)
day2<-as.Date(day2)
Then it worked fine:
difftime(day2,day1,units="days")
#Time difference of 365 days
Hope this helps.

Calculating Time Differences using xts in R

I was wondering if there was a way to calculate time differences using the xts package without having to convert time values etc. if possible. I have an xts object with a time format given as 2010-02-15 13:35:59.123 (where the .123 is the milliseconds).
Now, I would like to find the number of milliseconds until the end of the day (i.e. 17:00:00). The problem however is that I basically have to do a few conversions of the data before I can do this (such as using as.POSIXct) and this becomes more complicated since I have to do it for several different days and possibly even different times. For this reason, I would prefer to not have to convert the "end of day time" and leave it as 17:00:00 such that in order to find the number of milliseconds between the present time and the end of day time I can just have a fairly simple operation such as 17:00:00.000 - 13:35:59.123 = ...
Is there a simple way to do this with minimal conversions? I'm certain xts has a function which I don't know of but I couldn't find anything in the documentation :/
EDIT: I forgot to mention, I tried the more 'straightforward' route by trying to compute the time differences by first trying to use the function as.POSIXct(16:00:00, format = "%H:%M:%S") but this gives an error, and I'm honestly not sure why...
You should be able to do this using a combination of ave(), .indexDate(), and a custom function. You didn't provide a reproducible example, so here's one using the daily data that comes with xts.
library(xts)
data(sample_matrix)
x <- as.xts(sample_matrix)
secsRemaining <- function(x) { end(x)-index(x) })
tdiff <- ave(x[,1], as.yearmon(index(x)), FUN = secsRemaining)
tdiff[86:92,]
# Open
# 2007-03-28 259200
# 2007-03-29 172800
# 2007-03-30 86400
# 2007-03-31 0
# 2007-04-01 2505600
# 2007-04-02 2419200
# 2007-04-03 2332800
In your case, the call would use .indexDate(x) instead of as.yearmon(index(x)).
tdiff <- ave(x[,1], .indexDate(x), FUN = secsRemaining)
Also note that this call to ave() only works on a 1-column xts object. Seems like a bug that it doesn't. Also note that you have to use FUN = with ave(), since the FUN argument occurs after ....

Data aggregation loop in R

I am facing a problem concerning aggregating my data to daily data.
I have a data frame where NAs have been removed (Link of picture of data is given below). Data has been collected 3 times a day, but sometimes due to NAs, there is just 1 or 2 entries per day; some days data is missing completely.
I am now interested in calculating the daily mean of "dist": this means summing up the data of "dist" of one day and dividing it by number of entries per day (so 3 if there is no data missing that day). I would like to do this via a loop.
How can I do this with a loop? The problem is that sometimes I have 3 entries per day and sometimes just 2 or even 1. I would like to tell R that for every day, it should sum up "dist" and divide it by the number of entries that are available for every day.
I just have no idea how to formulate a for loop for this purpose. I would really appreciate if you could give me any advice on that problem. Thanks for your efforts and kind regards,
Jan
Data frame: http://www.pic-upload.de/view-11435581/Data_loop.jpg.html
Edit: I used aggregate and tapply as suggested, however, the mean value of the data was not really calculated:
Group.1 x
1 2006-10-06 12:00:00 636.5395
2 2006-10-06 20:00:00 859.0109
3 2006-10-07 04:00:00 301.8548
4 2006-10-07 12:00:00 649.3357
5 2006-10-07 20:00:00 944.8272
6 2006-10-08 04:00:00 136.7393
7 2006-10-08 12:00:00 360.9560
8 2006-10-08 20:00:00 NaN
The code used was:
dates<-Dis_sub$date
distance<-Dis_sub$dist
aggregate(distance,list(dates),mean,na.rm=TRUE)
tapply(distance,dates,mean,na.rm=TRUE)
Don't use a loop. Use R. Some example data :
dates <- rep(seq(as.Date("2001-01-05"),
as.Date("2001-01-20"),
by="day"),
each=3)
values <- rep(1:16,each=3)
values[c(4,5,6,10,14,15,30)] <- NA
and any of :
aggregate(values,list(dates),mean,na.rm=TRUE)
tapply(values,dates,mean,na.rm=TRUE)
gives you what you want. See also ?aggregate and ?tapply.
If you want a dataframe back, you can look at the package plyr :
Data <- as.data.frame(dates,values)
require(plyr)
ddply(data,"dates",mean,na.rm=TRUE)
Keep in mind that ddply is not fully supporting the date format (yet).
Look at the data.table package especially if your data is huge. Here is some code that calculates the mean of dist by day.
library(data.table)
dt = data.table(Data)
Data[,list(avg_dist = mean(dist, na.rm = T)),'date']
It looks like your main problem is that your date field has times attached. The first thing you need to do is create a column that has just the date using something like
Dis_sub$date_only <- as.Date(Dis_sub$date)
Then using Joris Meys' solution (which is the right way to do it) should work.
However if for some reason you really want to use a loop you could try something like
newFrame <- data.frame()
for d in unique(Dis_sub$date){
meanDist <- mean(Dis_sub$dist[Dis_sub$date==d],na.rm=TRUE)
newFrame <- rbind(newFrame,c(d,meanDist))
}
But keep in mind that this will be slow and memory-inefficient.

15min time aggregation in R

I have a zoo series in R. I can choose between a chron or a POSIXct index.
How can I aggregate to 15min, taking the last element every 15min?
I know how to aggregate daily, writing as.Date, but not how to aggregate every 15min.
thanks.
If I recall, this is documented in the zoo vignettes. Did you look there?
The xts package, which builds on zoo has helper functions -- see help(to.period) in particular and the to.minutes15 function.
Here are a couple of possibilities depending on what you want. Both make use of trunc.times from the chron package. The aggregate.zoo solution takes the last value within each 15 minute interval and labels it using the time at the beginning of the 15 minute interval so the times used are: 00:00:00, 00:15:00, 00:30:00 and 00:45:00. The duplicated solution uses the same values but labels them using the last time actually found in the data. In both cases we only include intervals for which data is present.
There are more examples of aggregate.zoo in (1) ?aggregate.zoo, (2) all three of the zoo vignettes have examples and (3) searching the r-help archives for the words aggregate.zoo and trunc finds even more examples.
library(zoo)
library(chron)
z <- zoo(1:10, chron(1:10/(24*13)))
# 1. last value in each 15 minute interval
# using time at which interval begins
aggregate(z, trunc(time(z), "00:15:00"), tail, 1)
# 2. last value in each 15 minute interval
# time of last point in data within interval
z[!duplicated(trunc(time(z), "00:15:00"), fromLast = TRUE)]

Resources