I have a zoo series in R. I can choose between a chron or a POSIXct index.
How can I aggregate to 15min, taking the last element every 15min?
I know how to aggregate daily, writing as.Date, but not how to aggregate every 15min.
thanks.
If I recall, this is documented in the zoo vignettes. Did you look there?
The xts package, which builds on zoo has helper functions -- see help(to.period) in particular and the to.minutes15 function.
Here are a couple of possibilities depending on what you want. Both make use of trunc.times from the chron package. The aggregate.zoo solution takes the last value within each 15 minute interval and labels it using the time at the beginning of the 15 minute interval so the times used are: 00:00:00, 00:15:00, 00:30:00 and 00:45:00. The duplicated solution uses the same values but labels them using the last time actually found in the data. In both cases we only include intervals for which data is present.
There are more examples of aggregate.zoo in (1) ?aggregate.zoo, (2) all three of the zoo vignettes have examples and (3) searching the r-help archives for the words aggregate.zoo and trunc finds even more examples.
library(zoo)
library(chron)
z <- zoo(1:10, chron(1:10/(24*13)))
# 1. last value in each 15 minute interval
# using time at which interval begins
aggregate(z, trunc(time(z), "00:15:00"), tail, 1)
# 2. last value in each 15 minute interval
# time of last point in data within interval
z[!duplicated(trunc(time(z), "00:15:00"), fromLast = TRUE)]
Related
My objective is to impute NAs in a zooreg time series object. The pattern of the time series is cyclic. My code is:
#load libraries required
library("zoo")
# create sequence every 15 minutes from 1st Dec to 20th Dec, 2018
timeStamp <- seq.POSIXt(from=as.POSIXct('2018-01-01 00:00:00', tz="UTC"), to=as.POSIXct('2018-01-20 23:45:00', tz="UTC"), by = "15 min")
# data which increases from 12am to 12pm, then decreases till 12 am of next day, for 20 days
readings <- rep(c(seq(1,48,1), seq(48,1,-1)), 20)
dF <- data.frame(timeStamp=timeStamp, readings=readings)
# create a regular zooreg object, frequency is 1 day( 4 readings * 24 hours)
readingsZooReg <- zooreg(dF$readings, order.by = dF$timeStamp, frequency = 4*24)
plot(readingsZooReg)
# force some data to be NAs
window(readingsZooReg, start = as.POSIXct("2018-01-14 00:00:00", tz="UTC"), end = as.POSIXct("2018-01-16 23:45:00", tz="UTC")) <- NA
plot(readingsZooReg)
# plot imputed values
plot(na.approx(readingsZooReg))
The plots are:
Full time series, NAs added, Imputed time series
I'm purposely using zoo here, since the time series I work on are irregular(eg. solar, oil wells, etc)
1) Is my usage of "zooreg" correct? Or would a "zoo" object suffice ?
2) Is my frequency variable right?
3) Why won't na.approx work? I've also tried na.StructTs, the R script hangs.
4) Is there a solution using any other package? xts, ts, etc?
Your current example time-series is a regular time-series.
(a irregular time series would have time-steps with different time distances between observations)
E.g.:
10:00:10, 10:00:20, 10:00:30, 10:00:40, 10:00:50 (regular spaced)
10:00:10, 10:00:17, 10:00:33, 10:00:37, 10:00:50 (irregular spaced)
If you really need to handle irregular spaced time-series, zoo is your go to package. Otherwise you can also use other time series classes as xts and ts.
About the frequency:
You set the frequency of a time-series usually according to a value where you expect patterns to repeat. (in your example this could be 96). In real live this is often 1 day, 1 week, 1 month,....but it can be also different from these like 1,5 days. (e.g. if you have daily returning patterns and 1 minute observations you would set the frequency to 1440).
na.approx of zoo workes perfectly. It is exactly doing what it is expected to. A interpolation between the points 0 before the gap and 0 at the end of the gap will give a straight line at 0. Of course that is probably not the result you expected, because it does not account for seasonality. That is why G. Grothendieck suggests you na.StructTS as a method to choose. (this method is usually better in accounting for seasonality)
The best choice if you are not bound to zoo would in this specific case be using na_seadec from the imputeTS package ( a package solely dedicated to time series imputation).
I have added you a example also with nice plots from the imputeTS package
library(imputeTS)
yourTS <- ts(coredata(readingsZooReg), frequency = 96)
ggplot_na_distribution(yourTS)
imputedTS <- na_seadec(yourTS)
ggplot_na_imputations(yourTS, imputedTS)
Usually imputeTS also works perfectly with zoo time-series as input. I only changed it to ts again, because something with your zoo object seems odd...that is also why na.StructTS from zoo itself breaks. Maybe somebody with better knowledge can help out here.
Beware, if you really should have irregular time series do not use other packages / imputation functions than from zoo. Because they all assume the data to be regular spaced and will give results accordingly.
I have a dataset having solar power generation for 24 hours for many days, now I have to find the average of the power generated in accordance with the time, as for example, Have a glimpse of the datasetI have to find the average of the power generated at time 9:00:00 AM.
Start by stripping out the time from the date-time variable.
Assuming your data is called myData
library(lubridate)
myData$Hour <- hour(strptime(myData$Time, format = "%Y-%m-%d %H:%M:%S"))
Then use ddply from the plyr package, which allows us to apply a function to a subset of data.
myMeans <- ddply(myData[,c("Hour", "IT_solar_generation")], "Hour", numcolwise(mean))
The resulting frame will have one column called Time which will give you the hour, and another with the means at each hour.
NOW, on another side but important note, when you ask a question you should be providing information on the attempts you've made so far to answer the question. This isn't a help desk.
I was wondering if there was a way to calculate time differences using the xts package without having to convert time values etc. if possible. I have an xts object with a time format given as 2010-02-15 13:35:59.123 (where the .123 is the milliseconds).
Now, I would like to find the number of milliseconds until the end of the day (i.e. 17:00:00). The problem however is that I basically have to do a few conversions of the data before I can do this (such as using as.POSIXct) and this becomes more complicated since I have to do it for several different days and possibly even different times. For this reason, I would prefer to not have to convert the "end of day time" and leave it as 17:00:00 such that in order to find the number of milliseconds between the present time and the end of day time I can just have a fairly simple operation such as 17:00:00.000 - 13:35:59.123 = ...
Is there a simple way to do this with minimal conversions? I'm certain xts has a function which I don't know of but I couldn't find anything in the documentation :/
EDIT: I forgot to mention, I tried the more 'straightforward' route by trying to compute the time differences by first trying to use the function as.POSIXct(16:00:00, format = "%H:%M:%S") but this gives an error, and I'm honestly not sure why...
You should be able to do this using a combination of ave(), .indexDate(), and a custom function. You didn't provide a reproducible example, so here's one using the daily data that comes with xts.
library(xts)
data(sample_matrix)
x <- as.xts(sample_matrix)
secsRemaining <- function(x) { end(x)-index(x) })
tdiff <- ave(x[,1], as.yearmon(index(x)), FUN = secsRemaining)
tdiff[86:92,]
# Open
# 2007-03-28 259200
# 2007-03-29 172800
# 2007-03-30 86400
# 2007-03-31 0
# 2007-04-01 2505600
# 2007-04-02 2419200
# 2007-04-03 2332800
In your case, the call would use .indexDate(x) instead of as.yearmon(index(x)).
tdiff <- ave(x[,1], .indexDate(x), FUN = secsRemaining)
Also note that this call to ave() only works on a 1-column xts object. Seems like a bug that it doesn't. Also note that you have to use FUN = with ave(), since the FUN argument occurs after ....
I have some timedelta strings which were exported from Python. I'm trying to import them for use in R, but I'm getting some weird results.
When the timedeltas are small, I get results that are off by 2 days, e.g.:
> as.difftime('26 days 04:53:36.000000000',format='%d days %H:%M:%S.000000000')
Time difference of 24.20389 days
When they are larger, it doesn't work at all:
> as.difftime('36 days 04:53:36.000000000',format='%d days %H:%M:%S.000000000')
Time difference of NA secs
I also read into 'R' some time delta objects I had processed with 'Python' and had a similar issue with the 26 days 04:53:36.000000000 format. As Gregor said, %d in strptime is the day of the month as a zero padded decimal number so won't work with numbers >31 and there doesn't seem to be an option for cumulative days (probably because strptime is for date time objects and not time delta objects).
My solution was to convert the objects to strings and extract the numerical data as Gregor suggested and I did this using the gsub function.
# convert to strings
data$tdelta <- as.character(data$tdelta)
# extract numerical data
days <- as.numeric(gsub('^.*([0-9]+) days.*$','\\1',data$tdelta))
hours <- as.numeric(gsub('^.*ys ([0-9]+):.*$','\\1',data$tdelta))
minutes <- as.numeric(gsub('^.*:([0-9]+):.*$','\\1',data$tdelta))
seconds <- as.numeric(gsub('^.*:([0-9]+)..*$','\\1',data$tdelta))
# add up numerical components to whatever units you want
time_diff_seconds <- seconds + minutes*60 + hours*60*60 + days*24*60*60
# add column to data frame
data$tdelta <- time_diff_seconds
That should allow you to do computations with the time differences. Hope that helps.
I want to create a single column with a sequence of date/time increasing every hour for one year or one month (for example). I was using a code like this to generate this sequence:
start.date<-"2012-01-15"
start.time<-"00:00:00"
interval<-60 # 60 minutes
increment.mins<-interval*60
x<-paste(start.date,start.time)
for(i in 1:365){
print(strptime(x, "%Y-%m-%d %H:%M:%S")+i*increment.mins)
}
However, I am not sure how to specify the range of the sequence of dates and hours. Also, I have been having problems dealing with the first hour "00:00:00"? Not sure what is the best way to specify the length of the date/time sequence for a month, year, etc? Any suggestion will be appreciated.
I would strongly recommend you to use the POSIXct datatype. This way you can use seq without any problems and use those data however you want.
start <- as.POSIXct("2012-01-15")
interval <- 60
end <- start + as.difftime(1, units="days")
seq(from=start, by=interval*60, to=end)
Now you can do whatever you want with your vector of timestamps.
Try this. mondate is very clever about advancing by a month. For example, it will advance the last day of Jan to last day of Feb whereas other date/time classes tend to overshoot into Mar. chron does not use time zones so you can't get the time zone bugs that code as you can using POSIXct. Here x is from the question.
library(chron)
library(mondate)
start.time.num <- as.numeric(as.chron(x))
# +1 means one month. Use +12 if you want one year.
end.time.num <- as.numeric(as.chron(paste(mondate(x)+1, start.time)))
# 1/24 means one hour. Change as needed.
hours <- as.chron(seq(start.time.num, end.time.num, 1/24))