averaging time series data in r xts - r

I have time series data where the values are stored at the end of a 1-min sampling interval (i.e. data for 00:00 belongs to the interval 23:59 - 00:00 etc.).
I now would like to average in 5 min intervals giving the mean concentrations at 00:05, 00:10, etc.
What I get with the code below is the averages at 00:04, 00:09, etc., which seems to be related to the endpoints function, but I cannot figure out how to average correctly (i.e. in my case minutes 00:01 to 00:05 reported as mean at 00:05 etc.)
library(zoo)
library(xts)
t1 <- as.POSIXct("2012-1-1 0:1:0")
t2 <- as.POSIXct("2012-1-1 0:15:0")
d <- seq(t1, t2, by = "1 min")
x <- rnorm(length(d))
z <- zoo(x, d)
period.apply(z,endpoints(z,"mins",5),mean)
> 2012-01-01 00:04:00 2012-01-01 00:09:00 2012-01-01 00:14:00 2012-01-01 00:15:00
0.6864088 -0.9403631 -0.4269895 0.6728044

The endpoints function is working correctly. You need to change your index values. 00:05:00 is the beginning of the 5th minute, not the end.
> z <- zoo(x, d-1)
> period.apply(z,endpoints(z,"mins",5),mean)
2012-01-01 00:04:59 2012-01-01 00:09:59 2012-01-01 00:14:59
1.2324436 -0.5881076 0.5067009

Related

Calculating mean and sd of bedtime (hh:mm) in R - problem are times before/after midnight

I got the following dataset:
data <- read.table(text="
wake_time sleep_time
08:38:00 23:05:00
09:30:00 00:50:00
06:45:00 22:15:00
07:27:00 23:34:00
09:00:00 23:00:00
09:05:00 00:10:00
06:40:00 23:28:00
10:00:00 23:30:00
08:10:00 00:10:00
08:07:00 00:38:00", header=T)
I used the chron-package to calculate the average wake_time:
> mean(times(data$wake_time))
[1] 08:20:12
But when I do the same for the variable sleep_time, this happens:
> mean(times(data$sleep_time))
[1] 14:04:00
I guess the result is distorted because the sleep_time contains times before and after midnight.
But how can I solve this problem?
Additionally:
How can I calculate the sd of the times. I want to use it like "mean wake-up-time 08:20 ± 44 min" for example.
THe times values are stored as numbers 0-1 representing a fraction of a day. If the sleep time is earlier than the wake time, you can "add a day" before taking the mean. For example
library(chron)
wake <- times(data$wake_time)
sleep <- times(data$sleep_time)
times(mean(ifelse(sleep < wake, sleep+1, sleep)))
# [1] 23:40:00
And since the values are parts of a day, if you want the sd in minutes, you'd take the partial day values and convert to minutes
sd(ifelse(sleep < wake, sleep+1, sleep) * 24*60)
# [1] 47.60252

period.apply over an hour with deciding start time

So I have a xts time serie over the year with time zone "UTC". The time interval between each row is 15 minutes.
x1 x2
2014-12-31 23:15:00 153.0 0.0
2014-12-31 23:30:00 167.1 5.4
2014-12-31 23:45:00 190.3 4.1
2015-01-01 00:00:00 167.1 9.7
As I want data over one hour to allow for comparison with other data sets, I tried to use period.apply:
dat <- period.apply(dat, endpoints(dat,on="hours",k=1), colSums)
The problem is that the first row in my new data set is 2014-12-31 23:45:00 and not 2015-01-01 00:00:00. I tried changing the endpoint vector but somehow it keeps saying that it is out of bounds. I also thought this was my answer: https://stats.stackexchange.com/questions/5305/how-to-re-sample-an-xts-time-series-in-r/19003#19003 but it was not. I don't want to change the names of my columns, I want to sum over a different interval.
Here a reproducible example:
library(xts)
seq<-seq(from=ISOdate(2014,12,31,23,15),length.out = 100, by="15 min", tz="UTC")
xts<-xts(rep(1,100),order.by = seq)
period.apply(xts, endpoints(xts,on="hours",k=1), colSums)
And the result looks like this:
2014-12-31 23:45:00 3
2015-01-01 00:45:00 4
2015-01-01 01:45:00 4
2015-01-01 02:45:00 4
and ends up like this:
2015-01-01 21:45:00 4
2015-01-01 22:45:00 4
2015-01-01 23:45:00 4
2015-01-02 00:00:00 1
Whereas I would like it to always sum over the same interval, meaning I would like only 4s.
(I am using RStudio 0.99.903 with R x64 3.3.2)
The problem is that you're using endpoints, but you want to align by the start of the interval, not the end. I thought you might be able to use this startpoints function, but that produced weird results.
The basic idea of the work-around below is to subtract a small amount from all index values, then use endpoints and period.apply to aggregate. Then call align.time on the result. I'm not sure if this is a general solution, but it seems to work for your example.
library(xts)
seq<-seq(from=ISOdate(2014,12,31,23,15),length.out = 100, by="15 min", tz="UTC")
xts<-xts(rep(1,100),order.by = seq)
# create a temporary object
tmp <- xts
# subtract a small amount of time from each index value
.index(tmp) <- .index(tmp)-0.001
# aggregate to hourly
agg <- period.apply(tmp, endpoints(tmp, "hours"), colSums)
# round index up to next hour
agg_aligned <- align.time(agg, 3600)

Calculating differences of dates in hours between rows of a dataframe

I have the following dataframe (ts1):
D1 Diff
1 20/11/2014 16:00 0.00
2 20/11/2014 17:00 0.01
3 20/11/2014 19:00 0.03
I would like to add a new column to ts1 that will be the difference in hours between successive rows D1 (dates) in hours.
The new ts1 should be:
D1 Diff N
1 20/11/2014 16:00 0.00
2 20/11/2014 17:00 0.01 1
3 20/11/2014 19:00 0.03 2
For calculating the difference in hours independently I use:
library(lubridate)
difftime(dmy_hm("29/12/2014 11:00"), dmy_hm("29/12/2014 9:00"), units="hours")
I know that for calculating the difference between each row I need to transform the ts1 into matrix.
I use the following command:
> ts1$N<-difftime(dmy_hm(as.matrix(ts1$D1)), units="hours")
And I get:
Error in as.POSIXct(time2) : argument "time2" is missing, with no default
Suppose ts1 is as shown in Note 2 at the end. Then create a POSIXct variable tt from D1, convert tt to numeric giving the number of seconds since the Epoch, divide that by 3600 to get the number of hours since the Epoch and take differences. No packages are used.
tt <- as.POSIXct(ts1$D1, format = "%d/%m/%Y %H:%M")
m <- transform(ts1, N = c(NA, diff(as.numeric(tt) / 3600)))
giving:
> m
D1 Diff N
1 20/11/2014 16:00 0.00 NA
2 20/11/2014 17:00 0.01 1
3 20/11/2014 19:00 0.03 2
Note 1: I assume you are looking for N so that you can fill in the empty hours. In that case you don't really need N. Also, it would be easier to deal with time series if you use a time series representation. First we convert ts1 to a zoo object, then we create a zero width zoo object with the datetimes that we need and finally we merge them:
library(zoo)
z <- read.zoo(ts1, tz = "", format = "%d/%m/%Y %H:%M")
z0 <- zoo(, seq(start(z), end(z), "hours"))
zz <- merge(z, z0)
giving:
> zz
2014-11-20 16:00:00 2014-11-20 17:00:00 2014-11-20 18:00:00 2014-11-20 19:00:00
0.00 0.01 NA 0.03
If you really did need a data frame back then:
DF <- fortify.zoo(zz)
Note 2: Input used in reproducible form is:
Lines <- "D1,Diff
1,20/11/2014 16:00,0.00
2,20/11/2014 17:00,0.01
3,20/11/2014 19:00,0.03"
ts1 <- read.csv(text = Lines, as.is = TRUE)
Thanks to #David Arenburg and #nicola:
Can use either:
res <- diff(as.POSIXct(ts1$D1, format = "%d/%m/%Y %H:%M")) ; units(res) <- "hours"
Or:
res <- diff(dmy_hm(ts1$D1))
and afterwards:
ts1$N <- c(NA_real_, as.numeric(res))

Calulcate running difference of time using difftime on one column of timestamps

How would you calculate time difference of two consecutive rows of timestamps in minutes and add the result to a new column.
I have tried this:
data$hours <- as.numeric(floor(difftime(timestamps(data), (timestamps(data)[1]), units="mins")))
But only get difference from time zero and onwards.
Added example data with 'mins' column that I want to be added
timestamps mins
2013-06-23 00:00:00 NA
2013-06-23 01:00:00 60
2013-06-23 02:00:00 60
2013-06-23 04:00:00 120
The code that you're using with the [1] is always referencing the first element of the timestamps vector.
To do what you want, you want to look at all but the first element minus all but the last element.
mytimes <- data.frame(timestamps=c("2013-06-23 00:00:00",
"2013-06-23 01:00:00",
"2013-06-23 02:00:00",
"2013-06-23 04:00:00"),
mins=NA)
mytimes$mins <- c(NA, difftime(mytimes$timestamps[-1],
mytimes$timestamps[-nrow(mytimes)],
units="mins"))
What this code does is:
Setup a data frame so that you will keep the length of the timestamps and mins the same.
Within that data frame, put the timestamps you have and the fact that you don't have any mins yet (i.e. NA).
Select all but the first element of timestamps mytimes$timestamps[-1]
Select all but the last element of timestamps mytimes$timestamps[-nrow(mytimes)]
Subtract them difftime (since they're well-formatted, you don't first have to make them POSIXct objects) with the units of minutes. units="mins"
Put an NA in front because you have one fewer difference than you have rows c(NA, ...)
Drop all of that back into the original data frame's mins column mytimes$mins <-
Another option is to calculate it with this approach:
# create some data for an MWE
hrs <- c(0,1,2,4)
df <- data.frame(timestamps = as.POSIXct(paste("2015-12-17",
paste(hrs, "00", "00", sep = ":"))))
df
# timestamps
# 1 2015-12-17 00:00:00
# 2 2015-12-17 01:00:00
# 3 2015-12-17 02:00:00
# 4 2015-12-17 04:00:00
# create a function that calculates the lag for n periods
lag <- function(x, n) c(rep(NA, n), x[1:(length(x) - n)])
# create a new column named mins
df$mins <- as.numeric(df$timestamps - lag(df$timestamps, 1)) / 60
df
# timestamps mins
# 1 2015-12-17 00:00:00 NA
# 2 2015-12-17 01:00:00 60
# 3 2015-12-17 02:00:00 60
# 4 2015-12-17 04:00:00 120

R Search for a particular time from index

I use an xts object. The index of the object is as below. There is one for every hour of the day for a year.
"2011-01-02 18:59:00 EST"
"2011-01-02 19:58:00 EST"
"2011-01-02 20:59:00 EST"
In columns are values associated with each index entry. What I want to do is calculate the standard deviation of the value for all Mondays at 18:59 for the complete year. There should be 52 values for the year.
I'm able to search for the day of the week using the weekdays() function, but my problem is searching for the time, such as 18:59:00 or any other time.
You can do this by using interaction to create a factor from the combination of weekdays and .indexhour, then use split to select the relevant observations from your xts object.
set.seed(21)
x <- .xts(rnorm(1e4), seq(1, by=60*60, length.out=1e4))
groups <- interaction(weekdays(index(x)), .indexhour(x))
output <- lapply(split(x, groups), function(x) c(count=length(x), sd=sd(x)))
output <- do.call(rbind, output)
head(output)
# count sd
# Friday.0 60 1.0301030
# Monday.0 59 0.9204670
# Saturday.0 60 0.9842125
# Sunday.0 60 0.9500347
# Thursday.0 60 0.9506620
# Tuesday.0 59 0.8972697
You can use the .index* family of functions (don't forget the '.' in front of 'index'!):
fxts[.indexmon(fxts)==0] # its zero-based (!) and gives you all the January values
fxts[.indexmday(fxts)==1] # beginning of month
fxts[.indexwday(SPY)==1] # Mondays
require(quantmod)
> fxts
value
2011-01-02 19:58:00 1
2011-01-02 20:59:00 2
2011-01-03 18:59:00 3
2011-01-09 19:58:00 4
2011-01-09 20:59:00 5
2011-01-10 18:59:00 6
2011-01-16 18:59:00 7
2011-01-16 19:58:00 8
2011-01-16 20:59:00 9`
fxts[.indexwday(fxts)==1] #this gives you all the Mondays
for subsetting the time you use
fxts["T19:30/T20:00"] # this will give you the time period you are looking for
and here you combine weekday and time period
fxts["T18:30/T20:00"] & fxts[.indexwday(fxts)==1] # to get a logical vector or
fxts["T18:30/T21:00"][.indexwday(fxts["T18:30/T21:00"])==1] # to get the values
> value
2011-01-03 18:58:00 3
2011-01-10 18:59:00 6

Resources