So I have a xts time serie over the year with time zone "UTC". The time interval between each row is 15 minutes.
x1 x2
2014-12-31 23:15:00 153.0 0.0
2014-12-31 23:30:00 167.1 5.4
2014-12-31 23:45:00 190.3 4.1
2015-01-01 00:00:00 167.1 9.7
As I want data over one hour to allow for comparison with other data sets, I tried to use period.apply:
dat <- period.apply(dat, endpoints(dat,on="hours",k=1), colSums)
The problem is that the first row in my new data set is 2014-12-31 23:45:00 and not 2015-01-01 00:00:00. I tried changing the endpoint vector but somehow it keeps saying that it is out of bounds. I also thought this was my answer: https://stats.stackexchange.com/questions/5305/how-to-re-sample-an-xts-time-series-in-r/19003#19003 but it was not. I don't want to change the names of my columns, I want to sum over a different interval.
Here a reproducible example:
library(xts)
seq<-seq(from=ISOdate(2014,12,31,23,15),length.out = 100, by="15 min", tz="UTC")
xts<-xts(rep(1,100),order.by = seq)
period.apply(xts, endpoints(xts,on="hours",k=1), colSums)
And the result looks like this:
2014-12-31 23:45:00 3
2015-01-01 00:45:00 4
2015-01-01 01:45:00 4
2015-01-01 02:45:00 4
and ends up like this:
2015-01-01 21:45:00 4
2015-01-01 22:45:00 4
2015-01-01 23:45:00 4
2015-01-02 00:00:00 1
Whereas I would like it to always sum over the same interval, meaning I would like only 4s.
(I am using RStudio 0.99.903 with R x64 3.3.2)
The problem is that you're using endpoints, but you want to align by the start of the interval, not the end. I thought you might be able to use this startpoints function, but that produced weird results.
The basic idea of the work-around below is to subtract a small amount from all index values, then use endpoints and period.apply to aggregate. Then call align.time on the result. I'm not sure if this is a general solution, but it seems to work for your example.
library(xts)
seq<-seq(from=ISOdate(2014,12,31,23,15),length.out = 100, by="15 min", tz="UTC")
xts<-xts(rep(1,100),order.by = seq)
# create a temporary object
tmp <- xts
# subtract a small amount of time from each index value
.index(tmp) <- .index(tmp)-0.001
# aggregate to hourly
agg <- period.apply(tmp, endpoints(tmp, "hours"), colSums)
# round index up to next hour
agg_aligned <- align.time(agg, 3600)
Related
I got the following dataset:
data <- read.table(text="
wake_time sleep_time
08:38:00 23:05:00
09:30:00 00:50:00
06:45:00 22:15:00
07:27:00 23:34:00
09:00:00 23:00:00
09:05:00 00:10:00
06:40:00 23:28:00
10:00:00 23:30:00
08:10:00 00:10:00
08:07:00 00:38:00", header=T)
I used the chron-package to calculate the average wake_time:
> mean(times(data$wake_time))
[1] 08:20:12
But when I do the same for the variable sleep_time, this happens:
> mean(times(data$sleep_time))
[1] 14:04:00
I guess the result is distorted because the sleep_time contains times before and after midnight.
But how can I solve this problem?
Additionally:
How can I calculate the sd of the times. I want to use it like "mean wake-up-time 08:20 ± 44 min" for example.
THe times values are stored as numbers 0-1 representing a fraction of a day. If the sleep time is earlier than the wake time, you can "add a day" before taking the mean. For example
library(chron)
wake <- times(data$wake_time)
sleep <- times(data$sleep_time)
times(mean(ifelse(sleep < wake, sleep+1, sleep)))
# [1] 23:40:00
And since the values are parts of a day, if you want the sd in minutes, you'd take the partial day values and convert to minutes
sd(ifelse(sleep < wake, sleep+1, sleep) * 24*60)
# [1] 47.60252
I need to convert quarterly data into yearly, by summing over 4 quarters in each year. When I searched stackoverflow.com, I found that using a function to sum over periods, seem to work. However, the format did not match, so I couldn't work with the converted year data array with the other arrays
For example, annual data in FRED looks as follows:
2009-01-01 12126.078
2010-01-01 12739.542
2011-01-01 13352.255
2012-01-01 14061.878
2013-01-01 14444.823
However, when I changed the data using the following function:
library("quantmod")
library(zoo)
library(mFilter)
library(nleqslv)
fredsym <- c("PROPINC")
quarter.proprietors_income <- PROPINC
## convert to annual
as.year <- function(x) as.integer(as.yearqtr(x)) # a new function
annual.proprietors_income <- aggregate(quarter.proprietors_income, as.yearqtr, sum) # sum over quarters
it changes from this:
2016-01-01 1327.613
2016-04-01 1339.493
2016-07-01 1346.067
2016-10-01 1354.560
2017-01-01 1380.221
2017-04-01 1378.637
2017-07-01 1381.911
2017-10-01 1403.114
to this:
2011 4574.669
2012 4965.486
2013 5138.968
2014 5263.208
2015 5275.225
2016 5367.733
2017 5543.883
What I need is having an annual data but with the original YYYY-MM-DD format, and it should appear as 01-01 for each yearly data.. Otherwise it doesn't work with other annual data...
Is there any way to solve this issue?
Using DF in the Note below use cut as shown:
aggregate(DF["value"], list(year = as.Date(cut(as.Date(DF$Date), "year"))), sum)
giving:
year value
1 2016-01-01 5367.733
2 2017-01-01 5543.883
Note
Lines <- "Date value
2016-01-01 1327.613
2016-04-01 1339.493
2016-07-01 1346.067
2016-10-01 1354.560
2017-01-01 1380.221
2017-04-01 1378.637
2017-07-01 1381.911
2017-10-01 1403.114"
DF <- read.table(text = Lines, header = TRUE)
I found that, the aggregate command makes the class into zoo. No more xts to be remained as time series.
Alternatively, apply.yearly seems to work.
annual.proprietors_income <- apply.yearly(xts(quarter.proprietors_income),sum)
This is now in xts. BUt the thing is they show mon-day as ending quarter as YYYY-10-01 for each year. How can I make it into YYYY-01-01....
I have a very big dataframe in R, containing weather data with the following format.
valid temp
1 17/08/2014 00:20 14
2 17/08/2014 00:50 14
3 17/08/2014 01:20 13.5
4 17/08/2014 01:50 13
5 17/08/2014 02:20 12
6 17/08/2014 02:50 10
I would like to convert these sub-hourly data to hourly, like the following.
valid tmpc
1 2014-08-17 00:00:00 14
2 2014-08-17 01:00:00 13.75
3 2014-08-17 02:00:00 12.5
The class of df$valid is 'factor'. I have tried first converting them to Date through POSIXct, but it gives only NA values. I have also tried changing the system locale and still I get NAs.
We can do this in base R by converting to POSIXlt, set the minute to 0, convert it back to POSIXct and aggregate to get the mean of 'temp'
df1$valid <- strptime(df1$valid, "%d/%m/%Y %H:%M")
df1$valid$min <- 0
df1$valid <- as.POSIXct(df1$valid)
aggregate(temp~valid, df1, FUN = mean)
Option 1: The lubridate solution using ceiling_date or round_date. It's not clear according to your data frame and results if what you want is to round or ceiling. For instance, in the first row you are rounding and in the third using ceiling. Anyways here the example:
library(lubridate)
df <- data.frame(i = 1, valid= "17/08/2014 01:28", temp = 14)
df$valid <- dmy_hm(df$valid)
df$valid_round <- ceiling_date(df$valid , unit="hours")
Results:
i valid temp valid_round
1 1 2014-08-17 01:28:00 14 2014-08-17 02:00:00
Option 2: using the base functions. Use:
df$valid <- as.POSIXct(strptime(df$valid, "%d/%m/%Y %H:%M", tz ="UTC"))
and then round it.
How would you calculate time difference of two consecutive rows of timestamps in minutes and add the result to a new column.
I have tried this:
data$hours <- as.numeric(floor(difftime(timestamps(data), (timestamps(data)[1]), units="mins")))
But only get difference from time zero and onwards.
Added example data with 'mins' column that I want to be added
timestamps mins
2013-06-23 00:00:00 NA
2013-06-23 01:00:00 60
2013-06-23 02:00:00 60
2013-06-23 04:00:00 120
The code that you're using with the [1] is always referencing the first element of the timestamps vector.
To do what you want, you want to look at all but the first element minus all but the last element.
mytimes <- data.frame(timestamps=c("2013-06-23 00:00:00",
"2013-06-23 01:00:00",
"2013-06-23 02:00:00",
"2013-06-23 04:00:00"),
mins=NA)
mytimes$mins <- c(NA, difftime(mytimes$timestamps[-1],
mytimes$timestamps[-nrow(mytimes)],
units="mins"))
What this code does is:
Setup a data frame so that you will keep the length of the timestamps and mins the same.
Within that data frame, put the timestamps you have and the fact that you don't have any mins yet (i.e. NA).
Select all but the first element of timestamps mytimes$timestamps[-1]
Select all but the last element of timestamps mytimes$timestamps[-nrow(mytimes)]
Subtract them difftime (since they're well-formatted, you don't first have to make them POSIXct objects) with the units of minutes. units="mins"
Put an NA in front because you have one fewer difference than you have rows c(NA, ...)
Drop all of that back into the original data frame's mins column mytimes$mins <-
Another option is to calculate it with this approach:
# create some data for an MWE
hrs <- c(0,1,2,4)
df <- data.frame(timestamps = as.POSIXct(paste("2015-12-17",
paste(hrs, "00", "00", sep = ":"))))
df
# timestamps
# 1 2015-12-17 00:00:00
# 2 2015-12-17 01:00:00
# 3 2015-12-17 02:00:00
# 4 2015-12-17 04:00:00
# create a function that calculates the lag for n periods
lag <- function(x, n) c(rep(NA, n), x[1:(length(x) - n)])
# create a new column named mins
df$mins <- as.numeric(df$timestamps - lag(df$timestamps, 1)) / 60
df
# timestamps mins
# 1 2015-12-17 00:00:00 NA
# 2 2015-12-17 01:00:00 60
# 3 2015-12-17 02:00:00 60
# 4 2015-12-17 04:00:00 120
I have time series data where the values are stored at the end of a 1-min sampling interval (i.e. data for 00:00 belongs to the interval 23:59 - 00:00 etc.).
I now would like to average in 5 min intervals giving the mean concentrations at 00:05, 00:10, etc.
What I get with the code below is the averages at 00:04, 00:09, etc., which seems to be related to the endpoints function, but I cannot figure out how to average correctly (i.e. in my case minutes 00:01 to 00:05 reported as mean at 00:05 etc.)
library(zoo)
library(xts)
t1 <- as.POSIXct("2012-1-1 0:1:0")
t2 <- as.POSIXct("2012-1-1 0:15:0")
d <- seq(t1, t2, by = "1 min")
x <- rnorm(length(d))
z <- zoo(x, d)
period.apply(z,endpoints(z,"mins",5),mean)
> 2012-01-01 00:04:00 2012-01-01 00:09:00 2012-01-01 00:14:00 2012-01-01 00:15:00
0.6864088 -0.9403631 -0.4269895 0.6728044
The endpoints function is working correctly. You need to change your index values. 00:05:00 is the beginning of the 5th minute, not the end.
> z <- zoo(x, d-1)
> period.apply(z,endpoints(z,"mins",5),mean)
2012-01-01 00:04:59 2012-01-01 00:09:59 2012-01-01 00:14:59
1.2324436 -0.5881076 0.5067009