A simple question: I know how to subset time series in xts for years, months and days from the help: x['2000-05/2001'] and so on.
But how can I subset my data by hours of the day? I would like to get all data between 07:00 am and 06:00 pm. I.e., I want to extract the data during business time - irrelevant of the day (I take care for weekends later on). Help has an example of the form:
.parseISO8601('T08:30/T15:00')
But this does not work in my case. Does anybody have a clue?
If your xts object is called x then something like y <- x["T09:30/T11:00"] works for me to get a slice of the morning session, for example.
For some reason to cut xts time of day using x["T09:30/T11:00"] is pretty slow, I use the method from R: Efficiently subsetting dataframe based on time of day and data.table time subset vs xts time subset to make a faster function with similar syntax:
cut_time_of_day <- function(x, t_str_begin, t_str_end){
tstr_to_sec <- function(t_str){
#"09:00:00" to sec of day
as.numeric(as.POSIXct(paste("1970-01-01", t_str), "UTC")) %% (24*60*60)
}
#POSIX ignores leap second
#sec_of_day = as.numeric(index(x)) %% (24*60*60) #GMT only
sec_of_day = {lt = as.POSIXlt(index(x)); lt$hour *60*60 + lt$min*60 + lt$sec} #handle tzone
sec_begin = tstr_to_sec(t_str_begin)
sec_end = tstr_to_sec(t_str_end)
return(x[ sec_of_day >= sec_begin & sec_of_day <= sec_end,])
}
Test:
n = 100000
dtime <- seq(ISOdate(2001,1,1), by = 60*60, length.out = n)
attributes(dtime)$tzone <- "CET"
x = xts((1:n), order.by = dtime)
y2 <- cut_time_of_day(x,"07:00:00", "09:00:00")
y1 <- x["T07:00:00/T09:00:00"]
identical(y1,y2)
Related
I am trying to aggregate quarterly hour data but I am getting the error message invalid type (list). The list is a POSIXlt list and I have aggregated minutely and hourly data before but I have never seen this error before. Do I need to convert the list to a different type and if so, would I still be able to extract the 15min data? Here is my code, I would really appreciate any help:
seq_start <- as.POSIXct("2015-09-10 01:00:00 BST")
Arrivals <- floor(runif(60, min = 1, max = 14))
Minute_Seq <- seq(trunc(seq_start, units='mins'), by='1 mins',length = 60)
Arrival_board = data.frame(Minute_Seq,Arrivals)
Arrival_board$QTR= as.POSIXlt(round(as.double(Arrival_board$Minute_Seq)/(5*60))*(5*60),origin=(as.POSIXlt('1970-01-01')))
arrive_stats <- aggregate(Arrival_board$Arrivals ~ Arrival_board$QTR, Arrival_board, FUN=mean)
POSIXlt is a list type, use POSIXct instead:
aggregate(Arrivals ~ QTR, transform(Arrival_board, QTR=as.POSIXct(QTR)), FUN=mean)
Here is an alternative to binning your data via your QTR expression. It uses the seq.Date and cut command. It is more straight forward than divide round and multiple:
seq_start <- as.POSIXct("2015-09-10 01:00:00 BST")
Arrivals <- floor(runif(60, min = 1, max = 14))
Minute_Seq <- seq(trunc(seq_start, units='mins'), by='1 mins',length = 60)
Arrival_board = data.frame(Minute_Seq,Arrivals)
QTR= seq(trunc(seq_start, units='mins'), by='5 mins',length = 13)
Arrival_board$QTR = cut(Arrival_board$Minute_Seq,QTR)
arrive_stats <- aggregate(Arrival_board$Arrivals ~ QTR, Arrival_board, FUN=mean)
Do to the differences on how the bins are defined there will be slight shift in the results. To correct for this case with a 5 minute window, change the seq_start time by 2 minutes:
QTR= seq(trunc(seq_start-(2*60), units='mins'), by='5 mins',length = 14)
This question already has an answer here:
Split time series data into time intervals (say an hour) and then plot the count
(1 answer)
Closed 7 years ago.
Date,Time,Lots,Status
"10-28-15","00:04:50","13-09","1111111110000000"
"10-28-15","00:04:50","13-10","1111100000000000"
"10-28-15","00:04:50","13-11","1111111011100000"
"10-28-15","00:04:50","13-12","1111011111000000"
"10-28-15","00:04:57","13-13","1111111111000000"
"10-28-15","00:04:57","13-14","1111111111110000"
"10-28-15","00:04:57","13-15","1111111100000000"
"10-28-15","00:04:57","13-16","1111111111000000"
"10-28-15","00:05:04","13-17","1111111110000000"
"10-28-15","00:05:04","13-18","1111101100000000"
"10-28-15","00:05:04","13-19","1111111111100000"
"10-28-15","00:05:04","13-20","1111111111100000"
"10-28-15","00:05:11","13-21","1111110100000000"
"10-28-15","00:05:11","13-22","1000011111100000"
"10-28-15","00:05:11","13-23","1101011111110000"
"10-28-15","00:05:11","13-24","1111111111000000"
"10-28-15","00:05:19","13-25","1011000000000000"
"10-28-15","00:05:19","13-26","0000000000000000"
"10-28-15","00:05:19","13-27","1111011110000000"
"10-28-15","00:05:19","13-28","1010000000000000"
say dfrm above "sample", I need to convert the time into time interval of 15 minutes. How do I do that? Eg: I have 4 intervals for every hour. 00:04:50 will go into 00:04:45 - 00:04:50. Thanks!
I have tried using:
format(as.POSIXlt(as.POSIXct('2000-1-1', "UTC") + round(as.numeric(sample$V3)/300)*300), format = "%H:%M:%S")
I hope I understood your question correctly, but I think you could do it like this:
# example data frame:
myDat <- data.frame(date = rep("2010-08-15",30),
time = sprintf("%02i:%02i:%02i",rep(12:14,each=10),rep(c(0,15,16,29:31,44:45,59,1),3),sample(1:59)[30]))
#produce a datetime variable
myDat$datetime <- strptime(x = paste(myDat$date,myDat$time),format = "%Y-%m-%d %H:%M:%S")
# extract the minutes
myDat$min <- as.integer(as.character(myDat$datetime,format="%M"))
# find the interval to put them in
myDat$ival <- findInterval(myDat$min,c(15,30,45,60)); myDat$ival <- factor(myDat$ival)
levels(myDat$ival) <- c("00","15","30","45")
# concatenate minute interval and hour
myDat$timeIval <- sprintf("%s:%s:00",as.character(myDat$datetime,format="%H"),myDat$ival)
myDat[order(myDat$datetime),]
I'd first convert the data to POSIXct
time <- as.POSIXct(
strptime(paste0(df$column1, " ", df$column2), format="%m-%d-%y %H:%M:%S")
)
and then use seq.POSIXct and cut to generate a factor variable for the 15 min intervals.
interval <- cut(
time,
breaks = seq(starttime, endtime, as.difftime(15, units="mins))
)
You may want to add -Inf or Inf to the breaks to avoid generating NA values.
I am trying to create a ts object using R for a daily time series that starts on 24.02.2015 and ends on 13.04.2015. I have put the frequency=7 for daily data but I cannot find a way to put the exact date as start argument.
I think this is what you want, using the decimal_date() function from 'lubridate' to get the proper start time for a daily series and assuming that the vector of values you want to index as a ts is called x and is of the proper length:
library(lubridate)
df <- ts(x, start = decimal_date(as.Date("2015-02-24")), frequency = 365)
Here's what that looks like if I use rnorm() to generate an x of the proper length:
> df
Time Series:
Start = c(2015, 55)
End = c(2015, 103)
Frequency = 365
[1] 0.4284579 1.9384426 0.1242242 -2.4002789 -0.4064669 0.6945274 -0.5172909 0.4772347 0.8758635 -1.7233406 0.5929249 1.5662611 1.0692173 -0.1354226
[15] 1.1404375 0.7714662 -0.2871663 -5.2720038 -1.7353146 -0.7053329 1.0206803 1.7170262 -0.3469172 0.2594851 2.0371700 -2.1549066 -0.6639050 -0.4912258
[29] -0.3849884 -3.0448583 -1.3317834 1.6173705 0.7176759 -0.8646802 -1.7697016 1.1114061 0.6941131 -0.1942612 -0.1836107 -0.5850649 -1.7449090 -3.3646555
[43] -0.4341833 1.9721407 1.4995265 1.7168002 1.8617295 -3.4578959 1.1639413
Note that for daily indexing, you want frequency = 365, not 7, which denotes weekly indexing.
If you want a vector of dates that you can use in 'zoo' instead, this does it:
seq(from = as.Date("2015-02-24"), to = as.Date("2015-04-13"), by = 1)
So you would create a zoo object like this:
zoo(x, seq(from = as.Date("2015-02-24"), to = as.Date("2015-04-13"), by = 1))
And if you want a table with date column, you can use:
df <- data.frame(date = seq(from = as.Date("2015-02-24"), to = as.Date("2015-04-13"), by = 1))
Using the xts library:
library(xts)
data_xts <- xts(x=dataframe$x, order.by=as.Date(dataframe$date, "%m/%d/%Y"))
With this method, you can't or don't have to specify the end date.
The output looks like this:
[,1]
2020-01-01 7168.3
2020-01-02 7174.4
2020-01-03 6942.3
2020-01-04 7334.8
I'm working with some time data and I'm having problems converting a time difference to years and months.
My data looks more or less like this,
dfn <- data.frame(
Today = Sys.time(),
DOB = seq(as.POSIXct('2007-03-27 00:00:01'), len= 26, by="3 day"),
Patient = factor(1:26, labels = LETTERS))
First I subtract the data of birth (DOB) form today's data (Today).
dfn$ageToday <- dfn$Today - dfn$DOB
This gives me the Time difference in days.
dfn$ageToday
Time differences in days
[1] 1875.866 1872.866 1869.866 1866.866 1863.866
[6] 1860.866 1857.866 1854.866 1851.866 1848.866
[11] 1845.866 1842.866 1839.866 1836.866 1833.866
[16] 1830.866 1827.866 1824.866 1821.866 1818.866
[21] 1815.866 1812.866 1809.866 1806.866 1803.866
[26] 1800.866
attr(,"tzone")
[1] ""
This is where first part of my question comes in; how do I convert this difference to years and months (rounded to months)? (i.e. 4.7, 4.11, etc.)
I read the ?difftime man page and the ?format, but I did not figure it out.
Any help would be appreciated.
Furthermore, I would like to melt my final object and if I try using melt on the data frame above using this command,
require(plyr)
require(reshape)
mdfn <- melt(dfn, id=c('Patient'))
I get this strange warning I haven't see before
Error in as.POSIXct.default(value) :
do not know how to convert 'value' to class "POSIXct"
So, my second question is; how do I create a time diffrence I can melt alongside my POSIXct variables? If I melt without dfn$ageToday everything works like a charm.
Thanks, Eric
The lubridatepackage makes working with dates and times, including finding time differences, really easy.
library("lubridate")
library("reshape2")
dfn <- data.frame(
Today = Sys.time(),
DOB = seq(as.POSIXct('2007-03-27 00:00:01'), len= 26, by="3 day"),
Patient = factor(1:26, labels = LETTERS))
dfn$diff <- new_interval(dfn$DOB, dfn$Today) / duration(num = 1, units = "years")
mdfn <- melt(dfn, id=c('Patient'))
class(mdfn$value) # all values are coerced into numeric
The new_interval() function calculates the time difference between two dates. Note that there is a function today() that could substitute for your use of Sys.time. Finally note the duration() function that creates a standard, ehm, duration that you can use to divide the interval by a length of standard units, in this case, a unit of one year.
In case you want to preserve the contents of Today and DOB, then you may want to convert everything to character first and reconvert later...
library("lubridate")
library("reshape2")
dfn <- data.frame(
Today = Sys.time(),
DOB = seq(as.POSIXct('2007-03-27 00:00:01'), len= 26, by="3 day"),
Patient = factor(1:26, labels = LETTERS))
# Create standard durations for a year and a month
one.year <- duration(num = 1, units = "years")
one.month <- duration(num = 1, units = "months")
# Calculate the difference in years as float and integer
dfn$diff.years <- new_interval(dfn$DOB, dfn$Today) / one.year
dfn$years <- floor( new_interval(dfn$DOB, dfn$Today) / one.year )
# Calculate the modulo for number of months
dfn$diff.months <- round( new_interval(dfn$DOB, dfn$Today) / one.month )
dfn$months <- dfn$diff.months %% 12
# Paste the years and months together
# I am not using the decimal point so as not to imply this is
# a numeric representation of the diference
dfn$y.m <- paste(dfn$years, dfn$months, sep = '|')
# convert Today and DOB to character so as to preserve them in melting
dfn$Today <- as.character(dfn$Today)
dfn$DOB <- as.character(dfn$DOB)
# melt using string representation of difference between the two dates
dfn2 <- dfn[,c("Today", "DOB", "Patient", "y.m")]
mdfn2 <- melt(dfn2, id=c('Patient'))
# alternative melt using numeric representation of difference in years
dfn3 <- dfn[,c("Today", "DOB", "Patient", "diff.years")]
mdfn3 <- melt(dfn3, id=c('Patient'))
I am trying to convert a time series of daily data (only business days) contained in an xts object into a time series of weekly data. Specifically, I want the resulting time series to contain the end of week entries (meaning last business day of a week) of the original data. I've been trying to achieve this using the function to.weekly of the xts package.
In the discussion regarding another question (Wrong week-ending date using 'to.weekly' function in 'xts' package) the below example code achieved exactly what I need. However, when I run the code, to.weekly uses Mondays as a representative for the weekly data.
I am wondering which global setting might allow me to force to.weekly to use Friday as a week's representative.
Example code:
library(lubridate); library(xts)
test.dates <- seq(as.Date("2000-01-01"),as.Date("2011-10-01"),by='days')
test.dates <- test.dates[wday(test.dates)!=1 & wday(test.dates)!=7] #Remove weekends
test.data <- rnorm(length(test.dates),mean=1,sd=2)
test.xts <- xts(x=test.data,order.by=test.dates)
test.weekly <- to.weekly(test.xts)
test.weekly[wday(test.weekly, label = TRUE, abbr = TRUE) != "Fri"]
test.dates <- test.dates[wday(test.dates)==6]
tail(wday(test.dates, label = TRUE, abbr = TRUE))
#[1] Fri Fri Fri Fri Fri Fri
#Levels: Sun < Mon < Tues < Wed < Thurs < Fri < Sat
OK. With the unstated requirements added to the problem:
require(timeDate)
require(lubridate)
startDate <- as.Date("2000-01-03")
endDate <- as.Date("2011-10-01")
AllDays <- as.timeDate(seq(startDate, endDate, by="day"))
is.wrk <- isBizday(AllDays, holidays = holidayNYSE(), wday = 1:5)
is.wrkdt <- as.Date(names(is.wrk)[is.wrk])
endweeks <- tapply(is.wrkdt, paste(year(is.wrkdt),week(is.wrkdt), sep = ""), max)
head(as.Date(endweeks, origin="1970-01-01"))
# 1 2 3 4 5 6
#"2011-01-06" "2011-01-13" "2011-01-20" "2011-01-27" "2011-02-03" "2011-02-10"
So you want:
as.Date(endweeks, origin="1970-01-01")
I had the same problem and I found a two-lines solution.
You need first to retain only business days (if your data set also contains holidays):
test.dates <- test.dates[ wday(dates) %in% c(2:6) ]
Then you have two alternatives. First, you can use to.weekly() which retains the most recent business day, i.e. not necessarily constrained to wday(test.dates)==6
test.weekly <- to.weekly(test.xts)
Or you can use the function endpoints() which works on multi-columns xts objects and deals much better with NA's because it does not remove missing data (preventing the warning "missing values removed from data")
test.weekly <- test.xts[endpoints(test.xts,on='weeks')[-1],]