This question already has an answer here:
Split time series data into time intervals (say an hour) and then plot the count
(1 answer)
Closed 7 years ago.
Date,Time,Lots,Status
"10-28-15","00:04:50","13-09","1111111110000000"
"10-28-15","00:04:50","13-10","1111100000000000"
"10-28-15","00:04:50","13-11","1111111011100000"
"10-28-15","00:04:50","13-12","1111011111000000"
"10-28-15","00:04:57","13-13","1111111111000000"
"10-28-15","00:04:57","13-14","1111111111110000"
"10-28-15","00:04:57","13-15","1111111100000000"
"10-28-15","00:04:57","13-16","1111111111000000"
"10-28-15","00:05:04","13-17","1111111110000000"
"10-28-15","00:05:04","13-18","1111101100000000"
"10-28-15","00:05:04","13-19","1111111111100000"
"10-28-15","00:05:04","13-20","1111111111100000"
"10-28-15","00:05:11","13-21","1111110100000000"
"10-28-15","00:05:11","13-22","1000011111100000"
"10-28-15","00:05:11","13-23","1101011111110000"
"10-28-15","00:05:11","13-24","1111111111000000"
"10-28-15","00:05:19","13-25","1011000000000000"
"10-28-15","00:05:19","13-26","0000000000000000"
"10-28-15","00:05:19","13-27","1111011110000000"
"10-28-15","00:05:19","13-28","1010000000000000"
say dfrm above "sample", I need to convert the time into time interval of 15 minutes. How do I do that? Eg: I have 4 intervals for every hour. 00:04:50 will go into 00:04:45 - 00:04:50. Thanks!
I have tried using:
format(as.POSIXlt(as.POSIXct('2000-1-1', "UTC") + round(as.numeric(sample$V3)/300)*300), format = "%H:%M:%S")
I hope I understood your question correctly, but I think you could do it like this:
# example data frame:
myDat <- data.frame(date = rep("2010-08-15",30),
time = sprintf("%02i:%02i:%02i",rep(12:14,each=10),rep(c(0,15,16,29:31,44:45,59,1),3),sample(1:59)[30]))
#produce a datetime variable
myDat$datetime <- strptime(x = paste(myDat$date,myDat$time),format = "%Y-%m-%d %H:%M:%S")
# extract the minutes
myDat$min <- as.integer(as.character(myDat$datetime,format="%M"))
# find the interval to put them in
myDat$ival <- findInterval(myDat$min,c(15,30,45,60)); myDat$ival <- factor(myDat$ival)
levels(myDat$ival) <- c("00","15","30","45")
# concatenate minute interval and hour
myDat$timeIval <- sprintf("%s:%s:00",as.character(myDat$datetime,format="%H"),myDat$ival)
myDat[order(myDat$datetime),]
I'd first convert the data to POSIXct
time <- as.POSIXct(
strptime(paste0(df$column1, " ", df$column2), format="%m-%d-%y %H:%M:%S")
)
and then use seq.POSIXct and cut to generate a factor variable for the 15 min intervals.
interval <- cut(
time,
breaks = seq(starttime, endtime, as.difftime(15, units="mins))
)
You may want to add -Inf or Inf to the breaks to avoid generating NA values.
Related
I have a 24 hour data starting from 7:30 today (for example), until 7:30 the next day, because I didn't link the date to the line plot, R sorts the hour starting from 00:00 despite the data starting at 7:30, I am a beginner in R, and I don't know where to begin to even solve this problem, should I try linking the date also to the X axis, or is there a better solution?
My time function somehow didn't work either, it used to work when I was plotting data for 15 minute increments.
library(chron)
d <- read.csv(file="data.csv", header = T)
t <- times(d$Time)
plot(t,d$MCO2, type="l")
Graph created from the 24 hour data I have :
Graph created from a 15 minute data using the same code :
I wanted the outcome to be from 7:30 to 7:30 the next day, but it showed now a decimal number from 0.0 to 1
Here is the link to the data, just in case:
https://www.dropbox.com/s/wsg437gu00e5t08/Data%20210519.csv?dl=0
The question is actually about combining a date column and a time column to create a timestamp containing date AND time. Note that I suggest to process everything as if we are in GMT timezone. You can pick whatever timezone you want, then stick to it.
# use ggplot
library(ggplot2)
# assume everything happens in GMT timezone
Sys.setenv( TZ = "GMT" )
# replicating the data: a measurement result sampled at 1 sec interval
t <- seq(start, end, by = "1 sec")
Time24 <- trimws(strftime(t, format = "%k:%M:%OS", tz="GMT"))
Date <- strftime(t, format = "%d/%m/%Y", tz="GMT")
head(Time24)
head(Date)
d <- data.frame(Date, Time24)
# this is just a random data of temperature
d$temp <- rnorm(length(d$Date),mean=25,sd=5)
head(d)
# the resulting data is as follows
# Date Time24 temp
#1 22/05/2019 0:00:00 22.67185
#2 22/05/2019 0:00:01 19.91123
#3 22/05/2019 0:00:02 19.57393
#4 22/05/2019 0:00:03 15.37280
#5 22/05/2019 0:00:04 31.76683
#6 22/05/2019 0:00:05 26.75153
# this is the answer to the question
# which is combining the the date and the time column of the data
# note we still assume that this happens in GMT
t <- as.POSIXct(paste(d$Date,d$Time24,sep=" "), format = "%d/%m/%Y %H:%M:%OS", tz="GMT")
# print the data into a plot
png(filename = "test.png", width = 800, height = 600, units = "px", pointsize = 22 )
ggplot(d,aes(x=t,y=temp)) + geom_line() +
scale_x_datetime(date_breaks = "3 hour",
date_labels = "%H:%M\n%d-%b")
The problem is that the function times does not include information about the day. This is a problem since your data spans two days.
The data type you use should be able to include information about the day. Posix is this data type. Also, since Posix is the go-to date-time object in R it is much easier to plot.
Before plotting the data, the time column should have the correct difference in days. When just transforming the column with as.POSIXct, the times of day 2 are read as if it is from day 1. This is why we have to add 24 hours to the correct entries.
After that, it is just a matter of plotting. I added an example of the package of ggplot2 since I prefer these plots.
You might notice that using as.POSIXct will add an incorrect date to your time information. Don't bother about this, you use this date just as a dummy date. You don't use this date itself, you just use it to be able to work with the difference in days.
library(ggplot2)
# Read in your data set
d <- read.csv(file="Data 210519.csv", header = T)
# Read column into R date-time object
t <- as.POSIXct(d$Time24, format = "%H:%M:%OS")
# Add 24 hours to time the time on day 2.
startOfDayTwo <- as.POSIXct("00:00:00", format = "%H:%M:%OS")
endOfDayTwo <- as.POSIXct("07:35:00", format = "%H:%M:%OS")
t[t >= startOfDayTwo & t <= endOfDayTwo] <- t[t >= startOfDayTwo & t <= endOfDayTwo] + 24*60*60
plot(t,d$MCO2, type="l")
# arguably a nicer plot
ggplot(d,aes(x=t,y=MCO2)) + geom_line() +
scale_x_datetime(date_breaks = "2 hour",
date_labels = "%I:%M %p")
I am trying to aggregate quarterly hour data but I am getting the error message invalid type (list). The list is a POSIXlt list and I have aggregated minutely and hourly data before but I have never seen this error before. Do I need to convert the list to a different type and if so, would I still be able to extract the 15min data? Here is my code, I would really appreciate any help:
seq_start <- as.POSIXct("2015-09-10 01:00:00 BST")
Arrivals <- floor(runif(60, min = 1, max = 14))
Minute_Seq <- seq(trunc(seq_start, units='mins'), by='1 mins',length = 60)
Arrival_board = data.frame(Minute_Seq,Arrivals)
Arrival_board$QTR= as.POSIXlt(round(as.double(Arrival_board$Minute_Seq)/(5*60))*(5*60),origin=(as.POSIXlt('1970-01-01')))
arrive_stats <- aggregate(Arrival_board$Arrivals ~ Arrival_board$QTR, Arrival_board, FUN=mean)
POSIXlt is a list type, use POSIXct instead:
aggregate(Arrivals ~ QTR, transform(Arrival_board, QTR=as.POSIXct(QTR)), FUN=mean)
Here is an alternative to binning your data via your QTR expression. It uses the seq.Date and cut command. It is more straight forward than divide round and multiple:
seq_start <- as.POSIXct("2015-09-10 01:00:00 BST")
Arrivals <- floor(runif(60, min = 1, max = 14))
Minute_Seq <- seq(trunc(seq_start, units='mins'), by='1 mins',length = 60)
Arrival_board = data.frame(Minute_Seq,Arrivals)
QTR= seq(trunc(seq_start, units='mins'), by='5 mins',length = 13)
Arrival_board$QTR = cut(Arrival_board$Minute_Seq,QTR)
arrive_stats <- aggregate(Arrival_board$Arrivals ~ QTR, Arrival_board, FUN=mean)
Do to the differences on how the bins are defined there will be slight shift in the results. To correct for this case with a 5 minute window, change the seq_start time by 2 minutes:
QTR= seq(trunc(seq_start-(2*60), units='mins'), by='5 mins',length = 14)
I have a raw dataset of observations taken at 5 minute intervals between 6am and 9pm during weekdays only. These do not come with date-time information for plotting etc so I am attempting to create a vector of date-times to add to this to my data. ie this:
X425 X432 X448
1 0.07994814 0.1513559 0.1293103
2 0.08102852 0.1436480 0.1259074
to this
X425 X432 X448
2010-05-24 06:00 0.07994814 0.1513559 0.1293103
2010-05-24 06:05 0.08102852 0.1436480 0.1259074
I have gone about this as follows:
# using lubridate and xts
library(xts)
library(lubridate)
# sequence of 5 min intervals from 06:00 to 21:00
sttime <- hms("06:00:00")
intervals <- sttime + c(0:180) * minutes(5)
# sequence of days from 2010-05-24 to 2010-11-05
dayseq <- timeBasedSeq("2010-05-24/2010-11-05/d")
# add intervals to dayseq
dayPlusTime <- function(days, times) {
dd <- NULL
for (i in 1:2) {
dd <- c(dd,(days[i] + times))}
return(dd)
}
obstime <- dayPlusTime(dayseq, intervals)`
But obstime is coming out as a list. days[1] + times works so I guess it's something to do with the way the POSIXct objects are concatenated together to make dd but i can't figure out what am I doing wrong otr where to go next.
Any help appreciated
A base alternative:
# create some dummy dates
dates <- Sys.Date() + 0:14
# select non-weekend days
wd <- dates[as.integer(format(dates, format = "%u")) %in% 1:5]
# create times from 06:00 to 21:00 by 5 min interval
times <- format(seq(from = as.POSIXct("2015-02-18 06:00"),
to = as.POSIXct("2015-02-18 21:00"),
by = "5 min"),
format = "%H:%M")
# create all date-time combinations, paste, convert to as.POSIXct and sort
wd_times <- sort(as.POSIXct(do.call(paste, expand.grid(wd, times))))
One of the issues is that your interval vector does not change the hour when the minutes go over 60.
Here is one way you could do this:
#create the interval vector
intervals<-c()
for(p in 6:20){
for(j in seq(0,55,by=5)){
intervals<-c(intervals,paste(p,j,sep=":"))
}
}
intervals<-c(intervals,"21:0")
#get the days
dayseq <- timeBasedSeq("2010-05-24/2010-11-05/d")
#concatenate everything and format to POSIXct at the end
obstime<-strptime(unlist(lapply(dayseq,function(x){paste(x,intervals)})),format="%Y-%m-%d %H:%M", tz="GMT")
A simple question: I know how to subset time series in xts for years, months and days from the help: x['2000-05/2001'] and so on.
But how can I subset my data by hours of the day? I would like to get all data between 07:00 am and 06:00 pm. I.e., I want to extract the data during business time - irrelevant of the day (I take care for weekends later on). Help has an example of the form:
.parseISO8601('T08:30/T15:00')
But this does not work in my case. Does anybody have a clue?
If your xts object is called x then something like y <- x["T09:30/T11:00"] works for me to get a slice of the morning session, for example.
For some reason to cut xts time of day using x["T09:30/T11:00"] is pretty slow, I use the method from R: Efficiently subsetting dataframe based on time of day and data.table time subset vs xts time subset to make a faster function with similar syntax:
cut_time_of_day <- function(x, t_str_begin, t_str_end){
tstr_to_sec <- function(t_str){
#"09:00:00" to sec of day
as.numeric(as.POSIXct(paste("1970-01-01", t_str), "UTC")) %% (24*60*60)
}
#POSIX ignores leap second
#sec_of_day = as.numeric(index(x)) %% (24*60*60) #GMT only
sec_of_day = {lt = as.POSIXlt(index(x)); lt$hour *60*60 + lt$min*60 + lt$sec} #handle tzone
sec_begin = tstr_to_sec(t_str_begin)
sec_end = tstr_to_sec(t_str_end)
return(x[ sec_of_day >= sec_begin & sec_of_day <= sec_end,])
}
Test:
n = 100000
dtime <- seq(ISOdate(2001,1,1), by = 60*60, length.out = n)
attributes(dtime)$tzone <- "CET"
x = xts((1:n), order.by = dtime)
y2 <- cut_time_of_day(x,"07:00:00", "09:00:00")
y1 <- x["T07:00:00/T09:00:00"]
identical(y1,y2)
I'm working with some time data and I'm having problems converting a time difference to years and months.
My data looks more or less like this,
dfn <- data.frame(
Today = Sys.time(),
DOB = seq(as.POSIXct('2007-03-27 00:00:01'), len= 26, by="3 day"),
Patient = factor(1:26, labels = LETTERS))
First I subtract the data of birth (DOB) form today's data (Today).
dfn$ageToday <- dfn$Today - dfn$DOB
This gives me the Time difference in days.
dfn$ageToday
Time differences in days
[1] 1875.866 1872.866 1869.866 1866.866 1863.866
[6] 1860.866 1857.866 1854.866 1851.866 1848.866
[11] 1845.866 1842.866 1839.866 1836.866 1833.866
[16] 1830.866 1827.866 1824.866 1821.866 1818.866
[21] 1815.866 1812.866 1809.866 1806.866 1803.866
[26] 1800.866
attr(,"tzone")
[1] ""
This is where first part of my question comes in; how do I convert this difference to years and months (rounded to months)? (i.e. 4.7, 4.11, etc.)
I read the ?difftime man page and the ?format, but I did not figure it out.
Any help would be appreciated.
Furthermore, I would like to melt my final object and if I try using melt on the data frame above using this command,
require(plyr)
require(reshape)
mdfn <- melt(dfn, id=c('Patient'))
I get this strange warning I haven't see before
Error in as.POSIXct.default(value) :
do not know how to convert 'value' to class "POSIXct"
So, my second question is; how do I create a time diffrence I can melt alongside my POSIXct variables? If I melt without dfn$ageToday everything works like a charm.
Thanks, Eric
The lubridatepackage makes working with dates and times, including finding time differences, really easy.
library("lubridate")
library("reshape2")
dfn <- data.frame(
Today = Sys.time(),
DOB = seq(as.POSIXct('2007-03-27 00:00:01'), len= 26, by="3 day"),
Patient = factor(1:26, labels = LETTERS))
dfn$diff <- new_interval(dfn$DOB, dfn$Today) / duration(num = 1, units = "years")
mdfn <- melt(dfn, id=c('Patient'))
class(mdfn$value) # all values are coerced into numeric
The new_interval() function calculates the time difference between two dates. Note that there is a function today() that could substitute for your use of Sys.time. Finally note the duration() function that creates a standard, ehm, duration that you can use to divide the interval by a length of standard units, in this case, a unit of one year.
In case you want to preserve the contents of Today and DOB, then you may want to convert everything to character first and reconvert later...
library("lubridate")
library("reshape2")
dfn <- data.frame(
Today = Sys.time(),
DOB = seq(as.POSIXct('2007-03-27 00:00:01'), len= 26, by="3 day"),
Patient = factor(1:26, labels = LETTERS))
# Create standard durations for a year and a month
one.year <- duration(num = 1, units = "years")
one.month <- duration(num = 1, units = "months")
# Calculate the difference in years as float and integer
dfn$diff.years <- new_interval(dfn$DOB, dfn$Today) / one.year
dfn$years <- floor( new_interval(dfn$DOB, dfn$Today) / one.year )
# Calculate the modulo for number of months
dfn$diff.months <- round( new_interval(dfn$DOB, dfn$Today) / one.month )
dfn$months <- dfn$diff.months %% 12
# Paste the years and months together
# I am not using the decimal point so as not to imply this is
# a numeric representation of the diference
dfn$y.m <- paste(dfn$years, dfn$months, sep = '|')
# convert Today and DOB to character so as to preserve them in melting
dfn$Today <- as.character(dfn$Today)
dfn$DOB <- as.character(dfn$DOB)
# melt using string representation of difference between the two dates
dfn2 <- dfn[,c("Today", "DOB", "Patient", "y.m")]
mdfn2 <- melt(dfn2, id=c('Patient'))
# alternative melt using numeric representation of difference in years
dfn3 <- dfn[,c("Today", "DOB", "Patient", "diff.years")]
mdfn3 <- melt(dfn3, id=c('Patient'))