Giving index numbers to time - r

I have a column of time
a = times('00:00:00', '00:15:00', '01:45:00', '23:45:00')
And I would like to give them an index based on a 15 min interval. So for example, 00:00:00 will be 1, 00:15:00 will be 2, and 23:45:00 will be 96 as there are ninety-six 15 min intervals in a 24hr period.
So the result I want is:
1 2 8 96

Another fun idea,
cumsum(c(1, diff(strptime(a, format = "%T")) / 15))
#[1] 1 2 8 96

We can use cut with breaks of "15 mins" after converting a to time object and convert the levels to numeric.
as.integer(cut(strptime(a, format = "%T"), breaks = "15 mins"))
#[1] 1 2 8 96
Same would also work with as.POSIXct
as.integer(cut(as.POSIXct(a, format = "%T"), breaks = "15 mins"))
Using lubridate we can convert to seconds and do
library(lubridate)
period_to_seconds(hms(a))/(15*60) + 1
Or with minute
minute(as.period(hms(a), "minutes"))/15 + 1
You might need ceiling/floor based on the time duration to round values.
data
a <- c('00:00:00', '00:15:00', '01:45:00', '23:45:00')

Related

recode a time variable (format: hh:mm:ss) into a categorical variable

I have a variable named duration.video in the following format hh:mm:ss that I would like to recode into a categorical variable ('Less than 5 minutes', 'between 5 and 30 min', etc.)
Here is my line of code:
video$Duration.video<-as.factor(car::recode(
video$Duration.video,
"00:00:01:00:04:59='Less than 5 minutes';00:05:00:00:30:00='Between 5 and 30 minutes';00:30:01:01:59:59='More than 30 minutes and less than 2h';02:00:00:08:00:00='2h and more'"
))
The code does not work because all the values of the variable are put in one category ('Between 5 and 30 minutes').
I think it's because my variable is in character format but I can't convert it to numeric. And also maybe the format with ":" can be a problem for the recoding in R.
I tried to convert to data.table::ITime but the result remains the same.
This is a tidy solution. You can get this done with base R but this may be easier.
library(lubridate)
library(dplyr)
df <- data.frame(
duration_string = c("00:00:03","00:00:06","00:12:00","00:31:00","01:12:01")
)
df <- df %>%
mutate(
duration = as.duration(hms(duration_string)),
cat_duration = case_when(
duration < dseconds(5) ~ "less than 5 secs",
duration >= dseconds(5) & duration < dminutes(30) ~ "between 5 secs and 30 mins",
duration >= dminutes(30) & duration < dhours(1) ~ "between 30 mins and 1 hour",
duration > dhours(1) ~ "more than 1 hour",
) ,
cat_duration = factor(cat_duration,levels = c("less than 5 secs",
"between 5 secs and 30 mins",
"between 30 mins and 1 hour",
"more than 1 hour"
))
)
We can use factor. This only uses base R:
labs <- c('Less than 5 minutes',
'Between 5 and 30 minutes',
'More than 30 minutes and less than 2h',
'2h and more')
transform(df, factor = {
hms <- substr(duration_string, 1, 8)
factor((hms >= "00:00:05") + (hms > "00:30:00") + (hms >= "02:00:00"), 0:3, labs)
})

How do I custom the 24 hour start hour and finish hour for line plot? (for example, start at 7:30)

I have a 24 hour data starting from 7:30 today (for example), until 7:30 the next day, because I didn't link the date to the line plot, R sorts the hour starting from 00:00 despite the data starting at 7:30, I am a beginner in R, and I don't know where to begin to even solve this problem, should I try linking the date also to the X axis, or is there a better solution?
My time function somehow didn't work either, it used to work when I was plotting data for 15 minute increments.
library(chron)
d <- read.csv(file="data.csv", header = T)
t <- times(d$Time)
plot(t,d$MCO2, type="l")
Graph created from the 24 hour data I have :
Graph created from a 15 minute data using the same code :
I wanted the outcome to be from 7:30 to 7:30 the next day, but it showed now a decimal number from 0.0 to 1
Here is the link to the data, just in case:
https://www.dropbox.com/s/wsg437gu00e5t08/Data%20210519.csv?dl=0
The question is actually about combining a date column and a time column to create a timestamp containing date AND time. Note that I suggest to process everything as if we are in GMT timezone. You can pick whatever timezone you want, then stick to it.
# use ggplot
library(ggplot2)
# assume everything happens in GMT timezone
Sys.setenv( TZ = "GMT" )
# replicating the data: a measurement result sampled at 1 sec interval
t <- seq(start, end, by = "1 sec")
Time24 <- trimws(strftime(t, format = "%k:%M:%OS", tz="GMT"))
Date <- strftime(t, format = "%d/%m/%Y", tz="GMT")
head(Time24)
head(Date)
d <- data.frame(Date, Time24)
# this is just a random data of temperature
d$temp <- rnorm(length(d$Date),mean=25,sd=5)
head(d)
# the resulting data is as follows
# Date Time24 temp
#1 22/05/2019 0:00:00 22.67185
#2 22/05/2019 0:00:01 19.91123
#3 22/05/2019 0:00:02 19.57393
#4 22/05/2019 0:00:03 15.37280
#5 22/05/2019 0:00:04 31.76683
#6 22/05/2019 0:00:05 26.75153
# this is the answer to the question
# which is combining the the date and the time column of the data
# note we still assume that this happens in GMT
t <- as.POSIXct(paste(d$Date,d$Time24,sep=" "), format = "%d/%m/%Y %H:%M:%OS", tz="GMT")
# print the data into a plot
png(filename = "test.png", width = 800, height = 600, units = "px", pointsize = 22 )
ggplot(d,aes(x=t,y=temp)) + geom_line() +
scale_x_datetime(date_breaks = "3 hour",
date_labels = "%H:%M\n%d-%b")
The problem is that the function times does not include information about the day. This is a problem since your data spans two days.
The data type you use should be able to include information about the day. Posix is this data type. Also, since Posix is the go-to date-time object in R it is much easier to plot.
Before plotting the data, the time column should have the correct difference in days. When just transforming the column with as.POSIXct, the times of day 2 are read as if it is from day 1. This is why we have to add 24 hours to the correct entries.
After that, it is just a matter of plotting. I added an example of the package of ggplot2 since I prefer these plots.
You might notice that using as.POSIXct will add an incorrect date to your time information. Don't bother about this, you use this date just as a dummy date. You don't use this date itself, you just use it to be able to work with the difference in days.
library(ggplot2)
# Read in your data set
d <- read.csv(file="Data 210519.csv", header = T)
# Read column into R date-time object
t <- as.POSIXct(d$Time24, format = "%H:%M:%OS")
# Add 24 hours to time the time on day 2.
startOfDayTwo <- as.POSIXct("00:00:00", format = "%H:%M:%OS")
endOfDayTwo <- as.POSIXct("07:35:00", format = "%H:%M:%OS")
t[t >= startOfDayTwo & t <= endOfDayTwo] <- t[t >= startOfDayTwo & t <= endOfDayTwo] + 24*60*60
plot(t,d$MCO2, type="l")
# arguably a nicer plot
ggplot(d,aes(x=t,y=MCO2)) + geom_line() +
scale_x_datetime(date_breaks = "2 hour",
date_labels = "%I:%M %p")

how to calculate time interval and divide it by integer in r?

i have time string as "08:00","06:00"
and i wanna calculate difference between them
and divide it by 15 mins.
then results should be 8 in integer
i don't how to code in R
anybody can help me ?
Something like this using difftime?
difftime(
as.POSIXct("08:00", format = "%H:%M"),
as.POSIXct("06:00", format = "%H:%M"),
units = "mins") / 15
#Time difference of 8 mins
Or to convert to numeric
as.numeric(
difftime(as.POSIXct("08:00", format = "%H:%M"),
as.POSIXct("06:00", format = "%H:%M"),
units = "mins") / 15)
#[1] 8
It would be easy with lubridate, where we convert the strings in hm format and divide by 15 minutes.
library(lubridate)
(hm(a) - hm(b))/minutes(15)
#[1] 8
data
a <- "08:00"
b <- "06:00"

Calculating hourly averages from a multi-year timeseries

I have a dataset filled with the average windspeed per hour for multiple years. I would like to create an 'average year', in which for each hour the average windspeed for that hour over multiple years is calculated. How can I do this without looping endlessly through the dataset?
Ideally, I would like to just loop through the data once, extracting for each row the right month, day, and hour, and adding the windspeed from that row to the right row in a dataframe where the aggregates for each month, day, and hour are gathered. Is it possible to do this without extracting the month, day, and hour, and then looping over the complete average-year data.frame to find the right row?
Some example data:
data.multipleyears <- data.frame(
DATETIME = c("2001-01-01 01:00:00", "2001-05-03 09:00:00", "2007-01-01 01:00:00", "2008-02-29 12:00:00"),
Windspeed = c(10, 5, 8, 3)
)
Which I would like to aggregate in a dataframe like this:
average.year <- data.frame(
DATETIME = c("01-01 00:00:00", "01-01 01:00:00", ..., "12-31 23:00:00")
Aggregate.Windspeed = (100, 80, ...)
)
From there, I can go on calculating the averages, etc. I have probably overlooked some command, but what would be the right syntax for something like this (in pseudocode):
for(i in 1:nrow(data.multipleyears) {
average.year$Aggregate.Windspeed[
where average.year$DATETIME(month, day, hour) == data.multipleyears$DATETIME[i](month, day, hour)] <- average.year$Aggregate.Windspeed + data.multipleyears$Windspeed[i]
}
Or something like that. Help is appreciated!
I predict that ddply and the plyr package are going to be your best friend :). I created a 30 year dataset with hourly random windspeeds between 1 and 10 ms:
begin_date = as.POSIXlt("1990-01-01", tz = "GMT")
# 30 year dataset
dat = data.frame(dt = begin_date + (0:(24*30*365)) * (3600))
dat = within(dat, {
speed = runif(length(dt), 1, 10)
unique_day = strftime(dt, "%d-%m")
})
> head(dat)
dt unique_day speed
1 1990-01-01 00:00:00 01-01 7.054124
2 1990-01-01 01:00:00 01-01 2.202591
3 1990-01-01 02:00:00 01-01 4.111633
4 1990-01-01 03:00:00 01-01 2.687808
5 1990-01-01 04:00:00 01-01 8.643168
6 1990-01-01 05:00:00 01-01 5.499421
To calculate the daily normalen (30 year average, this term is much used in meteorology) over this 30 year period:
library(plyr)
res = ddply(dat, .(unique_day),
summarise, mean_speed = mean(speed), .progress = "text")
> head(res)
unique_day mean_speed
1 01-01 5.314061
2 01-02 5.677753
3 01-03 5.395054
4 01-04 5.236488
5 01-05 5.436896
6 01-06 5.544966
This takes just a few seconds on my humble two core AMD, so I suspect just going once through the data is not needed. Multiple of these ddply calls for different aggregations (month, season etc) can be done separately.
You can use substr to extract the part of the date you want,
and then use tapply or ddply to aggregate the data.
tapply(
data.multipleyears$Windspeed,
substr( data.multipleyears$DATETIME, 6, 19),
mean
)
# 01-01 01:00:00 02-29 12:00:00 05-03 09:00:00
# 9 3 5
library(plyr)
ddply(
data.multipleyears,
.(when=substr(DATETIME, 6, 19)),
summarize,
Windspeed=mean(Windspeed)
)
# when Windspeed
# 1 01-01 01:00:00 9
# 2 02-29 12:00:00 3
# 3 05-03 09:00:00 5
It is pretty old post, but I wanted to add. I guess timeAverage in Openair can also be used. In the manual, there are more options for timeAverage function.

How to select and plot hourly averages from data frame?

I have a CSV file that looks like this, where "time" is a UNIX timestamp:
time,count
1300162432,5
1299849832,0
1300006132,1
1300245532,4
1299932932,1
1300089232,1
1299776632,9
1299703432,14
... and so on
I am reading it into R and converting the time column into POSIXct like so:
data <- read.csv(file="data.csv",head=TRUE,sep=",")
data[,1] <- as.POSIXct(data[,1], origin="1970-01-01")
Great so far, but now I would like to build a histogram with each bin corresponding to the average hourly count. I'm stuck on selecting by hour and then counting. I've looked through ?POSIXt and ?cut.POSIXt, but if the answer is in there, I am not seeing it.
Any help would be appreciated.
Here is one way:
R> lines <- "time,count
1300162432,5
1299849832,0
1300006132,1
1300245532,4
1299932932,1
1300089232,1
1299776632,9
1299703432,14"
R> con <- textConnection(lines); df <- read.csv(con); close(con)
R> df$time <- as.POSIXct(df$time, origin="1970-01-01")
R> df$hour <- as.POSIXlt(df$time)$hour
R> df
time count hour
1 2011-03-15 05:13:52 5 5
2 2011-03-11 13:23:52 0 13
3 2011-03-13 09:48:52 1 9
4 2011-03-16 04:18:52 4 4
5 2011-03-12 12:28:52 1 12
6 2011-03-14 08:53:52 1 8
7 2011-03-10 17:03:52 9 17
8 2011-03-09 20:43:52 14 20
R> tapply(df$count, df$hour, FUN=mean)
4 5 8 9 12 13 17 20
4 5 1 1 1 0 9 14
R>
Your data doesn't actually yet have multiple entries per hour-of-the-day but this would average over the hours, properly parsed from the POSIX time stamps. You can adjust with TZ info as needed.
You can calculate the hour "bin" for each time by converting to a POSIXlt and subtracting away the minute and seconds components. Then you can add a new column to your data frame that would contain the hour bin marker, like so:
date.to.hour <- function (vec)
{
as.POSIXct(
sapply(
vec,
function (x)
{
lt = as.POSIXlt(x)
x - 60*lt$min - lt$sec
}),
tz="GMT",
origin="1970-01-01")
}
data$hour <- date.to.hour(as.POSIXct(data[,1], origin="1970-01-01"))
There's a good post on this topic on Mages' blog. To get the bucketed data:
aggregate(. ~ cut(time, 'hours'), data, mean)
If you just want a quick graph, ggplot2 is your friend:
qplot(cut(time, "hours"), count, data=data, stat='summary', fun.y='mean')
Unfortunately, because cut returns a factor, the x axis won't work properly. You may want to write your own, less awkward bucketing function for time, e.g.
timebucket = function(x, bucketsize = 1,
units = c("secs", "mins", "hours", "days", "weeks")) {
secs = as.numeric(as.difftime(bucketsize, units=units[1]), units="secs")
structure(floor(as.numeric(x) / secs) * secs, class=c('POSIXt','POSIXct'))
}
qplot(timebucket(time, units="hours"), ...)

Resources