Create new column "Weekday %H:%M" from Timestamp - r

I have some columns (timestamp, object_id, status and some others) with which I want to predict the status of an object.
I have the hypothesis that the "weektime" has an important influence on the status. Under "weektime" I understand: Monday 23:17.
Now I think I need to create a column with this format to test the hypothesis.
I already converted the timestamp to POSIXlt:
training_data$TimeStamp = as.POSIXlt(training_data$TimeStamp, "", "%Y-%m-%d %H:%M:%OS")
I also already created a column with only the weekday.
training_data$TimeStamp_weekday = weekdays(training_data$TimeStamp)
Can you help me to create a column with the "weektime"?
I think I also need to have only 4 "time slots" of 15min per hour to make the predictions easier. So Monday 23:17 -> 23:15
0-15 mins -> 0
15-30 mins -> 15
30-45 mins -> 30
45-60 mins -> 45
Or something similar.

Don't need to paste anything on to lubridate::weekday results. There is a trunc.POSIXt but it doesn't let you truncate to fractional intervals as far as I can tell. Instead truncate to the prior 15 minute mark by subtracting 7.5 minutes (=.0.125 hours), dividing by the same amount, rounding and then multiplying by that amount. That should have the effect of "rounding down" to the prior interval mark. Then use format.POSIXt to get the desired format.
> Sys.time()
[1] "2017-12-29 12:24:49 PST"
>
format( as.POSIXct( to convert back to datetime
round( as.numeric( Sys.time() -0.125 * 60*60 )/(0.125 * 60*60) ) * 0.125*60*60 ,
origin="1970-01-01"), "%A %H:%M")
[1] "Friday 12:15"
What is does is shift all the times so they are centered on the interval "marks" or boundaries, then rounds to the nearest whole number on that scale, and then expands back to the original scale.

Related

Convert time shown as hours and minutes in three to four digits within a dataframe to minutes

I have a dataframe with a column named "time" (which has an integer class). The time comes in three or four digits e.g. 514, which means 5:14 am, or 1914, which means 19:14. I need to create a new column "time_2" which will have the time of the "time" column but only showing minutes. For example, if row 1 has the value "514" for the "time" column, that needs to go to a new column named "time_2" with a first value of "314" minutes.
I tried to "remove" the first digit and then multiply it by 60 and the following two digits sum it to the latter. However, I was not able to accomplish it, and the fact that there is also data with four digits made me stop and search for some help. I just don't know how to do it. I'd truly appreciate some help.
(Language: R)
If your times are never outside of the 00:00-23:59 window, you can use as.difftime from base R:
x <- c(514,1914)
as.difftime(sprintf("%04d", x), format="%H%M", units="mins")
#Time differences in mins
#[1] 314 1154
Times outside this range can be dealt with using a little arithmetic:
as.numeric(
as.difftime(x %/% 100 + (x %% 100)/60, units="hours"),
units="mins"
)
#[1] 314 1154
Convert the numbers to timestamp and extract hours and minutes using substr
library("lubridate")
InPut
time <- c(521,1914)
time <- substr(as.POSIXct(sprintf("%04.0f", time), format='%H%M'), 12, 16) # Extracting hours and minutes using substr
time <- as.POSIXct(x = time, format = "%H:%M", tz = "UTC")
hour(time) * 60 + minute(time)
OutPut:
[1] 321 1154
in more simpler manner:
library(lubridate)
time <- c(521,1914)
60 *hour(parse_date_time(time, "HM"))+ minute(parse_date_time(time, "HM"))

R function to convert time of day to total minutes

I need to take the time of day (e.g. 13:34) and convert it to minute number or number of minutes in a day it represents so the expected answer would be (780 + 34) or 814 minutes
I was thinking about extracting the hours and minutes into variables probably using lubridate and of course, multiplying the hours by 60 and adding the minutes.
But is there a method or function that I can use for this that already exists? Thought I'd check with the SO community.
Thanks
Thanks to rawr - providing a POSIX type answer that worked just fine:
difftime(as.POSIXct('13:34', format = '%H:%M'), as.POSIXct('00:00', format = '%H:%M'), units = 'min')
For everyone here's the output I got:
z<-difftime(as.POSIXct('13:34', format = '%H:%M'), as.POSIXct('00:00', format = '%H:%M'), units = 'min')
#print out z
> z
#response
Time difference of 814 mins
#confirm that z only has the 814 as value
> typeof(z) [1] "double"

Generating machine utilisation chart from time data over a 12 hour period

I would like to generate a line plot which depicts the utilisation of a system of machines over a 12 hour period. As I am new to R I would like some advice on the approach I could use to generate such a plot.
Here is an example of the data frame that is used -
Machine StartTime StopTime
A 10:30 11:00
B 12:00 13:00
B 7:00 9:00
A 13:00 16:00
Say, the 12 hour period is from 4:00 to 16:00
My approach (probably not the most efficient) - is to create an empty matrix with 720 rows (1 for each minute), then check if the utilisiation of the system using the formula:
utilisiation = machines Busy / total machines
This would mean that I would some how need to iterate through each minute from 4:00 to 16:00. Is that possible?
Yes, but its not something out of the bag. I'd probably use data frames or data tables instead of a matrix. I'll use data.table in my examples.
To create a sequence of times you can try:
data.table(time=seq(from=as.POSIXlt("2016-06-09 4:00:00"),to=as.POSIXlt("2016-06-09 16:00:00"),by="min"))
However, this is probably unnecessary since the plots can recognize times. (Well, at least ggplot2 can). For instance
require(data.table)
require(reshape2)
require(ggplot2)
#Make the data table.
dt<-data.table(Machine=c("A","B","B","A"),
StartTime=c("10:30","12:00","7:00","13:00"),
StopTime=c("11:00","13:00","9:00","16:00"))
#Reshape the data, so that we have one column of dates.
mdt<-melt(dt,id.vars = "Machine",value.name = "Time",variable.name = "Event")
#Make the time vector a time vector, not a character vector.
mdt[,time:=as.POSIXct(Time,format="%H:%M")]
#delete the character vector.
mdt[,Time:=NULL]
#order the data.table by time.
mdt<-mdt[order(time)]
#Define how each time affects the cumulative number of machines.
mdt[Event=="StartTime",onoff:=1]
mdt[Event=="StopTime",onoff:=-1]
#EDIT: Sum the onoff effects at each point in time -this ensures you get one measurement for each time -eliminating the vertical line.
mdt<-mdt[,list(onoff=sum(onoff)),by=time]
#Calculate the cumulative number of machines on.
mdt[,TotUsage:=cumsum(onoff)]
#Plot the number of machines on at any given time.
ggplot(mdt,aes(x=time,y=TotUsage))+geom_step()
That will get you something like this (EDIT: without the vertical spike):
I made your idea the code. It checks every machine is on/off per minute.
[Caution] If your data is big, this code takes much time. This method is simple but not efficiency.
# make example data
d <- data.frame(Machine = c("A","B","B","A"),
StartTime = strptime(c("10:30", "12:00", "7:00", "13:00"), "%H:%M", "GMT"),
StopTime = strptime(c("11:00", "13:00", "9:00", "16:00"), "%H:%M", "GMT"))
# cut from 4:00 to 16:00 by the minute
time <- seq(strptime(c("04:00"), "%H:%M", "GMT"), strptime("16:00", "%H:%M", "GMT"), 60)
# sum(logic test) returns number of True. Sapply checks it each time.
a <- sapply(1:721, function(x) sum((d$StartTime <= time[x]) & (time[x] < d$StopTime)) / length(levels(d$Machine)))
plot(time, a, type="l", ylab="utilisiation")

Convert time from numeric to time format in R

I read data from an xls file. Apparently, the time is not in the right format. It is as follows (for example)
0.3840277777777778
0.3847222222222222
0.3854166666666667
Indeed, they should be
09:12
09:13
09:13
I don't know how to convert it to the right format. I searched several threads and all of them are about converting the date (with/without time) to the right format.
Can somebody give me any clues?
You can use as.POSIXct after having multiplied your number by the number of seconds in a day (60 * 60 * 24)
nTime <- c(0.3840277777777778, 0.3847222222222222, 0.3854166666666667)
format(as.POSIXct((nTime) * 86400, origin = "1970-01-01", tz = "UTC"), "%H:%M")
## [1] "09:13" "09:14" "09:15"
Another option is times from chron
library(chron)
times(nTime)
#[1] 09:13:00 09:14:00 09:15:00
To strip off the seconds,
substr(times(nTime),1,5)
#[1] "09:13" "09:14" "09:15"
data
nTime <- c(0.3840277777777778, 0.3847222222222222, 0.3854166666666667)
For people who want the opposite way: given the 09:13:00, get 0.3840278
as.numeric(chron::times("09:13:00"))
Essentially, the idea is that one whole day is 1,so noon (12pm) is 0.5.

Strip the date and keep the time

Lots of people ask how to strip the time and keep the date, but what about the other way around? Given:
myDateTime <- "11/02/2014 14:22:45"
I would like to see:
myTime
[1] "14:22:45"
Time zone not necessary.
I've already tried (from other answers)
as.POSIXct(substr(myDateTime, 12,19),format="%H:%M:%S")
[1] "2013-04-13 14:22:45 NZST"
The purpose is to analyse events recorded over several days by time of day only.
Thanks
Edit:
It turns out there's no pure "time" object, so every time must also have a date.
In the end I used
as.POSIXct(as.numeric(as.POSIXct(myDateTime)) %% 86400, origin = "2000-01-01")
rather than the character solution, because I need to do arithmetic on the results. This solution is similar to my original one, except that the date can be controlled consistently - "2000-01-01" in this case, whereas my attempt just used the current date at runtime.
I think you're looking for the format function.
(x <- strptime(myDateTime, format="%d/%m/%Y %H:%M:%S"))
#[1] "2014-02-11 14:22:45"
format(x, "%H:%M:%S")
#[1] "14:22:45"
That's character, not "time", but would work with something like aggregate if that's what you mean by "analyse events recorded over several days by time of day only."
If the time within a GMT day is useful for your problem, you can get this with %%, the remainder operator, taking the remainder modulo 86400 (the number of seconds in a day).
stamps <- c("2013-04-12 19:00:00", "2010-04-01 19:00:01", "2018-06-18 19:00:02")
as.numeric(as.POSIXct(stamps)) %% 86400
## [1] 0 1 2

Resources