Generating machine utilisation chart from time data over a 12 hour period - r

I would like to generate a line plot which depicts the utilisation of a system of machines over a 12 hour period. As I am new to R I would like some advice on the approach I could use to generate such a plot.
Here is an example of the data frame that is used -
Machine StartTime StopTime
A 10:30 11:00
B 12:00 13:00
B 7:00 9:00
A 13:00 16:00
Say, the 12 hour period is from 4:00 to 16:00
My approach (probably not the most efficient) - is to create an empty matrix with 720 rows (1 for each minute), then check if the utilisiation of the system using the formula:
utilisiation = machines Busy / total machines
This would mean that I would some how need to iterate through each minute from 4:00 to 16:00. Is that possible?

Yes, but its not something out of the bag. I'd probably use data frames or data tables instead of a matrix. I'll use data.table in my examples.
To create a sequence of times you can try:
data.table(time=seq(from=as.POSIXlt("2016-06-09 4:00:00"),to=as.POSIXlt("2016-06-09 16:00:00"),by="min"))
However, this is probably unnecessary since the plots can recognize times. (Well, at least ggplot2 can). For instance
require(data.table)
require(reshape2)
require(ggplot2)
#Make the data table.
dt<-data.table(Machine=c("A","B","B","A"),
StartTime=c("10:30","12:00","7:00","13:00"),
StopTime=c("11:00","13:00","9:00","16:00"))
#Reshape the data, so that we have one column of dates.
mdt<-melt(dt,id.vars = "Machine",value.name = "Time",variable.name = "Event")
#Make the time vector a time vector, not a character vector.
mdt[,time:=as.POSIXct(Time,format="%H:%M")]
#delete the character vector.
mdt[,Time:=NULL]
#order the data.table by time.
mdt<-mdt[order(time)]
#Define how each time affects the cumulative number of machines.
mdt[Event=="StartTime",onoff:=1]
mdt[Event=="StopTime",onoff:=-1]
#EDIT: Sum the onoff effects at each point in time -this ensures you get one measurement for each time -eliminating the vertical line.
mdt<-mdt[,list(onoff=sum(onoff)),by=time]
#Calculate the cumulative number of machines on.
mdt[,TotUsage:=cumsum(onoff)]
#Plot the number of machines on at any given time.
ggplot(mdt,aes(x=time,y=TotUsage))+geom_step()
That will get you something like this (EDIT: without the vertical spike):

I made your idea the code. It checks every machine is on/off per minute.
[Caution] If your data is big, this code takes much time. This method is simple but not efficiency.
# make example data
d <- data.frame(Machine = c("A","B","B","A"),
StartTime = strptime(c("10:30", "12:00", "7:00", "13:00"), "%H:%M", "GMT"),
StopTime = strptime(c("11:00", "13:00", "9:00", "16:00"), "%H:%M", "GMT"))
# cut from 4:00 to 16:00 by the minute
time <- seq(strptime(c("04:00"), "%H:%M", "GMT"), strptime("16:00", "%H:%M", "GMT"), 60)
# sum(logic test) returns number of True. Sapply checks it each time.
a <- sapply(1:721, function(x) sum((d$StartTime <= time[x]) & (time[x] < d$StopTime)) / length(levels(d$Machine)))
plot(time, a, type="l", ylab="utilisiation")

Related

Date Formatting in Time Series Codes

I have a .csv file that looks like this:
Date
Time
Demand
01-Jan-05
6:30
6
01-Jan-05
6:45
3
...
23-Jan-05
21:45
0
23-Jan-05
22:00
1
The days are broken into 15 minute increments from 6:30 - 22:00.
Now, I am trying to do a time series on this, but I am a little lost on the notation of this.
I have the following so far:
library(tidyverse)
library(forecast)
library(zoo)
tp <- read.csv(".csv")
tp.ts <- ts(tp$DEMAND, start = c(), end = c(), frequency = 63)
The frequency I am after is an entire day, which I believe makes the number 63.***
However, I am unsure as to how to notate the dates in c().
***Edit
If the frequency is meant to be observations per a unit of time, and I am trying to observe just (Demand) by the 15 minute time slots (Time) in each day (Date), maybe my Frequency is 1?
***Edit 2
So I think I am struggling with doing the time series because I have a Date column (which is characters) and a Time column.
Since I need the data for Demand at the given hours on the dates, maybe I need to convert the dates to be used in ts() and combine the Date and Time date into a new column?
If I do this, I am assuming this should give me the times I need (6:30 to 22:00) but with the addition of having the date?
However, the data is to be used to predict the Demand for the rest of the month. So maybe the Date is an important variable if the day of the week impacts Demand?
We assume you are starting with tp shown reproducibly in the Note at the end. A complete cycle of 24 * 4 = 96 points should be represented by one unit of time internally. The chron class does that so read it in as a zoo series z with chron time index and then convert that to ts giving ts_ser or possibly leave it as a zoo series depending on what you are going to do next.
library(zoo)
library(chron)
to_chron <- function(date, time) as.chron(paste(date, time), "%d-%b-%y %H:%M")
z <- read.zoo(tp, index = 1:2, FUN = to_chron, frequency = 4 * 24)
ts_ser <- as.ts(z)
Note
tp <- structure(list(Date = c("01-Jan-05", "01-Jan-05"), Time = c("6:30",
"6:45"), Demand = c(6L, 3L)), row.names = 1:2, class = "data.frame")

Calculating 12-hour differences for hourly data

I have data measurements that over a 10 day period that are recorded on an hourly basis with a sample provided below:
Date_Time Measure
1/1/2021 05:00 430.1
1/1/2021 06:00 430.2
1/1/2021 07:00 429.8
First what I want to do is calculate the difference for every 12 hour period - that calculate the difference from 00:00 to 12:00 and 12:00 to 00:00 and so on.
Second I want to be able to find the maximum difference for this period of time.
This is all done in R, and I have only been able to find code for calculating averages or know how to calculate differences individually and not creating its own kind of column of data for it.
I have tried using diff(Measure, lag = 11) thinking that would calculate the difference between 12 hour periods but I kept getting the error:
Error in mutate(., diff_12 = diff(Level, lag = 11)) : x `diff_12` must be size 265 or 1, not 254.
While it is not the cleanest line of code I wrote I used:
mutate(diff = lag(Measure, 12) - Measure)
This answered my own question.

Convert time shown as hours and minutes in three to four digits within a dataframe to minutes

I have a dataframe with a column named "time" (which has an integer class). The time comes in three or four digits e.g. 514, which means 5:14 am, or 1914, which means 19:14. I need to create a new column "time_2" which will have the time of the "time" column but only showing minutes. For example, if row 1 has the value "514" for the "time" column, that needs to go to a new column named "time_2" with a first value of "314" minutes.
I tried to "remove" the first digit and then multiply it by 60 and the following two digits sum it to the latter. However, I was not able to accomplish it, and the fact that there is also data with four digits made me stop and search for some help. I just don't know how to do it. I'd truly appreciate some help.
(Language: R)
If your times are never outside of the 00:00-23:59 window, you can use as.difftime from base R:
x <- c(514,1914)
as.difftime(sprintf("%04d", x), format="%H%M", units="mins")
#Time differences in mins
#[1] 314 1154
Times outside this range can be dealt with using a little arithmetic:
as.numeric(
as.difftime(x %/% 100 + (x %% 100)/60, units="hours"),
units="mins"
)
#[1] 314 1154
Convert the numbers to timestamp and extract hours and minutes using substr
library("lubridate")
InPut
time <- c(521,1914)
time <- substr(as.POSIXct(sprintf("%04.0f", time), format='%H%M'), 12, 16) # Extracting hours and minutes using substr
time <- as.POSIXct(x = time, format = "%H:%M", tz = "UTC")
hour(time) * 60 + minute(time)
OutPut:
[1] 321 1154
in more simpler manner:
library(lubridate)
time <- c(521,1914)
60 *hour(parse_date_time(time, "HM"))+ minute(parse_date_time(time, "HM"))

Sort Datetime data by day, but from 4PM to 4PM

I have Tweets from various times a day about companies, and I want to group them all by day. I have already done this. However, I want to sort them not from 00:00 until 23:59, but instead from 16:00 until 15:59 (because of the NYSE open hours).
Tweets (Negative, Neutral and Positive is for the sentiment):
Company,Datetime_UTC,Negative,Neutral,Positive,Volume
AXP,2013-06-01 16:00:00+00:00,0,2,0,2
AXP,2013-06-01 17:00:00+00:00,0,2,0,2
AXP,2013-06-02 05:00:00+00:00,0,1,0,1
AXP,2013-06-02 16:00:00+00:00,0,2,0,2
My code:
Tweets$Datetime_UTC <- as.Date(Tweets$Datetime)
Sent <- aggregate(list(Tweets$Negative, Tweets$Neutral, Tweets$Positive), by=list(Tweets$Company, Tweets$Datetime_UTC), sum)
colnames(Sent) <- c("Company", "Date", "Negative", "Neutral", "Positive")
Sent <- Sent[order(Sent$Company),]
Output of that code:
Company,Date,Negative,Neutral,Positive
AXP,2013-06-01,0,4,0
AXP,2013-06-02,0,3,0
How I'd want it to be (considering that a day should start at 16:00):
Company,Date,Negative,Neutral,Positive
AXP,2013-06-02,0,5,0
AXP,2013-06-03,0,2,0
As you can see, my code almost works. I just want to sort after different time windows.
How to do this? One idea would be to just add +8h to every single Datetime_UTC, which would change 16:00 into 00:00. After this, I could just use my code. Would that be possible?
Thanks in advance!! :-)
Effectively what you're doing is redefining a date to start at 16:00 instead of 00:00. One option would be to convert to epoch time (seconds since 1970:01:01 00:00:00+00:00 and simply slide your data forward by eight hours.
You can convert to epoch seconds, then add 8 hours worth of seconds, and then convert back to Date class all in one line. Then you would just aggregate as you had been.
Tweets$Datetime_UTC <- as.Date(as.integer(as.POSIXct(Tweets)) + 28800)
Replace your first line of code with that and it should do the trick.

Creating a specific sequence of date/times in R

I want to create a single column with a sequence of date/time increasing every hour for one year or one month (for example). I was using a code like this to generate this sequence:
start.date<-"2012-01-15"
start.time<-"00:00:00"
interval<-60 # 60 minutes
increment.mins<-interval*60
x<-paste(start.date,start.time)
for(i in 1:365){
print(strptime(x, "%Y-%m-%d %H:%M:%S")+i*increment.mins)
}
However, I am not sure how to specify the range of the sequence of dates and hours. Also, I have been having problems dealing with the first hour "00:00:00"? Not sure what is the best way to specify the length of the date/time sequence for a month, year, etc? Any suggestion will be appreciated.
I would strongly recommend you to use the POSIXct datatype. This way you can use seq without any problems and use those data however you want.
start <- as.POSIXct("2012-01-15")
interval <- 60
end <- start + as.difftime(1, units="days")
seq(from=start, by=interval*60, to=end)
Now you can do whatever you want with your vector of timestamps.
Try this. mondate is very clever about advancing by a month. For example, it will advance the last day of Jan to last day of Feb whereas other date/time classes tend to overshoot into Mar. chron does not use time zones so you can't get the time zone bugs that code as you can using POSIXct. Here x is from the question.
library(chron)
library(mondate)
start.time.num <- as.numeric(as.chron(x))
# +1 means one month. Use +12 if you want one year.
end.time.num <- as.numeric(as.chron(paste(mondate(x)+1, start.time)))
# 1/24 means one hour. Change as needed.
hours <- as.chron(seq(start.time.num, end.time.num, 1/24))

Resources