Calculate count of zeros R for specific case - r

I have data that looks like below
I am trying to calculate when Unit 1 went 0 and what time it became greater than 0. Suppose Unit 1 first drops to zero at 01/04/2019 02:00 and it is zero-till 01/04/2019 03:00 so that should be counted as 1 and then the second time it goes zero at 01/04/2019 04:30 and its zero-till 01/04/2019 05:00 which will be counted as 2 and same calculation for the other units.
Additionally, Iam looking to capture the time difference like the first time unit 1 went 0 for 2 hours and then second time unit went 0 for 1 hour something like this
I am thinking if that can be done using if statement that counts until the value is greater than zero and then a loop gets updated.
I am struggling with how to incorporate time with that.
The final result should be
Unit | Went Offline| Came online
Unit 1| 01/04/2019 02:00 | 01/04/2019 03:00
Unit 1| 01/04/2019 04:30 | 01/04/2019 05:00

I prefer some sudo code to start with. But here is an example solution to begin with.
# create data frame
date = format(seq(as.POSIXct("2019-04-01 00:00:00", tz="GMT"),
length.out=15, by='30 min'), '%Y-%m-%d %H:%M:%S')
unit1 = c(513, 612, 653, 0, 0, 0, 530, 630, 0, 0, 650, 512, 530 , 650, 420)
data = data.frame(date, unit1)
# subset all data that is Zero
data1 = data[data$unit1 != 0,]
# Create lead for from and to
data1$dateTo = lead(data1$date, 1, na.pad = TRUE)
#calculate time diff
data1$timediff = as.numeric(difftime(data1$dateTo,data1$date,units = "mins"))
# subset data that has a time diff more than 30 mins
data2 = subset.data.frame(data1, timediff > 30)

Related

Find mid time between two times in R, sometimes overnight

I would like to find the mid time between time 1 and time 2 in R. I already have the duration in hours. Sometimes it's overnight (row 1) and sometimes not (row 2).
Here are the first two rows in the data frame
dat <- data.frame(id=1:2, tm1=c("23:00","01:00"), tm2=c("07:00","06:00"), dur=c("8.0","5.0"))
So in row 1 the mid time should be 3:00 and in row 2 it should be 03:30.
you can just add half of the duration
format(strptime(dat$tm1,format = '%H:%M') + dat$dur*3600/2, format = '%H:%M')
Here is one way -
library(dplyr)
dat %>%
mutate(across(starts_with('tm'), as.POSIXct, tz = 'UTC', format = '%H:%M'),
tm2 = if_else(tm1 > tm2, tm2 + 86400, tm2),
midtime = tm1 + difftime(tm2, tm1, units = 'secs')/2,
across(c(starts_with('tm'), midtime), format, '%H:%M'))
# id tm1 tm2 dur midtime
#1 1 23:00 07:00 8.0 03:00
#2 2 01:00 06:00 5.0 03:30
The logic is -
Convert tm1 and tm2 to POSIXct class. This will add today's date in both the columns.
Now if tm1 > tm2 (like in row 1) add 1 day to tm2.
Subtract the two times, divide it by 2 and add the difference to tm1 to get midtime.
Finally change the time in '%H:%M' format in all columns.

R create time interval using cut.Date

my data frame looks like:
set.seed(1)
MyDates <- ISOdatetime(2012, 1, 1, 0, 0, 0, tz = "GMT") + sample(1:27000, 500)
It is fair easy to count number of observations per 5 mins by using cut
df <- data.frame(table(cut(MyDates, breaks = "5 mins")))
It gives me intervals as
00:00:00 -- 00:05:00,
00:05:00 -- 00:10:00
But how about if I want to get a 'customized' intervals as
00:00:00 -- 00:05:00,
00:01:00 -- 00:06:00,
00:02:00 -- 00:07:00
Any help would be appreciated!
You just need to pass a vector of numeric values to cut.
An example:
data.frame(table(cut(MyDates,
c(min(MyDates), ## "Leftmost" cut
min(MyDates) + 5000, ## Custom cut 1
min(MyDates) + 17000, ## Custom cut 2
min(MyDates) + 65000),## "Rightmost" cut
right = TRUE))) ## let cut() know where should the infinite be closed or open.

how to subset data between fixed time on successive days, for several months of data

I have data of the following form:
DateTime | Var1
11/01/2016 06:01 | 0
11/01/2016 06:02 | 0.70
...
...
11/01/2016 23:59 | 35.08
11/02/2016 00:01 | 33.29
...
11/02/2016 06:00 | 24.62
...
11/30/2016 23:59 | 42.08
12/01/2016 00:01 | 39.79
....
I have ~5 months data. I have to subset the data from 6:00am of 1 day to just before 6:00am of next day. I can use the following code to subset the data once I have the dates in hand, but how to automatically obtain all the successive dates from the input data?
Date1 <- as.integer(as.POSIXct(Date1))
Date2 <- as.integer(as.POSIXct(Date2))
subset <- subset(data, as.integer(as.POSIXct(data$txtime)) >= Date1 & as.integer(as.POSIXct(data$txtime)) < Date2)
Right now, I can use to following code to obtain successive dates within a month, but this won't work for the last day of the month, where part of the data to be subsetted is on the first day of the next month. So I can't do it automatically for the duration 6:00am 30th November - 5:59am 1st December. Also, the code is not fully automated, as the number of days (used in the loop) varies across months.
for (dateofmonth in c(1:29)) {
Date1 <- paste("2016-11-", dateofmonth, ' 06:00:00', sep = '')
Date2 <- paste("2016-11-", (dateofmonth+1), ' 06:00:00', sep = '')
}
There is possibly an easier way to do this, but I can't figure it out. Please suggest.
Try this:
datelist <- split(data, as.Date(as.POSIXct(data$txtime)-21600))
This will shift your time 6 hours backwards, and then split your data by date. So that each sub dataframe will contain times from 6:00 am in that date to 5:59 am in next day.

How to mark the observations with given information

Considering the data collected with 5 minutes time interval with a numeric variable a,and a discret variable acc, which represents if there's any incident happened(0 for no incident while 1 for incident):
a<-c(1:(288*4))
t<-seq(as.POSIXct("2016-01-01 00:05:00"), as.POSIXct("2016-01-05 00:00:00"), by = '5 min')
acc<-rep(0,288*4)
df<-data.frame(t,a,acc)
Now I have another data set which has the time(accurates to 1 sec) at which the incidents happened during the collection period:
T<-sample(seq(as.POSIXct("2016-01-01 00:05:00"), as.POSIXct("2016-01-05 00:00:00"), by = '1 sec'),size = 5)
I want to mark the nearest 2 prior observation's acc as 1 according to the time in T. For example, if the incident happened at 2016-01-02 07:13:23, the observations' acc with t of 2016-01-02 07:05:00 and 2016-01-02 07:10:00 are marked as 1
How could I manage to do this?
ind <- findInterval(T, df$t)
df$acc[c(ind, ind + 1)] <- 1
One way could be:
library(lubridate)
df$acc=apply(sapply(T,function(x) x %within% interval((df$t - minutes(4)-seconds(59)),(df$t + minutes(4)+seconds(59)))),1,sum)
lubridate allows for the easy manipulation of dates, minutes(x) and seconds(x) adds x minutes or second to a period object.
interval() is used to create a time interval confined by the time in df$t ± 4min59s.
sapply() is used to check if any of the time in T is within the interval.
apply() is used to collapse the results of sapply() (it outputs 1 column for each element in T)
If T contains a value that is exactly equal to one in df$t such as 2016-01-04 12:05:00 CET this will only put 1 for this one.

R: How to handle times without dates?

I have data which includes Date as well as Time enter and Time exit. These latter two contain data like this: 08:02, 12:02, 23:45 etc.
I would like to manipulate the Time eXXX data - for example, substract Time enter from Time exit to work out duration, or plot the distributions of Time enter and Time exit, e.g. to see if most entries are before 10:00, or if most exits are after 17:00.
All the packages I've looked at require a date to precede the time, e.g. 01/02/2012 12:33.
Is this possible, or should I simply append an identical date to every time for the sake of calculations? This seem a bit messy!
Use the "times" class found in the chron package:
library(chron)
Enter <- c("09:12", "17:01")
Enter <- times(paste0(Enter, ":00"))
Exit <- c("10:15", "18:11")
Exit <- times(paste0(Exit, ":00"))
Exit - Enter # durations
sum(Enter < "10:00:00") # no entering before 10am
mean(Enter < "10:00:00") # fraction entering before 10am
sum(Exit > "17:00:00") # no exiting after 5pm
mean(Exit > "17:00:00") # fraction exiting after 5pm
table(cut(hours(Enter), breaks = c(0, 10, 17, 24))) # Counts for indicated hours
## (0,10] (10,17] (17,24]
## 1 1 0
table(hours(Enter)) # Counts of entries each hour
## 9 17
## 1 1
stem(hours(Enter), scale = 2)
## The decimal point is at the |
## 9 | 0
## 10 |
## 11 |
## 12 |
## 13 |
## 14 |
## 15 |
## 16 |
## 17 | 0
Graphics:
tab <- c(table(Enter), -table(Exit)) # Freq at each time. Enter is pos; Exit is neg.
plot(times(names(tab)), tab, type = "h", xlab = "Time", ylab = "Freq")
abline(v = c(10, 17)/24, col = "red", lty = 2) # vertical red lines
abline(h = 0) # X axis
Thanks for the feedback and sorry for the confusion I have edited it a bit to clarify.
New Edit:
First, chron package and strptime with fixed format both work well as demonstrated in other answers. I just want to introduce lubridate a little bit since it's easier to use, and flexible with time format.
Example data
df <- data.frame(TimeEnterChar = c(rep("07:58", 10), "08:02", "08:03", "08:05", "08:10", "09:00"),
TimeExitChar = c("16:30", "16:50", "17:00", rep("17:02", 10), "17:30", "18:59"),
stringsAsFactors = F)
If all you want is to count how many entry time were later than 8:00, then you can compare the character directly. Below would should 5 entry time were later.
sum(df$TimeEnterChar > "08:00")
If you want more, personally, I like lubridate package when dealing with time data, especially timestamps with dates although it's not the focus of this post at all.
library(lubridate)
# Convert character to a "Period" class by lubridate, shows in form of H M S
df$TimeEnterTime <- hm(df$TimeEnterChar)
df$TimeExitTime <- hm(df$TimeExitChar)
head(df)
sum(df$TimeEnterTime > hm("08:00"))
You can still compare the time.
A little more about using them as numeric:
I assume only minute-level time is wanted. Thus, I divided number of seconds by 60 to get number of minutes.
df$DurationMinute <- as.numeric( df$TimeExitTime - df$TimeEnterTime )/60
hist(df$DurationMinute, breaks = seq(500, 600, 5))
head(df)
TimeEnterChar TimeExitChar TimeEnterTime TimeExitTime DurationMinute
1 07:58 16:30 7H 58M 0S 16H 30M 0S 512
2 07:58 16:50 7H 58M 0S 16H 50M 0S 532
3 07:58 17:00 7H 58M 0S 17H 0M 0S 542
4 07:58 17:02 7H 58M 0S 17H 2M 0S 544
5 07:58 17:02 7H 58M 0S 17H 2M 0S 544
6 07:58 17:02 7H 58M 0S 17H 2M 0S 544
You can simply plot a histogram to see the distribution of time duration between entry and exit.
You can also look at the distribution of entry/exit time. But some effort is needed to convert the axis.
df$TimeEnterNumMin <- as.numeric(df$TimeEnterTime) / 60
df$TimeExitNumMin <- as.numeric(df$TimeExitTime) / 60
hist(df$TimeEnterNumMin, breaks = seq(0, 1440, 60), xaxt = 'n', main = "Whole by 1hr")
axis(side = 1, at = seq(0, 1440, 60), labels = paste0(seq(0, 24, 1), ":00"))
hist(df$TimeEnterNumMin, breaks = seq(420, 600, 15), xaxt = 'n', main = "Morning by 15min")
axis(side = 1, at = seq(420, 600, 60), labels = paste0(seq(7, 10, 1), ":00"))
I did not polish the plot, nor make the axis flexible. Please do based on your needs. Hopefully, it helps.
Below is old useless post: (no need to read. kept so that comments don't look weird)
Came across a similar issue and was inspired by this post. #G. Grothendieck and #David Arenburg provided great answers for transforming the time.
For comparison, I feel forcing the time into numeric helps. Instead of comparing "11:22:33" with "9:00:00", comparing as.numeric(hms("11:22:33")) (which is 40953 seconds) and as.numeric(hms("9:00:00")) (32400) would be much easier.
as.numeric(hms("11:22:33")) > as.numeric(hms("9:00:00")) & as.numeric(hms("11:22:33")) < as.numeric(hms("17:00:00"))
[1] TRUE
The above example shows 11:22:33 is between 9AM and 5PM.
To extract just time from the date or POSIXct object, substr("2013-10-01 11:22:33 UTC", 12, 19) should work, although it looks stupid to change a time object to string/character and back to time again.
Converting the time to numeric should work for plotting as #G. Grothendieck descirbed. You can convert the numbers back to time as needed for x axis labels.
Would something like that work?
SubstracTimes <- function(TimeEnter, TimeExit){
(as.numeric(format(strptime(TimeExit, format ="%H:%M"), "%H")) +
as.numeric(format(strptime(TimeExit, format ="%H:%M"), "%M"))/60) -
(as.numeric(format(strptime(TimeEnter, format ="%H:%M"), "%H")) +
as.numeric(format(strptime(TimeEnter, format ="%H:%M"), "%M"))/60)
}
Testing:
TimeEnter <- "08:02"
TimeExit <- "12:02"
SubstracTimes(TimeEnter, TimeExit)
> SubstracTimes(TimeEnter, TimeExit)
[1] 4

Resources