I want to get the difference in consecutive rows of Time. When I used this code it works fine.
But when I apply that to my original dataset 1 minute appears as 0.0006944444 (in terms of days).
How can I get make it appear as 1 minute instead?
Date <- c("03/06/2019", "03/06/2019", "03/06/2019", "03/06/2019")
Time <- c("17:15:00","17:16:00", "17:18:00", "17:21:00")
df1 <- data.frame(Date, Time)
library(chron)
df1$Time <- chron(times = df1$Time)
sapply(df1, class)
df1 <- df1 %>% mutate(diff = Time - lag(Time))
Too long for a comment for the time being. When I run your code on a modified data set with a one second difference between two time points, your answer is exactly as expected. The diff column contains a one second difference.
Can you show us some code that produces the error?
Date <- c("03/06/2019", "03/06/2019", "03/06/2019", "03/06/2019")
Time <- c("17:15:01","17:15:02", "17:18:00", "17:21:00")
df1 <- data.frame(Date, Time)
library(chron)
df1$Time <- chron(times = df1$Time)
sapply(df1, class)
df1 <- df1 %>% mutate(diff = Time - lag(Time))
# Date Time diff
# 1 03/06/2019 17:15:01 <NA>
# 2 03/06/2019 17:15:02 00:00:01
# 3 03/06/2019 17:18:00 00:02:58
# 4 03/06/2019 17:21:00 00:03:00
Related
I have a datafile containing ~60,000 observations from 70 individuals. The datafile looks like this: datafile exampledatafile
I wish to select the last 5 minutes of data for each individual. Each individual has a different number of observations. Is there a way to identify the last observation for each individual and select the preceding 5 minutes of data? I used the code below to identify the first 5 minutes but I am unsure how to do the same for the last 5 minutes.
#Set date and time format
df$DateTime=paste(df$Date, df$Time)
df$DateTime <- as.POSIXct(df$DateTime, format="%d/%m/%Y %H:%M:%S")
df$ID <- as.numeric(as.character(df$ID))
df$Value <- as.numeric(as.character(df$Value))
extract=df %>%
group_by(ID, DateTime = cut(DateTime, breaks="5 min")) %>%
summarize(Value=median(Value))
Thanks in advance!
This should filter to the last 5 minutes of observations per individual.
df %>%
group_by(ID) %>%
mutate(last_time = max(DateTime)) %>%
ungroup() %>%
filter(DateTime >= last_time - 5*60)
I have a data frame where each row is a different timestamp. The older data in the data frame is collected at 30-minute intervals while the more recent data is collected at 15-minute intervals. I would like to run a for loop (or maybe an ifelse statement) that calulates the time difference between each row, if the difference is equal to 30 minutes (below example uses 1800 seconds) then the loop continues, but if the loop encounters a 15 minute time difference (below example uses 900 seconds) it stops and tells me which row this first occured on.
x <- as.POSIXct("2000-01-01 01:00", tz = "", "%Y-%m-%d %H:%M")
y <- as.POSIXct("2000-01-10 12:30", tz = "", "%Y-%m-%d %H:%M")
xx <- as.POSIXct("2000-01-10 12:45", tz = "", "%Y-%m-%d %H:%M")
yy <- as.POSIXct("2000-01-20 23:45", tz = "", "%Y-%m-%d %H:%M")
a.30 <- as.data.frame(seq(from = x, to = y, by = 1800))
names(a.30)[1] <- "TimeStamp"
a.15 <- as.data.frame(seq(from = xx, to = yy, by = 900))
names(a.15)[1] <- "TimeStamp"
dat <- rbind(a.30,a.15)
In the example dat data frame, the time difference switches from 30 minute to 15 minute intervals at row 457. I would like to automate the process of identifing the row where this change in time difference first occurs.
We can use difftime to calculate the difference in time in mins and create a logical vector based on the difference
library(dplyr)
dat %>%
summarise(ind = which.max(abs(as.numeric(difftime(TimeStamp,
lag(TimeStamp, default = TimeStamp[2]), unit = 'min'))) < 30))
# ind
#1 457
Here's another way that uses slightly different logic. Calculate the difference, and create a column with the row number. Then filter to where the difference is 15, and take the first row.
library(tidyverse)
dat %>% mutate(Diff = TimeStamp - lag(TimeStamp), rownum = row_number()) %>%
filter(Diff == 15) %>%
slice(1)
TimeStamp Diff rownum
1 2000-01-10 12:45:00 15 mins 457
I am an aspiring data scientist, and this will be my first ever question on StackOF.
I have this line of code to help wrangle me data. My date filter is static. I would prefer not to have to go in an change this hardcoded value every year. What is the best alternative for my date filter to make it more dynamic? The date column is also difficult to work with because it is not a
"date", it is a "dbl"
library(dplyr)
library(lubridate)
# create a sample dataframe
df <- data.frame(
DATE = c(20191230, 20191231, 20200122)
)
Tried so far:
df %>%
filter(DATE >= 20191231)
# load packages (lubridate for dates)
library(dplyr)
library(lubridate)
# create a sample dataframe
df <- data.frame(
DATE = c(20191230, 20191231, 20200122)
)
This looks like this:
DATE
1 20191230
2 20191231
3 20200122
# and now...
df %>% # take the dataframe
mutate(DATE = ymd(DATE)) %>% # turn the DATE column actually into a date
filter(DATE >= floor_date(Sys.Date(), "year") - days(1))
...and filter rows where DATE is >= to one day before the first day of this year (floor_date(Sys.Date(), "year"))
DATE
1 2019-12-31
2 2020-01-22
I have Values such as :
df[,1:2]
Results in
I want to create a new column that has the difference between the Ins and Outs.
These are TIME values,
Expected output is :
1201
0718 ( neglecting Negative values )
.. and So on.
library(stringr)
# generate few rows of data
In <- c('143','1239')
Out <- c('1344','521')
df <- data.frame(cbind(In, Out), stringsAsFactors=FALSE)
# pad with zero if needed (e.g. 143 -> 0143)
df$In[str_length(df$In) == 3] <- paste(0,df$In[str_length(df$In) == 3], sep='')
df$Out[str_length(df$Out) == 3] <- paste(0,df$Out[str_length(df$Out) == 3], sep='')
df$In <- strptime(df$In, format='%H%M')
df$Out <- strptime(df$Out, format='%H%M')
df$diff <- df$In - df$Out
This gives:
> df$diff
Time differences in hours
[1] -12.01667 7.30000
Is this what you are looking for?
If I understand correctly, the OP wants to compute the absolute time difference where the time of the day (neglecting the date) is given as character strings in the form HMM or HHMM.
There are classes which support time of the day (without date) directly, e.g., the hms package or the ITime class of the data.table package.
As an additional challenge, the timestamps are not given in a standard time format HH:MM, e.g., 09:43.
Here is an approach which uses as.ITime() after the strings have been padded.
# create sample data frame
df <- data.frame(In = c("143", "1239"),
Out = c("1344", "521"))
library(magrittr) # piping is used for readability
# pad strings and coerce to ITime class
df$In %<>%
stringr::str_pad(4L, pad = "0") %>%
data.table::as.ITime("%H%M")
df$Out %<>%
stringr::str_pad(4L, pad = "0") %>%
data.table::as.ITime("%H%M")
# compute absolute difference
df$absdiff <- abs(df$In - df$Out)
df
In Out absdiff
1 01:43:00 13:44:00 12:01:00
2 12:39:00 05:21:00 07:18:00
Now, the OP seems to expect the result in the same non-standard format HHMM (without the : field separator) as the input values. This can be achieved by
df$absdiff %>%
as.POSIXct() %>%
format("%H%M")
[1] "1201" "0718"
I have a data.frame that contains a bunch of POSIXct dates:
df <- data.frame(dte=as.POSIXct(c("2001-02-03 14:30:00",
"2001-02-04 9:30:00", "2001-02-05 10:30:00")), a=1:3)
I would like to extract the part of the df that has the time portion greater than 9:15 AM and less than 5:25 PM. I could extract the components of hour and minute separately and write a comparison but i thought there might be a more elegant way of doing it. Can anyone make a suggestion?
My current method would be:
df <- subset(df,
(as.numeric(format(dte, "%H")) > 9 & as.numeric(format(dte, "%M")) > 15) |
(as.numeric(format(dte, "%H")) < 17 & as.numeric(format(dte, "%M")) < 25))
My suggestion would be to use xts instead of a data.frame.
df <- data.frame(dte=as.POSIXct(c("2001-02-03 14:30:00",
"2001-02-04 9:30:00", "2001-02-05 10:30:00")), a=1:3)
library(xts)
x <- xts(df$a, df$dte)
x["T09:15/T17:25"] # returns everything (in your example)
x["T10:15/T14:25"] # returns the correct subset