I have Values such as :
df[,1:2]
Results in
I want to create a new column that has the difference between the Ins and Outs.
These are TIME values,
Expected output is :
1201
0718 ( neglecting Negative values )
.. and So on.
library(stringr)
# generate few rows of data
In <- c('143','1239')
Out <- c('1344','521')
df <- data.frame(cbind(In, Out), stringsAsFactors=FALSE)
# pad with zero if needed (e.g. 143 -> 0143)
df$In[str_length(df$In) == 3] <- paste(0,df$In[str_length(df$In) == 3], sep='')
df$Out[str_length(df$Out) == 3] <- paste(0,df$Out[str_length(df$Out) == 3], sep='')
df$In <- strptime(df$In, format='%H%M')
df$Out <- strptime(df$Out, format='%H%M')
df$diff <- df$In - df$Out
This gives:
> df$diff
Time differences in hours
[1] -12.01667 7.30000
Is this what you are looking for?
If I understand correctly, the OP wants to compute the absolute time difference where the time of the day (neglecting the date) is given as character strings in the form HMM or HHMM.
There are classes which support time of the day (without date) directly, e.g., the hms package or the ITime class of the data.table package.
As an additional challenge, the timestamps are not given in a standard time format HH:MM, e.g., 09:43.
Here is an approach which uses as.ITime() after the strings have been padded.
# create sample data frame
df <- data.frame(In = c("143", "1239"),
Out = c("1344", "521"))
library(magrittr) # piping is used for readability
# pad strings and coerce to ITime class
df$In %<>%
stringr::str_pad(4L, pad = "0") %>%
data.table::as.ITime("%H%M")
df$Out %<>%
stringr::str_pad(4L, pad = "0") %>%
data.table::as.ITime("%H%M")
# compute absolute difference
df$absdiff <- abs(df$In - df$Out)
df
In Out absdiff
1 01:43:00 13:44:00 12:01:00
2 12:39:00 05:21:00 07:18:00
Now, the OP seems to expect the result in the same non-standard format HHMM (without the : field separator) as the input values. This can be achieved by
df$absdiff %>%
as.POSIXct() %>%
format("%H%M")
[1] "1201" "0718"
Related
I want to get the difference in consecutive rows of Time. When I used this code it works fine.
But when I apply that to my original dataset 1 minute appears as 0.0006944444 (in terms of days).
How can I get make it appear as 1 minute instead?
Date <- c("03/06/2019", "03/06/2019", "03/06/2019", "03/06/2019")
Time <- c("17:15:00","17:16:00", "17:18:00", "17:21:00")
df1 <- data.frame(Date, Time)
library(chron)
df1$Time <- chron(times = df1$Time)
sapply(df1, class)
df1 <- df1 %>% mutate(diff = Time - lag(Time))
Too long for a comment for the time being. When I run your code on a modified data set with a one second difference between two time points, your answer is exactly as expected. The diff column contains a one second difference.
Can you show us some code that produces the error?
Date <- c("03/06/2019", "03/06/2019", "03/06/2019", "03/06/2019")
Time <- c("17:15:01","17:15:02", "17:18:00", "17:21:00")
df1 <- data.frame(Date, Time)
library(chron)
df1$Time <- chron(times = df1$Time)
sapply(df1, class)
df1 <- df1 %>% mutate(diff = Time - lag(Time))
# Date Time diff
# 1 03/06/2019 17:15:01 <NA>
# 2 03/06/2019 17:15:02 00:00:01
# 3 03/06/2019 17:18:00 00:02:58
# 4 03/06/2019 17:21:00 00:03:00
Is there an efficient way to select Date and numeric columns in R?
df <- data.frame(
Date=c("10/11/2012","10/12/2012"),
AE=c(1211,100),
Percent=c(0.03,0.43),
Name = c("A", "B")
)
As such I can use is.numeric function to check if a column is numeric or not and then use one of several ways to subset, but is there a function to check if a column is date and how to use multiple conditions for subseting.
I found that there is a funciton is.Date in lubridate package but it did not work
#does not work
df <- df %>%
select_if(is.numeric|is.Date)
dplyr verbs for selection allow various methods of providing the conditionals>
raw functions, as in is.numeric, which will be called with the column data (a vector) as its one argument;
anonymous functions (R style), as in function(x) is.numeric(x) | inherits(x, "Date");
what is called a "purrr style lambda" using R formulas (~), which seems to be just a more compact form of the base R anon-func, but there are a some differences, namely you use . or .x as a placeholder for the column data, as in the answer below
df %>%
select_if(~ is.numeric(.) | inherits(., "Date"))
# AE Percent
# 1 1211 0.03
# 2 100 0.43
Since your first column is not actually a date, let's fix that
# df$Date <- as.Date(df$Date, format="%m/%d/%Y")
df %>%
mutate(Date = as.Date(Date, format="%m/%d/%Y")) %>%
select_if(~ is.numeric(.x) | inherits(.x, "Date"))
# Date AE Percent
# 1 2012-10-11 1211 0.03
# 2 2012-10-12 100 0.43
I have a dataframe with dates stored as strings. The conversion with strptime works fine when I test it in the terminal, but when I want to assign the date in the original cell, I get an error:
provided 11 variables to replace 1 variables
This must be due to the fact that the Object created by strptime() POSIXlt is a list.
How can I assign that object into the cell? I later want to order the dataframe by the date column.
I'm sorry that I can't share the code, due to privacy restrictions.
Edit: This snippet should produce the same error
#creating dataframe
x <- c( "20.11.2019 10:12:15", "21.10.2019 10:12:16", "20.10.2019 10:12:20")
y <- c( "1234", "1238", "1250")
df <- data.frame( "date" = x, "id" = y)
df[order(df$date),] #ordering by date
df #showing that dates get ordered 'incorrectly'
df[,1] = strptime(df[,1], "%d.%m.%Y %H:%M:%S") #trying to replace character with dates while converting
#afterwards I want to order them again 'correctly'
Personally I would use dplyr to mutate the values of the original cell. In combination with lubridate it works for me (at least I think this what you wanted):
df <- df %>% mutate(date =ymd_hms(strptime(date, "%d.%m.%Y %H:%M:%S"))) %>% arrange(date)
date id
1 2019-10-20 10:12:20 1250
2 2019-10-21 10:12:16 1238
3 2019-11-20 10:12:15 1234
This simple adjustment also works. Change df[,1] to df$date.
df$date = strptime(df[,1], "%d.%m.%Y %H:%M:%S")
I am trying to calculate the number of weekdays between two dates in a data frame.
I am using the solution given here. The solution works when dates are available in all columns, but if any dates are missing, then there are no results.
Here is the code being used:
library(dplyr)
# The macro to calculate working days
Nweekdays <- Vectorize(function(a, b)
sum(!weekdays(seq(a, b, "days")) %in% c("Saturday", "Sunday")))
# Sample data frame
id = c("ID1", "ID2", "ID3")
startDate = c("2019-08-01", "2019-08-06", "2019-08-10")
endDate = c("2019-08-05", "2019-08-15", "2019-08-20")
df = data.frame(id, startDate, endDate)
# Using dplyr to coerce to Date and run macro
df <- df %>%
mutate(startDate = as.Date(startDate)) %>%
mutate(endDate = as.Date(endDate)) %>%
mutate(workingdays = Nweekdays(startDate, endDate))
The code works correctly and gives me a new column with working days. But if one of the dates is missing or NA, e.g.
startDate = c("2019-08-01", "", "2019-08-10")
then I get
Evaluation error: 'to' must be a finite number.
and there is no new column generated. I want an empty result for the missing value, but the correct result for all others. I am sure I am missing something basic so apologies for that!!
You just need to update your function to deal with non-date values so it only tries to compute if both a and b are dates:
Nweekdays <- Vectorize(function(a, b) {
if (!is.na(a) & !is.na(b)) {
sum(!weekdays(seq(a, b, "days")) %in% c("Saturday", "Sunday"))
} else {
return(NA)
}
})
you can use some more strict form of validation rather than !is.na() with something like lubridate::is.Date(), but this is a base solution and any non-date value will convert to NA when you call as.Date() in the mutate line.
I have a data.frame that contains a bunch of POSIXct dates:
df <- data.frame(dte=as.POSIXct(c("2001-02-03 14:30:00",
"2001-02-04 9:30:00", "2001-02-05 10:30:00")), a=1:3)
I would like to extract the part of the df that has the time portion greater than 9:15 AM and less than 5:25 PM. I could extract the components of hour and minute separately and write a comparison but i thought there might be a more elegant way of doing it. Can anyone make a suggestion?
My current method would be:
df <- subset(df,
(as.numeric(format(dte, "%H")) > 9 & as.numeric(format(dte, "%M")) > 15) |
(as.numeric(format(dte, "%H")) < 17 & as.numeric(format(dte, "%M")) < 25))
My suggestion would be to use xts instead of a data.frame.
df <- data.frame(dte=as.POSIXct(c("2001-02-03 14:30:00",
"2001-02-04 9:30:00", "2001-02-05 10:30:00")), a=1:3)
library(xts)
x <- xts(df$a, df$dte)
x["T09:15/T17:25"] # returns everything (in your example)
x["T10:15/T14:25"] # returns the correct subset