I have a data.frame that contains a bunch of POSIXct dates:
df <- data.frame(dte=as.POSIXct(c("2001-02-03 14:30:00",
"2001-02-04 9:30:00", "2001-02-05 10:30:00")), a=1:3)
I would like to extract the part of the df that has the time portion greater than 9:15 AM and less than 5:25 PM. I could extract the components of hour and minute separately and write a comparison but i thought there might be a more elegant way of doing it. Can anyone make a suggestion?
My current method would be:
df <- subset(df,
(as.numeric(format(dte, "%H")) > 9 & as.numeric(format(dte, "%M")) > 15) |
(as.numeric(format(dte, "%H")) < 17 & as.numeric(format(dte, "%M")) < 25))
My suggestion would be to use xts instead of a data.frame.
df <- data.frame(dte=as.POSIXct(c("2001-02-03 14:30:00",
"2001-02-04 9:30:00", "2001-02-05 10:30:00")), a=1:3)
library(xts)
x <- xts(df$a, df$dte)
x["T09:15/T17:25"] # returns everything (in your example)
x["T10:15/T14:25"] # returns the correct subset
Related
I want to get the difference in consecutive rows of Time. When I used this code it works fine.
But when I apply that to my original dataset 1 minute appears as 0.0006944444 (in terms of days).
How can I get make it appear as 1 minute instead?
Date <- c("03/06/2019", "03/06/2019", "03/06/2019", "03/06/2019")
Time <- c("17:15:00","17:16:00", "17:18:00", "17:21:00")
df1 <- data.frame(Date, Time)
library(chron)
df1$Time <- chron(times = df1$Time)
sapply(df1, class)
df1 <- df1 %>% mutate(diff = Time - lag(Time))
Too long for a comment for the time being. When I run your code on a modified data set with a one second difference between two time points, your answer is exactly as expected. The diff column contains a one second difference.
Can you show us some code that produces the error?
Date <- c("03/06/2019", "03/06/2019", "03/06/2019", "03/06/2019")
Time <- c("17:15:01","17:15:02", "17:18:00", "17:21:00")
df1 <- data.frame(Date, Time)
library(chron)
df1$Time <- chron(times = df1$Time)
sapply(df1, class)
df1 <- df1 %>% mutate(diff = Time - lag(Time))
# Date Time diff
# 1 03/06/2019 17:15:01 <NA>
# 2 03/06/2019 17:15:02 00:00:01
# 3 03/06/2019 17:18:00 00:02:58
# 4 03/06/2019 17:21:00 00:03:00
I have a large dataframe and I want to select rows which satisfy condition on date columns. The dataframe is similar to this:
library(tidyverse)
library(lubridate)
curdate <- seq(as.Date("2000/1/1"), by = "month", length.out = 24)
expdate <- rep(seq(as.Date("2000/3/1"), by = "quarter", length.out = 12),2)
afactor <- rep(c("C","P"),12)
anumber <- runif(24)
df<-data.frame(curdate, expdate, afactor, anumber)
df$expdate[12]<-as.Date("2001-02-01")
I would like to get the rows which the month of the expiration date (expdate) is two months later than the month of current date (curdate). In this example, I should select these five dates (rows 1, 7, 12, 13 and 19):
curdate expdate afactor anumber
2000-01-01 2000-03-01 C 0.6832251
2000-07-01 2001-09-01 C 0.2671076
2001-01-01 2000-03-01 C 0.2097065
2001-07-01 2001-09-01 C 0.9258450
2000-12-01 2001-02-01 P 0.4903951
First I used the following line for that:
df_select1 <- df %>% group_by(curdate, afactor) %>%
filter(month(expdate) == month(curdate)+2)
But it misses the cases when the month is November or December. For instance here, it misses the case when curdate is 2000-12-01. So I want to add a condition, to deal with these cases. I wrote:
df_select2 <- df %>% group_by(curdate, afactor) %>%
if_else(month(curdate)<11,
filter(month(expdate) == month(curdate)+2),
filter(month(expdate) == month(curdate)-10))
but I get the following error: condition must be a logical vector, not a grouped_df/tbl_df/tbl/data.frame object.
I found the following solution, but there are certainly much shorter ways to do it:
df_select1 <- df %>% group_by(curdate, afactor) %>%
filter(month(curdate)<11) %>%
filter(month(expdate) == month(curdate)+2)
df_select2 <- df %>% group_by(curdate, afactor) %>%
filter(month(curdate)>10) %>%
filter(month(expdate) == month(curdate)-10)
df_select <- full_join(df_select1, df_select2)
If you're importing lubridate, you probably should also make use of its functions for calculating with months. Those are a bit tricky obviously because they are not of equal lengths, why the base function difftime is not offering a monthly unit for example.
This would be a solution for your problem, without the if_else function:
df_select1 <- df %>% group_by(curdate, afactor) %>%
filter(expdate == curdate + months(2))
By the way, you're not running into problems as long as your data is always the first day in the respective month. You have to decide what should happen in the following cases though:
ymd("2019-08-31")+months(1)
ymd("2019-01-29")+months(1)
This leads to an NA for obvious reason. If this happens lubridate::add_with_rollback() could offer a solution, depending on your needs.
An edit after clarifying the question. If you're looking for those dates whose expdate is two months "later" compared to the curdate, in the specific sense that you're comparing only their months regardless of the year, a little modulo operation might help:
df %>%
filter(lubridate::month(expdate) == (lubridate::month(curdate)+2) %% 12)
You can add 2 months to curdate using the %m+% operator from lubridate:
df %>%
filter(months(expdate) == months(curdate %m+% months(2)))
This will take into account the variation in days by calendar month.
Edit
I've added the months function from base-R after the question was updated. The month function from lubridate could also be used.
I have a dataframe with dates stored as strings. The conversion with strptime works fine when I test it in the terminal, but when I want to assign the date in the original cell, I get an error:
provided 11 variables to replace 1 variables
This must be due to the fact that the Object created by strptime() POSIXlt is a list.
How can I assign that object into the cell? I later want to order the dataframe by the date column.
I'm sorry that I can't share the code, due to privacy restrictions.
Edit: This snippet should produce the same error
#creating dataframe
x <- c( "20.11.2019 10:12:15", "21.10.2019 10:12:16", "20.10.2019 10:12:20")
y <- c( "1234", "1238", "1250")
df <- data.frame( "date" = x, "id" = y)
df[order(df$date),] #ordering by date
df #showing that dates get ordered 'incorrectly'
df[,1] = strptime(df[,1], "%d.%m.%Y %H:%M:%S") #trying to replace character with dates while converting
#afterwards I want to order them again 'correctly'
Personally I would use dplyr to mutate the values of the original cell. In combination with lubridate it works for me (at least I think this what you wanted):
df <- df %>% mutate(date =ymd_hms(strptime(date, "%d.%m.%Y %H:%M:%S"))) %>% arrange(date)
date id
1 2019-10-20 10:12:20 1250
2 2019-10-21 10:12:16 1238
3 2019-11-20 10:12:15 1234
This simple adjustment also works. Change df[,1] to df$date.
df$date = strptime(df[,1], "%d.%m.%Y %H:%M:%S")
I have a dataframe organized by year.
For example:
date <- seq(as.Date("2001-07-20"),as.Date("2010-12-31"),by = 1)
Now I want to select a subset by using two time periods:
June 23 to July 13 AND July 20 to Aug 9 for 2004-2008.
Could you provide some clue? Thanks!
Yes, it can be solved by:
test[date %between% c("2004-07-20", "2004-08-09")]...
but there are many years in my data, the code can be very repetitive.
I wonder if it can be solved like:
df$md <- format(as.Date(df$date), "%m-%d")
df <- df[df$md %in% c(as.Date(06-23):Date(07-13), Date(07-20):Date(08-09)) & year %in% (2004:2008),]
It doesn't work: Error in as.Date.numeric(6 - 23) : 'origin' must be supplied
You can construct the ranges of interest and subset:
library(lubridate)
date <- seq(as.Date("2001-07-20",origin="1970-01-01"),as.Date("2010-12-31",origin="1970-01-01"),by = 1)
range1 <- as.Date(unlist(lapply(c(0:4),function(y) seq(as.Date("2004-06-23",origin="1970-01-01"),as.Date("2004-07-13",origin="1970-01-01"),by="1 day") + years(y))),origin="1970-01-01")
range2 <- as.Date(unlist(lapply(c(0:4),function(y) seq(as.Date("2004-07-20",origin="1970-01-01"),as.Date("2004-08-09",origin="1970-01-01"),by="1 day") + years(y))),origin="1970-01-01")
date[date %in% range1 | date %in% range2]
Alternative
Alternative answer using %between% as suggested in OP
library(lubridate)
dates <- seq(as.Date("2001-07-20"),as.Date("2010-12-31"),by = 1)
r1 <- c(as.Date("2004-06-23"),as.Date("2004-07-13"))
r2 <- c(as.Date("2004-07-20"),as.Date("2004-08-09"))
ranges <- lapply(c(0:4),function(y) list(r1=r1 + years(y),r2=r2+years(y)))
as.Date(unlist(lapply(ranges,function(r) { dates[dates %between% r$r1 | dates %between% r$r2] })))
I have Values such as :
df[,1:2]
Results in
I want to create a new column that has the difference between the Ins and Outs.
These are TIME values,
Expected output is :
1201
0718 ( neglecting Negative values )
.. and So on.
library(stringr)
# generate few rows of data
In <- c('143','1239')
Out <- c('1344','521')
df <- data.frame(cbind(In, Out), stringsAsFactors=FALSE)
# pad with zero if needed (e.g. 143 -> 0143)
df$In[str_length(df$In) == 3] <- paste(0,df$In[str_length(df$In) == 3], sep='')
df$Out[str_length(df$Out) == 3] <- paste(0,df$Out[str_length(df$Out) == 3], sep='')
df$In <- strptime(df$In, format='%H%M')
df$Out <- strptime(df$Out, format='%H%M')
df$diff <- df$In - df$Out
This gives:
> df$diff
Time differences in hours
[1] -12.01667 7.30000
Is this what you are looking for?
If I understand correctly, the OP wants to compute the absolute time difference where the time of the day (neglecting the date) is given as character strings in the form HMM or HHMM.
There are classes which support time of the day (without date) directly, e.g., the hms package or the ITime class of the data.table package.
As an additional challenge, the timestamps are not given in a standard time format HH:MM, e.g., 09:43.
Here is an approach which uses as.ITime() after the strings have been padded.
# create sample data frame
df <- data.frame(In = c("143", "1239"),
Out = c("1344", "521"))
library(magrittr) # piping is used for readability
# pad strings and coerce to ITime class
df$In %<>%
stringr::str_pad(4L, pad = "0") %>%
data.table::as.ITime("%H%M")
df$Out %<>%
stringr::str_pad(4L, pad = "0") %>%
data.table::as.ITime("%H%M")
# compute absolute difference
df$absdiff <- abs(df$In - df$Out)
df
In Out absdiff
1 01:43:00 13:44:00 12:01:00
2 12:39:00 05:21:00 07:18:00
Now, the OP seems to expect the result in the same non-standard format HHMM (without the : field separator) as the input values. This can be achieved by
df$absdiff %>%
as.POSIXct() %>%
format("%H%M")
[1] "1201" "0718"