mutate_impl error in dplyr/lubridate add date time - r

Using the lubridate package I want to add seconds (for the purpose of the example) to a "POSIXct", "POSIXt" field in a tibble.
b <- structure(list(`"a"` = c("a", "a", "a", "a", "a"), Date_time = structure(c(1506694322,
1506694270, 1506693970, 1506693897, 1506693849), class = c("POSIXct",
"POSIXt"), tzone = "")), .Names = c("\"a\"", "Date_time"), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -5L))
b %>%
mutate(tol_lower = Date_time - second(2),
tol_lower = Date_time + second(30))
I get the error:
Error in mutate_impl(.data, dots) : 'origin' must be supplied
Why is this? I appreciate i can calculate hours, but I'd like to know what I'm doing wrong.
Additional points:
-I've tried as.Date, which gives the same error.
-I can add seconds directly without issue: tol_lower = Date_time - 2

Whyn't use this?
b %>% mutate(tol_lower = Date_time - 2,
tol_upper = Date_time + 30)
In case you want to add hours to given date then simply use Date_time + 2*60*60 (i.e. 2 hours added to Date_time).
Also ?second clearly says that x in second(x) is a "date-time object" but in your case you are trying to pass an integer.
Hope it helps!

Related

How to filter a part of dates and then change that part?

There are a group of dates in test_2 called df that I'm trying to change. For example: 2020-12-15 is in the started_at column and 2020-12-25 is in the ended_at column. I want to change the daypart of the ended_at column.
I could write day(test_2$ended_at) <- 15 #[thanks Ben for guiding me with this chunk]
But the problem is there are some other days also. Like, 2020-12-08 etc.
How is it possible to filter the required part of the date and change it?
I soulfully appreciate your kind help.
Here is the dput of the data structure.
> dput(test_2)
structure(list(started_at = structure(c(1608033433, 1608033092,
1608033242, 1608034138, 1608034548, 1608033904, 1608033525, 1608032413,
1608032432, 1607385918, 1608032241, 1608034867, 1609079592, 1608033139,
1608032406, 1608034912, 1608033844, 1608032114, 1608034239, 1608032677,
1608032219, 1608033975, 1609459101, 1608032929, 1608034558, 1608034138,
1608033654, 1608033875, 1606810523, 1608034878, 1608034232), tzone = "UTC", class = c("POSIXct",
"POSIXt")), ended_at = structure(c(1608914839, 1608908027, 1608909124,
1608924913, 1608905112, 1608920814, 1608915081, 1608891612, 1608896054,
1607385667, 1608891462, 1608922015, 1606985651, 1608907113, 1608896350,
1608923619, 1608923486, 1608887393, 1608934063, 1608899164, 1608886816,
1608924042, 1606781193, 1608907025, 1608914882, 1608923510, 1608921699,
1608922845, 1606810492, 1608913874, 1608943331), tzone = "UTC", class = c("POSIXct",
"POSIXt"))), row.names = c(NA, -31L), class = c("tbl_df", "tbl",
"data.frame"))
Based on the description, we may create a logical index for subsetting and updating the 'ended_at' column
library(lubridate)
i1 <- with(test_2, as.Date(started_at) == "2020-12-15" &
as.Date(ended_at ) == "2020-12-25")
day(test_2$ended_at[i1]) <- 15

Imputing the date and time in R

I am having a data set like below and I am trying to impute the value like below.
ID In Out
4 2019-09-20 21:57:22 NA
4 NA 2019-09-21 5:07:03
When there NA's in lead and lag for each ID's, I am trying to impute the time to cut off the previous day and start new time for the next day. I was doing like this, but I am getting error
df1%>%
group_by(ID) %>%
mutate(In= ifelse(is.na(In) & is.na(lag(Out)),
as.POSIXct(as.character(paste(as.Date(In),"05:00:01"))),
In)) %>%
mutate(Out= ifelse(is.na(Out) & lead(In) == "05:00:01",
as.POSIXct(as.character(paste(as.Date(Out),"05:00:00"))),
Out))
The desired output will be
ID In Out
4 2019-09-20 21:57:22 2019-09-21 05:00:00
4 2019-09-21 5:00:01 2019-09-21 5:07:03
Dput for the data
structure(list(concat = c("176 - 2019-09-20", "176 - 2019-09-20",
"176 - 2019-09-20", "176 - 2019-09-20", "176 - 2019-09-21"),
ENTRY = structure(c(1568989081, 1569008386, 1569016635, 1569016646,
NA), class = c("POSIXct", "POSIXt"), tzone = "UTC"), EXIT = structure(c(1569005439,
1569014914, 1569016645, NA, 1569042433), class = c("POSIXct",
"POSIXt"), tzone = "UTC")), row.names = c(NA, -5L), class = c("data.table",
"data.frame"), .internal.selfref = <pointer: 0x0000000007e21ef0>)
Finally, I got the desired output by separating the date and time and pasting it back. Definitely this is not a efficient way to achieve this. May be some one can suggest other efficient way to do this which gives some learning at least.
df%>%
mutate(ENTRY_date = as.Date(ENTRY)) %>%
mutate(EXIT_date = as.Date(EXIT))%>%
mutate(ENTRY_time = format(ENTRY,"%H:%M:%S"))%>%
mutate(EXIT_time = format(EXIT,"%H:%M:%S"))%>%
mutate(Entry_date1 = if_else(is.na(ENTRY_date)&is.na(lag(EXIT_date)),EXIT_date,ENTRY_date))%>%
mutate(Exit_date1 = if_else(is.na(EXIT_date)& is.na(lead(ENTRY_date)),ENTRY_date,EXIT_date))%>%
mutate(Entry_time1 = if_else(is.na(ENTRY_time)&is.na(lag(EXIT_time)),"05:00:01",ENTRY_time))%>%
mutate(Exit_time1 = if_else(is.na(EXIT_time)& is.na(lead(ENTRY_time)),"04:59:59",EXIT_time))%>%
mutate(ENTRY1 = as.POSIXct(paste(Entry_date1, Entry_time1), format = "%Y-%m-%d %H:%M:%S"))%>%
mutate(EXIT1 = as.POSIXct(paste(Exit_date1, Exit_time1), format = "%Y-%m-%d %H:%M:%S"))
First, using your dput() data did not work for me. Anyway, if I understand your question correctly you can do it like this:
# load pacakge
library(lubridate)
# replace missing In values with the corresponding Out values,
# setting 5:00:01 as time.
df$In[is.na(df$In)] <- ymd_hms(paste0(as.Date(df$Out[is.na(df$In)]), " 5:00:01"))
# same idea but first we save it as a vector...
Out <- ymd_hms(paste0(as.Date(df$In[is.na(df$Out)]), " 5:00:00"))
# ... then we add one day
day(Out) <- day(Out) + 1; df$Out[is.na(df$Out)] <- Out
This works for the data that you provided but if Out time is 2019-09-21 04:07:03, for example, then the correstponding In time is later, namely 2019-09-21 05:00:01. I do not know if this is intended. If not please specify your question.
I used this data
structure(list(In = structure(c(1569016642, NA), tzone = "UTC", class = c("POSIXct",
"POSIXt")), Out = structure(c(NA, 1569042423), tzone = "UTC", class = c("POSIXct",
"POSIXt"))), .Names = c("In", "Out"), row.names = c(NA, -2L), class = "data.frame")

Converting a count down timer to a usable format

I am trying to convert a count down timer from a character into a usable format, ideally a time format but numeric may work.
I have tried converting it using, as.POSIXct and also using the Chron package
Here is a dput of the DF
structure(list(Time = c("(-01:30)", "(-01:15)", "(-01:00)", "(-00:45)",
"(-00:30)", "(-00:15)", "0", "+00:13", "+00:15", "+00:30", "+00:45"
)), row.names = c(NA, -11L), class = c("tbl_df", "tbl", "data.frame"
))
I have already removed the brackets from the time column using
sd$Time = (gsub("[(),//]", "", sd$Time))
Then tried ton convert using the following
sd$Time <- as.POSIXct(sd$Time, format="%M:%S")
An option would be strptime
strptime(sub("^0$", "00:00", gsub("[-+()]", "", sd$Time)), format = "%M:%S")

Error in dplyr::summarise when working with datetimes and lubridate::dseconds

I have a tibble representing log messages. It has (among others) two columns:
FileCreationDateTime identifies the logfile from which the message originates and is thus intended as grouping variable. (Think of it as "filename")
EventDateTime is the time where some Event happend
What I now want to do is to find the start time, the end time and the duration of each logfile (identified by FileCreationDateTime). I think (or thought) this can be done with the following code:
file_durations <-
logMessages%>%
group_by(FileCreationDateTime) %>%
summarise(start = min(EventDateTime),
end = max(EventDateTime),
duration = dseconds(end - start))
The code itself seems to run without error, i can however neither print the result nor access it (at least not column "duration") as it returns the error
Error in sprintf("%ds (~%s %ss)", x, x2, unit, "s)") :
invalid format '%d'; use format %f, %e, %g or %a for numeric objects
Investigating, I found that the error seems to depend on the exact values of the datetimes. I have put together a MWE with two tibbles. The two tibbles differ only in one value. One works, while the other doesn't. I have no idea what could cause the error. Can someone enlighten me?
The human readable tibbles:
> working
# A tibble: 2 × 2
EventDateTime FileCreationDateTime
<dttm> <dttm>
1 2016-11-24 16:16:44.986 2016-11-24 16:16:46
2 2016-11-24 16:17:43.282 2016-11-24 16:16:46
> broken
# A tibble: 2 × 2
EventDateTime FileCreationDateTime
<dttm> <dttm>
1 2016-11-24 16:16:44.986 2016-11-24 16:16:46
2 2016-11-24 16:18:31.971 2016-11-24 16:16:46
The complete MWE:
library(tidyverse)
library(lubridate)
options(digits.secs = 6, digits = 6)
working <- structure(list(EventDateTime = structure(c(1480004204.987, 1480004263.283),
class = c("POSIXct", "POSIXt"),
tzone = "UTC"),
FileCreationDateTime = structure(c(1480000606, 1480000606),
class = c("POSIXct", "POSIXt"),
tzone = "Europe/Vienna")),
.Names = c("EventDateTime", "FileCreationDateTime"),
row.names = c(NA, -2L),
class = c("tbl_df", "tbl", "data.frame"))
working %>%
group_by(FileCreationDateTime) %>%
summarise(start = min(EventDateTime),
end = max(EventDateTime),
duration = dseconds(end - start))
broken <- structure(list(EventDateTime = structure(c(1480004204.987, 1480004311.972),
class = c("POSIXct", "POSIXt"),
tzone = "UTC"),
FileCreationDateTime = structure(c(1480000606, 1480000606),
class = c("POSIXct", "POSIXt"),
tzone = "Europe/Vienna")),
.Names = c("EventDateTime", "FileCreationDateTime"),
row.names = c(NA, -2L),
class = c("tbl_df", "tbl", "data.frame"))
broken %>%
group_by(FileCreationDateTime) %>%
summarise(start = min(EventDateTime),
end = max(EventDateTime),
duration = dseconds(end - start))
I am using R 3.4.0 64bit, lubridate_1.6.0 and dplyr_0.5.0 on Windows 10.
Thanks for any help!
I finally found the problem. It has nothing todo with dplyr but with lubridate::dseconds. As already reported (e.g. this issue) it fails on non-integer inputs > 60. This was apparently also my problem.

R Error: index is not in increasing order

NOTE: PROBLEM RESOLVED IN THE COMMENTS BELOW
I'm getting the following error when trying to turn a data.frame into xts following the answer in found here.
Error in .xts(DA[, 3:6], index = as.POSIXct(DAINDEX, format = "%m/%d/%Y %H:%M:%S", :
index is not in increasing order
I've not been able to find much on this error or how to resolve it, so any help towards that would be greatly appreciated.
The data is daily S&P 500 in a comma delimited format with the following columns: "Date" "Time" "Open" "High" "Low" "Close".
Below is the code:
DA <- read.csv("SNP.csv", header = TRUE, stringsAsFactors = FALSE)
DAINDEX <- paste(DA$Date, DA$Time, sep = " ")
Data.hist <- .xts(DA[,3:6], index = as.POSIXct(DAINDEX, format = "%m/%d/%Y %H:%M:%S", tzone = "GMT"))
As requested, some lines of the data
structure(list(Date = c("5/20/2016", "5/19/2016", "5/18/2016",
"5/17/2016", "5/16/2016", "5/13/2016"), Time = c("0:00:00", "0:00:00",
"0:00:00", "0:00:00", "0:00:00", "0:00:00"), Open = c(2041.880005,
2044.209961, 2044.380005, 2065.040039, 2046.530029, 2062.5),
High = c(2058.350098, 2044.209961, 2060.610107, 2065.689941,
2071.879883, 2066.790039), Low = c(2041.880005, 2025.910034,
2034.48999, 2040.819946, 2046.530029, 2043.130005), Close = c(2052.320068,
2040.040039, 2047.630005, 2047.209961, 2066.659912, 2046.609985
)), .Names = c("Date", "Time", "Open", "High", "Low", "Close"
), row.names = c(NA, 6L), class = "data.frame")
The above is the output of dput(head(DA))
The easiest thing to do is use the regular xts constructor instead of .xts. It will check if the index is sorted correctly, and sort the index and data, if necessary.
Data.hist <- xts(DA[,3:6], as.POSIXct(DAINDEX, "%m/%d/%Y %H:%M:%S", "GMT"))

Resources