Cannot calculate difference between two dates - r

I'm trying to calculate the difference in days between two date columns (date1 and date2). I imported a data set from csv file and the date format for date1 and date2 look as following:
8/22/18 19:05
First I tried to merely calculate the difference
data_test$dif <- data_test$date2 - data_test$date1
This led to an error "non-numeric argument to binary operator"
Then I thought to transform the variables into the as.Date format
data_test$date <- as.Date(data_test$date1)
But I get the error: "character string is not in a standard unambiguous format"
After spending various hours reading other posts I also tried unsucessfully
library(lubridate)
data_test$date <- ymd_hm(data_test$date1)
Receiving the error "All formats failed to parse. No formats found. "
Can you please help me finding the solution to this problem?
Thank you!

Related

Error with the “standard unambiguous date” for string-to-date conversion in R

So I am trying this code, which I have used in the past with other data wrangling tasks with no errors:
## Create an age_at_enrollment variable, based on the start_date per individual (i.e. I want to know an individual's age, when they began their healthcare job).
complete_dataset_1 = complete_dataset %>% mutate(age_at_enrollment = (as.Date(start_date)-as.Date(birth_date))/365.25)
However, I keep receiving this error message:
"Error in charToDate(x) : character string is not in a standard unambiguous format"
I believe this error is happening because in the administrative dataset that I am using, the start_date and birth_date variables are formatted in an odd way:
start_date birth_date
2/5/07 0:00 2/28/1992 0:00
I could not find an answer as to why the data is formatted that, so any thoughts on how to fix this issue without altering the original administrative dataset?
The ambiguity in your call to as.Date is whether the day or month comes first. To resolve this, you may use the format parameter of as.Date:
complete_dataset_1 = complete_dataset
%>% mutate(age_at_enrollment = (
as.Date(start_date, format="%m/%d/%Y") -
as.Date(birth_date, format="%m/%d/%Y")) / 365.25)
A more precise way to calculate the diff in years, handling the leap year edge case, would be to use the lubridate package:
library(lubridate)
complete_dataset_1 = complete_dataset
%>% mutate(age_at_enrollment = time_length(difftime(
as.Date(start_date, format="%m/%d/%Y"),
as.Date(birth_date, format="%m/%d/%Y")), "years")

subset data based on y/m/d/h and get error ouputs [duplicate]

I have a dataset called EPL2011_12. I would like to make new a dataset by subsetting the original by date. The dates are in the column named Date The dates are in DD-MM-YY format.
I have tried
EPL2011_12FirstHalf <- subset(EPL2011_12, Date > 13-01-12)
and
EPL2011_12FirstHalf <- subset(EPL2011_12, Date > "13-01-12")
but get this error message each time.
Warning message:
In Ops.factor(Date, 13- 1 - 12) : > not meaningful for factors
I guess that means R is treating like text instead of a number and that why it won't work?
Well, it's clearly not a number since it has dashes in it. The error message and the two comments tell you that it is a factor but the commentators are apparently waiting and letting the message sink in. Dirk is suggesting that you do this:
EPL2011_12$Date2 <- as.Date( as.character(EPL2011_12$Date), "%d-%m-%y")
After that you can do this:
EPL2011_12FirstHalf <- subset(EPL2011_12, Date2 > as.Date("2012-01-13") )
R date functions assume the format is either "YYYY-MM-DD" or "YYYY/MM/DD". You do need to compare like classes: date to date, or character to character. And if you were comparing character-to-character, then it's only going to be successful if the dates are in the YYYYMMDD format (with identical delimiters if any delimiters are used).
The first thing you should do with date variables is confirm that R reads it as a Date. To do this, for the variable (i.e. vector/column) called Date, in the data frame called EPL2011_12, input
class(EPL2011_12$Date)
The output should read [1] "Date". If it doesn't, you should format it as a date by inputting
EPL2011_12$Date <- as.Date(EPL2011_12$Date, "%d-%m-%y")
Note that the hyphens in the date format ("%d-%m-%y") above can also be slashes ("%d/%m/%y"). Confirm that R sees it as a Date. If it doesn't, try a different formatting command
EPL2011_12$Date <- format(EPL2011_12$Date, format="%d/%m/%y")
Once you have it in Date format, you can use the subset command, or you can use brackets
WhateverYouWant <- EPL2011_12[EPL2011_12$Date > as.Date("2014-12-15"),]

R datetime format issues

I am currently trying to determine the time and date on the observations in my dataset.
The date/timestamp is as follows:
1458024601.18659
1458024660.818
The observation are recorded ever minute.
I am trying to convert the above date/time stamp into something for understandable/ interpretable.
Could you please help me with this issue.
Many thanks.
Looks like seconds, but seconds starting from when? Typically, 1970-01-01:
> x = 1458024601.18659
> as.POSIXct(x, origin="1970-01-01")
[1] "2016-03-15 06:50:01 GMT"
So if you are expecting that timestamp to be that time, we've got the origin right.
If you are expecting a date in 1946, then origin="1900-01-01" is probably what you want.
Since, according to your most recent post, the data is stored as a factor class, some further manipulations are required.
To convert the factor column into the required numeric class, this modification of #Spacedman's answer should work:
as.POSIXct(as.numeric(as.character(all_prices$timestamp)), origin="1970-01-01")
Your solution is perfect, except that i have another issue :(
I tried to run this code on the data.frame that i have. Unfortunately, i keep getting this following error after running the code.
dates <- as.POSIXct(all_prices$timestamp, origin="2016-03-15")
Error in as.POSIXlt.character(as.character(x), ...) :
character string is not in a standard unambiguous format
Data "all_prices" is a data.frame.
class(all_prices)
[1] "data.frame"
data "all_prices$timestamp" is a factor.
class(all_prices$timestamp)
[1] "factor"

Find dates that fail to parse in R Lubridate

As a R novice I'm pulling my hair out trying to debug cryptic R errors. I have csv that containing 150k lines that I load into a data frame named 'date'. I then use lubridate to convert this character column to datetimes in hopes of finding min/max date.
dates <- csv[c('datetime')]
dates$datetime <- ymd_hms(dates$datetime)
Running this code I receive the following error message:
Warning message:
3 failed to parse.
I accept this as the CSV could have some janky dates in there and next run:
min(dates$datetime)
max(dates$datetime)
Both of these return NA, which I assume is from the few broken dates still stored in the data frame. I've searched around for a quick fix, and have even tried to build a foreach loop to identify the problem dates, but no luck. What would be a simple way to identify the 3 broken dates?
example date format: 2015-06-17 17:10:16 +0000
Credit to LawyeR and Stibu from above comments:
I first sorted the raw csv column and did a head() & tail() to find
which 3 dates were causing trouble
Alternatively which(is.na(dates$datetime)) was a simple one liner to also find the answer.
Lubridate will throw that error when attempting to parse dates that do not exist because of daylight savings time.
For example:
library(lubridate)
mydate <- strptime('2020-03-08 02:30:00', format = "%Y-%m-%d %H:%M:%S")
ymd_hms(mydate, tz = "America/Denver")
[1] NA
Warning message:
1 failed to parse.
My data comes from an unintelligent sensor which does not know about DST, so impossible (but correctly formatted) dates appear in my timeseries.
If the indices of where lubridate fails are useful to know, you can use a for loop with stopifnot() and print each successful parse.
Make some dates, throw an error in there at a random location.
library(lubridate)
set.seed(1)
my_dates<-as.character(sample(seq(as.Date('1900/01/01'),
as.Date('2000/01/01'), by="day"), 1000))
my_dates[sample(1:length(my_dates), 1)]<-"purpleElephant"
Now use a for loop and print each successful parse with stopifnot().
for(i in 1:length(my_dates)){
print(i)
stopifnot(!is.na(ymd(my_dates[i])))
}
To provide a more generic answer, first filter out the NAs, then try and parse, then filter only the NAs. This will show you the failures. Something like:
dates2 <- dates[!is.na(dates2$datetime)]
dates2$datetime <- ymd_hms(dates2$datetime)
Warning message:
3 failed to parse.
dates2[is.na(dates2$datetime)]
Here is a simple function that solves the generic problem:
parse_ymd = function(x){
d=lubridate::ymd(x, quiet=TRUE)
errors = x[!is.na(x) & is.na(d)]
if(length(errors)>0){
cli::cli_warn("Failed to parse some dates: {.val {errors}}")
}
d
}
x = c("2014/20/21", "2014/01/01", NA, "2014/01/02", "foobar")
my_date = lubridate::ymd(x)
#> Warning: 2 failed to parse.
my_date = parse_ymd(x)
#> Warning: Failed to parse some dates: "2014/20/21" and "foobar"
Created on 2022-09-29 with reprex v2.0.2
Of course, replace ymd() with whatever you want.
Use the truncate argument. The most common type of irregularity in date-time data is the truncation due to rounding or unavailability of the time stamp.
Therefore, try truncated = 1, then potentially go up to truncated = 3:
dates <- csv[c('datetime')]
dates$datetime <- ymd_hms(dates$datetime, truncated = 1)

Importing date vector from Excel into R

I'm reading an Excel spreadsheet into R, one of the columns contains dates in format
03-Jan-11
07-Feb-11
07-Mar-11
07-Mar-11
04-Apr-11
e.t.c
However when imported into R using read.xlsx2 they become numeric:
40546
40581
40609
40609
e.t.c.
How can I convert the numeric dates into their old format?
I have tried:
as.Date( dates, origin= "2011-01-03")
However this gave error:
Error in charToDate(x) :
character string is not in a standard unambiguous format
and
as.Date(as.numeric(Directories$Insertion.Date),origin="2011/01/03")
Did not preserve the dates
Does anyone know how I can do this in R
Many thanks.
In the end solved this with
as.Date( as.numeric (as.character(Directories$Insertion.Date) ),origin="1899-12-30")
Thanks very much to the commenters for clearing up my confusion!

Resources