mdy {lubridate} unable to identify "January" - r

I am working with a list of birth dates in the format "January131973". To get the dates from this string, I am using mdy function from lubridate library. Strangely the code was returning NA only for dates in January, but working fine for other months, as below
> mdy("January131973")
[1] NA
Warning message:
All formats failed to parse. No formats found.
> mdy("April241973")
[1] "1973-04-24"
The data spans across all months and dates, and years ranging from 1971 to 1990. But the error occurs only for dates in January. I have worked around the input string to get "13January1973" and proceeded with dmy function instead, which has resolved the issue at hand. (ymd also works perfectly fine.)
However, if it can be verified that I am not overlooking any underlying conflicts etc, it will be helpful the next times, and can also help identify unseen issues elsewhere.
Here is a test code I have tried out to check different combinations
library(tidyr)
library(lubridate)
x <- data.frame(mmm=month.name, dd=c(15:26), yyyy=c(1973:1984))
x_mdy <- unite(x, test, mmm,dd,yyyy, sep = "",remove = FALSE)
lapply(x_mdy$test, mdy)
x_dmy <- unite(x, test, dd,mmm,yyyy, sep = "", remove = FALSE)
lapply(x_dmy$test, dmy)
x_ymd <- unite(x, test, yyyy,mmm,dd, sep = "", remove = FALSE)
lapply(x_ymd$test, ymd)
After running the above code, I have faced the issue only while using mdy with "January". Also note that abbreviated form of the month name also gives the same error (mmm=month.abb in the above df creation.)
Any clarification of this behavior appreciated.

Related

Lubridate or ANYTIME to convert from 24hr to 12hr time

As the title suggests, I am trying to use either lubridate or ANYTIME (or similar) to convert a time from 24 hour into 12 hour.. To make life easier I don't need the whole time converted.
What I mean is I have a column of dates in this format:
2021-02-15 16:30:33
I can use inbound$Hour <- hour(inbound$Timestamp) to grab just the hour from the Timestamp which is great.. except that it is still in 24hr time. (this creates an integer column for the hour number)
I have tried several mutates such as inbound <- inbound %>% mutate(Hour = ifelse(Hour > 12, sum(Hour - 12),Hour)
This technically works.. but I get some really wonky values (I get a -294 in several rows for example)..
is there an easier way to get the 12hr time converted?
Per recommendation below I tried to use a base FORMAT as follows:
inbound$Time <- format(inbound$Timestamp, "%H:%M:%S")
inbound$Time <- format(inbound$Time, "%I:%M:%S")
and on the second format I am getting an error
Error in format.default(inbound$Time, "%I:%M:%S") :
invalid 'trim' argument
I did notice the first format converts to a class CHARACTER column.. not sure if that is causing issues with the 2nd format or not..
I then also tried:
`inbound$time <- format(strptime(inbound$Timestamp, "%H:%M:%S"), "%I:%M %p")`
Which runs without error.. but it creates a full column of NA's
Final edit::::: I made the mistake of mis-reading/applying the solution and that caused errors.. when using the inbound$Time <- format(inbound$Time, "%I:%M:%S") or as.numeric(format(inbound$Timestamp, "%I")) from the comments... both worked and solved the issue I was having.
To be clear... From 2021-02-15 16:30:33 you want just 04:30:33 as a result?
No need for lubridate or anytime. Assuming that is a Posixct
a <- as.POSIXct("2021-02-15 16:30:33")
a
# [1] "2021-02-15 16:30:33 UTC"
b <- format(a, "%H:%M:%S")
b
#[1] "16:30:33"
c <- format(a, "%I:%M:%S")
c
#[1] "04:30:33"

How do you change numerical values in R into dates?

Hi this question has been bugging me for some time.
So I am trying to convert the so-called dates in my R project into actual dates. Right now the dates are arranged in a numerical manner, ie after 2/28/2020 it's not 3/1/2020 but 2/3/2020.
I've tried the
as.Date(3/14/2020, origin = "14-03-2020")
and also
df <- data.frame(Date = c("10/9/2009 0:00:00", "10/15/2009 0:00:00"))
as.Date(df$Date, "%m/%d/%Y %H:%M:%S")
and
strDates <- c("01/28/2020", "05/03/2020")%>%
dates <- as.Date(strDates, "%m/%d/%Y")
i just plugged in two dates to test out if it works or not because there are about around 40 dates. However, my output is as follows:
Error in as.Date.default(., 3/14/2020, origin = "14-03-2020") : do not know how to convert '.' to class “Date”
for the first one and then
the second one is:
data frame not found
the third one is
Error in as.Date(strDates, "%m/%d/%Y") : object 'strDates' not found
Issues with your code:
as.Date(3/14/2020, origin = "14-03-2020")
First, R will replace 3/14/2020 with 0.000106082, since that's what 3 divided by 14 divided by 2020 equals. You need to identify it as a string using single or double quotes, as in: as.Date("3/14/2020", origin = "14-03-2020").
But that is still broken. When converting to Date, if you provide a character (string) input, then you may need to provide format=, since it needs to know which numbers in the string correspond to year, month, date, etc. If you provide a numeric (or integer) input, then you do need to provide origin=, so that it knows what "day 0" is. For unix, epoch is what you need, so origin="1970-01-01". If you're using dates from Excel, you need origin="1899-12-30" (see https://stackoverflow.com/a/43230524).
Your next error is because you are mixing magrittr ops with ... base R.
strDates <- c("01/28/2020", "05/03/2020")%>%
dates <- as.Date(strDates, "%m/%d/%Y")
The issue here has nothing to do with dates. The use of %>% on line 1 is taking the output of line 1 (in R, assignment to a variable invisibly returns the assigned numbers, which is why chaining assignment works, a <- b <- 2) and injecting it as the first argument in the next function call. With this your code was eventually interpreted as
strDates <- c("01/28/2020", "05/03/2020")%>%
{ dates <- as.Date(., strDates, "%m/%d/%Y") }
which is obviously not what you intended or need. I suspect that this is just an artifact of getting frustrated and was mid-stage converting from a %>% pipe to something else, and you forgot to clean up the %>%s. This could be
dates <- c("01/28/2020", "05/03/2020") %>%
as.Date("%m/%d/%Y")
dates
# [1] "2020-01-28" "2020-05-03"
Your data.frame code seems to work fine, though you do not assign the new Date-assigned values back to the frame. Try this slight adaptation:
df <- data.frame(Date = c("10/9/2009 0:00:00", "10/15/2009 0:00:00"))
df$Date <- as.Date(df$Date, "%m/%d/%Y %H:%M:%S")
df
# Date
# 1 2009-10-09
# 2 2009-10-15
str(df)
# 'data.frame': 2 obs. of 1 variable:
# $ Date: Date, format: "2009-10-09" "2009-10-15"

Error with the “standard unambiguous date” for string-to-date conversion in R

So I am trying this code, which I have used in the past with other data wrangling tasks with no errors:
## Create an age_at_enrollment variable, based on the start_date per individual (i.e. I want to know an individual's age, when they began their healthcare job).
complete_dataset_1 = complete_dataset %>% mutate(age_at_enrollment = (as.Date(start_date)-as.Date(birth_date))/365.25)
However, I keep receiving this error message:
"Error in charToDate(x) : character string is not in a standard unambiguous format"
I believe this error is happening because in the administrative dataset that I am using, the start_date and birth_date variables are formatted in an odd way:
start_date birth_date
2/5/07 0:00 2/28/1992 0:00
I could not find an answer as to why the data is formatted that, so any thoughts on how to fix this issue without altering the original administrative dataset?
The ambiguity in your call to as.Date is whether the day or month comes first. To resolve this, you may use the format parameter of as.Date:
complete_dataset_1 = complete_dataset
%>% mutate(age_at_enrollment = (
as.Date(start_date, format="%m/%d/%Y") -
as.Date(birth_date, format="%m/%d/%Y")) / 365.25)
A more precise way to calculate the diff in years, handling the leap year edge case, would be to use the lubridate package:
library(lubridate)
complete_dataset_1 = complete_dataset
%>% mutate(age_at_enrollment = time_length(difftime(
as.Date(start_date, format="%m/%d/%Y"),
as.Date(birth_date, format="%m/%d/%Y")), "years")

Failed to convert 'x$date' to class “Date” in R

Okay so I have been trying to use this package from Facebook, but for some reason I keep seeing this error.
library(tidyquant)
library(quantmod)
library(prophet)
library(dplyr)
SPY <-tq_get(get = "stock.prices", "SPY", from = "2016-01-01")
df<-select(SPY,c(date,close))
df$date <- as.Date(as.character(df$date),format="%Y-%m-%d")
colnames(df)<-c("ds","y")
m<-prophet(df)
future<-make_future_dataframe(m,periods=52, freq = "d")
forecast <- predict(m,future)
plot(m,forecast)
When I run the plot function, I would see this error message:
Error in as.Date.default(x$date, format = "%d/%m/%Y") : do not know how to convert 'x$date' to class “Date”
I tried using as.Date function, strptime function, and format function but it was in no use.
forecast$ds<-as.Date(paste(forecast$ds),"%Y-%m-%d")
forecast$ds<- format(forecast$ds, "%d/%m/%Y")
forecast$date<-forecast$ds
m$date<-forecast$ds
This didn't work
df$newdate<- strptime(as.character(df$ds),"%Y-%m-%d")
df$newdate<- format(df$newdate, "%d/%m/%Y")
df$newdate<-as.Date(df$newdate)
dp<-data.frame(df$newdate,y)
and this didn't work either. They were some answers provided by other similar postings but I do not really see what is causing the issue. Any help would be appreciated.
The error message is caused by some quirks of as.Date(). The workaround is to save the dataset as a CSV file using write.csv() and then read in again as a CSV using read.csv(). And then use as.Date(). This will eliminate the error message.
Another workaround is to use as.data.frame() first for your entire dataset before using as.Date().
library(lubridate)
df$date <- ymd(df$date) # ymd stands for year, month, date
or
library(anydate)
df$date <- anydate(df$date)
Plotting works afterwards for me.

Find dates that fail to parse in R Lubridate

As a R novice I'm pulling my hair out trying to debug cryptic R errors. I have csv that containing 150k lines that I load into a data frame named 'date'. I then use lubridate to convert this character column to datetimes in hopes of finding min/max date.
dates <- csv[c('datetime')]
dates$datetime <- ymd_hms(dates$datetime)
Running this code I receive the following error message:
Warning message:
3 failed to parse.
I accept this as the CSV could have some janky dates in there and next run:
min(dates$datetime)
max(dates$datetime)
Both of these return NA, which I assume is from the few broken dates still stored in the data frame. I've searched around for a quick fix, and have even tried to build a foreach loop to identify the problem dates, but no luck. What would be a simple way to identify the 3 broken dates?
example date format: 2015-06-17 17:10:16 +0000
Credit to LawyeR and Stibu from above comments:
I first sorted the raw csv column and did a head() & tail() to find
which 3 dates were causing trouble
Alternatively which(is.na(dates$datetime)) was a simple one liner to also find the answer.
Lubridate will throw that error when attempting to parse dates that do not exist because of daylight savings time.
For example:
library(lubridate)
mydate <- strptime('2020-03-08 02:30:00', format = "%Y-%m-%d %H:%M:%S")
ymd_hms(mydate, tz = "America/Denver")
[1] NA
Warning message:
1 failed to parse.
My data comes from an unintelligent sensor which does not know about DST, so impossible (but correctly formatted) dates appear in my timeseries.
If the indices of where lubridate fails are useful to know, you can use a for loop with stopifnot() and print each successful parse.
Make some dates, throw an error in there at a random location.
library(lubridate)
set.seed(1)
my_dates<-as.character(sample(seq(as.Date('1900/01/01'),
as.Date('2000/01/01'), by="day"), 1000))
my_dates[sample(1:length(my_dates), 1)]<-"purpleElephant"
Now use a for loop and print each successful parse with stopifnot().
for(i in 1:length(my_dates)){
print(i)
stopifnot(!is.na(ymd(my_dates[i])))
}
To provide a more generic answer, first filter out the NAs, then try and parse, then filter only the NAs. This will show you the failures. Something like:
dates2 <- dates[!is.na(dates2$datetime)]
dates2$datetime <- ymd_hms(dates2$datetime)
Warning message:
3 failed to parse.
dates2[is.na(dates2$datetime)]
Here is a simple function that solves the generic problem:
parse_ymd = function(x){
d=lubridate::ymd(x, quiet=TRUE)
errors = x[!is.na(x) & is.na(d)]
if(length(errors)>0){
cli::cli_warn("Failed to parse some dates: {.val {errors}}")
}
d
}
x = c("2014/20/21", "2014/01/01", NA, "2014/01/02", "foobar")
my_date = lubridate::ymd(x)
#> Warning: 2 failed to parse.
my_date = parse_ymd(x)
#> Warning: Failed to parse some dates: "2014/20/21" and "foobar"
Created on 2022-09-29 with reprex v2.0.2
Of course, replace ymd() with whatever you want.
Use the truncate argument. The most common type of irregularity in date-time data is the truncation due to rounding or unavailability of the time stamp.
Therefore, try truncated = 1, then potentially go up to truncated = 3:
dates <- csv[c('datetime')]
dates$datetime <- ymd_hms(dates$datetime, truncated = 1)

Resources