Failed to convert 'x$date' to class “Date” in R - r

Okay so I have been trying to use this package from Facebook, but for some reason I keep seeing this error.
library(tidyquant)
library(quantmod)
library(prophet)
library(dplyr)
SPY <-tq_get(get = "stock.prices", "SPY", from = "2016-01-01")
df<-select(SPY,c(date,close))
df$date <- as.Date(as.character(df$date),format="%Y-%m-%d")
colnames(df)<-c("ds","y")
m<-prophet(df)
future<-make_future_dataframe(m,periods=52, freq = "d")
forecast <- predict(m,future)
plot(m,forecast)
When I run the plot function, I would see this error message:
Error in as.Date.default(x$date, format = "%d/%m/%Y") : do not know how to convert 'x$date' to class “Date”
I tried using as.Date function, strptime function, and format function but it was in no use.
forecast$ds<-as.Date(paste(forecast$ds),"%Y-%m-%d")
forecast$ds<- format(forecast$ds, "%d/%m/%Y")
forecast$date<-forecast$ds
m$date<-forecast$ds
This didn't work
df$newdate<- strptime(as.character(df$ds),"%Y-%m-%d")
df$newdate<- format(df$newdate, "%d/%m/%Y")
df$newdate<-as.Date(df$newdate)
dp<-data.frame(df$newdate,y)
and this didn't work either. They were some answers provided by other similar postings but I do not really see what is causing the issue. Any help would be appreciated.

The error message is caused by some quirks of as.Date(). The workaround is to save the dataset as a CSV file using write.csv() and then read in again as a CSV using read.csv(). And then use as.Date(). This will eliminate the error message.
Another workaround is to use as.data.frame() first for your entire dataset before using as.Date().

library(lubridate)
df$date <- ymd(df$date) # ymd stands for year, month, date
or
library(anydate)
df$date <- anydate(df$date)
Plotting works afterwards for me.

Related

how can I convert number to date?

I have a problem with the as.date function.
I have a list of normal date shows in the excel, but when I import it in R, it becomes numbers, like 33584. I understand that it counts since a specific day. I want to set up my date in the form of "dd-mm-yy".
The original data is:
how the "date" variable looks like in r
I've tried:
as.date <- function(x, origin = getOption(date.origin)){
origin <- ifelse(is.null(origin), "1900-01-01", origin)
as.Date(date, origin)
}
and also simply
as.Date(43324, origin = "1900-01-01")
but none of them works. it shows the error: do not know how to convert '.' to class “Date”
Thank you guys!
The janitor package has a pair of functions designed to deal with reading Excel dates in R. See the following links for usage examples:
https://www.rdocumentation.org/packages/janitor/versions/2.0.1/topics/excel_numeric_to_date
https://www.rdocumentation.org/packages/janitor/versions/2.0.1/topics/convert_to_date
janitor::excel_numeric_to_date(43324)
[1] "2018-08-12"
I've come across excel sheets read in with readxl::read_xls() that read date columns in as strings like "43488" (especially when there is a cell somewhere else that has a non-date value). I use
xldate<- function(x) {
xn <- as.numeric(x)
x <- as.Date(xn, origin="1899-12-30")
}
d <- data.frame(date=c("43488"))
d$actual_date <- xldate(d$date)
print(d$actual_date)
# [1] "2019-01-23"
Dates are notoriously annoying. I would highly recommend the lubridate package for dealing with them. https://lubridate.tidyverse.org/
Use as_date() from lubridate to read numeric dates if you need to.
You can use format() to put it in dd-mm-yy.
library(lubridate)
date_vector <- as_date(c(33584, 33585), origin = lubridate::origin)
formatted_date_vector <- format(date_vector, "%d-%m-%y")

How to Convert Date in "01MAR1978:00:00:00" string format to Date Format in SparkR?

I have dates in the following formats:
08MAR1978:00:00:00
10FEB1973:00:00:00
15AUG1982:00:00:00
I would like to convert them to:
1978-03-08
1973-02-10
1982-09-15
I have tried the following in SparkR:
period_uts <- unix_timestamp(all.new$DATE_OF_BIRTH, '%d%b%Y:%H:%M:%S')
period_ts <- cast(period_uts, 'timestamp')
period_dt <- cast(period_ts, 'date')
df <- withColumn(all.new, 'p_dt', period_dt)
But when I do this, all the dates get changed into "NA".
Can anyone please provide some insights on how I can convert dates in %d%B%Y:%H:%M:%S format to dates in SparkR?
Thanks!
I don't think you need SparkR to solve this question.
What you have:
DoB <- c("08MAR1978:00:00:00", "10FEB1973:00:00:00", "15AUG1982:00:00:00")
If you want to get 1978-03-08 etc. you could just use as.Date in combination with the date format you already found yourself:
as.Date(DoB, format="%d%B%Y:%H:%M:%S")
# [1] "1978-03-08" "1973-02-10" "1982-08-15"
as.Date will ensure that R knows how to interpret your string as a date.
Note, however, that in general the way dates are displayed to you (i.e. 1978-03-08) actually don't really matter. The reason is that 'under the hood', R understands your date now, so all date-related operations will be performed appropriately.
I figured out how to do it:
all.new = all.new %>% withColumn("Date_of_Birth_Fixed", to_date(.$DATE_OF_BIRTH, "ddMMMyyyy"))
This works in Spark 2.2.x

R date conversion issue

I'm trying to convert my date column from character to date format, which I thought should be dead easy using:
datetest <- as.Date(CAT$Date, "%Y-%m-%d")
but it returns:
Error in as.Date.default(CAT$Date, "%Y-%m-%d") : do not know how to
convert 'CAT$Date' to class “Date”
I have also tried: datetest <- as.Date(CAT[["Date"]], "%Y-%m-$d")
but get the same error message.
Really not sure why it doesn't like it, any help to a complete newbie would be appreciated! Thanks.
first check the class of CAT$Date, then do the following:
datetest <- as.Date(as.character(unlist(CAT$Date)), "%Y-%m-%d")

mdy {lubridate} unable to identify "January"

I am working with a list of birth dates in the format "January131973". To get the dates from this string, I am using mdy function from lubridate library. Strangely the code was returning NA only for dates in January, but working fine for other months, as below
> mdy("January131973")
[1] NA
Warning message:
All formats failed to parse. No formats found.
> mdy("April241973")
[1] "1973-04-24"
The data spans across all months and dates, and years ranging from 1971 to 1990. But the error occurs only for dates in January. I have worked around the input string to get "13January1973" and proceeded with dmy function instead, which has resolved the issue at hand. (ymd also works perfectly fine.)
However, if it can be verified that I am not overlooking any underlying conflicts etc, it will be helpful the next times, and can also help identify unseen issues elsewhere.
Here is a test code I have tried out to check different combinations
library(tidyr)
library(lubridate)
x <- data.frame(mmm=month.name, dd=c(15:26), yyyy=c(1973:1984))
x_mdy <- unite(x, test, mmm,dd,yyyy, sep = "",remove = FALSE)
lapply(x_mdy$test, mdy)
x_dmy <- unite(x, test, dd,mmm,yyyy, sep = "", remove = FALSE)
lapply(x_dmy$test, dmy)
x_ymd <- unite(x, test, yyyy,mmm,dd, sep = "", remove = FALSE)
lapply(x_ymd$test, ymd)
After running the above code, I have faced the issue only while using mdy with "January". Also note that abbreviated form of the month name also gives the same error (mmm=month.abb in the above df creation.)
Any clarification of this behavior appreciated.

Find dates that fail to parse in R Lubridate

As a R novice I'm pulling my hair out trying to debug cryptic R errors. I have csv that containing 150k lines that I load into a data frame named 'date'. I then use lubridate to convert this character column to datetimes in hopes of finding min/max date.
dates <- csv[c('datetime')]
dates$datetime <- ymd_hms(dates$datetime)
Running this code I receive the following error message:
Warning message:
3 failed to parse.
I accept this as the CSV could have some janky dates in there and next run:
min(dates$datetime)
max(dates$datetime)
Both of these return NA, which I assume is from the few broken dates still stored in the data frame. I've searched around for a quick fix, and have even tried to build a foreach loop to identify the problem dates, but no luck. What would be a simple way to identify the 3 broken dates?
example date format: 2015-06-17 17:10:16 +0000
Credit to LawyeR and Stibu from above comments:
I first sorted the raw csv column and did a head() & tail() to find
which 3 dates were causing trouble
Alternatively which(is.na(dates$datetime)) was a simple one liner to also find the answer.
Lubridate will throw that error when attempting to parse dates that do not exist because of daylight savings time.
For example:
library(lubridate)
mydate <- strptime('2020-03-08 02:30:00', format = "%Y-%m-%d %H:%M:%S")
ymd_hms(mydate, tz = "America/Denver")
[1] NA
Warning message:
1 failed to parse.
My data comes from an unintelligent sensor which does not know about DST, so impossible (but correctly formatted) dates appear in my timeseries.
If the indices of where lubridate fails are useful to know, you can use a for loop with stopifnot() and print each successful parse.
Make some dates, throw an error in there at a random location.
library(lubridate)
set.seed(1)
my_dates<-as.character(sample(seq(as.Date('1900/01/01'),
as.Date('2000/01/01'), by="day"), 1000))
my_dates[sample(1:length(my_dates), 1)]<-"purpleElephant"
Now use a for loop and print each successful parse with stopifnot().
for(i in 1:length(my_dates)){
print(i)
stopifnot(!is.na(ymd(my_dates[i])))
}
To provide a more generic answer, first filter out the NAs, then try and parse, then filter only the NAs. This will show you the failures. Something like:
dates2 <- dates[!is.na(dates2$datetime)]
dates2$datetime <- ymd_hms(dates2$datetime)
Warning message:
3 failed to parse.
dates2[is.na(dates2$datetime)]
Here is a simple function that solves the generic problem:
parse_ymd = function(x){
d=lubridate::ymd(x, quiet=TRUE)
errors = x[!is.na(x) & is.na(d)]
if(length(errors)>0){
cli::cli_warn("Failed to parse some dates: {.val {errors}}")
}
d
}
x = c("2014/20/21", "2014/01/01", NA, "2014/01/02", "foobar")
my_date = lubridate::ymd(x)
#> Warning: 2 failed to parse.
my_date = parse_ymd(x)
#> Warning: Failed to parse some dates: "2014/20/21" and "foobar"
Created on 2022-09-29 with reprex v2.0.2
Of course, replace ymd() with whatever you want.
Use the truncate argument. The most common type of irregularity in date-time data is the truncation due to rounding or unavailability of the time stamp.
Therefore, try truncated = 1, then potentially go up to truncated = 3:
dates <- csv[c('datetime')]
dates$datetime <- ymd_hms(dates$datetime, truncated = 1)

Resources