Smartbind date format and error in R - r

I'm getting an error using smartbind to append two datasets. First, I'm pretty sure the error I'm getting:
> Error in as.vector(x, mode) : invalid 'mode' argument
is coming from the date variable in both datasets. The date variable in it's raw format is such: month/day/year. I transformed the variable after importing the data using as.Date and format
> rs.month$xdeeddt <- as.Date(rs.month$xdeeddt, "%m/%d/%Y")
> rs.month$deed.year <- as.numeric(format(rs.month$xdeeddt, format = "%Y"))
> rs.month$deed.day <- as.numeric(format(rs.month$xdeeddt, format = "%d"))
> rs.month$deed.month <- as.numeric(format(rs.month$xdeeddt, format = "%m"))
The resulting date variable is as such:
> [1] "2014-03-01" "2014-03-13" "2014-01-09" "2013-10-09"
The transformation for the date was applied to both datasets (the format of the raw data was identical for both datasets). When I try to use smartbind, from the gtools package, to append the two datasets it returns with the error above. I removed the date, month, day, and year variables from both datasets and was able to append the datasets successfully with smartbind.
Any suggestions on how I can append the datasets with the date variables.....?

I came here after googling for the same error message during a smartbind of two data frames. The discussion above, while not so conclusive about a solution, definitely helped me move through this error.
Both my data frames contain POSIXct date objects. Those are just a numeric vector of UNIXy seconds-since-epoch, along with a couple of attributes that provide the structure needed to interpret the vector as a date object. The solution is simply to strip the attributes from that variable, perform the smartbind, and then restore the attributes:
these.atts <- attributes(df1$date)
attributes(df1$date) <- NULL
attributes(df2$date) <- NULL
df1 <- smartbind(df1,df2)
attributes(df1$date) <- these.atts
I hope this helps someone, sometime.
-Andy

Related

how can I convert number to date?

I have a problem with the as.date function.
I have a list of normal date shows in the excel, but when I import it in R, it becomes numbers, like 33584. I understand that it counts since a specific day. I want to set up my date in the form of "dd-mm-yy".
The original data is:
how the "date" variable looks like in r
I've tried:
as.date <- function(x, origin = getOption(date.origin)){
origin <- ifelse(is.null(origin), "1900-01-01", origin)
as.Date(date, origin)
}
and also simply
as.Date(43324, origin = "1900-01-01")
but none of them works. it shows the error: do not know how to convert '.' to class “Date”
Thank you guys!
The janitor package has a pair of functions designed to deal with reading Excel dates in R. See the following links for usage examples:
https://www.rdocumentation.org/packages/janitor/versions/2.0.1/topics/excel_numeric_to_date
https://www.rdocumentation.org/packages/janitor/versions/2.0.1/topics/convert_to_date
janitor::excel_numeric_to_date(43324)
[1] "2018-08-12"
I've come across excel sheets read in with readxl::read_xls() that read date columns in as strings like "43488" (especially when there is a cell somewhere else that has a non-date value). I use
xldate<- function(x) {
xn <- as.numeric(x)
x <- as.Date(xn, origin="1899-12-30")
}
d <- data.frame(date=c("43488"))
d$actual_date <- xldate(d$date)
print(d$actual_date)
# [1] "2019-01-23"
Dates are notoriously annoying. I would highly recommend the lubridate package for dealing with them. https://lubridate.tidyverse.org/
Use as_date() from lubridate to read numeric dates if you need to.
You can use format() to put it in dd-mm-yy.
library(lubridate)
date_vector <- as_date(c(33584, 33585), origin = lubridate::origin)
formatted_date_vector <- format(date_vector, "%d-%m-%y")

Problems with parse_date_time converting a character vector

I have an imported CSV in R which contains a column of dates and times - this is imported into R as character. The format is "30/03/2020 08:59". I want to convert these strings into a format that allows me to work on them. For simplicity I have made a dataframe which has a single column of these dates (854) in this format.
I'm trying to use the parse_date_time function from lubridate.
It works fine when I reference a single value, e.g.
b=parse_date_time(consults_dates[3,1],orders="dmy HM")
gives b=2020-03-30 09:08:00
However, when I try to perform this on the entire(consults_dates), I get an error, e.g.
c= parse_date_time(consults_dates,orders="dmy HM") gives error:
Warning message:
All formats failed to parse. No formats found.
Apologies - if this is blatantly a simple question, day 1 of R after years of Matlab.
You need to pass the column to parse_date_time function and not the entire dataframe.
library(lubridate)
consults_dates$colum_name <- parse_date_time(consults_dates$colum_name, "dmy HM")
However, if you have only one format in the column you can use dmy_hm
consults_dates$colum_name <- dmy_hm(consults_dates$colum_name)
In base R, we can use :
consults_dates$colum_name <- as.POSIXct(consults_dates$colum_name,
format = "%d/%m/%Y %H:%M", tz = "UTC")

quantmod difficulty loading data in correct format

I am very new to R, I watched a youtube video to do various time series analysis, but it downloaded data from yahoo - my data is in Excel. I wanted to follow the same analysis, but with data from an excel.csv file. I spent two days finding out that the date must be in USA style. Now I am stuck again on a basic step - loading the data so it can be analysed - this seems to be the biggest hurdle with R. Please can someone give me some guidance on why the command shown below does not do the returns for the complete column set. I tried the zoo format, but it didn't work, then I tried xts and it worked partially. I suspect the original import from excel is the major problem. Can I get some guidance please
> AllPrices <- as.zoo(AllPrices)
> head(AllPrices)
Index1 Index2 Index3 Index4 Index5 Index6 Index7 Index8 Index9 Index10
> AllRets <- dailyReturn(AllPrices)
Error in NextMethod("[<-") : incorrect number of subscripts on matrix
> AllPrices<- as.xts(AllPrices)
> AllRets <- dailyReturn(AllPrices)
> head(AllRets)
daily.returns
2012-11-06 0.000000e+00
2012-11-07 -2.220249e-02
2012-11-08 1.379504e-05
2012-11-09 2.781961e-04
2012-11-12 -2.411128e-03
2012-11-13 7.932869e-03
Try to load your data using the readr package.
library(readr)
Then, look at the documentation by running ?read_csv in the console.
I recommend reading in your data this way. Specify the column types. For instance, if your first column is the date, read it in as a character "c" and if your other columns are numeric use "n".
data <- read_csv('YOUR_DATA.csv', col_types = "cnnnnn") # date in left column, 5 numeric columns
data$Dates <- as.Date(data$Dates, format = "%Y-%m-%d") # make the dates column a date class (you need to update "Dates" to be your column name for the Dates column, you may need to change the format
data <- as.data.frame(data) # turn the result into a dataframe
data <- xts(data[,-1], order.by = XAU[,1]) # then make an xts, data is everything but the date column, order.by is the date column

subset data based on y/m/d/h and get error ouputs [duplicate]

I have a dataset called EPL2011_12. I would like to make new a dataset by subsetting the original by date. The dates are in the column named Date The dates are in DD-MM-YY format.
I have tried
EPL2011_12FirstHalf <- subset(EPL2011_12, Date > 13-01-12)
and
EPL2011_12FirstHalf <- subset(EPL2011_12, Date > "13-01-12")
but get this error message each time.
Warning message:
In Ops.factor(Date, 13- 1 - 12) : > not meaningful for factors
I guess that means R is treating like text instead of a number and that why it won't work?
Well, it's clearly not a number since it has dashes in it. The error message and the two comments tell you that it is a factor but the commentators are apparently waiting and letting the message sink in. Dirk is suggesting that you do this:
EPL2011_12$Date2 <- as.Date( as.character(EPL2011_12$Date), "%d-%m-%y")
After that you can do this:
EPL2011_12FirstHalf <- subset(EPL2011_12, Date2 > as.Date("2012-01-13") )
R date functions assume the format is either "YYYY-MM-DD" or "YYYY/MM/DD". You do need to compare like classes: date to date, or character to character. And if you were comparing character-to-character, then it's only going to be successful if the dates are in the YYYYMMDD format (with identical delimiters if any delimiters are used).
The first thing you should do with date variables is confirm that R reads it as a Date. To do this, for the variable (i.e. vector/column) called Date, in the data frame called EPL2011_12, input
class(EPL2011_12$Date)
The output should read [1] "Date". If it doesn't, you should format it as a date by inputting
EPL2011_12$Date <- as.Date(EPL2011_12$Date, "%d-%m-%y")
Note that the hyphens in the date format ("%d-%m-%y") above can also be slashes ("%d/%m/%y"). Confirm that R sees it as a Date. If it doesn't, try a different formatting command
EPL2011_12$Date <- format(EPL2011_12$Date, format="%d/%m/%y")
Once you have it in Date format, you can use the subset command, or you can use brackets
WhateverYouWant <- EPL2011_12[EPL2011_12$Date > as.Date("2014-12-15"),]

Find dates that fail to parse in R Lubridate

As a R novice I'm pulling my hair out trying to debug cryptic R errors. I have csv that containing 150k lines that I load into a data frame named 'date'. I then use lubridate to convert this character column to datetimes in hopes of finding min/max date.
dates <- csv[c('datetime')]
dates$datetime <- ymd_hms(dates$datetime)
Running this code I receive the following error message:
Warning message:
3 failed to parse.
I accept this as the CSV could have some janky dates in there and next run:
min(dates$datetime)
max(dates$datetime)
Both of these return NA, which I assume is from the few broken dates still stored in the data frame. I've searched around for a quick fix, and have even tried to build a foreach loop to identify the problem dates, but no luck. What would be a simple way to identify the 3 broken dates?
example date format: 2015-06-17 17:10:16 +0000
Credit to LawyeR and Stibu from above comments:
I first sorted the raw csv column and did a head() & tail() to find
which 3 dates were causing trouble
Alternatively which(is.na(dates$datetime)) was a simple one liner to also find the answer.
Lubridate will throw that error when attempting to parse dates that do not exist because of daylight savings time.
For example:
library(lubridate)
mydate <- strptime('2020-03-08 02:30:00', format = "%Y-%m-%d %H:%M:%S")
ymd_hms(mydate, tz = "America/Denver")
[1] NA
Warning message:
1 failed to parse.
My data comes from an unintelligent sensor which does not know about DST, so impossible (but correctly formatted) dates appear in my timeseries.
If the indices of where lubridate fails are useful to know, you can use a for loop with stopifnot() and print each successful parse.
Make some dates, throw an error in there at a random location.
library(lubridate)
set.seed(1)
my_dates<-as.character(sample(seq(as.Date('1900/01/01'),
as.Date('2000/01/01'), by="day"), 1000))
my_dates[sample(1:length(my_dates), 1)]<-"purpleElephant"
Now use a for loop and print each successful parse with stopifnot().
for(i in 1:length(my_dates)){
print(i)
stopifnot(!is.na(ymd(my_dates[i])))
}
To provide a more generic answer, first filter out the NAs, then try and parse, then filter only the NAs. This will show you the failures. Something like:
dates2 <- dates[!is.na(dates2$datetime)]
dates2$datetime <- ymd_hms(dates2$datetime)
Warning message:
3 failed to parse.
dates2[is.na(dates2$datetime)]
Here is a simple function that solves the generic problem:
parse_ymd = function(x){
d=lubridate::ymd(x, quiet=TRUE)
errors = x[!is.na(x) & is.na(d)]
if(length(errors)>0){
cli::cli_warn("Failed to parse some dates: {.val {errors}}")
}
d
}
x = c("2014/20/21", "2014/01/01", NA, "2014/01/02", "foobar")
my_date = lubridate::ymd(x)
#> Warning: 2 failed to parse.
my_date = parse_ymd(x)
#> Warning: Failed to parse some dates: "2014/20/21" and "foobar"
Created on 2022-09-29 with reprex v2.0.2
Of course, replace ymd() with whatever you want.
Use the truncate argument. The most common type of irregularity in date-time data is the truncation due to rounding or unavailability of the time stamp.
Therefore, try truncated = 1, then potentially go up to truncated = 3:
dates <- csv[c('datetime')]
dates$datetime <- ymd_hms(dates$datetime, truncated = 1)

Resources