"Error in CharToDate(x)" - r

I have a csv data set with a column that contains dates. After importing the data set to R, we need to subset the data set based on certain date range.
app1110 <- read.csv("file_11102015.csv")
app1110$appcom_date2 <- app1110$APPLICATION..COMPLETED..DATE
Then we tried 1)
app1110$appcom_date2 <- format(as.POSIXct(app1110$appcom_date2, format= "%m/%d/%Y"), format="%m/%d/%Y")
subset(app1110, as.Date(appcom_date2 < "12/30/2013"))
The error message:
Error in as.Date.default(appcom_date2 < "12/30/2013") : do not know
how to convert 'appcom_date2 < "12/30/2013"' to class “Date”
So how can I subset data based on the date range?

Without seeing your data, I suspect you need to change this:
as.Date(appcom_date2 < "12/30/2013")
to this:
appcom_date2 < as.Date("12/30/2013", "%M/%d/%Y")
Or better still:
appcom_date2 < as.Date("2013-12-30")
The key point being that you need to coerce the string ("12/30/2013") to a Date object and then make the comparison.

Thanks, the problem was comparing character to date types. This fixed it:
app1110$appcom_date2 <- as.Date(app1110$appcom_date2,
format="%m/%d/%Y") subset(app1110,appcom_date2 < as.Date("2013-12-31")
& appcom_date2 > as.Date("2013-06-01"))
Got another question: when subsetting, I am using appcom_date2 variable as a criteria to set the period. How do I also specify to exclude all NA values from that variable?

Related

R Convert char to time

I have a column of military time values, df1$appt_times in the format of "13:30" All of them, 5 characters, "00:00". I have tried POSIXct but it added today's date to the values. I have also tried lubridate and couldn't get that to work. Most recently I am trying to use chron and am so far unsuccessful at that too
The goal is that once this is done I am going to group the times into factor levels, I cannot perform any conditional operations on them currently, unless I am wrong about that as well ;)
> df1$Time <- chron(times = df1$appt_time)
Error in convert.times(times., fmt) : format h:m:s may be incorrect
In addition: Warning message:
In unpaste(times, sep = fmt$sep, fnames = fmt$periods, nfields = 3) :
106057 entries set to NA due to wrong number of fields
also df1$Time <- chron(times(df1$appt_time)) same error as above
as well as different tries at being explicit with the format:
> df1$appt_time <- chron(df1$appt_time, format = "h:m")
Error in widths[, fmt$periods, drop = FALSE] : subscript out of bounds
I would be very grateful if someone could point out my error or suggest a better way to accomplish this task.
You can use as.POSIXct :
df1$date_time <- as.POSIXct(df1$appt_time, format = '%H:%M', tz = 'UTC')
Since you don't have dates this will assign today's date and time would be according to appt_time.
For example -
as.POSIXct('13:30', format = '%H:%M', tz = 'UTC')
#[1] "2021-02-01 13:30:00 UTC"
One way to overcome this problem if you need to perform arithmetic on the times prior to grouping them is to treat the minutes as a fraction of the hour:
# If you need to do some extra arithmetic prior to coercing to factor:
as.numeric(substr(test1, 1, 2)) + (as.numeric(substr(test1, 4, 5))/60)
# Otherwise:
as.factor(test1)
Where df1$appt_times == test1
test1 <- c('13:30','13:45', '14:00', '14:15', '14:30', '14:45', '15:00')
Not being able to find a solution to work with the time in the way I thought I came up with this DIIIIIRRRRRRRRRRRTY solution.
#converted appt_time to POSIXct format, which added toady's date
df9$appt_time <- as.POSIXct(df9$appt_time, format = '%H:%M')
#Since I am only interesting in creating a value based on if the time falls within a specific range I decided I could output this new value, 'unclassed', to a column and then manually eyeball the values I needed that corresponded to my ranges
df9$convert <- unclass(df9$appt_time)
#Using the, manually obtained, unclassed values I was able create the factor levels I wanted
group_appt_time <- function(convert){
ifelse (convert >= 1612624500 & convert <= 1612637100, 'Morning',
ifelse (convert >= 1612638000 & convert <= 1612647900, 'Mid-Day',
ifelse (convert >= 1612648800 & convert <= 1612658700, 'Afternoon',
'Invalid Time')))
}
df9$appt_time_grouped <- as.factor(group_appt_time(df9$convert))
This is a research project, not something I need to recreate in an ongoing manner so it works

window() function exclude the date sent as end argument, any work around?

I want to use window function to subset a time series. However, the function excludes the date I input as end argument.
window(ts1, end = "2018-09-24")
I couldn't find any argument to change this behavior. Any thought?
The problem arose because of comparing two different types of data, Date and POSIXct.
I solved the issue by finding the indexes of the rows that are after that date and then excluded them from the dataset:
evaluation_date <- "2018-09-24"
indexes_removed <- which(as.numeric(as.Date(index(ts1))) > as.numeric(as.Date(evaluation_date)))
ts1 <- ts1[[-indexes_removed]

R : Can't select xts values between two dates

library(PerformanceAnalytics)
to get the edhec data set
edhec['2000-12-31::2001-12-31',1]
is what I'm trying to obtain.
So far I have tried :
date_begin_test <- as.Date("2000-12-31")
date_end_test <- as.Date("2001-12-31")
I have tried as.POSIXct as well as plain strings
edhec[date_begin_test::date_end_test,1]
edhec[date_begin_test/date_end_test,1]
edhec[paste("'",date_begin_test,'::',date_end_test,"'",sep=''),1]
edhec[noquote(paste("'",date_begin_test,'::',date_end_test,"'",sep='')),1]
The last one is the most puzzling. It gives me every value from the beginning and stops at date_end_test.
You were close, this works:
edhec[paste(date_begin_test, '::', date_end_test, sep = ""), 1]
Personally, I would use:
edhec[paste(date_begin_test, date_end_test, sep="::"), 1]
Or use this:
x.subset=seq.Date(date_begin_test+1,date_end_test+1,by="month")-1
edhec[as.character(x.subset),1]
A slightly different approach with lubridate
require(lubridate)
edhec[index(edhec) %within% (ymd("2000-12-31") %--% ymd("2001-12-31")), 1]

Error in Simple User Define function related to data conversion in R

The purpose of this very simple function is just to transform a date column to a date variable and a numeric time (hourly) column to a factor variable, which will be used with plyr later in the code.
I can get this code to run successfully in the command line, but when I attempt to run it in the function I get an error.
# setting up some fake data
set.seed(31)
foo <- function(myHour, myDate){
rlnorm(1, meanlog=0,sdlog=1)*(myHour) + (150*myDate)
}
Hour <- 1:24
Day <-1:1080
dates <-seq(as.Date("2010-01-01"), by = "day", length.out= 1080)
myData <- expand.grid( Day, Hour)
names(myData) <- c("Date","Hour")
myData$Adspend <- apply(myData, 1, function(x) foo(x[2], x[1]))
myData$Date <-dates
myData$Demand <-(rnorm(1,mean = 0, sd=1)+.75*myData$Adspend)
#############################################################
myData
# Function Creation
AddCal <-function(DF,Date,Time) {
DF$Date<-as.Date(DF$Date, format="%m/%d/%Y")#Change Date variable into a date type
DF$Time<-factor(DF$Time,levels=`c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24))
}
#Test Function
Bob<-AddCal(myData,Date,Hour)
#Error I receive
Error in `$<-.data.frame`(`*tmp*`, "Time", value = integer(0)) :
replacement has 0 rows, data has 25920
I spent about 2 hours searching for answers and trying different things. Because I can run the individual lines of code at the command line and get the desired result, I am assuming this is an advanced coding problem beyond my novice capabilities.
In your function, replace all instances DF$Time with DF[[Time]] same for DF$Date.
Also see the two comments below from #Dwin & #mrip:
Make sure to return a value
Make sure to pass string arguments where strings are expected
What's going on:
When you use DF$Time, R is looking for a column named Time in DF. It is not treating Time as the string variable that you expect.
DF[[Time]] on the other hand does treat Time as a variable.
The reason the error only refers to Time and not Date is because Date is both the name of your variable and the name of a column in DF. (If in your function call you would have used something like AddCal(.. Date=Demand) or whatever other column name, you would not get back the results you would expect)
Side Note:
c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24)
is equivalent to
seq(24) and to 1:24

How to validate date in R

I have a date in the format dd-mm-yyyy HH:mm:ss
What is the best and easiest way to validate this date?
I tried
d <- format.Date(date, format="%d-%m-%Y %H:%M:%S")
But how can I catch the error when an illegal date is passed?
Simple way:
d <- try(as.Date(date, format="%d-%m-%Y %H:%M:%S"))
if("try-error" %in% class(d) || is.na(d)) {
print("That wasn't correct!")
}
Explanation: format.Date uses as.Date internally to convert date into an object of the Date class. However, it does not use a format option, so as.Date uses the default format, which is %Y-%m-%dand then %Y/%m/%d.
The format option from format.Date is used only for the output, not for the parsing. Quoting from the as.Date man page:
The ‘as.Date’ methods accept character strings, factors, logical
‘NA’ and objects of classes ‘"POSIXlt"’ and ‘"POSIXct"’. (The
last is converted to days by ignoring the time after midnight in
the representation of the time in specified timezone, default
UTC.) Also objects of class ‘"date"’ (from package ‘date’) and
‘"dates"’ (from package ‘chron’). Character strings are processed
as far as necessary for the format specified: any trailing
characters are ignored.
However, when you directly call as.Date with a format specification, nothing else will be allowed than what fits your format.
See also: ?as.Date
You may want to look at the gsubfn package. This has functions (gsubfn specifically) that work like other regular expression functions to match pieces to a string, but then it calls a user supplied function and passes the matching pieces to this function. So you would write your own function that looks at the year, moth, and day and makes sure that they are in the correct ranges (and the range for day can depend on the passed month and year.
This might be helpful if flexibility is desired in a date-time entry.
I have a function where I want to allow either a date-only entry or a date-time entry, then set a flag - for use inside the function only. I'm calling this flag data_type. The flag will be used later in the larger function to select units for getting a difference in two dates with difftime. (In most cases, the function will be perfectly fine with date only, but in some cases a user might need a shorter time frame. I don't want to inconvenience users with the shorter time frame if they don't need it.)
I am posting this for two reasons: 1) to help anyone trying to allow flexibility in date arguments and 2) to welcome sanity checks in case there's a problem with the method, since this is going into a function in an R package.
dat_time_check_fn <- function(dat_time) {
if (!anyNA(as.Date(dat_time, format= "%Y-%m-%d %H:%M:%S"))) date_type <- 1
else if (!anyNA(as.Date(dat_time, format= "%Y-%m-%d"))) date_type <- 2
else stop("Error: dates must either be in format '1999-12-31' or '1999-12-31 23:59:59' ")
date_type
}
Date-time case
date5 <- "1999-12-31 23:59:59"
date_type <- dat_time_check_fn(date5)
date_type
[1] 1
Date only case:
date6 <- "1999-12-31"
date_type <- dat_time_check_fn(date6)
date_type
[1] 2
Note that if the order above in the function is reversed, the longer date-time can be inadvertently coerced to the shorter version and both types result in date_type = 1.
My larger function has more than one date, but I need them to be compatible. Below, I'm checking the two dates checked above, where one was type 1 and one was type 2. Combining types gives the result with date only (type 2):
date_type <- dat_time_check_fn(c(date5, date6))
date_type
[1] 2
Here's a non-compliant version:
date7 <- "1/31/2011"
date_type <- dat_time_check_fn(date7)
Error in dat_time_check_fn(date7) :
Error: dates must either be in format '1999-12-31' or '1999-12-31 23:59:59'
Many solutions here are prone to SQL injection. They return TRUE for date = "2020-08-11; DROP * FROM my_table". Here is a vectorized base R function that works with NA:
is_date = function(x, format = NULL) {
formatted = try(as.Date(x, format), silent = TRUE)
is_date = as.character(formatted) == x & !is.na(formatted) # valid and identical to input
is_date[is.na(x)] = NA # Insert NA for NA in x
return(is_date)
}
Let's try:
> is_date(c("2020-08-11", "2020-13-32", "2020-08-11; DROP * FROM table", NA), format = "%Y-%m-%d")
## TRUE FALSE FALSE NA
I believe that what you are looking for is the tryCatch function.
The following as an excerpt from a script I wrote which accepts any .csv file with two series that have a common x axis. The first column in 'data' is the common x axis variable, and columns 2 & 3 are the y axis variables. I needed the tryCatch statement to make sure the script would create a plot regardless of whether the x axis data is a time series, or some other type of variable.
### READ DATA FROM A CSV FILE
data = read.csv("STLDvsNEM2.csv", header = TRUE)
#CONVERT FIRST ROW OF DATA (IN MY CASE, THE COLUMN INTENDED TO BE THE X AXIS)
#TO AN ACCEPTABLE DATE FORMAT
#IF FIRST ROW OF DATA IS NOT IN AN ACCEPTABLE DATE FORMAT
#USE THE VALUE WITHOUT ANY TRANSFORMATION
x <- tryCatch({
as.Date(data[,1])},
warning = function(w) {},
error = function(e) {
x <- data[,1]
})
y1 <- data[,2]
y2 <- data[,3]

Resources