I have a column of military time values, df1$appt_times in the format of "13:30" All of them, 5 characters, "00:00". I have tried POSIXct but it added today's date to the values. I have also tried lubridate and couldn't get that to work. Most recently I am trying to use chron and am so far unsuccessful at that too
The goal is that once this is done I am going to group the times into factor levels, I cannot perform any conditional operations on them currently, unless I am wrong about that as well ;)
> df1$Time <- chron(times = df1$appt_time)
Error in convert.times(times., fmt) : format h:m:s may be incorrect
In addition: Warning message:
In unpaste(times, sep = fmt$sep, fnames = fmt$periods, nfields = 3) :
106057 entries set to NA due to wrong number of fields
also df1$Time <- chron(times(df1$appt_time)) same error as above
as well as different tries at being explicit with the format:
> df1$appt_time <- chron(df1$appt_time, format = "h:m")
Error in widths[, fmt$periods, drop = FALSE] : subscript out of bounds
I would be very grateful if someone could point out my error or suggest a better way to accomplish this task.
You can use as.POSIXct :
df1$date_time <- as.POSIXct(df1$appt_time, format = '%H:%M', tz = 'UTC')
Since you don't have dates this will assign today's date and time would be according to appt_time.
For example -
as.POSIXct('13:30', format = '%H:%M', tz = 'UTC')
#[1] "2021-02-01 13:30:00 UTC"
One way to overcome this problem if you need to perform arithmetic on the times prior to grouping them is to treat the minutes as a fraction of the hour:
# If you need to do some extra arithmetic prior to coercing to factor:
as.numeric(substr(test1, 1, 2)) + (as.numeric(substr(test1, 4, 5))/60)
# Otherwise:
as.factor(test1)
Where df1$appt_times == test1
test1 <- c('13:30','13:45', '14:00', '14:15', '14:30', '14:45', '15:00')
Not being able to find a solution to work with the time in the way I thought I came up with this DIIIIIRRRRRRRRRRRTY solution.
#converted appt_time to POSIXct format, which added toady's date
df9$appt_time <- as.POSIXct(df9$appt_time, format = '%H:%M')
#Since I am only interesting in creating a value based on if the time falls within a specific range I decided I could output this new value, 'unclassed', to a column and then manually eyeball the values I needed that corresponded to my ranges
df9$convert <- unclass(df9$appt_time)
#Using the, manually obtained, unclassed values I was able create the factor levels I wanted
group_appt_time <- function(convert){
ifelse (convert >= 1612624500 & convert <= 1612637100, 'Morning',
ifelse (convert >= 1612638000 & convert <= 1612647900, 'Mid-Day',
ifelse (convert >= 1612648800 & convert <= 1612658700, 'Afternoon',
'Invalid Time')))
}
df9$appt_time_grouped <- as.factor(group_appt_time(df9$convert))
This is a research project, not something I need to recreate in an ongoing manner so it works
Related
I have something like this within a function:
x <- as.POSIXct((substr((dataframe[z, ])$variable, 1, 8)), tz = "GMT",
format = "%H:%M:%S")
print(x)
if ( (x >= as.POSIXct("06:00:00", tz = "GMT", format = "%H:%M:%S")) &
(x < as.POSIXct("12:00:00", tz = "GMT", format = "%H:%M:%S")) ){
position <- "first"
}
but I get this output:
character(0)
Error in if ((as.numeric(departure) - as.numeric(arrival)) < 0) { : argument is of length zero
how can I fix this so my comparison works and it prints the correct thing?
some examples of the dataframe$variable column:
16:33:00
15:34:00
14:51:00
07:26:00
05:48:00
11:10:00
17:48:00
06:17:00
08:22:00
11:31:00
Welcome to Stack Overflow!
First, the reason you've gotten some down votes is most likely because you haven't given much in your question to go on. For one thing, you haven't shown us what
(dataframe[z, ])$variable
is, which makes it hard for us to formulate a complete answer. You seem to be trying to extract a single value from a dataframe, is that right? If so, I've never seen it done that way, try replacing the above with:
dataframe$variable[z]
My guess is what you're trying to achieve is a comparison of an entire column of the dataframe called "variable", since that's generally more useful...
Having said that, I often come up against issues with time data, and from what I've heard, my experiences are not uncommon. When I'm dealing with just times, as it appears you are here, I prefer the chron::times format over POSIXct (POSIX is a date-time format, so a date is always included, it also tries to correct for timezone changes, as well as daylight savings changes, which tends to get in my way more than help). If you've got your data in the format you've specified in your first as.POSIXct call, you won't even need to specify that in calling the times function instead.
x <- chron::times( dataframe$variable )
print(x)
position <- ifelse ( x >= chron::times( "06:00:00" ) &
x < chron::times( "12:00:00" ),
"first", "not first"
)
This will output a vector "position", with a result for all values taken from dataframe$variable. Does that achieve what you're hoping for?
From here, if you did want to extract the comparison result for the particular row "z" in dataframe, you can still do that with
position[z]
EDIT to add:
It might be worth checking for missing values in "variable". This should return TRUE:
sum( is.na( dataframe$variable ) ) == 0
Also check for any that aren't correctly formatted. Again, this should return TRUE:
sum( is.na( chron::times( dataframe$variable ) ) ) == 0
EDIT to add:
As per the comments, it looks like some values in your "variables" column aren't converting properly. You should be able to find them with
subset( dataframe, is.na( chron::times( variable ) ) )
That should let you see what's wrong. It may be a single cell, or it may be a number of them. You'll need to tidy up that data, which you can do in a few ways. You could go through and fix them manually, you could add a function in your script to repair them before the conversion (this might be a good idea if there is a common issue between all of those values, or if you expect the same issue to happen again as new data comes in, if indeed you need to allow for that).
The other option is simply to exclude those rows from your analysis. If you go this route, make sure it's appropriate to the analysis you're running. If it is appropriate in your case, you can add a step to clean up the dataframe before running the steps in your question:
dataframe <- subset( dataframe, !is.na( chron::times( variable ) ) )
NOTE: there's a good chance this will come up with a warning. If you run the same line twice, and the warning goes away the second time (after the offending rows have been removed), you may need to look further into it.
That should drop the offending values, leaving only values that are properly converting to the times format, which should help with the steps you're trying to run. Check how your dataframe dimensions change before and after that step; that'll tell you how many rows you're dropping.
You could do the same thing with POSIXct if that's what you're comfortable with, I'm just personally more comfortable with times for what you're doing.
I have a csv data set with a column that contains dates. After importing the data set to R, we need to subset the data set based on certain date range.
app1110 <- read.csv("file_11102015.csv")
app1110$appcom_date2 <- app1110$APPLICATION..COMPLETED..DATE
Then we tried 1)
app1110$appcom_date2 <- format(as.POSIXct(app1110$appcom_date2, format= "%m/%d/%Y"), format="%m/%d/%Y")
subset(app1110, as.Date(appcom_date2 < "12/30/2013"))
The error message:
Error in as.Date.default(appcom_date2 < "12/30/2013") : do not know
how to convert 'appcom_date2 < "12/30/2013"' to class “Date”
So how can I subset data based on the date range?
Without seeing your data, I suspect you need to change this:
as.Date(appcom_date2 < "12/30/2013")
to this:
appcom_date2 < as.Date("12/30/2013", "%M/%d/%Y")
Or better still:
appcom_date2 < as.Date("2013-12-30")
The key point being that you need to coerce the string ("12/30/2013") to a Date object and then make the comparison.
Thanks, the problem was comparing character to date types. This fixed it:
app1110$appcom_date2 <- as.Date(app1110$appcom_date2,
format="%m/%d/%Y") subset(app1110,appcom_date2 < as.Date("2013-12-31")
& appcom_date2 > as.Date("2013-06-01"))
Got another question: when subsetting, I am using appcom_date2 variable as a criteria to set the period. How do I also specify to exclude all NA values from that variable?
library(PerformanceAnalytics)
to get the edhec data set
edhec['2000-12-31::2001-12-31',1]
is what I'm trying to obtain.
So far I have tried :
date_begin_test <- as.Date("2000-12-31")
date_end_test <- as.Date("2001-12-31")
I have tried as.POSIXct as well as plain strings
edhec[date_begin_test::date_end_test,1]
edhec[date_begin_test/date_end_test,1]
edhec[paste("'",date_begin_test,'::',date_end_test,"'",sep=''),1]
edhec[noquote(paste("'",date_begin_test,'::',date_end_test,"'",sep='')),1]
The last one is the most puzzling. It gives me every value from the beginning and stops at date_end_test.
You were close, this works:
edhec[paste(date_begin_test, '::', date_end_test, sep = ""), 1]
Personally, I would use:
edhec[paste(date_begin_test, date_end_test, sep="::"), 1]
Or use this:
x.subset=seq.Date(date_begin_test+1,date_end_test+1,by="month")-1
edhec[as.character(x.subset),1]
A slightly different approach with lubridate
require(lubridate)
edhec[index(edhec) %within% (ymd("2000-12-31") %--% ymd("2001-12-31")), 1]
The purpose of this very simple function is just to transform a date column to a date variable and a numeric time (hourly) column to a factor variable, which will be used with plyr later in the code.
I can get this code to run successfully in the command line, but when I attempt to run it in the function I get an error.
# setting up some fake data
set.seed(31)
foo <- function(myHour, myDate){
rlnorm(1, meanlog=0,sdlog=1)*(myHour) + (150*myDate)
}
Hour <- 1:24
Day <-1:1080
dates <-seq(as.Date("2010-01-01"), by = "day", length.out= 1080)
myData <- expand.grid( Day, Hour)
names(myData) <- c("Date","Hour")
myData$Adspend <- apply(myData, 1, function(x) foo(x[2], x[1]))
myData$Date <-dates
myData$Demand <-(rnorm(1,mean = 0, sd=1)+.75*myData$Adspend)
#############################################################
myData
# Function Creation
AddCal <-function(DF,Date,Time) {
DF$Date<-as.Date(DF$Date, format="%m/%d/%Y")#Change Date variable into a date type
DF$Time<-factor(DF$Time,levels=`c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24))
}
#Test Function
Bob<-AddCal(myData,Date,Hour)
#Error I receive
Error in `$<-.data.frame`(`*tmp*`, "Time", value = integer(0)) :
replacement has 0 rows, data has 25920
I spent about 2 hours searching for answers and trying different things. Because I can run the individual lines of code at the command line and get the desired result, I am assuming this is an advanced coding problem beyond my novice capabilities.
In your function, replace all instances DF$Time with DF[[Time]] same for DF$Date.
Also see the two comments below from #Dwin & #mrip:
Make sure to return a value
Make sure to pass string arguments where strings are expected
What's going on:
When you use DF$Time, R is looking for a column named Time in DF. It is not treating Time as the string variable that you expect.
DF[[Time]] on the other hand does treat Time as a variable.
The reason the error only refers to Time and not Date is because Date is both the name of your variable and the name of a column in DF. (If in your function call you would have used something like AddCal(.. Date=Demand) or whatever other column name, you would not get back the results you would expect)
Side Note:
c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24)
is equivalent to
seq(24) and to 1:24
I have a date in the format dd-mm-yyyy HH:mm:ss
What is the best and easiest way to validate this date?
I tried
d <- format.Date(date, format="%d-%m-%Y %H:%M:%S")
But how can I catch the error when an illegal date is passed?
Simple way:
d <- try(as.Date(date, format="%d-%m-%Y %H:%M:%S"))
if("try-error" %in% class(d) || is.na(d)) {
print("That wasn't correct!")
}
Explanation: format.Date uses as.Date internally to convert date into an object of the Date class. However, it does not use a format option, so as.Date uses the default format, which is %Y-%m-%dand then %Y/%m/%d.
The format option from format.Date is used only for the output, not for the parsing. Quoting from the as.Date man page:
The ‘as.Date’ methods accept character strings, factors, logical
‘NA’ and objects of classes ‘"POSIXlt"’ and ‘"POSIXct"’. (The
last is converted to days by ignoring the time after midnight in
the representation of the time in specified timezone, default
UTC.) Also objects of class ‘"date"’ (from package ‘date’) and
‘"dates"’ (from package ‘chron’). Character strings are processed
as far as necessary for the format specified: any trailing
characters are ignored.
However, when you directly call as.Date with a format specification, nothing else will be allowed than what fits your format.
See also: ?as.Date
You may want to look at the gsubfn package. This has functions (gsubfn specifically) that work like other regular expression functions to match pieces to a string, but then it calls a user supplied function and passes the matching pieces to this function. So you would write your own function that looks at the year, moth, and day and makes sure that they are in the correct ranges (and the range for day can depend on the passed month and year.
This might be helpful if flexibility is desired in a date-time entry.
I have a function where I want to allow either a date-only entry or a date-time entry, then set a flag - for use inside the function only. I'm calling this flag data_type. The flag will be used later in the larger function to select units for getting a difference in two dates with difftime. (In most cases, the function will be perfectly fine with date only, but in some cases a user might need a shorter time frame. I don't want to inconvenience users with the shorter time frame if they don't need it.)
I am posting this for two reasons: 1) to help anyone trying to allow flexibility in date arguments and 2) to welcome sanity checks in case there's a problem with the method, since this is going into a function in an R package.
dat_time_check_fn <- function(dat_time) {
if (!anyNA(as.Date(dat_time, format= "%Y-%m-%d %H:%M:%S"))) date_type <- 1
else if (!anyNA(as.Date(dat_time, format= "%Y-%m-%d"))) date_type <- 2
else stop("Error: dates must either be in format '1999-12-31' or '1999-12-31 23:59:59' ")
date_type
}
Date-time case
date5 <- "1999-12-31 23:59:59"
date_type <- dat_time_check_fn(date5)
date_type
[1] 1
Date only case:
date6 <- "1999-12-31"
date_type <- dat_time_check_fn(date6)
date_type
[1] 2
Note that if the order above in the function is reversed, the longer date-time can be inadvertently coerced to the shorter version and both types result in date_type = 1.
My larger function has more than one date, but I need them to be compatible. Below, I'm checking the two dates checked above, where one was type 1 and one was type 2. Combining types gives the result with date only (type 2):
date_type <- dat_time_check_fn(c(date5, date6))
date_type
[1] 2
Here's a non-compliant version:
date7 <- "1/31/2011"
date_type <- dat_time_check_fn(date7)
Error in dat_time_check_fn(date7) :
Error: dates must either be in format '1999-12-31' or '1999-12-31 23:59:59'
Many solutions here are prone to SQL injection. They return TRUE for date = "2020-08-11; DROP * FROM my_table". Here is a vectorized base R function that works with NA:
is_date = function(x, format = NULL) {
formatted = try(as.Date(x, format), silent = TRUE)
is_date = as.character(formatted) == x & !is.na(formatted) # valid and identical to input
is_date[is.na(x)] = NA # Insert NA for NA in x
return(is_date)
}
Let's try:
> is_date(c("2020-08-11", "2020-13-32", "2020-08-11; DROP * FROM table", NA), format = "%Y-%m-%d")
## TRUE FALSE FALSE NA
I believe that what you are looking for is the tryCatch function.
The following as an excerpt from a script I wrote which accepts any .csv file with two series that have a common x axis. The first column in 'data' is the common x axis variable, and columns 2 & 3 are the y axis variables. I needed the tryCatch statement to make sure the script would create a plot regardless of whether the x axis data is a time series, or some other type of variable.
### READ DATA FROM A CSV FILE
data = read.csv("STLDvsNEM2.csv", header = TRUE)
#CONVERT FIRST ROW OF DATA (IN MY CASE, THE COLUMN INTENDED TO BE THE X AXIS)
#TO AN ACCEPTABLE DATE FORMAT
#IF FIRST ROW OF DATA IS NOT IN AN ACCEPTABLE DATE FORMAT
#USE THE VALUE WITHOUT ANY TRANSFORMATION
x <- tryCatch({
as.Date(data[,1])},
warning = function(w) {},
error = function(e) {
x <- data[,1]
})
y1 <- data[,2]
y2 <- data[,3]