I'm trying to make new date field based on two other columns. If 'R' is present in the Indicator column, I want the date to be the ReportDate. If 'R' is not present, I want the date to be IncidentDate. A working example:
IncidentDate <- as.Date(c('2010-11-1','2008-3-25','2007-3-14'))
ReportDate <- as.Date(c('2010-11-1','2008-5-25','2007-5-14'))
Indicator <- c('','R','')
incident_data <- data.frame(IncidentDate, ReportDate, Indicator)
typeof(IncidentDate) #double
incident_data$calculatedDate <- ifelse(incident_data$ReportDate=='R',as.Date(incident_data$ReportDate), as.Date(incident_data$IncidentDate))
This gives me an error:
Error in charToDate(x) :
character string is not in a standard unambiguous format
I've also tried:
incident_data$calculatedDate <- ifelse(incident_data$ReportDate=='R',as.Date(as.character(incident_data$ReportDate)), as.Date(as.character(incident_data$IncidentDate)))
Which gives me the same error. Why might this be happening?
In base R, it may be better to use assignment on a logical vector instead of ifelse for Date class as ifelse can coerce and remove the Date attribute.
i1 <- incident_data$Indicator=='R'
incident_data$calculatedDate <- incident_data$IncidentDate
incident_data$calculatedDate[i1] <- incident_data$ReportDate
The logical should be based on the Indicator column. However, ifelse coerces the Date to its integer storage mode. So, it may be better to use if_else or case_when. With if_else, case_when, there is a type check associated with the the true, false cases.
library(dplyr)
if_else(incident_data$Indicator=='R',as.Date(incident_data$ReportDate),
as.Date(incident_data$IncidentDate))
#[1] "2010-11-01" "2008-05-25" "2007-03-14"
Related
I have columns that are named "X1.1.21", "X12.31.20" etc.
I can get rid of all the "X"s by using the substring function:
names(df) <- substring(names(df), 2, 8)
I've been trying many different methods to change "1.1.21" into a date format in R, but I'm having no luck so far. How can I go about this?
R doesn't like column names that start with numbers (hence you get X in front of them). However, you can still force R to allow column names that start with number by using check.names = FALSE while reading the data.
If you want to include date format as column names, you can use :
df <- data.frame(X1.1.21 = rnorm(5), X12.31.20 = rnorm(5))
names(df) <- as.Date(names(df), 'X%m.%d.%y')
names(df)
#[1] "2021-01-01" "2020-12-31"
However, note that they look like dates but are still of type 'character'
class(names(df))
#[1] "character"
So if you are going to use the column names for some date calculation you need to change it to date type first.
as.Date(names(df))
I have a vector of dates, and I want to find out which of those dates fall within a certain interval of dates. For example:
library(lubridate)
df$dates <- c(ymd("1999-1-1", "2000-1-1", "2001-1-1"))
comparison_interval <- interval(ymd("2000-12-25"), ymd("2005-1-1"))
I want to compare the vector to the stated interval to see which of the dates in that vector are within the interval. Ultimately I want a boolean vector that I can put in the data frame. So in the example above, I would want a vector of FALSE, FALSE, TRUE. I have tried using %within%
df$dates %within% comparison_interval
But it returns an error saying "Argument 1 is not a recognized date-time." What is the best way to do this?
You could do this as a boolean expression:
dates <- c(ymd("1999-1-1", "2000-1-1", "2001-1-1"))
results <- dates >= ymd("2000-12-25") & dates <= ymd("2005-1-1")
This would give you a logical vector.
I can initialize a data.frame via
df <- data.frame(a=numeric(), b=character())
But how do I define a column of type POSIXct?
df <- data.frame(a=numeric(), b=character(), c=POSIXct())
won't work.
You can try
df <- data.frame(a=numeric(), b=character(), c=as.POSIXct(character()))
Similarly, you can create a POSIXct column of NAs in a data frame with > 0 rows by creating a new column with as.POSIXct(NA).
An additional tip to the above initialization: If you begin rbind() activities to add rows to this empty data frame, you may encounter an error like the following if you follow this pattern:
oneDF <- rbind(oneDF,twoDF,stringsAsFactors=FALSE)
Error in as.POSIXct.default(value) :
do not know how to convert 'value' to class "POSIXct"
I finally discovered that removing the stringsAsFactors=FALSE allowed for the POSIXct value (both integer time and time zone) to transfer to the target DF.
oneDF <- rbind(oneDF,twoDF)
examining the result:
unclass(oneDF$mytime)
[1] 1282089600
attr(,"tzone")
[1] "GMT"
I am working with a data set which has all kinds of column classes, including class "Date". I try to assign NA to all empty values in this data set the following way:
data[data==""] <- NA
Obviously the date column makes some problems here, because there is the following error:
Error in charToDate(x) :
character string is not in a standard unambiguous format
I do not really know why this error occurs, since there are no empty values in the date column, so it should happen nothing there. The dates in the date column are in a standard format "%Y-%m-%d".
What is the problem here and how can I solve it?
You can create a logical index to subset columns other than the 'Date' class, and use that to replace the '' with NA
indx <- sapply(data, class)!='Date'
data[indx][data[indx]==''] <- NA
It is the 'Date' class that is creating the problem. Another option would be to convert the data to matrix so that all the columns will be character.
data[as.matrix(data)==''] <- NA
Or as suggested by #Frank (and using replace)
data[indx] <- lapply(data[indx], function(x) replace(x, which(x==''), NA))
data
set.seed(49)
data <- data.frame(Col1= sample(c('',LETTERS[1:3]), 10, replace=TRUE),
Col2=sample(c('',LETTERS[1:2]), 10, replace=TRUE),
Date=seq(as.Date('2010-01-01'),length.out=10, by='day'),
stringsAsFactors=FALSE)
I have a dataframe a that has columns id and date and a second dataframe b that has id as its first column. For each row in b, I'm trying to find all rows in a with the same id, and then find the minimum of the dates. I'm using the code below, but when I run this, I'm getting a numeric as opposed to dates. I'm wondering if someone can help me with this.
class(a$date)
# "Date"
funP <- function(x){
b <- subset(a, id==x[1])
min(b$date)
}
f <- apply(b, 1, funP)
class(f)
# "numeric"
Apparently the apply function converts date values. The manual (?apply) mentions:
Value:
[...]
In all cases the result is coerced by ‘as.vector’ to one of the basic
vector types [...]
You could convert it back to the Date class:
f <- as.Date(f, origin="1970-01-01")