R: Best way around as.POSIXct() in apply function - r

I'm trying to set up a new variable that incorporates the difference (in number of days) between a known date and the end of a given year. Dummy data below:
> Date.event <- as.POSIXct(c("12/2/2000","8/2/2001"), format = "%d/%m/%Y", tz = "Europe/London")
> Year = c(2000,2001)
> Dates.test <- data.frame(Date.event,Year)
> Dates.test
Date.event Year
1 2000-02-12 2000
2 2001-02-08 2001
I've tried applying a function to achieve this, but it returns an error
> Time.dif.fun <- function(x) {
+ as.numeric(as.POSIXct(sprintf('31/12/%s', s= x['Year']),format = "%d/%m/%Y", tz = "Europe/London") - x['Date.event'])
+ }
> Dates.test$Time.dif <- apply(
+ Dates.test, 1, Time.dif.fun
+ )
Error in unclass(e1) - e2 : non-numeric argument to binary operator
It seems that apply() does not like as.POSIXct(), as testing a version of the function that only derives the end of year date, it is returned as a numeric in the form '978220800' (e.g. for end of year 2000). Is there any way around this? For the real data the function is a bit more complex, including conditional instances using different variables and sometimes referring to previous rows, which would be very hard to do without apply.

Here are some alternatives:
1) Your code works with these changes. We factored out s, not because it is necessary, but only because the following line gets very hard to read without that due to its length. Note that if x is a data frame then so is x["Year"] but x[["Year"]] is a vector as is x$Year. Since the operations are all vectorized we do not need apply.
Although we have not made this change, it would be a bit easier to define s as s <- paste0(x$Year, "-12-31") in which case we could omit the format argument in the following line owing to the use of the default format.
Time.dif.fun <- function(x) {
s <- sprintf('31/12/%s', x[['Year']])
as.numeric(as.POSIXct(s, format = "%d/%m/%Y", tz = "Europe/London") -x[['Date.event']])
}
Time.dif.fun(Dates.test)
## [1] 323 326
2) Convert to POSIXlt, set the year, month and day to the end of the year and subtract. Note that the year component uses years since 1900 and the mon component uses Jan = 0, Feb = 1, ..., Dec = 11. See ?as.POSIXlt for details on these and other components:
lt <- as.POSIXlt(Dates.test$Date.event)
lt$year <- Dates.test$Year - 1900
lt$mon <- 11
lt$mday <- 31
as.numeric(lt - Dates.test$Date.event)
## [1] 323 326
3) Another possibility is:
with(Dates.test, as.numeric(as.Date(paste0(Year, "-12-31")) - as.Date(Date.event)))
## [1] 323 326

You could use the difftime function:
Dates.test$diff_days <- difftime(as.POSIXct(paste0(Dates.test[,2],"-12-31"),format = "%Y-%m-%d", tz = "Europe/London"),Dates.test[,1],unit="days")

You can use ISOdate to build the end of year date, and the difftime(... units='days') to get the days til end of year.
From ?difftime:
Limited arithmetic is available on "difftime" objects: they can be
added or subtracted, and multiplied or divided by a numeric vector.
If you want to do more than the limited arithmetic, just coerce with as.numeric(), but you will have to stick with whatever units you specified.
By convention, you may wish to use the beginning of the next year (midnight on new year's eve) as your endpoint for that year. For example:
Dates.test <- data.frame(
Date.event = as.POSIXct(c("12/2/2000","8/2/2001"),
format = "%d/%m/%Y", tz = "Europe/London")
)
# use data.table::year() to get the year of a date
year <- function(x) as.POSIXlt(x)$year + 1900L
Dates.test$Date.end <- ISOdate(year(Dates.test$Date.event)+1,1,1)
# if you don't want class 'difftime', wrap it in as.numeric(), as in:
Dates.test$Date.diff <- as.numeric(
difftime(Dates.test$Date.end,
Dates.test$Date.event,
units='days')
)
Dates.test
# Date.event Date.end Date.diff
# 1 2000-02-12 2001-01-01 12:00:00 324.5
# 2 2001-02-08 2002-01-01 12:00:00 327.5
The apply() family are basically a clean way of doing for loops, and you should strive for more efficient, vectorized solutions.

Related

Change date format from YYYYQQ or YYYY to mm/dd/yyyy

I have a column of data with two different formats: yyyyqq and yyyyy. I want to reformat the column to mmddyyyyy.
Whenever I use the following command as.Date(as.character(x), format = "%y") the output is yyyy-12-03. I cannot get any other combination of as.Date to work.
I'm sure this is a simple fix, but how do I do this?
Using the following assumptions:
2021 <- 2021-01-01
2021Q1 <- 2021-01-01
2021Q2 <- 2021-04-01
2021Q3 <- 2021-07-01
2021Q4 <- 2021-10-01
You can use the following:
as.Date(paste(substr(x, 1, 4), 3*as.numeric(max(substr(x, 6, 6),1))-2, "1", sep = "-"))
Edit: You can wrap this in a format(..., "%m%d%Y) but as already said in the comments I would not recommend it.
Here is a function which translates to the first (if frac=0) or last (if frac=1) date of the period. First append a 01 (first of the period) or 04 (last of the period) to the end of the input. That puts them all in yyyyqq format possibly with junk at the end. Then yearqtr will convert to a yearqtr object ignoring any junk. Then convert that to a Date object. as.Date.yearqtr uses the same meaning for frac. Finally format it as a character string in mm/dd/yyyy format.
(One alternative is to replace the format(...) line with chron::as.chron() in which case it will render in the same manner, since the format specified is the default for chron, but be a chron dates object which can be manipulated more conveniently, e.g. it sorts chronologically, than a character string.)
library(zoo)
to_date <- function(x, frac = 1) x |>
paste0(if (frac == 1) "04" else "01") |>
as.yearqtr("%Y%q") |>
as.Date(frac = frac) |>
format("%m/%d/%Y")
# test data
dd <- data.frame(x = c(2001, 2003, 200202, 200503))
transform(dd, first = to_date(x, frac = 0), last = to_date(x, frac = 1))
giving:
x first last
1 2001 01/01/2001 12/31/2001
2 2003 01/01/2003 12/31/2003
3 200202 04/01/2002 06/30/2002
4 200503 07/01/2005 09/30/2005

add_months function in Spark R

I have a variable of the form "2020-09-01". I need to increase and decrease this by 3 months and 5 months and store it in other variables. I need a syntax in Spark R.Thanks. Any other method will also work.Thanks, Again
In R following code works fine
y <- as.Date(load_date,"%Y-%m-%d") %m+% months(i)
The code below didn't work. Error says
unable to find an inherited method for function ‘add_months’ for signature ‘"Date", "numeric"
loaddate = 202009
year <- substr(loaddate,1,4)
month <- substr(loaddate,5,6)
load_date <- paste(year,month,"01",sep = "-")
y <- as.Date(load_date,"%Y%m%d")
y1 <- add_months(y,-3)
Expected Result - 2020-06-01
The lubridate package makes dealing with dates much easier. Here I have shuffled as.Date up a step, then simply subtract 3 months.
library(lubridate)
loaddate = 202009
year <- substr(loaddate,1,4)
month <- substr(loaddate,5,6)
load_date <- as.Date(paste(year,month,"01",sep = "-"))
new_date <- load_date - months(3)
new_date Output:
Date[1:1], format: "2020-06-01"

Data frame of departure and return dates, how do I get a list of all dates away?

I'm stuck on a problem calculating travel dates. I have a data frame of departure dates and return dates.
Departure Return
1 7/6/13 8/3/13
2 7/6/13 8/3/13
3 6/28/13 8/7/13
I want to create and pass a function that will take these dates and form a list of all the days away. I can do this individually by turning each column into dates.
## Turn the departure and return dates into a readable format
Dept <- as.Date(travelDates$Dept, format = "%m/%d/%y")
Retn <- as.Date(travelDates$Retn, format = "%m/%d/%y")
travel_dates <- na.omit(data.frame(dept_dates,retn_dates))
seq(from = travel_dates[1,1], to = travel_dates[1,2], by = 1)
This gives me [1] "2013-07-06" "2013-07-07"... and so on. I want to scale to cover the whole data frame, but my attempts have failed.
Here's one that I thought might work.
days_abroad <- data.frame()
get_days <- function(x,y){
all_days <- seq(from = x, to = y, by =1)
c(days_abroad, all_days)
return(days_abroad)
}
get_days(travel_dates$dept_dates, travel_dates$retn_dates)
I get this error:
Error in seq.Date(from = x, to = y, by = 1) : 'from' must be of length 1
There's probably a lot wrong with this, but what I would really like help on is how to run multiple dates through seq().
Sorry, if this is simple (I'm still learning to think in r) and sorry too for any breaches in etiquette. Thank you.
EDIT: updated as per OP comment.
How about this:
travel_dates[] <- lapply(travel_dates, as.Date, format="%m/%d/%y")
dts <- with(travel_dates, mapply(seq, Departure, Return, by="1 day"))
This produces a list with as many items as you had rows in your initial table. You can then summarize (this will be data.frame with the number of times a date showed up):
data.frame(count=sort(table(Reduce(append, dts)), decreasing=T))
# count
# 2013-07-06 3
# 2013-07-07 3
# 2013-07-08 3
# 2013-07-09 3
# ...
OLD CODE:
The following gets the #days of each trip, rather than a list with the dates.
transform(travel_dates, days_away=Return - Departure + 1)
Which produces:
# Departure Return days_away
# 1 2013-07-06 2013-08-03 29 days
# 2 2013-07-06 2013-08-03 29 days
# 3 2013-06-28 2013-08-07 41 days
If you want to put days_away in a separate list, that is trivial, though it seems more useful to have it as an additional column to your data frame.

Converting Vector into Dates in R

I have a vector of dates of the form BW01.68, BW02.68, ... , BW26.10. BW stands for "bi-week", so for example, "BW01.68" represents the first bi-week of the year 1968, and "BW26.10" represents the 26th (and final) bi-week of the year 2010. Using R, how could I convert this vector into actual dates, say, of the form 01-01-1968, 01-15-1968, ... , 12-16-2010? Is there a way for R to know exactly which dates correspond to each bi-week? Thanks for any help!
An alternative solution.
biwks <- c("BW01.68", "BW02.68", "BW26.10")
bw <- substr(biwks,3,4)
yr <- substr(biwks,6,7)
yr <- paste0(ifelse(as.numeric(yr) > 15,"19","20"),yr)
# the %j in the date format is the number of days into the year
as.Date(paste(((as.numeric(bw)-1) * 14) + 1,yr,sep="-"),format="%j-%Y")
#[1] "1968-01-01" "1968-01-15" "2010-12-17"
Though I will note that a 'bi-week' seems a strange measure and I can't be sure that just using 14 day blocks is what is intended in your work.
You can make this code a lot shorter. I have spaced out each step to help understanding but you could finish it off in one (long) line of code.
bw <- c('BW01.68', 'BW02.68','BW26.10','BW22.13')
# the gsub will ensure that bw01.1 the same as bw01.01, bw1.01, or bw1.1
#isolating year no
yearno <- as.numeric(
gsub(
x = bw,
pattern = "BW.*\\.",
replacement = ""
)
)
#isolating and converting bw to no of days
dayno <- 14 * as.numeric(
gsub(
x = bw,
pattern = "BW|\\.[[:digit:]]{1,2}",
replacement = ""
)
)
#cutoff year chosen as 15
yearno <- yearno + 1900
yearno[yearno < 1915] <- yearno[yearno < 1915] + 100
# identifying dates
dates <- as.Date(paste0('01/01/',yearno),"%d/%m/%Y") + dayno
# specifically identifinyg mondays of that week no
mondaydates <- dates - as.numeric(strftime(dates,'%w')) + 1
Output -
> bw
[1] "BW01.68" "BW02.68" "BW26.10" "BW22.13"
> dates
[1] "1968-01-15" "1968-01-29" "2010-12-31" "2013-11-05"
> mondaydates
[1] "1968-01-15" "1968-01-29" "2010-12-27" "2013-11-04"
PS: Just be careful that you're aligned with how bw is measured in your data and whether you're translating it correctly. You should be able to manipulate this to get it to work, for instance you might encounter a bw 27.

Calculate ages in R

I have two data frames in R. One frame has a persons year of birth:
YEAR
/1931
/1924
and then another column shows a more recent time.
RECENT
09/08/2005
11/08/2005
What I want to do is subtract the years so that I can calculate their age in number of years, however I am not sure how to approach this. Any help please?
The following function takes a vectors of Date objects and calculates the ages, correctly accounting for leap years. Seems to be a simpler solution than any of the other answers.
age = function(from, to) {
from_lt = as.POSIXlt(from)
to_lt = as.POSIXlt(to)
age = to_lt$year - from_lt$year
ifelse(to_lt$mon < from_lt$mon |
(to_lt$mon == from_lt$mon & to_lt$mday < from_lt$mday),
age - 1, age)
}
You can solve this with the lubridate package.
> library(lubridate)
I don't think /1931 is a common date class. So I'll assume all the entries are character strings.
> RECENT <- data.frame(recent = c("09/08/2005", "11/08/2005"))
> YEAR <- data.frame(year = c("/1931", "/1924"))
First, let's notify R that the recent dates are dates. I'll assume the dates are in month/day/year order, so I use mdy(). If they're in day/month/year order just use dmy().
> RECENT$recent <- mdy(RECENT$recent)
recent
1 2005-09-08
2 2005-11-08
Now, lets turn the years into numbers so we can do some math with them.
> YEAR$year <- as.numeric(substr(YEAR$year, 2, 5))
Now just do the math. year() extracts the year value of the RECENT dates.
> year(RECENT$recent) - YEAR
year
1 74
2 81
p.s. if your year entries are actually full dates, you can get the difference in years with
> YEAR1 <- data.frame(year = mdy("01/08/1931","01/08/1924"))
> as.period(RECENT$recent - YEAR1$year, units = "year")
[1] 74 years and 8 months 81 years and 10 months
I use a custom function, see code below, convenient to use in mutate and quite flexible (you'll need the lubridate package).
Examples
get_age("2000-01-01")
# [1] 17
get_age(lubridate::as_date("2000-01-01"))
# [1] 17
get_age("2000-01-01","2015-06-15")
# [1] 15
get_age("2000-01-01",dec = TRUE)
# [1] 17.92175
get_age(c("2000-01-01","2003-04-12"))
# [1] 17 14
get_age(c("2000-01-01","2003-04-12"),dec = TRUE)
# [1] 17.92176 14.64231
Function
#' Get age
#'
#' Returns age, decimal or not, from single value or vector of strings
#' or dates, compared to a reference date defaulting to now. Note that
#' default is NOT the rounded value of decimal age.
#' #param from_date vector or single value of dates or characters
#' #param to_date date when age is to be computed
#' #param dec return decimal age or not
#' #examples
#' get_age("2000-01-01")
#' get_age(lubridate::as_date("2000-01-01"))
#' get_age("2000-01-01","2015-06-15")
#' get_age("2000-01-01",dec = TRUE)
#' get_age(c("2000-01-01","2003-04-12"))
#' get_age(c("2000-01-01","2003-04-12"),dec = TRUE)
get_age <- function(from_date,to_date = lubridate::now(),dec = FALSE){
if(is.character(from_date)) from_date <- lubridate::as_date(from_date)
if(is.character(to_date)) to_date <- lubridate::as_date(to_date)
if (dec) { age <- lubridate::interval(start = from_date, end = to_date)/(lubridate::days(365)+lubridate::hours(6))
} else { age <- lubridate::year(lubridate::as.period(lubridate::interval(start = from_date, end = to_date)))}
age
}
You can do some formating:
as.numeric(format(as.Date("01/01/2010", format="%m/%d/%Y"), format="%Y")) - 1930
With your data:
> yr <- c(1931, 1924)
> recent <- c("09/08/2005", "11/08/2005")
> as.numeric(format(as.Date(recent, format="%m/%d/%Y"), format="%Y")) - yr
[1] 74 81
Since you have your data in a data.frame (I'll assume that it's called df), it will be more like this:
as.numeric(format(as.Date(df$recent, format="%m/%d/%Y"), format="%Y")) - df$year
Given the data in your example:
> m <- data.frame(YEAR=c("/1931", "/1924"),RECENT=c("09/08/2005","11/08/2005"))
> m
YEAR RECENT
1 /1931 09/08/2005
2 /1924 11/08/2005
Extract year with the strptime function:
> strptime(m[,2], format = "%m/%d/%Y")$year - strptime(m[,1], format = "/%Y")$year
[1] 74 81
Based on the previous answer, convert your columns to date objects and subtract. Some conversion of types between character and numeric is necessary:
> foo=data.frame(RECENT=c("09/08/2005","11/08/2005"),YEAR=c("/1931","/1924"))
> foo
RECENT YEAR
1 09/08/2005 /1931
2 11/08/2005 /1924
> foo$RECENTd = as.Date(foo$RECENT, format="%m/%d/%Y")
> foo$YEARn = as.numeric(substr(foo$YEAR,2,999))
> foo$AGE = as.numeric(format(foo$RECENTd,"%Y")) - foo$YEARn
> foo
RECENT YEAR RECENTd YEARn AGE
1 09/08/2005 /1931 2005-09-08 1931 74
2 11/08/2005 /1924 2005-11-08 1924 81
Note I've assumed you have that slash in your year column.
Also, tip for when asking questions about dates is to include a day that is past the twelfth so we know if you are a month/day/year person or a day/month/year person.
I think this might be a bit more intuitive and requires no formatting or stripping:
as.numeric(as.Date("2002-02-02") - as.Date("1924-08-03")) / 365
gives output:
77.55342
Then you can use floor(), round(), or ceiling() to round to a whole number.
Really solid way that also supports vectors using the lubridate package:
age <- function(date.birth, date.ref = Sys.Date()) {
if (length(date.birth) > 1 & length(date.ref) == 1) {
date.ref <- rep(date.ref, length(date.birth))
}
date.birth.monthdays <- paste0(month(date.birth), day(date.birth)) %>% as.integer()
date.ref.monthdays <- paste0(month(date.ref), day(date.ref)) %>% as.integer()
age.calc <- 0
for (i in 1:length(date.birth)) {
if (date.birth.monthdays[i] <= date.ref.monthdays[i]) {
# didn't had birthday
age.calc[i] <- year(date.ref[i]) - year(date.birth[i])
} else {
age.calc[i] <- year(date.ref[i]) - year(date.birth[i]) - 1
}
}
age.calc
}
This also accounts for leap years. I just check if someone has had a birthday already.

Resources