parsing dates in R - r

I have this input: 2020-03-11 and I want to return 03-11, How can I do that in R?. For example, if I want to have only month, I can assign the month this way: Month=lubridate::month(data)

Here are a few ways. All are vectorized, i.e. x can be a vector of such strings. No packages are used.
x <- "2020-03-11" # input
substring(x, 6)
sub("\\d+-", "", x)
trimws(trimws(x, "left", "\\d"), "left", "-")
format(as.Date(x), "%m-%d")
A different approach is to create an S3 subclass of Date that represents a Date but displays just the month and day storing the full date so that it is recoverable. as.md constructs an object of this new class, as.Date.md converts it back to Date class and format.md formats it. If we print such as object it will look for print.md but we have not defined it so it will use print.Date, the print method of the super class of md, and that method calls format invoking format.md.
as.md <- function(x, ...) structure(as.Date(x), class = c("md", "Date"))
as.Date.md <- function(x, ...) structure(x, class = "Date")
format.md <- function(x, format = "%m-%d", ...) format(as.Date(x), format = format, ...)
y <- as.md(x)
y
## [1] "03-11"
as.Date(y) # recover the full date
##[1] "2020-03-11"
data.frame(x, y)
## x y
## 1 2020-03-11 03-11

One easy way to do this is:
library(lubridate)
x <- "2020-03-11"
month <-month(x)
day <- day(x)
paste(month,"-",day)
and here is the result:
"3 - 11"
So, basically, I used lubridate to extract day and month and then used paste function to put those two together.
Another alternative is to use the code below (no lubridate):
format(as.Date(x), "%m-%d")
here is the result:
"03-11"

Related

How to get the start date of the astronomical season from a date

I tried using the lubridate::floor_date function to get the first date of the season in which my input date is, for ex.:
x <- ymd_hms("2008-08-03 12:01:59.23")
this date is in the summer of 2008, so starting 21-06-2008 and ending 20-09-2008. According to this i expected that by running this
lubridate::floor_date(x, "season")
i would get this
21-06-2008
but instead i get this
"2008-06-01 UTC"
which is not the beginning of the summer of 2008.
Am I using the wrong function, or is there another way to achieve what I'm trying to get ?
As you are using lubridate then you can create a function using floor_date().
astronomical_floor <- function(x) {
stopifnot(
(is(x, "Date") | is(x, "POSIXct") | is(x, "POSIXt"))
)
astronomical_floor <- x |>
floor_date("season") |>
format("%Y-%m-21") |>
ymd()
# Make sure floor not greater than date
# e.g. 2022-06-05 should return 2022-03-21
# not 2022-06-21
if (astronomical_floor > x) {
astronomical_floor <- floor_date(
x %m+% months(-1)
) |>
floor_date("season") |>
format("%Y-%m-21") |>
ymd()
}
return(astronomical_floor)
}
x <- ymd_hms("2008-08-03 12:01:59.23")
astronomical_floor(x) # "2008-06-21"
astronomical_floor(as.Date("2020-01-01")) # "2019-12-21"
astronomical_floor(x = ymd("2022-06-05")) # "2022-03-21"

using sapply on a vector of dates

I have a function that adds a random integer to a date:
rand_to_date = function(date){
newdate = as.Date(date) + sample(1:30, 1)
return(as.Date(newdate))
}
which works fine. However, if I attempt to use sapply to apply this function to a vector of dates, e.g.
test_dates = c('2001-01-01', '2002-02-02', '2003-03-03', '2004-04-04')
sapply will not return a vector of output in date format:
sapply(test_dates,rand_to_date)
2001-01-01 2002-02-02 2003-03-03 2004-04-04
11329 11748 12115 12513
In contrast, lapply will return a list of dates. However, applying unlist to this output once again gives me a vector of numbers rather than dates. Nor does
sapply(sapply(test,rand_to_date), as.Date)
work. What's the simplest way for me to give a vector of these randomized dates as output?
1) Owing to the existence of the c.Date method, use lapply and then c. We have also simplified rand_to_date and added set.seed to make it reproducible:
rand_to_date <- function(date) as.Date(date) + sample(30, 1)
set.seed(123)
test_dates <- c('2001-01-01', '2002-02-02', '2003-03-03', '2004-04-04')
do.call("c", lapply(test_dates, rand_to_date))
## [1] "2001-01-10" "2002-02-26" "2003-03-16" "2004-05-01"
2) Alternately, we could make rand_to_date vectorized right off like this:
rand_to_date <- function(date) as.Date(date) + sample(30, length(date), TRUE)
set.seed(123)
test_dates <- c('2001-01-01', '2002-02-02', '2003-03-03', '2004-04-04')
rand_to_date(test_dates)
## [1] "2001-01-10" "2002-02-26" "2003-03-16" "2004-05-01"
Unfortunately sapply discards attributes, including the S3 class — this is unrelated to your function; sapply(test_dates, as.Date) fails in the same way.
You need to add them again:
structure(sapply(test_dates,rand_to_date), 'Date')
You can just convert the number back to a date.
as.Date(sapply(test_dates, rand_to_date), origin = "1970-01-01")
You can use as.character() within sapply :
sapply(test_dates,function(v) as.character(rand_to_date(v)),USE.NAMES = F)
where as.character(rand_to_date(v)) gives date as character, instead of POSIXct type.

Converting date in For Loop in R - origin must be supplied

I have a sequence of dates in R, and for each date I need to get the year, month, and day. I tried to use the strftime function to print out the year, but R behaves very strangely. This code fails:
# sequence of dates
dates <- seq(as.Date("1987-03-29"), as.Date("1991-12-31"), by=1)
# this fails with "'origin' must be supplied" error:
for (d in dates) {
year <- strftime(d, "%Y")
print(year)
}
The exact error message is: Error in as.POSIXlt.numeric(x, tz = tz) : 'origin' must be supplied
On the other hand, this code works without any error:
# sequence of dates
dates <- seq(as.Date("1987-03-29"), as.Date("1991-12-31"), by=1)
# this works
for (i in 1: length(dates)) {
year <- strftime(dates[i], "%Y")
print(year)
}
Why does the first example fail and the second example works? I suspect that in the first example R is trying to convert my date to some kind of POSIXct object and in the second example it doesn't? I'm confused why there's any difference and I'd appreciate an explanation of what's going on. I'm using R version 3.2.2.
The for is creating d as numeric. Here are two approaches.
Below the comments were removed and only the code lines marked ## have been changed.
1) list Use a list like this:
dates <- seq(as.Date("1987-03-29"), as.Date("1991-12-31"), by=1)
for (d in as.list(dates)) { ##
year <- strftime(d, "%Y")
print(year)
}
2) as.Date or convert d back to "Date" class.
dates <- seq(as.Date("1987-03-29"), as.Date("1991-12-31"), by=1)
for (d in dates) {
year <- strftime(as.Date(d, origin = "1970-01-01"), "%Y") ##
print(year)
}

create lag variable of xts object using $ vs. [] notation

I am trying to create a lagged vector within an xts object using the lag function. It works when defining the new vector within the xts object using $ notation (e.g. x.ts$r1_lag), but it does when defining the new variable using square brackets, i.e. xts[,"r1_lag"]. See code below:
library(xts)
x <- data.frame(date=seq(as.Date('2015-01-01'), by='days', length=100),
runif(1e2), runif(1e2), runif(1e2))
colnames(x) <- c("date", "r1", "r2", "r3")
#the following command works
x.ts <- xts(x, order.by=x$date)
x.ts$r1_lag <- lag(x.ts$r1)
# but the following does not (says subscript is out of bounds)
x.ts <- xts(x, order.by=x$date)
x.ts[,"r1_lag"] <- lag(x.ts[,"r1"])
I need to use [] notation rather than $ notation to reference the vectors because if I want to run the lag transformation on vectors in more than one xts object (vectors within a list of multiple xts objects), I can't define the new vectors within the objects using $ notation, i.e. I cant define the new vectors using the notation in the below stylized loop:
for (i in letters) {
for (j in variables) {
macro.set.ts$i$paste(j,"_L1",sep="") <- lag(macro.set.ts[[i]][,j])
macro.set.ts$i$paste(j,"_L2",sep="") <- lag(macro.set.ts[[i]][,j], 2)
macro.set.ts$i$paste(j,"_L4",sep="") <- lag(macro.set.ts[[i]][,j], 4)
}
}
Thanks!
You don't need to use [<-.xts. You can use merge instead:
for (i in letters) {
for (j in variables) {
# create all lags
mst_ij <- macro.set.ts[[i]][,j]
jL <- merge(lag(mst_ij), lag(mst_ij, 2), lag(mst_ij, 4))
colnames(jL) <- paste(j, c("L1","L2","L4"), sep="_")
# merge back with original data
macro.set.ts[[i]] <- merge(macro.set.ts[[i]], jL)
}
}
The error is not related to lag function. You get an error because you try assign an xts object with another xts object. This example reproduces the error :
x.date= seq(as.Date('2015-01-01'),
by = 'days' , length = 5)
x1 <- xts(data.frame(c1=runif(5)), order.by=x.date)
x2 <- xts(data.frame(c2=runif(5)), order.by=x.date)
x1[,'r2'] <- x2
## Error in `[<-.default`(`*tmp*`, , "r2",
## subscript out of bounds
I find this is coherent within xts logic, because xts are indexed objects. So it is better here to merge objects or join and conserve the indexed nature of your time series.
merge(x1,x2)
This will cbind the 2 times series and fix any index problem. in fact, cbind is just a merge:
identical(cbind(x1,x2),merge(x1,x2)
That's said I think it is a kind of bug that this works for $<- operator and not with [<- operator.
I got the same output with:
x.ts <- cbind(x.ts,lag(x.ts[,"r1"]))
And
x.ts <- transform(x.ts, r1_lag = lag(x.ts[,'r1']))
But, be careful with the output. It may look the same but with an altered structure.
This should work:
x.ts <- merge(x.ts,lag(x.ts[,"r1"]))
You will then probably want to rename the last column that was added:
dimnames(x.ts)[[2]][5] <- "r1_lag"
This is the result:
> head(x.ts)
date r1 r2 r3 r1_lag
2015-01-01 "2015-01-01" "0.23171030" "0.44174424" "0.3396816640" NA
2015-01-02 "2015-01-02" "0.97292220" "0.74909452" "0.2793033421" "0.23171030"
2015-01-03 "2015-01-03" "0.52320743" "0.49288463" "0.0193637393" "0.97292220"
2015-01-04 "2015-01-04" "0.36574297" "0.69571803" "0.6411834760" "0.52320743"
2015-01-05 "2015-01-05" "0.37563137" "0.13841216" "0.3087215754" "0.36574297"
2015-01-06 "2015-01-06" "0.48089356" "0.32702759" "0.3967609401" "0.37563137"
> class(x.ts)
[1] "xts" "zoo"
Hope this helps.

Find first Tuesday of Month

I am trying to write a function which takes a vector of dates as an input and returns a vector of dates -- where the output is the date of the first Tuesday of the month which matches the input date.
So 2012-11-19 --> 2012-11-06, etc.
I have had some success with a single date, but have not been able to generalise to the vector case. Could someone please help?
This is what I have so far:
firstTuesday <- function(tt){
ct <- as.POSIXct(tt)
lt <- as.POSIXlt(tt)
firstOf <- as.POSIXlt(ct - 60*60*24* (lt$mday - 1))
if (firstOf$wday > 2)
{
adjDays <- (9 - firstOf$wday)
firstTues <- as.POSIXlt(as.POSIXct(firstOf) + 60*60*24*adjDays)
}
else {
adjDays <- (2 - firstOf$wday)
firstTues <- as.POSIXlt(as.POSIXct(firstOf) + 60*60*24*adjDays)
}
return(firstTues)
}
Which works for a single date: firstTuesday(Sys.Date()) but yielded junk for vectors of dates (due to issues with if not being a vectorised control operator, i think).
I got around my limited understanding by using indexing. The following code seems to do the trick.
firstTuesday <- function(tt){
ct <- as.POSIXct(tt)
lt <- as.POSIXlt(tt)
firstOf <- as.POSIXlt(ct - 60*60*24* (lt$mday - 1))
firstTue <- as.POSIXct(firstOf)
idx <- firstOf$wday > 2
firstTue[idx] <- as.POSIXct(firstOf[idx]) + 60*60*24*(9 - firstOf$wday[idx])
firstTue[!idx] <- as.POSIXct(firstOf[!idx]) + 60*60*24*(2 - firstOf$wday[!idx])
return(firstTue)
}
This uses lubridate and makes the logic a little simpler. Given a vector of dates the second function will return a vector of characters, similar to your input. You can change things around to suit your needs.
library(lubridate)
getTuesday = function(x) {
date = ymd(x)
first = floor_date(date,"month")
dow = sapply(seq(0,6),function(x) wday(first+days(x)))
firstTuesday = first + days(which(dow==3)-1)
return(firstTuesday)
}
getMultipleTuesdays = function(y) {
tmp = lapply(y, getTuesday)
tmp = lapply(tmp, as.character)
return(unlist(tmp))
}
Edit
Sample input/output
getMultipleTuesdays(c("2012-11-19","2012-11-19","2011-01-15"))
[1] "2012-11-06" "2012-11-06" "2011-01-04"
Here's a simple solution using base functions:
firstDayOfMonth <- function(dates, day="Mon", abbreviate=TRUE) {
# first 7 days of month
s <- lapply(as.Date(format(dates,"%Y-%m-01")), seq, by="day", length.out=7)
# first day of month
d <- lapply(s, function(d) d[weekdays(d,abbreviate)==day])
# unlist converts to atomic, so use do.call(c,...) instead
do.call(c, d)
}
Well, maybe the do.call at the end isn't so simple... but it's a handy piece of knowledge. :)
R> d <- as.Date(c("2012-11-19","2012-11-19","2011-01-15"))
R> firstDayOfMonth(d, "Tuesday", FALSE)
[1] "2012-11-06" "2012-11-06" "2011-01-04"

Resources