Create vector of character strings in R using for loop - r

I'm trying to create a vector of dates (formatted as character strings not as dates) using a for loop. I've reviewed a few other SO questions such as (How to create a vector of character strings using a loop?), but they weren't helpful. I've created the following for loop:
start_dates <- c("1993-12-01")
j <- 1
start_dates <- for(i in 1994:as.numeric(format(Sys.Date(), "%Y"))){
date <- sprintf("%s-01-01", i)
j <- j + 1
start_dates[j] <- date
}
However, it returns a NULL (empty) vector start_dates. When I increment the i index manually it works. For example:
> years <- 1994:as.numeric(format(Sys.Date(), "%Y"))
> start_dates <- c("1993-12-01")
> j <- 1
> i <- years[1]
> date <- sprintf("%s-01-01", i)
> j <- j + 1
> start_dates[j] <- date
> start_dates
[1] "1993-12-01" "1994-01-01"
> i <- years[2]
> date <- sprintf("%s-01-01", i)
> j <- j + 1
> start_dates[j] <- date
> start_dates
[1] "1993-12-01" "1994-01-01" "1995-01-01"
It must have something to do with the construction of my for() statement, but I can't figure it out. I'm sure it's super simple. Thanks in advance.

What is wrong with:
sprintf("%s-01-01", 1994:2015)
> sprintf("%s-01-01", 1994:2015)
[1] "1994-01-01" "1995-01-01" "1996-01-01" "1997-01-01" "1998-01-01"
[6] "1999-01-01" "2000-01-01" "2001-01-01" "2002-01-01" "2003-01-01"
[11] "2004-01-01" "2005-01-01" "2006-01-01" "2007-01-01" "2008-01-01"
[16] "2009-01-01" "2010-01-01" "2011-01-01" "2012-01-01" "2013-01-01"
[21] "2014-01-01" "2015-01-01"
sprintf() is fully vectorised, take advantage of this.
Problems with your loop
The main problem is that you are assigning the value of the for() function to start_dates when the for() finished, hence overwriting all the hard work your loop did. This is effectively what is happening:
j <- 1
foo <- for (i in 1:10) {
j <- j + 1
}
foo
> foo
NULL
And reading ?'for' we see that this behaviour is by design:
Value:
....
‘for’, ‘while’ and ‘repeat’ return ‘NULL’ invisibly.
Solution: Don't assign the returned value of for(). Hence the template might be:
for(i in foo) {
# ... do stuff
start_dates[j] <- bar
}
Fix that and you still have a problem; j will be 2 by the time you assign the first date to the output as you start with j <- 1 and increment it before assigning in the loop.
This would be easier if you made i take values from a sequence 1, 2, ..., n rather than the actual years you want. You can use i to index the years vector and as an index for the elements of start_dates too.
Not that you should do the loop this way, but, if you wanted too...
years <- seq.int(1994, 2015)
start_dates <- numeric(length = length(years))
for (i in seq_along(years)) {
start_dates[i] <- sprintf("%s-01-01", years[i])
}
which would give:
> start_dates
[1] "1994-01-01" "1995-01-01" "1996-01-01" "1997-01-01" "1998-01-01"
[6] "1999-01-01" "2000-01-01" "2001-01-01" "2002-01-01" "2003-01-01"
[11] "2004-01-01" "2005-01-01" "2006-01-01" "2007-01-01" "2008-01-01"
[16] "2009-01-01" "2010-01-01" "2011-01-01" "2012-01-01" "2013-01-01"
[21] "2014-01-01" "2015-01-01"
Sometimes it is helpful to loop over the actual values in a vector (as you did) rather than it's indices (as I just did), but only in specific cases. For general operations like you have here, it is just an additional complication you need to work around. That said, think about doing vectorised operations in R before resorting to a loop.

You shouldn't assign the loop to a variable. Do:
start_dates <- c("1993-12-01")
j <- 1
for(i in 1994:as.numeric(format(Sys.Date(), "%Y"))){ #use the for-loop on its own. Don't assign it to a variable
date <- sprintf("%s-01-01", i )
j <- j + 1
start_dates[j] <- date
}
and you are fine:
> start_dates
[1] "1993-12-01" "1994-01-01" "1995-01-01" "1996-01-01" "1997-01-01" "1998-01-01" "1999-01-01" "2000-01-01" "2001-01-01"
[10] "2002-01-01" "2003-01-01" "2004-01-01" "2005-01-01" "2006-01-01" "2007-01-01" "2008-01-01" "2009-01-01" "2010-01-01"
[19] "2011-01-01" "2012-01-01" "2013-01-01" "2014-01-01" "2015-01-01"

Related

Generate sequences of anniversary dates between 2 dates

I tried to generate a sequence of dates between two dates. By search all the old posts, I found very nice solution using seq.Date.
For example:
> seq.Date(as.Date("2016/1/15"), as.Date("2016/5/1"), by = "month")
[1] "2016-01-15" "2016-02-15" "2016-03-15" "2016-04-15"
The above function yields very nice solution. However, it doesnt work when the date is 30 or 31 in Jan.
> seq.Date(as.Date("2016/1/30"), as.Date("2016/5/1"), by = "month")
[1] "2016-01-30" "2016-03-01" "2016-03-30" "2016-04-30"
The second anniversary jumps to March instead of being capped at 29/Feb. I couldnt find a workaround for this.
Here's an approach that also works in other cases:
library(lubridate)
fun <- function(from, to, by) {
mySeq <- seq.Date(as.Date(from), as.Date(to), by = by)
as.Date(sapply(mySeq, function(d) d + 1 - which.max(day(d - 0:3))), origin = "1970-01-01")
}
fun("2016/1/30", "2016/5/1", "month")
# [1] "2016-01-30" "2016-02-29" "2016-03-30" "2016-04-30"
fun("2017/1/31", "2017/5/1", "month")
# [1] "2017-01-31" "2017-02-28" "2017-03-31" "2017-04-30"
fun("2017/1/29", "2017/5/1", "month")
# [1] "2017-01-29" "2017-02-28" "2017-03-29" "2017-04-29"
What fun does is that it subtracts 0:3 from each date and chooses the one that has the largest day.
With lubridate package
library('lubridate')
pmin(
ymd('2018-01-30') + months(0:11), # NA where month goes over
ymd('2018-01-01') + months(1:12) - days(1), # last day of month
na.rm = T
)
[1] "2018-01-30" "2018-02-28" "2018-03-30"
[4] "2018-04-30" "2018-05-30" "2018-06-30"
[7] "2018-07-30" "2018-08-30" "2018-09-30"
[10] "2018-10-30" "2018-11-30" "2018-12-30"

Format year half contained in text as dates

I have date values contained in text, each containing a half of the year:
date_by_half <- c("2016 H1", "2017 H2", "2018 H1")
I'd like to extract the date from text and store as the first day of each half or "semester". So, something like:
ysemester(date_by_half)
#[1] "2016-01-01" "2017-07-01" "2018-01-01"
I'm familiar with lubridate::yq() function, but I found that this only works for quarters.
lubridate::yq(date_by_half)
#[1] "2016-01-01" "2017-04-01" "2018-01-01"
Right now my work around is to replace H2 with Q3:
lubridate::yq(stringr::str_replace(date_by_half,"H2", "Q3"))
#[1] "2016-01-01" "2017-07-01" "2018-01-01"
However, I'm wondering if there is a more eloquent solution using lubridate (or some other quick and reusable method).
One liners
These one-liners use only base R:
1) read.table/ISOdate
with(read.table(text = date_by_half), as.Date(ISOdate(V1, ifelse(V2=="H1",1,7), 1)))
## [1] "2016-01-01" "2017-07-01" "2018-01-01"
2) sub Even shorter is:
as.Date(sub(" H2", "-7-1", sub(" H1", "-1-1", date_by_half)))
## [1] "2016-01-01" "2017-07-01" "2018-01-01"
S3
Another approach would be to create an S3 class, "half", for half year dates. We will only implement the methods we need.
as.half <- function(x, ...) UseMethod("as.half")
as.half.character <- function(x, ...) {
year <- as.numeric(sub("\\D.*", "", x))
half <- as.numeric(sub(".*\\D", "", x))
structure(year + (half - 1)/2, class = "half")
}
as.Date.half <- function(x, ...) {
as.Date(ISOdate(as.integer(x), 12 * (x - as.integer(x)) + 1, 1))
}
# test
as.Date(as.half(date_by_half))
## [1] "2016-01-01" "2017-07-01" "2018-01-01"
You can make your own function to do the trick.
# Your data
date_by_half <- c("2016 H1", "2017 H2", "2018 H1")
# Function to do the work
year_dater <- function(dates) {
year <- substr(dates, 1, 4)
quarter <- substr(dates, 6, 7)
month <- ifelse(quarter=="H1", 1, 7)
dates <- paste0(year, "-", month, "-", rep(1, length(month)))
return(dates)
}
# Running the function
dates <- year_dater(date_by_half)
# As date format
as.POSIXct(dates)
"2016-01-01 CET" "2017-07-01 CEST" "2018-01-01 CET"
We can use ceiling_date function from lubridate with unit as "halfyear" and change_on_boundary parameter set to FALSE so that the dates on boundary (2018-01-01, 2017-07-01 etc.) are never rounded up along with yq function.
library(lubridate)
ceiling_date(yq(date_by_half), unit = "halfyear", change_on_boundary = FALSE)
#[1] "2016-01-01" "2017-07-01" "2018-01-01"

Convert quarter/year format to a date

I created a function that coerce a vector of quarters-years format to a vector of dates.
.quarter_to_date(c("Q1/13","Q2/14"))
[1] "2013-03-01" "2014-06-01"
This the code of my function.
.quarter_to_date <-
function(x){
ll <- strsplit(gsub('Q([0-9])[/]([0-9]+)','\\1,\\2',x),',')
res <- lapply(ll,function(x){
m <- as.numeric(x[1])*3
m <- ifelse(nchar(m)==1,paste0('0',m),as.character(m))
as.Date(paste(x[2],m,'01',sep='-'),format='%y-%m-%d')
})
do.call(c,res)
}
My function works fine but it looks long and a little bit complicated. I think that this should be already done in other packages( lubridate for example) But I can't find it. Can someone help me to simplify this code please?
1) The zoo package has a "yearqtr" class. Convert to that and then to "Date" class:
library(zoo)
x <- c("Q1/13","Q2/14")
as.Date(as.yearqtr(x, format = "Q%q/%y"))
## [1] "2013-01-01" "2014-04-01"
2) Alternately use this to get the last day of the quarter instead of the first:
as.Date(as.yearqtr(x, format = "Q%q/%y"), frac = 1)
## [1] "2013-03-31" "2014-06-30"
3) Also consider not converting to "Date" class at all and just using "yearqtr" class directly:
as.yearqtr(x, format = "Q%q/%y")
## [1] "2013 Q1" "2014 Q2"

R Subset XTS weekdays

How do I subset an xts object to only include weekdays (Mon-Fri, with Saturday and Sunday excluded)?
Here's what I'd do:
library(xts)
data(sample_matrix)
sample.xts <- as.xts(sample_matrix, descr='my new xts object')
x <- sample.xts['2007']
x[!weekdays(index(x)) %in% c("Saturday", "Sunday")]
EDIT:
Joshua Ulrich in comments points out a better solution using .indexwday(), one of a family of built-in accessor functions for extracting pieces of the index of xts class objects. Also, like Dirk Eddelbuettel's solution, the following should be locale-independent:
x[.indexwday(x) %in% 1:5]
By computing the day-of-the week given the date, and subsetting. In the example, I use a Date type but the cast to POSIXlt works the same way for POSIXct intra-day timestamps.
> mydates <- Sys.Date() + 0:6
> mydates
[1] "2012-01-31" "2012-02-01" "2012-02-02" "2012-02-03" "2012-02-04"
+ "2012-02-05" "2012-02-06"
> we <- sapply(mydates, function(d) { as.POSIXlt(d)$wday}) %in% c(0, 6)
> we
[1] FALSE FALSE FALSE FALSE  TRUE  TRUE FALSE
> mydates[ ! we ]
[1] "2012-01-31" "2012-02-01" "2012-02-02" "2012-02-03" "2012-02-06"
> 
This really is not an xts question but basic date handling.

Date sequence with negative by

How do I get a sequence of monthly dates that ends on a given month and has a given length? seq(as.Date(*), length, by="month") assumes the start date is given, not the end date, and AFAIK it's impossible to specify a negative value for by in this case.
ETA: that is, I want a sequence that spans a given period, but one whose end point is specified rather than the start point. So, something like seq(to="2000-03-01", len=3, by="month") --> 2000-01-01, 2000-02-01, 2000-03-01.
Try this:
rev(seq(as.Date("2000-03-01"), length = 3, by = "-1 month"))
## [1] "2000-01-01" "2000-02-01" "2000-03-01"
library(lubridate)
ymd('2011-03-03') - months(0:5)
Maybe you could just compute it forward, using by=month as the +1 increment, and then reverse:
R> rev(seq(as.Date("2011-01-01"), length=6, by="month"))
[1] "2011-06-01" "2011-05-01" "2011-04-01" "2011-03-01" "2011-02-01" "2011-01-01"
Here you go. Base functions only:
last.days.of.month <- function(dt) {ldt<- as.POSIXlt(dt)
ldt$mon <- ldt$mon+1
ldt$mday <- 1
return(format( ldt -1, "%Y-%m-%d"))}
last.days.of.month(as.Date(c("2010-01-06","2010-03-06", "2010-02-06")) )
# [1] "2010-01-31" "2010-03-31" "2010-02-28"
seq.ldom <- function(dt, nmonths) {ldt<- rep(as.POSIXlt(dt)[1], nmonths)
ldt$mon <- ldt$mon+seq(1:nmonths)
ldt$mday <- 1
return(format( ldt -1, "%Y-%m-%d"))}
seq.ldom(as.Date("2010-01-06"), 5)
#[1] "2010-01-31" "2010-02-28" "2010-03-31" "2010-04-30"
#[5] "2010-05-31"
Oh, for some reason I thought you wanted the last days of the month. Sorry about the useless code. The first days of the month is not hard.
seq.fdom <- function(dt, nmonths) {ldt<- rep(as.POSIXlt(dt)[1], nmonths)
ldt$mon <- ldt$mon+seq(0:(nmonths-1))
ldt$mday <- 1
return(format( ldt , "%Y-%m-%d"))}
seq.fdom(as.Date("2010-01-06"), 5)
#[1] "2010-02-01" "2010-03-01" "2010-04-01" "2010-05-01"
#[5] "2010-06-01"
And getting the prior months either:
seq.prior.fdom <- function(dt, nmonths) {ldt<- rep(as.POSIXlt(dt)[1], nmonths)
ldt$mon <- ldt$mon-rev(0:(nmonths-1))
ldt$mday <- 1
return(format( ldt , "%Y-%m-%d"))}
seq.prior.fdom(as.Date("2010-01-06"), 5)
#[1] "2009-09-01" "2009-10-01" "2009-11-01" "2009-12-01"
#[5] "2010-01-01"
I think the basic principle is clear (if not beaten to death with a canoe paddle.)

Resources