using sapply on a vector of dates

using sapply on a vector of dates - r

I have a function that adds a random integer to a date:
rand_to_date = function(date){
newdate = as.Date(date) + sample(1:30, 1)
return(as.Date(newdate))
}
which works fine. However, if I attempt to use sapply to apply this function to a vector of dates, e.g.
test_dates = c('2001-01-01', '2002-02-02', '2003-03-03', '2004-04-04')
sapply will not return a vector of output in date format:
sapply(test_dates,rand_to_date)
2001-01-01 2002-02-02 2003-03-03 2004-04-04
11329 11748 12115 12513
In contrast, lapply will return a list of dates. However, applying unlist to this output once again gives me a vector of numbers rather than dates. Nor does
sapply(sapply(test,rand_to_date), as.Date)
work. What's the simplest way for me to give a vector of these randomized dates as output?

1) Owing to the existence of the c.Date method, use lapply and then c. We have also simplified rand_to_date and added set.seed to make it reproducible:
rand_to_date <- function(date) as.Date(date) + sample(30, 1)
set.seed(123)
test_dates <- c('2001-01-01', '2002-02-02', '2003-03-03', '2004-04-04')
do.call("c", lapply(test_dates, rand_to_date))
## [1] "2001-01-10" "2002-02-26" "2003-03-16" "2004-05-01"
2) Alternately, we could make rand_to_date vectorized right off like this:
rand_to_date <- function(date) as.Date(date) + sample(30, length(date), TRUE)
set.seed(123)
test_dates <- c('2001-01-01', '2002-02-02', '2003-03-03', '2004-04-04')
rand_to_date(test_dates)
## [1] "2001-01-10" "2002-02-26" "2003-03-16" "2004-05-01"

Unfortunately sapply discards attributes, including the S3 class — this is unrelated to your function; sapply(test_dates, as.Date) fails in the same way.
You need to add them again:
structure(sapply(test_dates,rand_to_date), 'Date')

You can just convert the number back to a date.
as.Date(sapply(test_dates, rand_to_date), origin = "1970-01-01")

You can use as.character() within sapply :
sapply(test_dates,function(v) as.character(rand_to_date(v)),USE.NAMES = F)
where as.character(rand_to_date(v)) gives date as character, instead of POSIXct type.

Related

How to make a function(x) accept x as a column name?

I have the following setup:
mydata:
today_date
r1 11.11.21
r2 11.11.21
r3 11.11.21
I want to convert column like 'today_date' to a date using
as.Date(today_date,tryFormats = c("%d.%m.%Y")).
So I'm using the following function, which is supposed to change the corresponding column to proper dates:
myfun <- function(x){
x<- as.Date(x, tryFormats = c("%d.%m.%Y"))
}
In this function x is representing a variable corresponding to: mydata$today_date
Sadly, x is properly representing the object that's to be replaced, so instead of:
myfun(mydata$today_date)
I still have to use:
mydata$today_date<- myfun(mydata$today_date)
How can I manipulate the function so the as.Date()-functionality is directly applied? I'm pretty certain that the variable in myfun(x) is not properly able to represent the subsection of my dataframe that I want to change. Any help is very welcome!

Try doing this.
df <- data.frame(today_date = c("11.11.21","11.11.21","11.11.21"))
myfun <- function(df, var = 'today_date'){
df[[var]] <- as.Date(df[[var]], tryFormats = c("%d.%m.%Y"))
return(df)
}
The output is
> myfun(df, "today_date")
today_date
1 0021-11-11
2 0021-11-11
3 0021-11-11

I like the magrittr assignment pipe syntax for this.
library(magrittr)
mydata$today_date %<>% myfun()
Instead of mydata$today_date<- myfun(mydata$today_date)

create lag variable of xts object using $ vs. [] notation

I am trying to create a lagged vector within an xts object using the lag function. It works when defining the new vector within the xts object using $ notation (e.g. x.ts$r1_lag), but it does when defining the new variable using square brackets, i.e. xts[,"r1_lag"]. See code below:
library(xts)
x <- data.frame(date=seq(as.Date('2015-01-01'), by='days', length=100),
runif(1e2), runif(1e2), runif(1e2))
colnames(x) <- c("date", "r1", "r2", "r3")
#the following command works
x.ts <- xts(x, order.by=x$date)
x.ts$r1_lag <- lag(x.ts$r1)
# but the following does not (says subscript is out of bounds)
x.ts <- xts(x, order.by=x$date)
x.ts[,"r1_lag"] <- lag(x.ts[,"r1"])
I need to use [] notation rather than $ notation to reference the vectors because if I want to run the lag transformation on vectors in more than one xts object (vectors within a list of multiple xts objects), I can't define the new vectors within the objects using $ notation, i.e. I cant define the new vectors using the notation in the below stylized loop:
for (i in letters) {
for (j in variables) {
macro.set.ts$i$paste(j,"_L1",sep="") <- lag(macro.set.ts[[i]][,j])
macro.set.ts$i$paste(j,"_L2",sep="") <- lag(macro.set.ts[[i]][,j], 2)
macro.set.ts$i$paste(j,"_L4",sep="") <- lag(macro.set.ts[[i]][,j], 4)
}
}
Thanks!

You don't need to use [<-.xts. You can use merge instead:
for (i in letters) {
for (j in variables) {
# create all lags
mst_ij <- macro.set.ts[[i]][,j]
jL <- merge(lag(mst_ij), lag(mst_ij, 2), lag(mst_ij, 4))
colnames(jL) <- paste(j, c("L1","L2","L4"), sep="_")
# merge back with original data
macro.set.ts[[i]] <- merge(macro.set.ts[[i]], jL)
}
}

The error is not related to lag function. You get an error because you try assign an xts object with another xts object. This example reproduces the error :
x.date= seq(as.Date('2015-01-01'),
by = 'days' , length = 5)
x1 <- xts(data.frame(c1=runif(5)), order.by=x.date)
x2 <- xts(data.frame(c2=runif(5)), order.by=x.date)
x1[,'r2'] <- x2
## Error in `[<-.default`(`*tmp*`, , "r2",
## subscript out of bounds
I find this is coherent within xts logic, because xts are indexed objects. So it is better here to merge objects or join and conserve the indexed nature of your time series.
merge(x1,x2)
This will cbind the 2 times series and fix any index problem. in fact, cbind is just a merge:
identical(cbind(x1,x2),merge(x1,x2)
That's said I think it is a kind of bug that this works for $<- operator and not with [<- operator.

I got the same output with:
x.ts <- cbind(x.ts,lag(x.ts[,"r1"]))
And
x.ts <- transform(x.ts, r1_lag = lag(x.ts[,'r1']))
But, be careful with the output. It may look the same but with an altered structure.

This should work:
x.ts <- merge(x.ts,lag(x.ts[,"r1"]))
You will then probably want to rename the last column that was added:
dimnames(x.ts)[[2]][5] <- "r1_lag"
This is the result:
> head(x.ts)
date r1 r2 r3 r1_lag
2015-01-01 "2015-01-01" "0.23171030" "0.44174424" "0.3396816640" NA
2015-01-02 "2015-01-02" "0.97292220" "0.74909452" "0.2793033421" "0.23171030"
2015-01-03 "2015-01-03" "0.52320743" "0.49288463" "0.0193637393" "0.97292220"
2015-01-04 "2015-01-04" "0.36574297" "0.69571803" "0.6411834760" "0.52320743"
2015-01-05 "2015-01-05" "0.37563137" "0.13841216" "0.3087215754" "0.36574297"
2015-01-06 "2015-01-06" "0.48089356" "0.32702759" "0.3967609401" "0.37563137"
> class(x.ts)
[1] "xts" "zoo"
Hope this helps.

Convert binary vector to decimal

I have a vector of a binary string:
a<-c(0,0,0,1,0,1)
I would like to convert this vector into decimal.
I tried using the compositions package and the unbinary() function, however, this solution and also most others that I have found on this site require g-adic string as input argument.
My question is how can I convert a vector rather than a string to decimal?
to illustrate the problem:
library(compositions)
unbinary("000101")
[1] 5
This gives the correct solution, but:
unbinary(a)
unbinary("a")
unbinary(toString(a))
produces NA.

You could try this function
bitsToInt<-function(x) {
packBits(rev(c(rep(FALSE, 32-length(x)%%32), as.logical(x))), "integer")
}
a <- c(0,0,0,1,0,1)
bitsToInt(a)
# [1] 5
here we skip the character conversion. This only uses base functions.
It is likely that
unbinary(paste(a, collapse=""))
would have worked should you still want to use that function.

There is a one-liner solution:
Reduce(function(x,y) x*2+y, a)
Explanation:
Expanding the application of Reduce results in something like:
Reduce(function(x,y) x*2+y, c(0,1,0,1,0)) = (((0*2 + 1)*2 + 0)*2 + 1)*2 + 0 = 10
With each new bit coming next, we double the so far accumulated value and add afterwards the next bit to it.
Please also see the description of Reduce() function.

If you'd like to stick to using compositions, just convert your vector to a string:
library(compositions)
a <- c(0,0,0,1,0,1)
achar <- paste(a,collapse="")
unbinary(achar)
[1] 5

This function will do the trick.
bintodec <- function(y) {
# find the decimal number corresponding to binary sequence 'y'
if (! (all(y %in% c(0,1)))) stop("not a binary sequence")
res <- sum(y*2^((length(y):1) - 1))
return(res)
}

Find first Tuesday of Month

I am trying to write a function which takes a vector of dates as an input and returns a vector of dates -- where the output is the date of the first Tuesday of the month which matches the input date.
So 2012-11-19 --> 2012-11-06, etc.
I have had some success with a single date, but have not been able to generalise to the vector case. Could someone please help?
This is what I have so far:
firstTuesday <- function(tt){
ct <- as.POSIXct(tt)
lt <- as.POSIXlt(tt)
firstOf <- as.POSIXlt(ct - 60*60*24* (lt$mday - 1))
if (firstOf$wday > 2)
{
adjDays <- (9 - firstOf$wday)
firstTues <- as.POSIXlt(as.POSIXct(firstOf) + 60*60*24*adjDays)
}
else {
adjDays <- (2 - firstOf$wday)
firstTues <- as.POSIXlt(as.POSIXct(firstOf) + 60*60*24*adjDays)
}
return(firstTues)
}
Which works for a single date: firstTuesday(Sys.Date()) but yielded junk for vectors of dates (due to issues with if not being a vectorised control operator, i think).
I got around my limited understanding by using indexing. The following code seems to do the trick.
firstTuesday <- function(tt){
ct <- as.POSIXct(tt)
lt <- as.POSIXlt(tt)
firstOf <- as.POSIXlt(ct - 60*60*24* (lt$mday - 1))
firstTue <- as.POSIXct(firstOf)
idx <- firstOf$wday > 2
firstTue[idx] <- as.POSIXct(firstOf[idx]) + 60*60*24*(9 - firstOf$wday[idx])
firstTue[!idx] <- as.POSIXct(firstOf[!idx]) + 60*60*24*(2 - firstOf$wday[!idx])
return(firstTue)
}

This uses lubridate and makes the logic a little simpler. Given a vector of dates the second function will return a vector of characters, similar to your input. You can change things around to suit your needs.
library(lubridate)
getTuesday = function(x) {
date = ymd(x)
first = floor_date(date,"month")
dow = sapply(seq(0,6),function(x) wday(first+days(x)))
firstTuesday = first + days(which(dow==3)-1)
return(firstTuesday)
}
getMultipleTuesdays = function(y) {
tmp = lapply(y, getTuesday)
tmp = lapply(tmp, as.character)
return(unlist(tmp))
}
Edit
Sample input/output
getMultipleTuesdays(c("2012-11-19","2012-11-19","2011-01-15"))
[1] "2012-11-06" "2012-11-06" "2011-01-04"

Here's a simple solution using base functions:
firstDayOfMonth <- function(dates, day="Mon", abbreviate=TRUE) {
# first 7 days of month
s <- lapply(as.Date(format(dates,"%Y-%m-01")), seq, by="day", length.out=7)
# first day of month
d <- lapply(s, function(d) d[weekdays(d,abbreviate)==day])
# unlist converts to atomic, so use do.call(c,...) instead
do.call(c, d)
}
Well, maybe the do.call at the end isn't so simple... but it's a handy piece of knowledge. :)
R> d <- as.Date(c("2012-11-19","2012-11-19","2011-01-15"))
R> firstDayOfMonth(d, "Tuesday", FALSE)
[1] "2012-11-06" "2012-11-06" "2011-01-04"

Why do I have to set as.Date origin again after using ifelse? Is there a better way?

The following function does work, but the last as.Date part was more or less an result of trial and error that do not understand fully.
### This function creates a real date column out of year / period that is saved in
### in separate columns, plus it handles a 13th period in case of overlapping period
### terminology. Turns quarters into months.
realDate <- function (table,year="year_col",period="period_col"){
if (is.character(table) == TRUE)
{
dframe <- get(table)
}
else{
dframe <- table
}
x <- expression({resDate <- with(dframe,
as.Date(paste(get(year),"-",
ifelse(get(period) > 9, get(period),
paste("0", get(period), sep = "")),
"-01", sep = "")))
})
y <- expression({resDate <- with(dframe,as.Date(paste(get(year) + 1,"-","01","-01",sep="")))})
#### I do not get this? Why do I have to do this?
a <- ifelse(get(period) == 13,eval(y),eval(x))
a <-as.Date(a, origin="1970-01-01")
return(a)
}
Instead I tried to do it like this (because it was more intuitively to me):
{ ....
ifelse(get(period) == 13,eval(y),eval(x))
return(resDate)
}
This returned the corrected values whenever the condition was FALSE (no) but returned NA if the condition was TRUE (yes). Why is that? And if I use the function above, why do I have to define the origin again? Why I even have call as.Date again?
EDIT:
a <- rep(2002:2010,2)
b <- rep(1:13,2)
d<-cbind(a,b[1:length(a)])
names(d) <- c("year_col","period_col")
P.S.:
I found this thread on vectorized ifelse.

Your construct is "interesting" at least. To start with, neither x nor y gives output. I wonder why you use an assignment in your eval(). this gives you a resDate vector that is exactly what the last call has been. And that is not dependent on the condition, it's the last one written (eval(x) in your case). They get executed before the ifelse clause is executed.
Plus, the output you get is the numeric representation of your data, not the data object. That is in resDate. I guess that ifelse cannot determine the class of the output vector as you use the eval() inside. I'm surprised you get output at all, in fact you're effectively using something that could be called a "bug" in R (Microsoft would call it a feature :-) ).
Your mistake is in your ifelse : get(period) doesn't exist. it should be get(period, dframe). Then it works. The only reason why it works on your computer, is because you have a period in your workspace presumably. Classis problem when debugging.
In any case, I'd make it:
realDate <- function (table,year="year_col",period="period_col"){
if (is.character(table)){ # is.character(table) returns a boolean already.
dframe <- get(table)
} else {
dframe <- table
}
year <- get(year,dframe)
period <- get(period,dframe)
year[period==13] <- year[period==13]+1
period[period==13] <- 1
as.Date(paste(year,"-",period,"-01",sep=""))
}
This is quite a bit faster than your own, has less pitfalls and conversions, and is more the R way of doing it. You could change year[...] and period [...] by ifelse constructs, but using indices is generally faster.
EDIT :
This is easier for the data generation:
dframe <- data.frame(
year_col= rep(2006:2007,each=13),
period_col = rep(1:13,2)
)
realDate(dframe)
[1] "2006-01-01" "2006-02-01" "2006-03-01" "2006-04-01" "2006-05-01"
"2006-06-01" "2006-07-01" "2006-08-01" "2006-09-01"
[10] "2006-10-01" "2006-11-01" "2006-12-01" "2007-01-01" "2007-01-01"
"2007-02-01" "2007-03-01" "2007-04-01" "2007-05-01"
[19] "2007-06-01" "2007-07-01" "2007-08-01" "2007-09-01"
"2007-10-01" "2007-11-01" "2007-12-01" "2008-01-01"

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

using sapply on a vector of dates - r

Unfortunately sapply discards attributes, including the S3 class — this is unrelated to your function; sapply(test_dates, as.Date) fails in the same way. You need to add them again: structure(sapply(test_dates,rand_to_date), 'Date')

You can just convert the number back to a date. as.Date(sapply(test_dates, rand_to_date), origin = "1970-01-01")

You can use as.character() within sapply : sapply(test_dates,function(v) as.character(rand_to_date(v)),USE.NAMES = F) where as.character(rand_to_date(v)) gives date as character, instead of POSIXct type.

Related

How to make a function(x) accept x as a column name?

create lag variable of xts object using $ vs. [] notation

Convert binary vector to decimal

Find first Tuesday of Month

Why do I have to set as.Date origin again after using ifelse? Is there a better way?

Categories

Resources