Using lapply with Dates and the minus function - r

I have a vector of dates and a vector of number-of-days.
dates <- seq(Sys.Date(), Sys.Date()+10, by='day')
number.of.days <- 1:4
I need to get a list with an entry for each number-of-days, where each entry in the list is the dates vector minus the corresponding number-of-days, ie.,
list(dates-1, dates-2, dates-3, dates-4)
The defintion of - (function (e1, e2) .Primitive("-")) indicates that its first and second arguments are e1 and e2, respectively. So the following should work.
lapply(number.of.days, `-`, e1=dates)
But it raises an error.
Error in -.Date(X[[i]], ...) : can only subtract from "Date" objects
Furthermore, the following does work:
lapply(number.of.days, function(e1, e2) e1 - e2, e1=dates)
Is this a feature or a bug?

You can use:
lapply(number.of.days, `-.Date`, e1=dates)
Part of the problem is - is a primitive which doesn't do argument matching. Notice how these are the same:
> `-`(e1=5, e2=3)
[1] 2
> `-`(e2=5, e1=3)
[1] 2
From R Language Definition:
This subsection applies to closures but not to primitive functions. The latter typically ignore tags and do positional matching, but their help pages should be consulted for exceptions, which include log, round, signif, rep and seq.int.
So in your case, you end up using dates as the second argument to - even though you attempt to specify it as the first. By using the "Date" method for -, which is not a primitive, we can get it to work.
So technically, the behavior you are seeing is a feature, or perhaps a "documented inconsistency". The part that could possibly considered a bug is that R will do a multiple dispatch to a "Date" method for - despite that method not supporting non-date arguments as the first argument:
> 1:4 - dates # dispatches to `-.Date` despite first argument not being date
Error in `-.Date`(1:4, dates) : can only subtract from "Date" objects

You might be better off using POSIXt dates. They're a bit more flexible, for example if you wanted to add a week or year. The equivalent answer to #BrodieG using lubridate functionality to work with POSIXt:
dates <- ymd(seq(Sys.Date(), Sys.Date()+10, by='day'))
number.of.days <- 1:4
list(dates-1, dates-2, dates-3, dates-4)
lapply(number.of.days, `-.POSIXt`, e1=dates)
Also, how's the fishing in Philly? :)

Related

Given a vector of any type, how can I get an NA value of the same class?

Suppose I want to do something like:
mask_values <- function(x, mask) ifelse(mask, x, NA)
The purpose of this function is to take a vector and replace some of its values with NA based on the value of mask. However, this function doesn't guarantee that the return type is always the same as the input x. For example:
date_vec <- rep(lubridate::today(), 10)
my_mask <- rep(c(TRUE, FALSE), length.out = 10)
class(mask_values(date_vec, my_mask))
which yields "numeric" rather than the desired "Date". So I try switching to dplyr::if_else, which is supposed to preserve types:
mask_values <- function(x, mask) dplyr::if_else(mask, x, NA)
class(mask_values(date_vec, my_mask))
However, if_else also requires the input types to be the same as each other, and NA has type "logical", which means I get this error:
Error: `false` must be a `Date` object, not a logical vector.
So it seems that if I want to use if_else in order to preserve the input type, I need to be able to obtain an NA value with the same class as the input. Is there a reliable way to do this for any class? One possibility seems to be x[NA], but I'm not sure if that is a universal solution or if it just happens to work with the examples that I've tested. You can assume that the only classes that matter are "vector-like" classes for which NA values exist, such as Date and POSIXct, as well as all the basic R data types (logical, character, numeric, etc.).
Alternatively, is there another way to implement my mask_values function such that the return value always has the same type as x?
I recommend avoiding ifelse whenever possible. It is quite inefficient and as you have seen also quirky regarding what it returns (although that is well documented). I rarely use it and, if I do use it, only for interactive use and not programmatically.
The canonical and safe way of setting values to NA in base R is is.na<-. (Note that it supports logical and positional indexing. mask could also be a numeric vector.)
mask_values <- function(x, mask) {
is.na(x) <- mask
x
}
#or simply this:
#mask_values <- `is.na<-`
#i.e., `is.na<-` is already what you want.
class(mask_values(date_vec, my_mask))
#[1] "Date"
Alternatively, you can also use simple subset-assignment. NA is a logical value. (If you create it like this. It can be coerced to other types and of course you can specify it as other types with NA_real_ etc.) If you assign a logical vector into any other vector, it will be coerced to that other vector's type (because "logical" is the most primitive type).
mask_values <- function(x, mask) {
x[mask] <- NA
x
}
class(mask_values(date_vec, my_mask))
#[1] "Date"
Btw., this subset-assignment is how the is.na<-.default method is defined.
I prefer doing subset-assignment explicitly in my code but occasionally the convenience function replace can be useful.

mpfr'izing a data.frame in R

I'm trying to convert a data.frame in R to mpfr format by multiplying by an mpfr unit constant. This works, as demonstrated in the code below, when applied to a column (result variable 'mpfr_col'), but for both approaches shown for working with a data.frame, it does not. The relevant errors for each attempt are listed in comment.
library(Rmpfr)
prec <- 256
m1 <- mpfr(1,prec)
col_build <- 1:10
test_df <- data.frame(col_build, col_build, col_build)
mpfr_col <- m1*(col_build)
mpfr_df <- m1*test_df # (list) object cannot be coerced to type 'double'
for(colnum in 1:length(colnames(test_df))){
test_df[,colnum] <- m1*test_df[,colnum] # attempt to replicate an object of type 'S4'
}
Answer:
Use [[colnum]] to access the columns instead of [,colnum]:
for(colnum in length(colnames(test_df))){
test_df[[colnum]] <- m1*test_df[[colnum]]
}
(Note: the print method of data.frame will fail, but the 'mpfr-izing' work. You can print it either by printing the columns individually or using as_tibble(test_df).
Explanation
The original fails because the [,colnum] assignment doesn't coerce the argument, I think. Using [[ returns an element (aka a column) of the list (aka the data.frame).
See this bit of Hadley Wickham's Advanced R book:
[ selects sub-lists. It always returns a list; if you use it with a
single positive integer, it returns a list of length one. [[ selects
an element within a list. $ is a convenient shorthand: x$y is
equivalent to x[["y"]].
And the help from Extract.data.frame {base}:
When [ and [[ are used to add or replace a whole column, no coercion
takes place but value will be replicated (by calling the generic
function rep) to the right length if an exact number of repeats can be
used.

Idioms for enabling type dispatching

There's a few questions here, I would be satisfied if any one of them was answered sufficiently well.
Background - what is the end goal?
I am interested in representing a date-range in R. Bare-minimum requirement is that we represent a start and end date, which can easily be done using a length-two date vector. Additionally, it would be nice to extend this object into a Class which further
supplies a name to each range (i.e. a character string)
enables the (easy) use of dplyr::between operator
Shortcomings of my previous approach
I've previously represented each range as a length-two date vector. The upside here is that I don't rely on any external dependencies and my data structure is so lightweight that it's not a hassle to program with. The downside is that I'm tired of having to access the beg and end of the date range via the [ operator and arguments 1 and 2 respectively (arguably less interpretable than if we had a class implementation).
Also, we ultimately deal with a sequence of date-ranges (i.e. a vector), and so abstracting away the DateRange is helpful before we start nesting data structures. I do not want to use a list of length-two date vectors nor do I wish to use a data.frame with two rows, each column being interpreted as a date-range.
Where have I looked?
I've looked at lubridate package and have considered inheriting from a Interval class. The downside to starting with this inheritance is that I don't think S4 is necessary for my use case. I just need a few simple data attributes and a nice API for calling dplyr::between.
An ideal solution might just extend the lubridate::Interval class to hold a name, an end date (could be a method as this info already stored in Interval via #start + #.Data), and extend dplyr::between to play nicely with said class.
What have I tried?
Here's a rough implementation of what I'm looking for:
# 3 key attributes: beg, end, and name.
MyInterval <- function(beg, end, name = NULL) {
if (class(beg) == "character") beg <- as.Date(beg)
if (class(end) == "character") end <- as.Date(end)
if (is.null(name)) name <- as.character(beg)
structure(.Data = list('beg' = beg, 'end' = end, 'name' = name), class = "MyInterval")
}
Now, I would like to be able to overload the between operator such that I may call it as follows: between(x, MyInterval), where we notice that dplyr::between(x, lo, hi) expects three arguments. To try and accomplish this, I've tried to set up type dispatching as follows:
between <- function(...) UseMethod('between')
between.MyInterval <- function(interval, x) {
if (class(x) == "character") x <- as.Date(x)
dplyr::between(x, interval$beg, interval$end)
}
between.default <- function(x, lo, hi) dplyr::between(x, lo, hi)
The reason I chose to use ... in the prototype for between is that the order of arguments currently differ between between.MyInterval and between.default. Is there a better way to code this up? I believe the behavior is as desired (to within a first glance)
i <- MyInterval("2012-01-01", "2012-12-31")
between(i, "2012-02-01") # Dispatches to between.MyInterval. Returns True as expected.
between(150, 100, 200) # Dispatches to dplyr::between. Good, we didn't break anything?
Thank you
Any criticisms are welcomed. I know that between is a function that doesn't do type-dispatching out of the box, and so implementing this myself raises a code smell.
A possibility is to use data.table's inrange-function.
First, let's make an interval:
my.interval <- function(beg, end) data.table(beg = as.Date(beg), end = as.Date(end))
mi <- my.interval("2012-01-01", "2012-12-31")
Now you can do:
> as.Date("2012-02-01") %inrange% mi
[1] TRUE
Or define you own inrange-function:
my.inrange <- function(x, intv) data.table::inrange(as.Date(x), intv$beg, intv$end)
With that you can do:
> my.inrange("2012-02-01", mi)
[1] TRUE
As #Frank commented, you can make an infix variant of my.inrange too:
`%my.inrange%` <- my.inrange
now you can use it in the following notation as well:
"2012-02-01" %my.inrange% mi
Which is similar to the infix notation of data.table's between and inrange functions.

expression vs call

What is the difference between an expression and a call?
For instance:
func <- expression(2*x*y + x^2)
funcDx <- D(func, 'x')
Then:
> class(func)
[1] "expression"
> class(funcDx)
[1] "call"
Calling eval with envir list works on both of them. But Im curious what is the difference between the two class, and under what circumstances should I use expression or call.
You should use expression when you want its capacity to hold more than one expression or call. It really returns an "expression list". The usual situation for the casual user of R is in forming arguments to ploting functions where the task is forming symbolic expressions for labels. R expression-lists are lists with potentially many items, while calls never are such. It's interesting that #hadley's Advanced R Programming suggests "you'll never need to use [the expression function]": http://adv-r.had.co.nz/Expressions.html. Parenthetically, the bquote function is highly useful, but has the limitation that it does not act on more than one expression at a time. I recently hacked a response to such a problem about parsing expressions and got the check, but I thought #mnel's answer was better: R selectively style plot axis labels
The strategy of passing an expression to the evaluator with eval( expr, envir= < a named environment or list>) is essentially another route to what function is doing. A big difference between expression and call (the functions) is that the latter expects a character object and will evaluate it by looking for a named function in the symbol table.
When you say that processing both with the eval "works", you are not saying it produces the same results, right? The D function (call) has additional arguments that get substituted and restrict and modify the result. On the other hand evaluation of the expression-object substitutes the values into the symbols.
There seem to be "levels of evaluation":
expression(mean(1:10))
# expression(mean(1:10))
call("mean" , (1:10))
# mean(1:10)
eval(expression(mean(1:10)))
# [1] 5.5
eval(call("mean" , (1:10)))
# [1] 5.5
One might have expected eval(expression(mean(1:10))) to return just the next level of returning a call object but it continues to parse the expression tree and evaluate the results. In order to get just the unevaluated function call to mean, I needed to insert a quote:
eval(expression(quote(mean(1:10))))
# mean(1:10)
From the documentation (?expression):
...an R expression vector is a list of calls, symbols etc, for example as returned by parse.
Notice:
R> class(func[[1]])
[1] "call"
When given an expression, D acts on the first call. If func were simply a call, D would work the same.
R> func2 <- substitute(2 * x * y + x^2)
R> class(func2)
[1] "call"
R> D(func2, 'x')
2 * y + 2 * x
Sometimes for the sake of consistency, you might need to treat both as expressions.
in this case as.expression comes in handy:
func <- expression(2*x*y + x^2)
funcDx <- as.expression(D(func, 'x'))
> class(func)
[1] "expression"
> class(funcDx)
[1] "expression"

Convert character vector to numeric vector in R for value assignment?

I have:
z = data.frame(x1=a, x2=b, x3=c, etc)
I am trying to do:
for (i in 1:10)
{
paste(c('N'),i,sep="") -> paste(c('z$x'),i,sep="")
}
Problems:
paste(c('z$x'),i,sep="") yields "z$x1", "z$x1" instead of calling the actual values. I need the expression to be evaluated. I tried as.numeric, eval. Neither seemed to work.
paste(c('N'),i,sep="") yields "N1", "N2". I need the expression to be merely used as name. If I try to assign it a value such as paste(c('N'),5,sep="") -> 5, ie "N5" -> 5 instead of N5 -> 5, I get target of assignment expands to non-language object.
This task is pretty trivial since I can simply do:
N1 = x1...
N2 = x2...
etc, but I want to learn something new
I'd suggest using something like for( i in 1:10 ) z[,i] <- N[,i]...
BUT, since you said you want to learn something new, you can play around with parse and substitute.
NOTE: these little tools are funny, but experienced users (not me) avoid them.
This is called "computing on the language". It's very interesting, and it helps understanding the way R works. Let me try to give an intro:
The basic language construct is a constant, like a numeric or character vector. It is trivial because it is not different from its "unevaluated" version, but it is one of the building blocks for more complicated expressions.
The (officially) basic language object is the symbol, also known as a name. It's nothing but a pointer to another object, i.e., a token that identifies another object which may or may not exist. For instance, if you run x <- 10, then x is a symbol that refers to the value 10. In other words, evaluating the symbol x yields the numeric vector 10. Evaluating a non-existant symbol yields an error.
A symbol looks like a character string, but it is not. You can turn a string into a symbol with as.symbol("x").
The next language object is the call. This is a recursive object, implemented as a list whose elements are either constants, symbols, or another calls. The first element must not be a constant, because it must evaluate to the real function that will be called. The other elements are the arguments to this function.
If the first argument does not evaluate to an existing function, R will throw either Error: attempt to apply non-function or Error: could not find function "x" (if the first argument is a symbol that is undefined or points to something other than a function).
Example: the code line f(x, y+z, 2) will be parsed as a list of 4 elements, the first being f (as a symbol), the second being x (another symbol), the third another call, and the fourth a numeric constant. The third element y+z, is just a function with two arguments, so it parses as a list of three names: '+', y and z.
Finally, there is also the expression object, that is a list of calls/symbols/constants, that are meant to be evaluated one by one.
You'll find lots of information here:
https://github.com/hadley/devtools/wiki/Computing-on-the-language
OK, now let's get back to your question :-)
What you have tried does not work because the output of paste is a character string, and the assignment function expects as its first argument something that evaluates to a symbol, to be either created or modified. Alternativelly, the first argument can also evaluate to a call associated with a replacement function. These are a little trickier, but they are handled by the assignment function itself, not by the parser.
The error message you see, target of assignment expands to non-language object, is triggered by the assignment function, precisely because your target evaluates to a string.
We can fix that building up a call that has the symbols you want in the right places. The most "brute force" method is to put everything inside a string and use parse:
parse(text=paste('N',i," -> ",'z$x',i,sep=""))
Another way to get there is to use substitute:
substitute(x -> y, list(x=as.symbol(paste("N",i,sep="")), y=substitute(z$w, list(w=paste("x",i,sep="")))))
the inner substitute creates the calls z$x1, z$x2 etc. The outer substitute puts this call as the taget of the assignment, and the symbols N1, N2 etc as the values.
parse results in an expression, and substitute in a call. Both can be passed to eval to get the same result.
Just one final note: I repeat that all this is intended as a didactic example, to help understanding the inner workings of the language, but it is far from good programming practice to use parse and substitute, except when there is really no alternative.
A data.frame is a named list. It usually good practice, and idiomatically R-ish not to have lots of objects in the global environment, but to have related (or similar) objects in lists and to use lapply etc.
You could use list2env to multiassign the named elements of your list (the columns in your data.frame) to the global environment
DD <- data.frame(x = 1:3, y = letters[1:3], z = 3:1)
list2env(DD, envir = parent.frame())
## <environment: R_GlobalEnv>
## ta da, x, y and z now exist within the global environment
x
## [1] 1 2 3
y
## [1] a b c
## Levels: a b c
z
## [1] 3 2 1
I am not exactly sure what you are trying to accomplish. But here is a guess:
### Create a data.frame using the alphabet
data <- data.frame(x = 'a', y = 'b', z = 'c')
### Create a numerical index corresponding to the letter position in the alphabet
index <- which(tolower(letters[1:26]) == data[1, ])
### Use an 'lapply' to apply a function to every element in 'index'; creates a list
val <- lapply(index, function(x) {
paste('N', x, sep = '')
})
### Assign names to our list
names(val) <- names(data)
### Observe the result
val$x

Resources