How to coerce an expression to a class known only at runtime? - r

R has many means of coercion: as.logical, as.integer, as.data.frame, etc.
Using these functions, however, requires knowing in advance ("at compile time") the target class (logical, integer, data.frame, etc.).
But in the problem I'm working on, I need to coerce expr1 to class(expr2), where expr1 and expr2 are arbitrary R expressions whose classes are not known until runtime.
I don't expect that R has any built-in support for this exact use-case, but I hope that it does for the one of coercing an arbitrary expression to an arbitrary class. IOW, I hope that R already has something like coerce(expr, a.class)1.
I know that I can roll my own coerce, like this:
coerce <- function (expr, a.class) {
## coerce expr to a.class
switch(a.class,
character = as.character(expr),
logical = as.logical(expr),
integer = as.integer(expr),
numeric = as.numeric(expr),
# yadda-yadda-yadda
stop(sprintf('unsupported class: %s', a.class)))
}
> coerce(5, "foo")
[1] "5"
Also, I know that one can also resort to this:
coerce2 <- function (expr, a.class) {
## coerce expr to a.class; take 2
rcode <- sprintf("as.%s(expr)", a.class)
tryCatch(eval(parse(text=rcode)),
error = function (condtion) {
stop(sprintf('unsupported class: %s', a.class))
})
}
> coerce2(5, data.frame())
x
1 5
...but I hope that R already supports something more robust and powerful than this.
1 My assumption is that I could then address the use-case I'm actually interested in with an expression of the form coerce(expr1, class(expr2)). I think this is a reasonable assumption, but in case it turns out not to hold, I thought I'd spell all this out.

Related

Idioms for enabling type dispatching

There's a few questions here, I would be satisfied if any one of them was answered sufficiently well.
Background - what is the end goal?
I am interested in representing a date-range in R. Bare-minimum requirement is that we represent a start and end date, which can easily be done using a length-two date vector. Additionally, it would be nice to extend this object into a Class which further
supplies a name to each range (i.e. a character string)
enables the (easy) use of dplyr::between operator
Shortcomings of my previous approach
I've previously represented each range as a length-two date vector. The upside here is that I don't rely on any external dependencies and my data structure is so lightweight that it's not a hassle to program with. The downside is that I'm tired of having to access the beg and end of the date range via the [ operator and arguments 1 and 2 respectively (arguably less interpretable than if we had a class implementation).
Also, we ultimately deal with a sequence of date-ranges (i.e. a vector), and so abstracting away the DateRange is helpful before we start nesting data structures. I do not want to use a list of length-two date vectors nor do I wish to use a data.frame with two rows, each column being interpreted as a date-range.
Where have I looked?
I've looked at lubridate package and have considered inheriting from a Interval class. The downside to starting with this inheritance is that I don't think S4 is necessary for my use case. I just need a few simple data attributes and a nice API for calling dplyr::between.
An ideal solution might just extend the lubridate::Interval class to hold a name, an end date (could be a method as this info already stored in Interval via #start + #.Data), and extend dplyr::between to play nicely with said class.
What have I tried?
Here's a rough implementation of what I'm looking for:
# 3 key attributes: beg, end, and name.
MyInterval <- function(beg, end, name = NULL) {
if (class(beg) == "character") beg <- as.Date(beg)
if (class(end) == "character") end <- as.Date(end)
if (is.null(name)) name <- as.character(beg)
structure(.Data = list('beg' = beg, 'end' = end, 'name' = name), class = "MyInterval")
}
Now, I would like to be able to overload the between operator such that I may call it as follows: between(x, MyInterval), where we notice that dplyr::between(x, lo, hi) expects three arguments. To try and accomplish this, I've tried to set up type dispatching as follows:
between <- function(...) UseMethod('between')
between.MyInterval <- function(interval, x) {
if (class(x) == "character") x <- as.Date(x)
dplyr::between(x, interval$beg, interval$end)
}
between.default <- function(x, lo, hi) dplyr::between(x, lo, hi)
The reason I chose to use ... in the prototype for between is that the order of arguments currently differ between between.MyInterval and between.default. Is there a better way to code this up? I believe the behavior is as desired (to within a first glance)
i <- MyInterval("2012-01-01", "2012-12-31")
between(i, "2012-02-01") # Dispatches to between.MyInterval. Returns True as expected.
between(150, 100, 200) # Dispatches to dplyr::between. Good, we didn't break anything?
Thank you
Any criticisms are welcomed. I know that between is a function that doesn't do type-dispatching out of the box, and so implementing this myself raises a code smell.
A possibility is to use data.table's inrange-function.
First, let's make an interval:
my.interval <- function(beg, end) data.table(beg = as.Date(beg), end = as.Date(end))
mi <- my.interval("2012-01-01", "2012-12-31")
Now you can do:
> as.Date("2012-02-01") %inrange% mi
[1] TRUE
Or define you own inrange-function:
my.inrange <- function(x, intv) data.table::inrange(as.Date(x), intv$beg, intv$end)
With that you can do:
> my.inrange("2012-02-01", mi)
[1] TRUE
As #Frank commented, you can make an infix variant of my.inrange too:
`%my.inrange%` <- my.inrange
now you can use it in the following notation as well:
"2012-02-01" %my.inrange% mi
Which is similar to the infix notation of data.table's between and inrange functions.

'=' vs. '<-' as a function argument in R

I am a beginner so I'd appreciate any thoughts, and I understand that this question might be too basic for some of you.
Also, this question is not about the difference between <- and =, but about the way they get evaluated when they are part of the function argument. I read this thread, Assignment operators in R: '=' and '<-' and several others, but I couldn't understand the difference.
Here's the first line of code:
My objective is to get rid of variables in the environment. From reading the above thread, I would believe that <- would exist in the user workspace, so there shouldn't be any issue with deleting all variables.
Here is my code and two questions:
Question 1
First off, this code doesn't work.
rm(ls()) #throws an error
I believe this happens because ls() returns a character vector, and rm() expects an object name. Am I correct? If so, I would appreciate if someone could guide me how to get object names from character array.
Question 2
I googled this topic and found that this code below deletes all variables.
rm(list = ls())
While this does help me, I am unsure why = is used instead of <-. If I run the following code, I get an error Error in rm(list <- ls()) : ... must contain names or character strings
rm(list <- ls())
Why is this? Can someone please guide me? I'd appreciate any help/guidance.
I read this thread, Assignment operators in R: '=' and '<-' and several others, but I couldn't understand the difference.
No wonder, since the answers there are actually quite confusing, and some are outright wrong. Since that’s the case, let’s first establish the difference between them before diving into your actual question (which, it turns out, is mostly unrelated):
<- is an assignment operator
In R, <- is an operator that performs assignment from right to left, in the current scope. That’s it.
= is either an assignment operator or a distinct syntactic token
=, by contrast, has several meanings: its semantics change depending on the syntactic context it is used in:
If = is used inside a parameter list, immediately to the right of a parameter name, then its meaning is: “associate the value on the right with the parameter name on the left”.
Otherwise (i.e. in all other situations), = is also an operator, and by default has the same meaning as <-: i.e. it performs assignment in the current scope.
As a consequence of this, the operators <- and = can be used interchangeably1. However, = has an additional syntactic role in an argument list of a function definition or a function call. In this context it’s not an operator and cannot be replaced by <-.
So all these statements are equivalent:
x <- 1
x = 1
x[5] <- 1
x[5] = 1
(x <- 1)
(x = 1)
f((x <- 5))
f((x = 5))
Note the extra parentheses in the last example: if we omitted these, then f(x = 5) would be interpreted as a parameter association rather than an assignment.
With that out of the way, let’s turn to your first question:
When calling rm(ls()), you are passing ls() to rm as the ... parameter. Ronak’s answer explains this in more detail.
Your second question should be answered by my explanation above: <- and = behave differently in this context because the syntactic usage dictates that rm(list = ls()) associates ls() with the named parameter list, whereas <- is (as always) an assignment operator. The result of that assignment is then once again passed as the ... parameter.
1 Unless somebody changed their meaning: operators, like all other functions in R, can be overwritten with new definitions.
To expand on my comment slightly, consider this example:
> foo <- function(a,b) b+1
> foo(1,b <- 2) # Works
[1] 3
> ls()
[1] "b" "foo"
> foo(b <- 3) # Doesn't work
Error in foo(b <- 3) : argument "b" is missing, with no default
The ... argument has some special stuff going on that restricts things a little further in the OP's case, but this illustrates the issue with how R is parsing the function arguments.
Specifically, when R looks for named arguments, it looks specifically for arg = val, with an equals sign. Otherwise, it is parsing the arguments positionally. So when you omit the first argument, a, and just do b <- 1, it thinks the expression b <- 1 is what you are passing for the argument a.
If you check ?rm
rm(..., list = character(),pos = -1,envir = as.environment(pos), inherits = FALSE)
where ,
... - the objects to be removed, as names (unquoted) or character strings (quoted).
and
list - a character vector naming objects to be removed.
So, if you do
a <- 5
and then
rm(a)
it will remove the a from the global environment.
Further , if there are multiple objects you want to remove,
a <- 5
b <- 10
rm(a, b)
This can also be written as
rm(... = a, b)
where we are specifying that the ... part in syntax takes the arguments a and b
Similarly, when we want to specify the list part of the syntax, it has to be given by
rm(list = ls())
doing list <- ls() will store all the variables from ls() in the variable named list
list <- ls()
list
#[1] "a" "b" "list"
I hope this is helpful.

What's the real meaning about 'Everything that exists is an object' in R?

I saw:
“To understand computations in R, two slogans are helpful:
• Everything that exists is an object.
• Everything that happens is a function call."
— John Chambers
But I just found:
a <- 2
is.object(a)
# FALSE
Actually, if a variable is a pure base type, it's result is.object() would be FALSE. So it should not be an object.
So what's the real meaning about 'Everything that exists is an object' in R?
The function is.object seems only to look if the object has a "class" attribute. So it has not the same meaning as in the slogan.
For instance:
x <- 1
attributes(x) # it does not have a class attribute
NULL
is.object(x)
[1] FALSE
class(x) <- "my_class"
attributes(x) # now it has a class attribute
$class
[1] "my_class"
is.object(x)
[1] TRUE
Now, trying to answer your real question, about the slogan, this is how I would put it. Everything that exists in R is an object in the sense that it is a kind of data structure that can be manipulated. I think this is better understood with functions and expressions, which are not usually thought as data.
Taking a quote from Chambers (2008):
The central computation in R is a function call, defined by the
function object itself and the objects that are supplied as the
arguments. In the functional programming model, the result is defined
by another object, the value of the call. Hence the traditional motto
of the S language: everything is an object—the arguments, the value,
and in fact the function and the call itself: All of these are defined
as objects. Think of objects as collections of data of all kinds. The data contained and the way the data is organized depend on the class from which the object was generated.
Take this expression for example mean(rnorm(100), trim = 0.9). Until it is is evaluated, it is an object very much like any other. So you can change its elements just like you would do it with a list. For instance:
call <- substitute(mean(rnorm(100), trim = 0.9))
call[[2]] <- substitute(rt(100,2 ))
call
mean(rt(100, 2), trim = 0.9)
Or take a function, like rnorm:
rnorm
function (n, mean = 0, sd = 1)
.Call(C_rnorm, n, mean, sd)
<environment: namespace:stats>
You can change its default arguments just like a simple object, like a list, too:
formals(rnorm)[2] <- 100
rnorm
function (n, mean = 100, sd = 1)
.Call(C_rnorm, n, mean, sd)
<environment: namespace:stats>
Taking one more time from Chambers (2008):
The key concept is that expressions for evaluation are themselves
objects; in the traditional motto of the S language, everything is an
object. Evaluation consists of taking the object representing an
expression and returning the object that is the value of that
expression.
So going back to our call example, the call is an object which represents another object. When evaluated, it becomes that other object, which in this case is the numeric vector with one number: -0.008138572.
set.seed(1)
eval(call)
[1] -0.008138572
And that would take us to the second slogan, which you did not mention, but usually comes together with the first one: "Everything that happens is a function call".
Taking again from Chambers (2008), he actually qualifies this statement a little bit:
Nearly everything that happens in R results from a function call.
Therefore, basic programming centers on creating and refining
functions.
So what that means is that almost every transformation of data that happens in R is a function call. Even a simple thing, like a parenthesis, is a function in R.
So taking the parenthesis like an example, you can actually redefine it to do things like this:
`(` <- function(x) x + 1
(1)
[1] 2
Which is not a good idea but illustrates the point. So I guess this is how I would sum it up: Everything that exists in R is an object because they are data which can be manipulated. And (almost) everything that happens is a function call, which is an evaluation of this object which gives you another object.
I love that quote.
In another (as of now unpublished) write-up, the author continues with
R has a uniform internal structure for representing all objects. The evaluation process keys off that structure, in a simple form that is essentially
composed of function calls, with objects as arguments and an object as the
value. Understanding the central role of objects and functions in R makes
use of the software more effective for any challenging application, even those where extending R is not the goal.
but then spends several hundred pages expanding on it. It will be a great read once finished.
Objects For x to be an object means that it has a class thus class(x) returns a class for every object. Even functions have a class as do environments and other objects one might not expect:
class(sin)
## [1] "function"
class(.GlobalEnv)
## [1] "environment"
I would not pay too much attention to is.object. is.object(x) has a slightly different meaning than what we are using here -- it returns TRUE if x has a class name internally stored along with its value. If the class is stored then class(x) returns the stored value and if not then class(x) will compute it from the type. From a conceptual perspective it matters not how the class is stored internally (stored or computed) -- what matters is that in both cases x is still an object and still has a class.
Functions That all computation occurs through functions refers to the fact that even things that you might not expect to be functions are actually functions. For example when we write:
{ 1; 2 }
## [1] 2
if (pi > 0) 2 else 3
## [1] 2
1+2
## [1] 3
we are actually making invocations of the {, if and + functions:
`{`(1, 2)
## [1] 2
`if`(pi > 0, 2, 3)
## [1] 2
`+`(1, 2)
## [1] 3

Inline expansion of variables in R

I'm confused with when a value is treated as a variable, and when as a string in R. In Ruby and Python, I'm used to a string always having to be quoted, and an unquoted string is always treated as a variable. Ie.
a["hello"] => a["hello"]
b = "hi"
a[b] => a["hi"]
But in R, this is not the case, for example
a$b < c(1,2,3)
b here is the value/name of the column, not the variable b.
c <- "b"
a$c => column not found (it's looking for column c, not b, which is the value of the variable c)
(I know that in this specific case I can use a[c], but there are many other cases. Such as ggplot(a, aes(x=c)) - I want to plot the column that is the value of c, not with the name c)...
In other StackOverflow questions, I've seen things like quote, substitute etc mentioned.
My question is: Is there a general way of "expanding" a variable and making sure the value of the variable is used, instead of the name of the variable? Or is that just not how things are done in R?
In your example, a$b is syntatic sugar for a[["b"]]. That's a special feature of the $ symbol when used with lists. The second form does what you expect - a[[b]] will return the element of a whose name == the value of the variable b, rather than the element whose name is "b".
Data frames are similar. For a data frame a, the $ operator refers to the column names. So a$b is the same as a[ , "b"]. In this case, to refer to the column of a indicated by the value of b, use a[, b].
The reason that what you posted with respect to the $ operator doesn't work is quite subtle and is in general quite different to most other situations in R where you can just use a function like get which was designed for that purpose. However, calling a$b is equivalent to calling
`$`(a , b)
This reminds us, that in R, everything is an object. $ is a function and it takes two arguments. If we check the source code we can see that calling a$c and expecting R to evaluate c to "b" will never work, because in the source code it states:
/* The $ subset operator.
We need to be sure to only evaluate the first argument.
The second will be a symbol that needs to be matched, not evaluated.
*/
It achieves this using the following:
if(isSymbol(nlist) )
SET_STRING_ELT(input, 0, PRINTNAME(nlist));
else if(isString(nlist) )
SET_STRING_ELT(input, 0, STRING_ELT(nlist, 0));
else {
errorcall(call,_("invalid subscript type '%s'"),
type2char(TYPEOF(nlist)));
}
nlist is the argument you passed do_subset_3 (the name of the C function $ maps to), in this case c. It found that c was a symbol, so it replaces it with a string but does not evaluate it. If it was a string then it is passed as a string.
Here are some links to help you understand the 'why's and 'when's of evaluation in R. They may be enlightening, they may even help, if nothing else they will let you know that you are not alone:
http://developer.r-project.org/nonstandard-eval.pdf
http://journal.r-project.org/2009-1/RJournal_2009-1_Chambers.pdf
http://www.burns-stat.com/documents/presentations/inferno-ish-r/
In that last one, the most important piece is bullet point 2, then read through the whole set of slides. I would probably start with the 3rd one, then the 1st 2.
These are less in the spirit of how to make a specific case work (as the other answers have done) and more in the spirit of what has lead to this state of affairs and why in some cases it makes sense to have standard nonstandard ways of accessing variables. Hopefully understanding the why and when will help with the overall what to do.
If you want to get the variable named "b", use the get function in every case. This will substitute the value of b for get(b) wherever it is found.
If you want to play around with expressions, you need to use quote(), substitute(), bquote(), and friends like you mentioned.
For example:
x <- quote(list(a = 1))
names(x) # [1] "" "a"
names(x) <- c("", a)
x # list(foo = 1)
And:
c <- "foo"
bquote(ggplot(a, aes(x=.(c)))) # ggplot(a, aes(x = "foo"))
substitute(ggplot(a, aes(x=c)), list(c = "foo"))

Convert character vector to numeric vector in R for value assignment?

I have:
z = data.frame(x1=a, x2=b, x3=c, etc)
I am trying to do:
for (i in 1:10)
{
paste(c('N'),i,sep="") -> paste(c('z$x'),i,sep="")
}
Problems:
paste(c('z$x'),i,sep="") yields "z$x1", "z$x1" instead of calling the actual values. I need the expression to be evaluated. I tried as.numeric, eval. Neither seemed to work.
paste(c('N'),i,sep="") yields "N1", "N2". I need the expression to be merely used as name. If I try to assign it a value such as paste(c('N'),5,sep="") -> 5, ie "N5" -> 5 instead of N5 -> 5, I get target of assignment expands to non-language object.
This task is pretty trivial since I can simply do:
N1 = x1...
N2 = x2...
etc, but I want to learn something new
I'd suggest using something like for( i in 1:10 ) z[,i] <- N[,i]...
BUT, since you said you want to learn something new, you can play around with parse and substitute.
NOTE: these little tools are funny, but experienced users (not me) avoid them.
This is called "computing on the language". It's very interesting, and it helps understanding the way R works. Let me try to give an intro:
The basic language construct is a constant, like a numeric or character vector. It is trivial because it is not different from its "unevaluated" version, but it is one of the building blocks for more complicated expressions.
The (officially) basic language object is the symbol, also known as a name. It's nothing but a pointer to another object, i.e., a token that identifies another object which may or may not exist. For instance, if you run x <- 10, then x is a symbol that refers to the value 10. In other words, evaluating the symbol x yields the numeric vector 10. Evaluating a non-existant symbol yields an error.
A symbol looks like a character string, but it is not. You can turn a string into a symbol with as.symbol("x").
The next language object is the call. This is a recursive object, implemented as a list whose elements are either constants, symbols, or another calls. The first element must not be a constant, because it must evaluate to the real function that will be called. The other elements are the arguments to this function.
If the first argument does not evaluate to an existing function, R will throw either Error: attempt to apply non-function or Error: could not find function "x" (if the first argument is a symbol that is undefined or points to something other than a function).
Example: the code line f(x, y+z, 2) will be parsed as a list of 4 elements, the first being f (as a symbol), the second being x (another symbol), the third another call, and the fourth a numeric constant. The third element y+z, is just a function with two arguments, so it parses as a list of three names: '+', y and z.
Finally, there is also the expression object, that is a list of calls/symbols/constants, that are meant to be evaluated one by one.
You'll find lots of information here:
https://github.com/hadley/devtools/wiki/Computing-on-the-language
OK, now let's get back to your question :-)
What you have tried does not work because the output of paste is a character string, and the assignment function expects as its first argument something that evaluates to a symbol, to be either created or modified. Alternativelly, the first argument can also evaluate to a call associated with a replacement function. These are a little trickier, but they are handled by the assignment function itself, not by the parser.
The error message you see, target of assignment expands to non-language object, is triggered by the assignment function, precisely because your target evaluates to a string.
We can fix that building up a call that has the symbols you want in the right places. The most "brute force" method is to put everything inside a string and use parse:
parse(text=paste('N',i," -> ",'z$x',i,sep=""))
Another way to get there is to use substitute:
substitute(x -> y, list(x=as.symbol(paste("N",i,sep="")), y=substitute(z$w, list(w=paste("x",i,sep="")))))
the inner substitute creates the calls z$x1, z$x2 etc. The outer substitute puts this call as the taget of the assignment, and the symbols N1, N2 etc as the values.
parse results in an expression, and substitute in a call. Both can be passed to eval to get the same result.
Just one final note: I repeat that all this is intended as a didactic example, to help understanding the inner workings of the language, but it is far from good programming practice to use parse and substitute, except when there is really no alternative.
A data.frame is a named list. It usually good practice, and idiomatically R-ish not to have lots of objects in the global environment, but to have related (or similar) objects in lists and to use lapply etc.
You could use list2env to multiassign the named elements of your list (the columns in your data.frame) to the global environment
DD <- data.frame(x = 1:3, y = letters[1:3], z = 3:1)
list2env(DD, envir = parent.frame())
## <environment: R_GlobalEnv>
## ta da, x, y and z now exist within the global environment
x
## [1] 1 2 3
y
## [1] a b c
## Levels: a b c
z
## [1] 3 2 1
I am not exactly sure what you are trying to accomplish. But here is a guess:
### Create a data.frame using the alphabet
data <- data.frame(x = 'a', y = 'b', z = 'c')
### Create a numerical index corresponding to the letter position in the alphabet
index <- which(tolower(letters[1:26]) == data[1, ])
### Use an 'lapply' to apply a function to every element in 'index'; creates a list
val <- lapply(index, function(x) {
paste('N', x, sep = '')
})
### Assign names to our list
names(val) <- names(data)
### Observe the result
val$x

Resources