Difference between as.data.frame(x) and as(x, "data.frame") - r

I'll keep it simple. Why does this work:
> as.data.frame(c('a', 'b'))
c("a", "b")
1 a
2 b
But this doesn't:
> as(c('a', 'b'), "data.frame")
Error in as(c("a", "b"), "data.frame") :
no method or default for coercing “character” to “data.frame”
I assumed that the latter would simply somehow convert into the former, but I suppose not.

Maybe the R authors thought replicating the first method would be encouraging bad coding practice. The first result does not look particularly worth emulating because the name of the column will not be easy to use. The data.frame method for character values delivers a much better behaved result since it gets created with a valid name:
> as.data.frame(c('a','b'))
c("a", "b")
1 a
2 b
data.frame(c('a','b'))
c..a....b..
1 a
2 b
See what happens when you try to extract values with the name of that column. Since everyone knows that dataframes are really list objects, (right?)... then it would be more natural to expect coders to use a list argument:
data.frame(list(b=c('a', 'b')) )
b
1 a
2 b
# same as
> as.data.frame(list(f=c('a','b')))
f
1 a
2 b
Alex's answer directs you to the as-function code, which elaborates and confirms joran's comment above. That function doesn't use the S3 dispatch, but rather looks up registered coercion methods that have been created by packages or constructed with setAs which is a process that is more commonly used in building S4-methods.
> setAs("character", "data.frame", function(from){ to=as.data.frame.character(from)})
> new=as(c('a', 'b'), "data.frame")
> new
from
1 a
2 b
The setAs function also allows you to use custom coercion at the time of input with the read.*-functions: How can I completely remove scientific notation for the entire R session

I believe that it has to do with the fact that as is not a generic function, such as mean:
R> mean
function (x, ...)
UseMethod("mean")
<bytecode: 0x000000000a617ed0>
<environment: namespace:base>
Since it's not a generic, there is no call to method dispatch (ie UseMethod)
On the other hand, as.data.frame is a generic function-- see methods(class= "data.frame") or the source for as.data.frame
If there was method dispatch on as, your assumption "that the latter would convert to the former" would be correct. Since as is not a generic function, your assumption is wrong.
If you look at the source code to as, you see that it's essentially a call to a number of if-else cases instead of a call to method dispatch. On line 52, you see the catch that returns your error:
if (is.null(asMethod))
stop(gettextf("no method or default for coercing %s to %s",
dQuote(thisClass), dQuote(Class)), domain = NA)
Which gives the return that you see.

Related

Use `callNextMethod()` with dotsMethods

I would like to define some S4 generics dispatching on the ... argument such that the more specialized methods call the inherited method through callNextMethod(). However, as illustrated by the MWE, this fails with the following error.
# sample function which returns the number of its arguments
f <- function(...) length(list(...))
setGeneric("f")
## [1] "f"
setMethod("f", "character", function(...){ print("character"); callNextMethod() })
## [1] "f"
f(1, 2, 3)
## [1] 3
f("a", "b", "c")
## [1] "character"
## Error in callNextMethod(): a call to callNextMethod() appears in a call to '.Method', but the call does not seem to come from either a generic function or another 'callNextMethod'
This behavior doesn't seem right to me, but maybe I'm missing something here. I would expect the failing callNextMethod() to dispatch to the inherited default method function(...) length(list(...)) effectively returning:
## [1] "character"
## [1] 3
Any thoughts on this?
Update
Additionally, I've found the following difference in behavior between S4 methods dispatching on formal arguments and ones dispatching on .... Consider the following example where switching the signature from x to ... changes the way objects are resolved.
f = function(x, ..., a = b) {
b = "missing 'a'"
cat(a)
}
f()
## missing 'a'
f(a = 1)
## 1
setGeneric("f", signature = "x")
f()
## missing 'a'
setGeneric("f", signature = "...")
f()
## Error in cat(a) : object 'b' not found
According to ?dotsMethods the dispatch on ... is implemented differently, but as suggested in the last sentence, this shouldn't cause any differences in behavior compared to regular generics. However, the above findings seem to prove the opposite.
Methods dispatching on “...” were introduced in version 2.8.0 of R. The initial implementation of the corresponding selection and dispatch is in an R function, for flexibility while the new mechanism is being studied. In this implementation, a local version of setGeneric is inserted in the generic function's environment. The local version selects a method according to the criteria above and calls that method, from the environment of the generic function. This is slightly different from the action taken by the C implementation when “...” is not involved. Aside from the extra computing time required, the method is evaluated in a true function call, as opposed to the special context constructed by the C version (which cannot be exactly replicated in R code.) However, situations in which different computational results would be obtained have not been encountered so far, and seem very unlikely.

'=' vs. '<-' as a function argument in R

I am a beginner so I'd appreciate any thoughts, and I understand that this question might be too basic for some of you.
Also, this question is not about the difference between <- and =, but about the way they get evaluated when they are part of the function argument. I read this thread, Assignment operators in R: '=' and '<-' and several others, but I couldn't understand the difference.
Here's the first line of code:
My objective is to get rid of variables in the environment. From reading the above thread, I would believe that <- would exist in the user workspace, so there shouldn't be any issue with deleting all variables.
Here is my code and two questions:
Question 1
First off, this code doesn't work.
rm(ls()) #throws an error
I believe this happens because ls() returns a character vector, and rm() expects an object name. Am I correct? If so, I would appreciate if someone could guide me how to get object names from character array.
Question 2
I googled this topic and found that this code below deletes all variables.
rm(list = ls())
While this does help me, I am unsure why = is used instead of <-. If I run the following code, I get an error Error in rm(list <- ls()) : ... must contain names or character strings
rm(list <- ls())
Why is this? Can someone please guide me? I'd appreciate any help/guidance.
I read this thread, Assignment operators in R: '=' and '<-' and several others, but I couldn't understand the difference.
No wonder, since the answers there are actually quite confusing, and some are outright wrong. Since that’s the case, let’s first establish the difference between them before diving into your actual question (which, it turns out, is mostly unrelated):
<- is an assignment operator
In R, <- is an operator that performs assignment from right to left, in the current scope. That’s it.
= is either an assignment operator or a distinct syntactic token
=, by contrast, has several meanings: its semantics change depending on the syntactic context it is used in:
If = is used inside a parameter list, immediately to the right of a parameter name, then its meaning is: “associate the value on the right with the parameter name on the left”.
Otherwise (i.e. in all other situations), = is also an operator, and by default has the same meaning as <-: i.e. it performs assignment in the current scope.
As a consequence of this, the operators <- and = can be used interchangeably1. However, = has an additional syntactic role in an argument list of a function definition or a function call. In this context it’s not an operator and cannot be replaced by <-.
So all these statements are equivalent:
x <- 1
x = 1
x[5] <- 1
x[5] = 1
(x <- 1)
(x = 1)
f((x <- 5))
f((x = 5))
Note the extra parentheses in the last example: if we omitted these, then f(x = 5) would be interpreted as a parameter association rather than an assignment.
With that out of the way, let’s turn to your first question:
When calling rm(ls()), you are passing ls() to rm as the ... parameter. Ronak’s answer explains this in more detail.
Your second question should be answered by my explanation above: <- and = behave differently in this context because the syntactic usage dictates that rm(list = ls()) associates ls() with the named parameter list, whereas <- is (as always) an assignment operator. The result of that assignment is then once again passed as the ... parameter.
1 Unless somebody changed their meaning: operators, like all other functions in R, can be overwritten with new definitions.
To expand on my comment slightly, consider this example:
> foo <- function(a,b) b+1
> foo(1,b <- 2) # Works
[1] 3
> ls()
[1] "b" "foo"
> foo(b <- 3) # Doesn't work
Error in foo(b <- 3) : argument "b" is missing, with no default
The ... argument has some special stuff going on that restricts things a little further in the OP's case, but this illustrates the issue with how R is parsing the function arguments.
Specifically, when R looks for named arguments, it looks specifically for arg = val, with an equals sign. Otherwise, it is parsing the arguments positionally. So when you omit the first argument, a, and just do b <- 1, it thinks the expression b <- 1 is what you are passing for the argument a.
If you check ?rm
rm(..., list = character(),pos = -1,envir = as.environment(pos), inherits = FALSE)
where ,
... - the objects to be removed, as names (unquoted) or character strings (quoted).
and
list - a character vector naming objects to be removed.
So, if you do
a <- 5
and then
rm(a)
it will remove the a from the global environment.
Further , if there are multiple objects you want to remove,
a <- 5
b <- 10
rm(a, b)
This can also be written as
rm(... = a, b)
where we are specifying that the ... part in syntax takes the arguments a and b
Similarly, when we want to specify the list part of the syntax, it has to be given by
rm(list = ls())
doing list <- ls() will store all the variables from ls() in the variable named list
list <- ls()
list
#[1] "a" "b" "list"
I hope this is helpful.

What's the real meaning about 'Everything that exists is an object' in R?

I saw:
“To understand computations in R, two slogans are helpful:
• Everything that exists is an object.
• Everything that happens is a function call."
— John Chambers
But I just found:
a <- 2
is.object(a)
# FALSE
Actually, if a variable is a pure base type, it's result is.object() would be FALSE. So it should not be an object.
So what's the real meaning about 'Everything that exists is an object' in R?
The function is.object seems only to look if the object has a "class" attribute. So it has not the same meaning as in the slogan.
For instance:
x <- 1
attributes(x) # it does not have a class attribute
NULL
is.object(x)
[1] FALSE
class(x) <- "my_class"
attributes(x) # now it has a class attribute
$class
[1] "my_class"
is.object(x)
[1] TRUE
Now, trying to answer your real question, about the slogan, this is how I would put it. Everything that exists in R is an object in the sense that it is a kind of data structure that can be manipulated. I think this is better understood with functions and expressions, which are not usually thought as data.
Taking a quote from Chambers (2008):
The central computation in R is a function call, defined by the
function object itself and the objects that are supplied as the
arguments. In the functional programming model, the result is defined
by another object, the value of the call. Hence the traditional motto
of the S language: everything is an object—the arguments, the value,
and in fact the function and the call itself: All of these are defined
as objects. Think of objects as collections of data of all kinds. The data contained and the way the data is organized depend on the class from which the object was generated.
Take this expression for example mean(rnorm(100), trim = 0.9). Until it is is evaluated, it is an object very much like any other. So you can change its elements just like you would do it with a list. For instance:
call <- substitute(mean(rnorm(100), trim = 0.9))
call[[2]] <- substitute(rt(100,2 ))
call
mean(rt(100, 2), trim = 0.9)
Or take a function, like rnorm:
rnorm
function (n, mean = 0, sd = 1)
.Call(C_rnorm, n, mean, sd)
<environment: namespace:stats>
You can change its default arguments just like a simple object, like a list, too:
formals(rnorm)[2] <- 100
rnorm
function (n, mean = 100, sd = 1)
.Call(C_rnorm, n, mean, sd)
<environment: namespace:stats>
Taking one more time from Chambers (2008):
The key concept is that expressions for evaluation are themselves
objects; in the traditional motto of the S language, everything is an
object. Evaluation consists of taking the object representing an
expression and returning the object that is the value of that
expression.
So going back to our call example, the call is an object which represents another object. When evaluated, it becomes that other object, which in this case is the numeric vector with one number: -0.008138572.
set.seed(1)
eval(call)
[1] -0.008138572
And that would take us to the second slogan, which you did not mention, but usually comes together with the first one: "Everything that happens is a function call".
Taking again from Chambers (2008), he actually qualifies this statement a little bit:
Nearly everything that happens in R results from a function call.
Therefore, basic programming centers on creating and refining
functions.
So what that means is that almost every transformation of data that happens in R is a function call. Even a simple thing, like a parenthesis, is a function in R.
So taking the parenthesis like an example, you can actually redefine it to do things like this:
`(` <- function(x) x + 1
(1)
[1] 2
Which is not a good idea but illustrates the point. So I guess this is how I would sum it up: Everything that exists in R is an object because they are data which can be manipulated. And (almost) everything that happens is a function call, which is an evaluation of this object which gives you another object.
I love that quote.
In another (as of now unpublished) write-up, the author continues with
R has a uniform internal structure for representing all objects. The evaluation process keys off that structure, in a simple form that is essentially
composed of function calls, with objects as arguments and an object as the
value. Understanding the central role of objects and functions in R makes
use of the software more effective for any challenging application, even those where extending R is not the goal.
but then spends several hundred pages expanding on it. It will be a great read once finished.
Objects For x to be an object means that it has a class thus class(x) returns a class for every object. Even functions have a class as do environments and other objects one might not expect:
class(sin)
## [1] "function"
class(.GlobalEnv)
## [1] "environment"
I would not pay too much attention to is.object. is.object(x) has a slightly different meaning than what we are using here -- it returns TRUE if x has a class name internally stored along with its value. If the class is stored then class(x) returns the stored value and if not then class(x) will compute it from the type. From a conceptual perspective it matters not how the class is stored internally (stored or computed) -- what matters is that in both cases x is still an object and still has a class.
Functions That all computation occurs through functions refers to the fact that even things that you might not expect to be functions are actually functions. For example when we write:
{ 1; 2 }
## [1] 2
if (pi > 0) 2 else 3
## [1] 2
1+2
## [1] 3
we are actually making invocations of the {, if and + functions:
`{`(1, 2)
## [1] 2
`if`(pi > 0, 2, 3)
## [1] 2
`+`(1, 2)
## [1] 3

What are the advantages of using with() vs. calling vectors?

I am curious if there are any advantages of using with() rather than calling the vector name (aside from using fewer key strokes)?
For example, is with(d,x1) always equivalent to d$x1?
where d is
structure(list(x1 = c(-1.96300839219158, -1.7799470435444, -0.247433477421076,
-0.333402872895705, -1.37145403620246, -0.23484024054114, -0.808080155419075,
-0.359895157796401, 0.54316873679816, -0.687429214935226), x2 = c(-0.619089899920824,
-0.0716448494478719, -0.136643798928645, 2.58777656543295, 0.758900617148999,
0.687980864291582, 0.442931351818574, -0.734342463692198, 2.55862689249189,
1.30677108261702)), .Names = c("x1", "x2"), row.names = c(NA,
-10L), class = "data.frame")
If you're just referencing an item in a list, e.g. a column in a data frame, then d$x1 and with(d, x1) will both return x1 from d. However, on its own the latter is rather unusual that's not really the intended purpose of with(); extracting a value from a list is what $ is for.
The advantage of using with() is to evaluate expressions in the context of a single environment without worring about global variables or attached data frames making references to variables ambiguous.
The $ syntax does not support expressions, so to perform a calculation involving multiple variables in a data frame, you would need to use d$x1, d$x2, etc. which is inconvenient. But for otherwise simply extracting an item from a list, $ is preferred.
A notable case in which the two methods are not equivalent is as follows. Suppose d is defined as
d <- data.frame(x1=c(1, 2, 3))
Now define y <- "x1". What happens when we try to reference x1 using y?
> d$y
NULL
> with(d, y)
[1] "x1"
> d[, y]
[1] 1 2 3
d$y returns NULL since there is no column y in d, so there's nothing to extract.
And since there's no column y in d, with(d, y) looks for y in the parent frame of d, which in this case is the global environment. So this evaluates y in the global environment and thus returns "x1". Even though there's nothing to extract, there is something to evaluate because y does exist, just not in d.
Now d[, y] gets us what we want. This first evaluates y, which turns this into d[, "x1"], which is the correct syntax for extracting x1 from d using another variable.
Some finer detail courtesy of David Arenburg:
Note that with() is actually a generic function that performs method dispatch, whereas $ is a primitive. An inspection of base:::with.default is illuminating:
function(data, expr, ...)
eval(substitute(expr), data, enclos = parent.frame())
This serves to confirm that with() is for evaluation.
Since $ is a primitive, it calls .Primitive("$"), which means that it calls an entry point in compiled internal code. Doing a bit of hunting shows that $ goes to an entry point called do_subset3 in subset.c. The comment immediately preceding that piece of C code is equally illuminating:
/* The $ subset operator.
We need to be sure to only evaluate the first argument.
The second will be a symbol that needs to be matched, not evaluated.
*/
This serves to confirm that $ is for extraction, not evaluation.
So in short, as David put it so well in a comment, with() and $ have different purposes which in certain circumstances can overlap.

Convert character vector to numeric vector in R for value assignment?

I have:
z = data.frame(x1=a, x2=b, x3=c, etc)
I am trying to do:
for (i in 1:10)
{
paste(c('N'),i,sep="") -> paste(c('z$x'),i,sep="")
}
Problems:
paste(c('z$x'),i,sep="") yields "z$x1", "z$x1" instead of calling the actual values. I need the expression to be evaluated. I tried as.numeric, eval. Neither seemed to work.
paste(c('N'),i,sep="") yields "N1", "N2". I need the expression to be merely used as name. If I try to assign it a value such as paste(c('N'),5,sep="") -> 5, ie "N5" -> 5 instead of N5 -> 5, I get target of assignment expands to non-language object.
This task is pretty trivial since I can simply do:
N1 = x1...
N2 = x2...
etc, but I want to learn something new
I'd suggest using something like for( i in 1:10 ) z[,i] <- N[,i]...
BUT, since you said you want to learn something new, you can play around with parse and substitute.
NOTE: these little tools are funny, but experienced users (not me) avoid them.
This is called "computing on the language". It's very interesting, and it helps understanding the way R works. Let me try to give an intro:
The basic language construct is a constant, like a numeric or character vector. It is trivial because it is not different from its "unevaluated" version, but it is one of the building blocks for more complicated expressions.
The (officially) basic language object is the symbol, also known as a name. It's nothing but a pointer to another object, i.e., a token that identifies another object which may or may not exist. For instance, if you run x <- 10, then x is a symbol that refers to the value 10. In other words, evaluating the symbol x yields the numeric vector 10. Evaluating a non-existant symbol yields an error.
A symbol looks like a character string, but it is not. You can turn a string into a symbol with as.symbol("x").
The next language object is the call. This is a recursive object, implemented as a list whose elements are either constants, symbols, or another calls. The first element must not be a constant, because it must evaluate to the real function that will be called. The other elements are the arguments to this function.
If the first argument does not evaluate to an existing function, R will throw either Error: attempt to apply non-function or Error: could not find function "x" (if the first argument is a symbol that is undefined or points to something other than a function).
Example: the code line f(x, y+z, 2) will be parsed as a list of 4 elements, the first being f (as a symbol), the second being x (another symbol), the third another call, and the fourth a numeric constant. The third element y+z, is just a function with two arguments, so it parses as a list of three names: '+', y and z.
Finally, there is also the expression object, that is a list of calls/symbols/constants, that are meant to be evaluated one by one.
You'll find lots of information here:
https://github.com/hadley/devtools/wiki/Computing-on-the-language
OK, now let's get back to your question :-)
What you have tried does not work because the output of paste is a character string, and the assignment function expects as its first argument something that evaluates to a symbol, to be either created or modified. Alternativelly, the first argument can also evaluate to a call associated with a replacement function. These are a little trickier, but they are handled by the assignment function itself, not by the parser.
The error message you see, target of assignment expands to non-language object, is triggered by the assignment function, precisely because your target evaluates to a string.
We can fix that building up a call that has the symbols you want in the right places. The most "brute force" method is to put everything inside a string and use parse:
parse(text=paste('N',i," -> ",'z$x',i,sep=""))
Another way to get there is to use substitute:
substitute(x -> y, list(x=as.symbol(paste("N",i,sep="")), y=substitute(z$w, list(w=paste("x",i,sep="")))))
the inner substitute creates the calls z$x1, z$x2 etc. The outer substitute puts this call as the taget of the assignment, and the symbols N1, N2 etc as the values.
parse results in an expression, and substitute in a call. Both can be passed to eval to get the same result.
Just one final note: I repeat that all this is intended as a didactic example, to help understanding the inner workings of the language, but it is far from good programming practice to use parse and substitute, except when there is really no alternative.
A data.frame is a named list. It usually good practice, and idiomatically R-ish not to have lots of objects in the global environment, but to have related (or similar) objects in lists and to use lapply etc.
You could use list2env to multiassign the named elements of your list (the columns in your data.frame) to the global environment
DD <- data.frame(x = 1:3, y = letters[1:3], z = 3:1)
list2env(DD, envir = parent.frame())
## <environment: R_GlobalEnv>
## ta da, x, y and z now exist within the global environment
x
## [1] 1 2 3
y
## [1] a b c
## Levels: a b c
z
## [1] 3 2 1
I am not exactly sure what you are trying to accomplish. But here is a guess:
### Create a data.frame using the alphabet
data <- data.frame(x = 'a', y = 'b', z = 'c')
### Create a numerical index corresponding to the letter position in the alphabet
index <- which(tolower(letters[1:26]) == data[1, ])
### Use an 'lapply' to apply a function to every element in 'index'; creates a list
val <- lapply(index, function(x) {
paste('N', x, sep = '')
})
### Assign names to our list
names(val) <- names(data)
### Observe the result
val$x

Resources