lapply-ing with the "$" function - r

I was going through some examples in hadley's guide to functionals, and came across an unexpected problem.
Suppose I have a list of model objects,
x=1:3;y=3:1; bah <- list(lm(x~y),lm(y~x))
and want to extract something from each (as suggested in hadley's question about a list called "trials"). I was expecting one of these to work:
lapply(bah,`$`,i='call') # or...
lapply(bah,`$`,call)
However, these return nulls. It seems like I'm not misusing the $ function, as these things work:
`$`(bah[[1]],i='call')
`$`(bah[[1]],call)
Anyway, I'm just doing this as an exercise and am curious where my mistake is. I know I could use an anonymous function, but think there must be a way to use syntax similar to my initial non-solution. I've looked through the places $ is mentioned in ?Extract, but didn't see any obvious explanation.
I just realized that this works:
lapply(bah,`[[`,i='call')
and this
lapply(bah,function(x)`$`(x,call))
Maybe this just comes down to some lapply voodoo that demands anonymous functions where none should be needed? I feel like I've heard that somewhere on SO before.

This is documented in ?lapply, in the "Note" section (emphasis mine):
For historical reasons, the calls created by lapply are unevaluated,
and code has been written (e.g. bquote) that relies on this. This
means that the recorded call is always of the form FUN(X[[0L]],
...), with 0L replaced by the current integer index. This is not
normally a problem, but it can be if FUN uses sys.call or
match.call or if it is a primitive function that makes use of the
call. This means that it is often safer to call primitive functions
with a wrapper, so that e.g. lapply(ll, function(x) is.numeric(x))
is required in R 2.7.1 to ensure that method dispatch for is.numeric
occurs correctly.

Related

R statistics programming : using magrittr piping to pass 2 parameters to function

I am using magrittr, and was able to pass one variable to an R function via pipes from magrittr, and also pick which parameter to place where in the situation of multivariable function : F(x,y,z,...)
But i want to pass 2 parameters at the same time.
For example, i will using Select function from dplyr and pass in tableName and ColumnName:
I thought i could do it like this:
tableName %>% ColumnName %>% select(.,.)
But this did not work.
Hope someone can help me on this.
EDIT :
Some below are saying that this is a duplicate of a link provided by others.
But based on the algebra structure of the magrittr definition of Pipe for multivariable functions, it should be "doable" just based on the algebra definition of the pipe function.
The link provided by others, goes beyond the base definition and employs other external functions and or libraries to try to achieve passing multiple parameter to the function.
I am looking for a solution, IF POSSIBLE, just using the magrittr library and other base operations.
So this is the restriction that is placed on this problem.
In most of my university courses in math and computer science we were restricted to use only those things taught in the course. So when I said I am using dplyr and magrittr, that should imply that those are the only things one is permitted to use, so its under this constraint.
Hope this clarifies the scope of possible solutions here.
And if it's not possible to do this via just these libraries I want someone to tell me that it cannot be done.
I think you need a little more detail about exactly what you want, but as I understand the problem, I think one solution might be:
list(x = tableName, y = "ColumnName") %>% {select(eval(.$x),.$y) }
This is just a modification of the code linked in the chat. The issue with other implementations is that the first and second inputs to select() must be of specific (and different) types. So just plugging in two strings or two objects won't work.
In the same spirit, you can also use either:
list(x = "tableName", y = "ColumnName") %>% { select(get(.$x),.$y) }
or
list(tableName, "ColumnName") %>% do.call("select", .).
Note, however, that all of these functions (i.e., get(), eval(), and do.call()) have an environment specification in them and could result in errors if improperly specified. They work just fine in these examples because everything is happening in the global environment, but that might change if they were, e.g., called in a function.

Is attributes() a function in R?

Help files call attributes() a function. Its syntax looks like a function call. Even class(attributes) calls it a function.
But I see I can assign something to attributes(myobject), which seems unusual. For example, I cannot assign anything to log(myobject).
So what is the proper name for "functions" like attributes()? Are there any other examples of it? How do you tell them apart from regular functions? (Other than trying supposedfunction(x)<-0, that is.)
Finally, I guess attributes() implementation overrides the assignment operator, in order to become a destination for assignments. Am I right? Is there any usable guide on how to do it?
Very good observation Indeed. It's an example of replacement function, if you see closely and type apropos('attributes') in your R console, It will return
"attributes"
"attributes<-"
along with other outputs.
So, basically the place where you are able to assign on the left sign of assignment operator, you are not calling attributes, you are actually calling attributes<- , There are many functions in R like that for example: names(), colnames(), length() etc. In your example log doesn't have any replacement counterpart hence it doesn't work the way you anticipated.
Definiton(from advanced R book link given below):
Replacement functions act like they modify their arguments in place,
and have the special name xxx<-. They typically have two arguments (x
and value), although they can have more, and they must return the
modified object
If you want to see the list of these functions you can do :
apropos('<-$') and you can check out similar functions, which has similar kind of properties.
You can read about it here and here
I am hopeful that this solves your problem.

Why is using assign bad?

This post (Lazy evaluation in R – is assign affected?) covers some common ground but I am not sure it answers my question.
I stopped using assign when I discovered the apply family quite a while back, albeit, purely for reasons of elegance in situations such as this:
names.foo <- letters
values.foo <- LETTERS
for (i in 1:length(names.foo))
assign(names.foo[i], paste("This is: ", values.foo[i]))
which can be replaced by:
foo <- lapply(X=values.foo, FUN=function (k) paste("This is :", k))
names(foo) <- names.foo
This is also the reason this (http://cran.r-project.org/doc/FAQ/R-FAQ.html#How-can-I-turn-a-string-into-a-variable_003f) R-faq says this should be avoided.
Now, I know that assign is generally frowned upon. But are there other reasons I don't know? I suspect it may mess with the scoping or lazy evaluation but I am not sure? Example code that demonstrates such problems will be great.
Actually those two operations are quite different. The first gives you 26 different objects while the second gives you only one. The second object will be a lot easier to use in analyses. So I guess I would say you have already demonstrated the major downside of assign, namely the necessity of then needing always to use get for corralling or gathering up all the similarly named individual objects that are now "loose" in the global environment. Try imagining how you would serially do anything with those 26 separate objects. A simple lapply(foo, func) will suffice for the second strategy.
That FAQ citation really only says that using assignment and then assigning names is easier, but did not imply it was "bad". I happen to read it as "less functional" since you are not actually returning a value that gets assigned. The effect looks to be a side-effect (and in this case the assign strategy results in 26 separate side-effects). The use of assign seems to be adopted by people that are coming from languages that have global variables as a way of avoiding picking up the "True R Way", i.e. functional programming with data-objects. They really should be learning to use lists rather than littering their workspace with individually-named items.
There is another assignment paradigm that can be used:
foo <- setNames( paste0(letters,1:26), LETTERS)
That creates a named atomic vector rather than a named list, but the access to values in the vector is still done with names given to [.
As the source of fortune(236) I thought I would add a couple examples (also see fortune(174)).
First, a quiz. Consider the following code:
x <- 1
y <- some.function.that.uses.assign(rnorm(100))
After running the above 2 lines of code, what is the value of x?
The assign function is used to commit "Action at a distance" (see http://en.wikipedia.org/wiki/Action_at_a_distance_(computer_programming) or google for it). This is often the source of hard to find bugs.
I think the biggest problem with assign is that it tends to lead people down paths of thinking that take them away from better options. A simple example is the 2 sets of code in the question. The lapply solution is more elegant and should be promoted, but the mere fact that people learn about the assign function leads people to the loop option. Then they decide that they need to do the same operation on each object created in the loop (which would be just another simple lapply or sapply if the elegant solution were used) and resort to an even more complicated loop involving both get and apply along with ugly calls to paste. Then those enamored with assign try to do something like:
curname <- paste('myvector[', i, ']')
assign(curname, i)
And that does not do quite what they expected which leads to either complaining about R (which is as fair as complaining that my next door neighbor's house is too far away because I chose to walk the long way around the block) or even worse, delve into using eval and parse to get their constructed string to "work" (which then leads to fortune(106) and fortune(181)).
I'd like to point out that assign is meant to be used with environments.
From that point of view, the "bad" thing in the example above is using a not quite appropriate data structure (the base environment instead of a list or data.frame, vector, ...).
Side note: also for environments, the $ and $<- operators work, so in many cases the explicit assign and get isn't necessary there, neither.

The Art of R Programming : Where else could I find the information?

I came across the editorial review of the book The Art of R Programming, and found this
The Art of R Programming takes you on a guided tour of software development with R, from basic types and data structures to advanced topics like closures, recursion, and anonymous functions
I immediately became fascinated by the idea of anonymous functions, something I had come across in Python in the form of lambda functions but could not make the connection in the R language.
I searched in the R manual and found this
Generally functions are assigned to symbols but they don't need to be. The value returned by the call to function is a function. If this is not given a name it is referred to as an anonymous function. Anonymous functions are most frequently used as arguments other functions such as the apply family or outer.
These things for a not-very-long-time programmer like me are "quirky" in a very interesting sort of way.
Where can I find more of these for R (without having to buy a book) ?
Thank you for sharing your suggestions
Functions don't have names in R. Whether you happen to put a function into a variable or not is not a property of the function itself so there does not exist two sorts of functions: anonymous and named. The best we can do is to agree to call a function which has never been assigned to a variable anonymous.
A function f can be regarded as a triple consisting of its formal arguments, its body and its environment accessible individually via formals(f), body(f) and environment(f). The name is not any part of that triple. See the function objects part of the language definition manual.
Note that if we want a function to call itself then we can use Recall to avoid knowing whether or not the function was assigned to a variable. The alternative is that the function body must know that the function has been assigned to a particular variable and what the name of that variable is. That is, if the function is assigned to variable f, say, then the body can refer to f in order to call itself. Recall is limited to self-calling functions. If we have two functions which mutually call each other then a counterpart to Recall does not exist -- each function must name the other which means that each function must have been assigned to a variable and each function body must know the variable name that the other function was assigned to.
There's not a lot to say about anonymous functions in R. Unlike Python, where lambda functions require special syntax, in R an anonymous function is simply a function without a name.
For example:
function(x,y) { x+y }
whereas a normal, named, function would be
add <- function(x,y) { x+y }
Functions are first-class objects, so you can pass them (regardless of whether they're anonymous) as arguments to other functions. Examples of functions that take other functions as arguments include apply, lapply and sapply.
Get Patrick Burns' "The R Inferno" at his site
There are several good web sites with basic introductions to R usage.
I also like Zoonekynd's manual
Great answers about style so far. Here's an answer about a typical use of anonymous functions in R:
# Make some data up
my.list <- list()
for( i in seq(100) ) {
my.list[[i]] <- lm( runif(10) ~ runif(10) )
}
# Do something with the data
sapply( my.list, function(x) x$qr$rank )
We could have named the function, but for simple data extractions and so forth it's really handy not to have to.

Scoping and functions in R 2.11.1 : What's going wrong?

This question comes from a range of other questions that all deal with essentially the same problem. For some strange reason, using a function within another function sometimes fails in the sense that variables defined within the local environment of the first function are not found back in the second function.
The classical pattern in pseudo-code :
ff <- function(x){
y <- some_value
some_function(y)
}
ff(x)
Error in eval(expr, envir, enclos) :
object 'y' not found
First I thought it had something to do with S4 methods and the scoping in there, but it also happens with other functions. I've had some interaction with the R development team, but all they did was direct me to the bug report site (which is not the most inviting one, I have to say). I never got any feedback.
As the problem keeps arising, I wonder if there is a logic explanation for it. Is it a common mistake made in all these cases, and if so, which one? Or is it really a bug?
Some of those questions :
Using functions and environments
R (statistical) scoping error using transformBy(), part of the doBy package.
How to use acast (reshape2) within a function in R?
Why can't I pass a dataset to a function?
Values not being copied to the next local environment
PS : I know the R-devel list, in case you wondered...
R has both lexical and dynamic scope. Lexical scope works automatically, but dynamic scope must be implemented manually, and requires careful book-keeping. Only functions used interactively for data analysis need dynamic scope, so most authors (like me!) don't learn how to do it correctly.
See also: the standard non-standard evaluation rules.
There are undoubtedly bugs in R, but a lot of the issues that people have been having are quite often errors in the implementation of some_function, not R itself. R has scoping rules ( see http://cran.r-project.org/doc/manuals/R-intro.html#Scope) which when combined with lazy evaluation of function arguments and the ability to eval arguments in other scopes are extremely powerful but which also often lead to subtle errors.
As Dirk mentioned in his answer, there isn't actually a problem with the code that you posted. In the links you posted in the question, there seems to be a common theme: some_function contains code that messes about with environments in some way. This messing is either explicit, using new.env and with or implicitly, using a data argument, that probably has a line like
y <- eval(substitute(y), data)
The moral of the story is twofold. Firstly, try to avoid explicitly manipulating environments, unless you are really sure that you know what you are doing. And secondly, if a function has a data argument then put all the variables that you need the function to use inside that data frame.
Well there is no problem in what you posted:
/tmp$ cat joris.r
#!/usr/bin/r -t
some_function <- function(y) y^2
ff <- function(x){
y <- 4
some_function(y) # so we expect 16
}
print(ff(3)) # 3 is ignored
$ ./joris.r
[1] 16
/tmp$
Could you restate and postan actual bug or misfeature?

Resources