Passing a variable name to a function in R - r

I've noticed that quite a few packages allow you to pass symbol names that may not even be valid in the context where the function is called. I'm wondering how this works and how I can use it in my own code?
Here is an example with ggplot2:
a <- data.frame(x=1:10,y=1:10)
library(ggplot2)
qplot(data=a,x=x,y=y)
x and y don't exist in my namespace, but ggplot understands that they are part of the data frame and postpones their evaluation to a context in which they are valid. I've tried doing the same thing:
b <- function(data,name) { within(data,print(name)) }
b(a,x)
However, this fails miserably:
Error in print(name) : object 'x' not found
What am I doing wrong? How does this work?
Note: this is not a duplicate of Pass variable name to a function in r

I've recently discovered what I think is a better approach to passing variable names.
a <- data.frame(x = 1:10, y = 1:10)
b <- function(df, name){
eval(substitute(name), df)
}
b(a, x)
[1] 1 2 3 4 5 6 7 8 9 10
Update The approach uses non standard evaluation. I began explaining but quickly realized that Hadley Wickham does it much better than I could. Read this http://adv-r.had.co.nz/Computing-on-the-language.html

You can do this using match.call for example:
b <- function(data,name) {
## match.call return a call containing the specified arguments
## and the function name also
## I convert it to a list , from which I remove the first element(-1)
## which is the function name
pars <- as.list(match.call()[-1])
data[,as.character(pars$name)]
}
b(mtcars,cyl)
[1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4
explanation:
match.call returns a call in which all of the specified arguments are
specified by their full names.
So here the output of match.call is 2 symbols:
b <- function(data,name) {
str(as.list(match.call()[-1])) ## I am using str to get the type and name
}
b(mtcars,cyl)
List of 2
$ data: symbol mtcars
$ name: symbol cyl
So Then I use first symbol mtcars ansd convert the second to a string:
mtcars[,"cyl"]
or equivalent to :
eval(pars$data)[,as.character(pars$name)]

Very old thread but you can also use the get command as well. It seems to work better for me.
a <- data.frame(x = 1:10, y = 11:20)
b <- function(df, name){
get(name, df)
}
b(a, "x")
[1] 1 2 3 4 5 6 7 8 9 10

If you put the variable name between quotes when you call the function, it works:
> b <- function(data,name) { within(data,print(name)) }
> b(a, "x")
[1] "x"
x y
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
6 6 6
7 7 7
8 8 8
9 9 9
10 10 10

Related

Using Strings to Identify Sequence of Column Names in R

I am currently try to use pre-defined strings in order to identify multiple column names in R.
To be more explicit, I am using the ave function to create identification variables for subgroups of a dataframe. The twist is that I want the identification variables to be flexible, in such a manner that I would just pass it as a generic string.
A sample code would be:
ids = with(df,ave(rep(1,nrow(df)),subcolumn1,subcolumn2,subcolumn3,FUN=seq_along))
I would like to run this code in the following fashion (code below does not work as expected):
subColumnsString = c("subcolumn1","subcolumn2","subcolumn3")
ids = with(df,ave(rep(1,nrow(df)),subColumnsString ,FUN=seq_along))
I tried something with eval, but still did not work:
subColumnsString = c("subcolumn1","subcolumn2","subcolumn3")
ids = with(df,ave(rep(1,nrow(df)),eval(parse(text=subColumnsString)),FUN=seq_along))
Any ideas?
Thanks.
EDIT: Working code example of what I want:
df = mtcars
id_names = c("vs","am")
idDF_correct = transform(df,idItem = as.numeric(interaction(vs,am)))
idDF_wrong = cbind(df,ave(rep(1,nrow(df)),df[id_names],FUN=seq_along))
Note how in idDF_correct, the unique combinations are correctly mapped into unique values of idItem. In idDF_wrong this is not the case.
I think this achieves what you requested. Here I use the mtcars dataset that ships with R:
subColumnsString <- c("cyl","gear")
ids = with(mtcars, ave(rep(1,nrow(mtcars)), mtcars[subColumnsString], FUN=seq_along))
Just index your data.frame using the sub columns which returns a list that naturally works with ave
EDIT
ids = ave(rep(1,nrow(mtcars)), mtcars[subColumnsString], FUN=seq_along)
You can omit the with and just call plain 'ol ave, as G. Grothendieck, stated and you should also use their answer as it is much more general.
This defines a function whose arguments are:
data, the input data frame
by, a character vector of column names in data
fun, a function to use in ave
Code--
Ave <- function(data, by, fun = seq_along) {
do.call(function(...) ave(rep(1, nrow(data)), ..., FUN = fun), data[by])
}
# test
Ave(CO2, c("Plant", "Treatment"), seq_along)
giving:
[1] 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3
[39] 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6
[77] 7 1 2 3 4 5 6 7

Convert a full length column to one variable in a row in R

I was wondering if it is possible to convert 1 column into 1 variable next to eachother
i.e.:
d <- data.frame(y = 1:10)
> d
y
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
Convert this column into:
> d
1 2 3 4 5 6 7 8 9 10
We don't know how are you going to use the numbers, but I think it is unnecessary to make any transformation. You can use d$y to get the numbers applied to any map of colors. See for example.
d <- data.frame(y = 1:7)
library(RColorBrewer)
mypalette<-brewer.pal(4,"Greens")
mycol <-palette()#rainbow(7)
heatmap(matrix(1:28,ncol=4),col=mypalette[d$y[1:4]],xlab="Greens (sequential)",
ylab="",xaxt="n",yaxt="n",bty="n",RowSideColors=mycol[d$y])
Not sure what is the prupose of:
1 variable next to eachother
But there are few ways to get the desired result (again, depends on the objective). You can do either:
d$y
unname(unlist(d)) #suggested by agstudy
or, better yet, to convert your dataframe's column into a vector, do this:
v <- as.vector(d[,1])
as string:
args <- paste(d$y, sep=" ")
args<-noquote(args)
now you'll have
[1] 1 2 3 4 5 6 7 8 9 10

Define vector length globaly in R

I'm trying to set globaly length of vector of data, which later goes into function() on numerous places, so I'm annoyed about retyping this on several places.
x[1:10]
How can I set 1:10 globaly so that I can re-use it in the above example as following:
global <- 1:10
x[glboal]
I have tried with paste() but cannot get it into simple numeric 1:10. Please note I don't want x <- 1:10 look like [1] 1 2 3 4 5 ....
As soon as objection appear about this question (bad coding manner/question type etc.), I will erase this post asap.
EDIT: I thought about it as: cat(paste("1:10",sep=""),collapse="")
I don't understand what you are trying to do. Maybe this?
global <- substitute(x <- 1:10)
global
#x <- 1:10
eval(global)
x
#[1] 1 2 3 4 5 6 7 8 9 10
Or this?
global1 <- substitute(1:10)
global1
#1:10
eval(global1)
#[1] 1 2 3 4 5 6 7 8 9 10
You said you need to pass a global variable to a function, so perhaps something like this?
> x <- quote(1:10)
> x
## 1:10
> f <- function() eval(get('x'))
> f()
## [1] 1 2 3 4 5 6 7 8 9 10

Conditionally creating a new column

I am fairly certain this is a really obvious question, but I can't figure it out.
Lets say I have the following dataset:
test <- data.frame(A = c(1:10),
B = c(1:10), C = c(1:10),
P = c(1:10))
And I want to test, if there is a column called "P", create a new column called "Z" and put some content in it calculated from P.
I wrote the following code (just to try and get it to conditionally create the column, I've not tried to get it to do anything with that yet!):
Clean <- function(data) {
if("P" %in% colnames(data)) {
data$Z <- NA
}
else {
cat("doobedooo")
}
}
Clean(test)
But it doesn't seem to do anything, and I don't understand why, when simply running test$Z <- NA on the dataset does work.
I put the "doobedooo" in there, to see if it is returning a false at the first condition. It doesn't seem to be doing so.
Have I simply misunderstood how if statements work?
You have to return a value from your function, and then assign that value to an object. Unlike many other languages, R doesn't modify objects in-place, at least not without a lot of work.
Clean <- function(data) {
if("P" %in% colnames(data)) {
data$Z <- NA
} else {
cat("doobedooo"
}
return(data)
}
test <- Clean(test)
#HongOi answer is the direct answer to your question. Mine is the R way to deal with your problem. Since you want to create , another column combinations of others, you can use transform (or within), for example:
if('P' %in% colnames(test))
test <- transform(test,Z={## you can put any statement here
x=P+1
x^2
round(x/12,2)
}
)
head(test)
A B C P Z
1 1 1 1 1 0.17
2 2 2 2 2 0.25
3 3 3 3 3 0.33
4 4 4 4 4 0.42
5 5 5 5 5 0.50
6 6 6 6 6 0.58
Previous answer already gives everything you need. However, there is another way to deal with these problems. In R you can use environment to set and add data by reference instead of return()ing the whole table (even if you change a piece of it).
env <- new.env()
env$test <- test
system.time({
Clean <- function(data) {
if("P" %in% names(data$test)) {
data$test$Z <- NA
}
else {
cat("doobedooo")
}
}
Clean(env)
})
> env$test
A B C P Z
1 1 1 1 1 NA
2 2 2 2 2 NA
3 3 3 3 3 NA
4 4 4 4 4 NA
5 5 5 5 5 NA
6 6 6 6 6 NA
7 7 7 7 7 NA
8 8 8 8 8 NA
9 9 9 9 9 NA
10 10 10 10 10 NA

ave function in R: First argument a vector

I'm trying to use the following code in R:
ID=seq(1,11)
g=c(1,2,3,1,1,2,3,4,4,1,3)
x <- sample(11)
d <- data.frame(ID,g, x)
Ranking_Categoria<-function(d,var,category)
{
d$rank<-ave(d$var,d$category,FUN=rank)
return(d)
}
and I get the following error message:
Error in split.default(x, g) : first argument must be a vector.
Variables var and category (character) are columns of the dataframe d that user needs to specify in order to get the desired result. I need to refer to this names when I use the function ave() as you can see.
You need to use [[ to get the var and category columns by name:
Ranking_Categoria<-function(d,var,category)
{
d$rank<-ave(d[[var]],d[[category]],FUN=rank)
return(d)
}
... because d$var tries to get the column called "var", and there is none.
UPDATE
> Ranking_Categoria(d, "x", "g")
ID g x rank
1 1 1 10 3
2 2 2 9 2
3 3 3 4 1
4 4 1 11 4
5 5 1 1 1
6 6 2 8 1
7 7 3 6 2
8 8 4 2 1
9 9 4 5 2
10 10 1 3 2
11 11 3 7 3
The best solution would be not to use names at all:
Ranking_Categoria<-function(d,var,category)
{
d$rank<-ave(var,category,FUN=rank)
return(d)
}
Then call it as
Ranking_Categoria(d,d$x,d$g)
The reason why the function in your question didn't work as you thought it would is partially because R's syntax and DWIM-ness for string manipulation sucks. Here's a hacky, fragile solution using eval and parse:
Ranking_Categoria<-function(d,var,category)
{
string=paste('d$rank<-ave(d$',var,',d$',category,',FUN=rank)',sep="")
eval(parse(text=string))
return(d)
}
However, you still have to call it as
Ranking_Categoria(d,"x","g")
And if you already have objects with the names of x and g, then may the gods help you if you try to do Ranking_Categoria(d,x,g)... Crap like this is why I've gone from using Perl and R equally to sticking with Perl (my first and native programming language) and using R only when necessary.

Resources