I would like to expand a data.frame to include a new column & give this column a dynamically assigned name passed within a function. Here is a simplified example:
passMyName <-function(df, newColTitle) {
df2 <-data.frame(df, newColTitle = rep(NA, nrow(df)))
return(df2)
}
randomDF <-data.frame(a=1:3, b=4:6, c=7:9)
passMyName(randomDF, myCustomColTitle)
This will add a new column to the data.frame with the variable name defined by the function, "newColTitle". Instead, I would like to evaluate newColTitle as "myCustomColTitle" and pass this as the named tag for this new column, such that the output data.fram contains a new column called "myCustomColTitle".
Is this possible? Perhaps via substitute(), deparse(), quote(), eval(), paste() etc. You can assume that I've read the following primer on this type of topic (several times actually):
http://adv-r.had.co.nz/Computing-on-the-language.html
But some details have not quite stuck with me.
Thanks
I think this would make much more sense if you passed the new name in as a character value. For example
passMyName <-function(df, newColTitle) {
df2 <- cbind(df, setNames(list(rep(NA, nrow(df))), newColTitle))
return(df2)
}
randomDF <-data.frame(a=1:3, b=4:6, c=7:9)
passMyName(randomDF , "myCustomColTitle")
but if you really wanted to use an unevaluated expression, you can use
passMyName <-function(df, newColTitle) {
newColTitle <- deparse(substitute(newColTitle))
df2 <- cbind(df, setNames(list(rep(NA, nrow(df))), newColTitle))
return(df2)
}
randomDF <-data.frame(a=1:3, b=4:6, c=7:9)
passMyName(randomDF, myCustomColTitle)
Related
I need to create a loop function in which I need to address to subsequent objects which names end with numbers i.e. object1, object 2, object3. So the code should look like this:
object1 <- c(1,2,3,4,5)
object2 <- c(2,3,4,5,6)
object3 <- c(3,4,5,6,7)
for (i in 1:3) {
assign (paste0("new_object",i), mean(object???))
}
So I need a equivalent to just typing
new_object1 <- mean(object1)
new_object2 <- mean(object2)
new_object3 <- mean(object3)
Many thanks in advance!
It would be get to return the values of that object by pasteing the 'i' with the 'object' string
for (i in 1:3) {
assign(paste0("new_object",i), mean(get(paste0('object', i)))
}
But, it is not a recommended way as it is creating new objects in the global env.
Instead, if the intention is to get the mean of all the 'object's,
sapply(mget(paste0("object", 1:3)), mean)
Or if there are more than three, use ls with pattern
sapply(mget(ls(pattern = '^object\\d+$')), mean)
Here, mget returns the value of more than one objects in a list, loop through the list with sapply and apply the mean function on the list element.
Creating objects can also be done from the list with list2env
out <- lapply( mget(ls(pattern = '^object\\d+$')), mean)
names(out) <- paste0('new_', names(out))
list2env(out, .GlobalEnv) # not recommended based on the same reason above
I have a variable named SAL_mean created like this (I want to make a loop once I figure this out):
watersheds <- c('ANE', 'SAL', 'CER')
assign(paste0(watersheds[1], '_mean'), read.csv(paste0(watersheds[1], '_mean.csv')))
now the next step should be something like this (which works):
cols_dont_want <- c('B1', 'B2', 'B3')
assign(paste0(watersheds[1], '_mean'), SAL_mean[, !names(SAL_mean) %in% cols_dont_want])
but I wanted to ask how to replace "SAL_mean" by using watersheds[1], because this line of code doesn't work:
assign(paste0(watersheds[1], '_mean'), paste0(watersheds[1], '_mean')[, !names(paste0(watersheds[1], '_mean')) %in% cols_dont_want])
I think it treats the "paste0(watersheds[2], '_mean')" as string and not as a name of variable but I haven't been able to find a solution (I tried for example "as.name" function but it gave me an error "object of type 'symbol' is not subsettable")
Keep dataframes in a list using ?lapply, then it gets easier to carry out same transformations on multiple dataframes in a list, something like:
# set vars
watersheds <- c('ANE', 'SAL', 'CER')
cols_dont_want <- c('B1', 'B2', 'B3')
# result, all dataframes in one list
myList <- lapply(watersheds, function(i){
# read the file
x <- read.csv(paste0(i, "_mean.csv"))
# exclude columns and return
x[, !colnames(x) %in% cols_dont_want]
} )
replace
paste0(watersheds[2], '_mean')
with
eval(parse(text = paste0(watersheds[2], '_mean')))
and it should work. Your guess is correct, paste0 just gives you a string but you need to call the variable which is done using eval()
Or you can do it in a for loop (some find the syntax more understandable). It's equivalent to zx8754's solution, except it assigns names to each dataframe as per the OP. It's trivial to modify zx8754's solution do do the same.
watersheds <- c('ANE', 'SAL', 'CER')
cols_dont_want <- c('B1', 'B2', 'B3')
ws.list <- list()
for (i in 1:length(watersheds)) {
ws.list[[i]] <- read.csv(paste0(watersheds[i], '_mean.csv'))
names(ws.list)[i] <- paste0(watersheds[i], '_mean')
ws.list[[i]] <- ws.list[[i]][!names(ws.list[[i]]) %in% cols_dont_want]
}
names(ws.list)
# "ANE_mean" "SAL_mean" "CER_mean"
# If you absolutely want to call the data.frames by their
# individual names, you can do so after you attach() the list.
attach(ws.list)
ANE_mean
I'm trying to write a function with dynamic arguments (i.e. the function argument names are not determined beforehand). Inside the function, I can generate a list of possible argument names as strings and try to extract the function argument with the corresponding name (if given). I tried using match.arg, but that does not work.
As a (massively stripped-down) example, consider the following attempt:
# Override column in the dataframe. Dots arguments can be any
# of the column names of the data.frame.
dataframe.override = function(frame, ...) {
for (n in names(frame)) {
# Check whether this col name was given as an argument to the function
if (!missing(n)) {
vl = match.arg(n);
# DO something with that value and assign it as a column:
newval = vl
frame[,n] = newval
}
}
frame
}
AA = data.frame(a = 1:5, b = 6:10, c = 11:15)
dataframe.override(AA, b = c(5,6,6,6,6)) # Should override column b
Unfortunately, the match.arg apparently does not work:
Error in match.arg(n) : 'arg' should be one of
So, my question is: Inside a function, how can I check whether the function was called with a given argument and extract its value, given the argument name as a string?
Thanks,
Reinhold
PS: In reality, the "Do something..." part is quite complicated, so simply assigning the vector to the dataframe column directly without such a function is not an option.
You probably want to review the chapter on Non Standard Evaluation in Advanced-R. I also think Hadley's answer to a related question might be useful.
So: let's start from that other answer. The most idiomatic way to get the arguments to a function is like this:
get_arguments <- function(...){
match.call(expand.dots = FALSE)$`...`
}
That provides a list of the arguments with names:
> get_arguments(one, test=2, three=3)
[[1]]
one
$test
[1] 2
$three
[1] 3
You could simply call names() on the result to get the names.
Note that if you want the values as strings you'll need to use deparse, e.g.
deparse(get_arguments(one, test=2, three=3)[[2]])
[1] "2"
P.S. Instead of looping through all columns, you might want to use intersect or setdiff, e.g.
dataframe.override = function(frame, ...) {
columns = names(match.call(expand.dots = FALSE)$`...`)[-1]
matching.cols <- intersect(names(frame), names(columns))
for (i in seq_along(matching.cols) {
n = matching.cols[[i]]
# Check whether this col name was given as an argument to the function
if (!missing(n)) {
vl = match.arg(n);
# DO something with that value and assign it as a column:
newval = vl
frame[,n] = newval
}
}
frame
}
P.P.S: I'm assuming there's a reason you're not using dplyr::mutate for this.
I'm trying to understand how SE works in dplyr so I can use variables as inputs to these functions. I'm having some trouble with understanding how this works across the different functions and when I should be doing what. It would be really good to understand the logic behind this.
Here are some examples:
library(dplyr)
library(lazyeval)
a <- c("x", "y", "z")
b <- c(1,2,3)
c <- c(7,8,9)
df <- data.frame(a, b, c)
The following is exactly why i'd use SE and the *_ variant of a function. I want to change the name of what's being mutated based on another variable.
#Normal mutate - copies b into a column called new
mutate(df, new = b)
#Mutate using a variable column names. Use mutate_ and the unqouted variable name. Doesn't use the name "new", but use the string "col.new"
col.name <- "new"
mutate_(df, col.name = "b")
#Do I need to use interp? Doesn't work
expr <- interp(~(val = b), val = col.name)
mutate_(df, expr)
Now I want to filter in the same way. Not sure why my first attempt didn't work.
#Apply the same logic to filter_. the following doesn't return a result
val.to.filter <- "z"
filter_(df, "a" == val.to.filter)
#Do I need to use interp? Works. What's the difference compared to the above?
expr <- interp(~(a == val), val = val.to.filter)
filter_(df, expr)
Now I try to select_. Works as expected
#Apply the same logic to select_, an unqouted variable name works fine
col.to.select <- "b"
select_(df, col.to.select)
Now I move on to rename_. Knowing what worked for mutate and knowing that I had to use interp for filter, I try the following
#Now let's try to rename. Qouted constant, unqouted variable. Doesn't work
new.name <- "NEW"
rename_(df, "a" = new.name)
#Do I need an eval here? It worked for the filter so it's worth a try. Doesn't work 'Error: All arguments to rename must be named.'
expr <- interp(~(a == val), val = new.name)
rename_(df, expr)
Any tips on best practice when it comes to using variable names across the dplyr functions and when interp is required would be great.
The differences here are not related to which dplyr verb you are using. They are related to where you are trying to use the variable. You are mixing whether the variable is used as a function argument or not, and whether it should be interpreted as a name or as a character string.
Scenario 1:
You want to use your variable as an argument name. Such as in your mutate example.
mutate(df, new = b)
Here new is the name of a function argument, it is left of a =. The only way to do this is to use the .dots argument. Like
col.name <- 'new'
mutate_(df, .dots = setNames(list(~b), col.name))
Running just setNames(list(~b), col.name) shows you how we have an expression (~b), which is going right of the =, and the name is going left of the =.
Scenario 2:
You want to give only a variable as a function argument. This is the simplest case. Let's again use mutate(df, new = b), but in this case we want b to be variable. We could use:
v <- 'b'
mutate_(df, .dots = setNames(list(v), 'new'))
Or simply:
mutate_(df, new = b)
Scenario 3
You want to do some combinations of variable and fixed things. That is, your expression should only be partly variable. For this we use interp. For example, what if we would like to do something like:
mutate(df, new = b + 1)
But being able to change b?
v <- 'b'
mutate_(df, new = interp(~var + 1, var = as.name(v)))
Note that we as.name to make sure that we insert b into the expression, not 'b'.
I'm using data.table package and trying to write a function (shown below):
require(data.table)
# Function definition
f = function(path, key) {
table = data.table(read.delim(path, header=TRUE))
e = substitute(key)
setkey(table, e) # <- Error in setkeyv(x, cols, verbose = verbose) : some columns are not in the data.table: e
return(table)
}
# Usage
f("table.csv", ID)
Here I try to pass an expression to the function. Why this code doesn't work?
I've already tried different combinations of substitute(), quote() and eval(). So, it'd be great if you could also explain how to get this to work.
First, let's look at how the setkey function does things from the data.table package:
# setkey function
function (x, ..., verbose = getOption("datatable.verbose"))
{
if (is.character(x))
stop("x may no longer be the character name of the data.table. The possibility was undocumented and has been removed.")
cols = getdots()
if (!length(cols))
cols = colnames(x)
else if (identical(cols, "NULL"))
cols = NULL
setkeyv(x, cols, verbose = verbose)
}
So, when you do:
require(data.table)
dt <- data.table(ID=c(1,1,2,2,3), y = 1:5)
setkey(dt, ID)
It calls the function getdots which is internal to data.table (that is, it's not exported). Let's have a look at that function:
# data.table:::getdots
function ()
{
as.character(match.call(sys.function(-1), call = sys.call(-1),
expand.dots = FALSE)$...)
}
So, what does this do? It takes the parameter you entered in setkey and it uses match.call to extract the arguments separately. That is, the match.call argument for this example case would be:
setkey(x = dt, ... = list(ID))
and since it's a list, you can access the ... parameter with $... to get a list of 1 element with its value ID and converting to this list to a character with as.character results in "ID" (a character vector). And then setkey passes this to setkeyv internally to set the keys.
Now why doesn't this work when you write setkey(table, key) inside your function?
This is precisely because of the way setkey/getdots is. The setkey function is designed to take any argument after the first argument (which is a data.table) and then return the ... argument as a character.
That is, if you give setkey(dt, key) then it'll return cols <- "key". If you give setkey(dt, e), it'll give back cols <- "e". It doesn't look for if "key" is an existing variable and then if so substitute the value of the variable. All it does is convert the value you provide (whether it be a symbol or character) back to a character.
Of course this won't work in your case because you want the value in key = ID to be provided in setkey. At least I can't think of a way to do this.
How to get around this?
As #agstudy already mentions, the best/easiest way is to pass "ID" and use setkeyv. But, if you really insist on using f("table.csv", ID) then, this is what you could do:
f <- function(path, key) {
table = data.table(read.delim(path, header=TRUE))
e = as.character(match.call(f)$key)
setkeyv(table, e)
return(table)
}
Here, you first use match.call to get the value corresponding to argument key and then convert it to a character and then pass that to setkeyv.
In short, setkey internally uses setkeyv. And imho, setkey is a convenient function to be used when you already know the column name of the data.table for which you need to set the key. Hope this helps.
I can't tell from your code what you're trying to achieve, so I'll answer the question the title asks instead; "How to pass an expression through a function?"
If you want to do this (this should be avoided where possible), you can do the following:
f <- function(expression) {
return(eval(parse(text=expression)))
}
For example:
f("a <- c(1,2,3); sum(a)")
# [1] 6