I am trying to write a function in R that:
1) Receives a data frame and column name as parameters.
2) Performs an operation on the column in the data frame.
func <- function(col, df)
{
col = deparse(substitute(col))
print(paste("Levels: ", levels(df[[col]])))
}
func(Col1, DF)
func(Col2, DF)
mapply(func, colnames(DF)[1:2], DF)
Output
> func(Col1, DF)
[1] "Levels: GREEN" "Levels: YELLOW"
> func(Col2, DF)
[1] "Levels: 0.1" "Levels: 1"
> mapply(func, colnames(DF)[1:2], DF)
Error in `[[.default`(df, col) : subscript out of bounds
Two things :
in your function func, you apply deparse(substitute(col)) to an object col you expected is not a string. So it works with func(Col1, DF). But in your mapply() call, your argument colnames(...) is a string, so it create an error. Same error obtained with func('Col1', DF).
in a mapply() call, all arguments need to be a vector or a list. So you need to use list(df, df), or if you don't want to replicate, remove the argument df of your function func.
This is one alternative that should work:
func <- function(col, df)
{
print(paste("Levels: ", levels(df[,col])))
}
mapply(FUN = func, colnames(DF)[1:2], list(DF, DF))
Please have a look at the last comment of #demarsylvain - maybe a copy-paste error on your side, you should have done:
func <- function(col,df) {
print(paste("Levels: ", levels(df[,col])))
}
mapply(FUN = func, c('Species', 'Species'), list(iris, iris))
you did:
func <- function(col) {
print(paste("Levels: ", levels(df[,col])))
}
mapply(FUN = func, c('Species', 'Species'), list(iris, iris))
Please upvote and accept the solution of #demarsylvain, it works
EDIT to adress your comment:
To have a generic version for an arbitrary list of column names you can use this code, sorry for the loop :)
func <- function(col,df) {
print(paste("Levels: ", levels(df[,col])))
}
cnames = colnames(iris)
i <- 1
l = list()
while(i <= length(cnames)) {
l[[i]] <- iris
i <- i + 1
}
mapply(FUN = func, cnames, l)
Related
I have a function which takes a dataframe and its columns and processes it in various ways (left out for simplicity). We can put in column names as arguments or transform columns directly inside function arguments (like here). I need to find out what object(s) are passed in the function.
Reproducible example:
df <- data.frame(x= 1:10, y=1:10)
myfun <- function(data, col){
col_new <- eval(substitute(col), data)
# magic part
object_name <- ...
# magic part
plot(col_new, main= object_name)
}
For instance, the expected output for myfun(data= df, x*x) is the plot plot(df$x*df$x, main= "x"). So the title is x, not x*x. What I have got so far is this:
myfun <- function(data, col){
colname <- tryCatch({eval(substitute(col))}, error= function(e) {geterrmessage()})
colname <- gsub("' not found", "", gsub("object '", "", colname))
plot(eval(substitute(col), data), main= colname)
}
This function gives the expected output but there must be some more elegant way to find out to which object the input refers to. The answer must be with base R.
Use substitute to get the expression passed as col and then use eval and all.vars to get the values and name.
myfun <- function(data, col){
s <- substitute(col)
plot(eval(s, data), main = all.vars(s), type = "o", ylab = "")
}
myfun(df, x * x)
Anothehr possibility is to pass a one-sided formula.
myfun2 <- function(formula, data){
plot(eval(formula[[2]], data), main = all.vars(formula), type = "o", ylab = "")
}
myfun2(~ x * x, df)
The rlang package can be very powerful when you get a hang of it. Does something like this do what you want?
library(rlang)
myfun <- function (data, col){
.col <- enexpr(col)
unname(sapply(call_args(.col), as_string))
}
This gives you back the "wt" column.
myfun(mtcars, as.factor(wt))
# [1] "wt"
I am not sure your use case, but this would work for multiple inputs.
myfun(mtcars, sum(x, y))
# [1] "x" "y"
And finally, it is possible you might not even need to do this, but rather store the expression and operate directly on the data. The tidyeval framework can help with that as well.
Using rlang, I'd like to have a function that works both when directly called and when passed arguments as part of constructing another function argument by default, e.g.:
refdf = data.frame(x=1:100, y=runif(100,-1,1))
test.helper <- function(z, df) {
qz <- enquo(z)
range(eval_tidy(qz, df))
}
test.helper(y, refdf) # works
test.main <- function(z, df, def = test.helper(z, df)) {
print(def)
}
test.main(y, refdf)
# doesn't work: Error in eval_tidy(qz, df) : object 'y' not found
If instead, I do
refdf = data.frame(x=1:100, y=runif(100,-1,1))
test.helper <- function(z, df) {
qz <- as_quosure(z)
range(eval_tidy(qz, df))
}
test.helper(y, refdf)
# doesn't work: Error in is_quosure(x) : object 'y' not found
test.main <- function(z, df, def = test.helper(enquo(z), df)) {
print(def)
}
test.main(y, refdf)
# now works
I feel like I'm missing something about what gets quoted when; is there an alternative syntax I can use to make both work? I know I could define a separate test.helper_quo or some such, but I'd really like to use the test.helper in the signature (as an extra hint to users about what functions are available).
This should work
library(rlang)
test.helper <- function(z, df) {
qz <- enquo(z)
range(eval_tidy(qz, df))
}
test.helper(y, refdf) # works
test.main <- function(z, df, def = test.helper(!!enquo(z), df)) {
print(def)
}
test.main(y, refdf) # works
# or with rlang >= 0.4.0
test.main <- function(z, df, def = test.helper({{z}}, df)) {
print(def)
}
test.main(y, refdf) # works
Note that in def, we need to capture the quosure passed as z and then expand that into the the call the test.helper so the it's own enquo will be able to see the original symbol.
This is closely related to the question: How do I pass ``...`` to a new environment in R?
And the thread here.
My ultimate goal is to be able to have a function that:
operates on a function and returns a function
The return function creates a new environment with .GlobalEnv as its parent
it evaluates the argument function inside the new environment.
And it solves the save-size problem described below.
The save size problem is the problem that lm (and other, for example ggplot) objects save their calling environments, which can often contain irrelevant information. The goal is to have a convenient wrapper that fixes this problem. To demonstrate:
saveSize <- function (object) {
tf <- tempfile(fileext = ".RData")
on.exit(unlink(tf))
save(object, file = tf)
file.size(tf)
}
tmp_fun <- function(){
iris_big <- lapply(1:10000, function(x) iris)
lm(Sepal.Length ~ Sepal.Width, data = iris)
}
out <- tmp_fun()
object.size(out)
# 48008
saveSize(out)
# 1002448 - Far too large as it contains iris_big.
Bill Dunlap proposes this solution which works:
tmp_fun_Bill <- function(){
iris_big <- lapply(1:10000, function(x) iris)
env <- new.env(parent = globalenv())
with(env, lm(Sepal.Length ~ Sepal.Width, data = iris))
}
out <- tmp_fun_Bill()
object.size(out)
# 48008
saveSize(out)
# 4478 - this works!
I would like to generalize Bill's approach into one of these functions that return functions (like purrr::safely).
My best attempt, with help from #MrFlick:
in_new_env <- function(.f){
function(...) {
params <- list(...)
env <- new.env(parent = globalenv())
# Change the environment of any formula objects
params <- lapply(params, function(x) {if (inherits("x","formula")) {environment(x)<-env}; x})
assign(".params.", params, envir = env)
env$.f <- .f
evalq(do.call(".f", .params.), envir=env)
}
}
tmp_fun_me <- function(){
iris_big <- lapply(1:10000, function(x) iris)
in_new_env(lm)(Sepal.Length ~ Sepal.Width, data = iris)
}
out <- tmp_fun_me()
object.size(out)
# 48008
saveSize(out)
# 1002448 - too big again
Can someone point out what's going wrong here?
The problem is really the that the formula grabs the current environment. Here's a function that will set the environment for the formula to an empty environment and
dropenv <- function(x) {
env <- new.env(parent = globalenv())
if (inherits(x,"formula")) {
environment(x)<-env
}
x
}
tmp_fun_drop <- function(){
iris_big <- lapply(1:10000, function(x) iris)
lm(dropenv(Sepal.Length ~ Sepal.Width), data = iris)
}
But this requires evaulating and parsing the formula argument to the lm() function. Here's a possible workaround for your desired method calling
in_new_env <- function(.f){
function(formula, ...) {
formula <- dropenv(formula)
.f(formula, ...)
}
}
tmp_fun_drop <- function(){
iris_big <- lapply(1:10000, function(x) iris)
in_new_env(lm)(Sepal.Length ~ Sepal.Width, data = iris)
}
Now the function returned by in_new_env assumes that the first parameter will be a formula and will clear out the environment for that formula.
I want to write a function that runs the same analysis on different data.frames. Here is a simple version of my code:
set1 <- data.frame(x=c(1,2,4,6,2), y=c(4,6,3,56,4))
set2 <- data.frame(x=c(3,2,3,8,2), y=c(2,6,3,6,3))
mydata <- c("set1", "set2")
for (dataCount in 1:length(data)) {
lm(x~y, data=mydata)
}
How do I call a data.frame by name inside the function? Right now "data" obviously only returns the the names of "mydata" as a character.
There are number of ways of doing this. Your "native" way would be
mydata <- ls(pattern = "set")
for (dataCount in mydata) {
print(summary(lm(x~y, data=get(dataCount))))
}
or you could collate your data.frames into a list and work on that.
mylist <- list(set1, set2)
lapply(mylist, FUN = function(yourdata) {
print(summary(lm(x ~ y, data = yourdata)))
})
Following works:
plot(Sepal.Length ~ Petal.Width, data = iris)
abline(lm(Sepal.Length ~ Petal.Width, data = iris))
But following code does not work:
str = "Sepal.Length ~ Petal.Width, data = iris"
plot(str)
abline(lm(str))
I tried deparse(substitute), as.forumla and eval but they do not work.
Using str from the question try this:
# fun and args should each be a character string
run <- function(fun, args) eval(parse(text = sprintf("%s(%s)", fun, args)))
run("plot", str)
abline(run("lm", str))
Or try this:
`%(%` <- function(fun, args) run(deparse(substitute(fun)), args)
plot %(% str
abline(lm %(% str)
Note that this approach can handle the situation where there are commas in an argument (as opposed to argument separators) and does not make use of any external packages.
Try to parse arguments and create them :
fun_str<- function(fun, str_arg){
## split args separted by comma
m <- as.list(strsplit(str_arg,',')[[1]])
args <- lapply(m,function(x){
## remove any extra space
arg = str_trim(strsplit(x,'=')[[1]])
if (arg[1]=="data") get(arg[2],parent.frame())
else if (grepl('~',x)) as.formula(x)
})
do.call(fun,args)
}
Then call it :
fun_str("plot",str)
fun_str("lm",str)
Here's another alternative. You could use a call object to represent the data argument and then evaluate it in the argument list.
f <- formula("Sepal.Length ~ Petal.Width")
cl <- call("=", "data", iris)
plot(f, eval(cl))
abline(lm(f, eval(cl)))
It looks like this alternate solution will also work with the original str vector.
str <- "Sepal.Length ~ Petal.Width, data = iris"
s <- strsplit(str, ", data = ")[[1]]
with(setNames(as.list(s), c("formula", "data")), {
getd <- get(data, parent.frame())
plot(f <- formula(formula), data = getd)
abline(lm(f, data = getd))
})