I'm not quite familiar with R function dealing with variables used.
Here's the problem:
I want to built a function, of which variables ... are column names of data frame used for table().
f <- function (data, ...){
T <- with(data, table(...) # ... variables input
return(T)
}
How can I deal with the code?
Thanks a lot for answering!
The order of evaluation doesn't quite work right with with() apparently. Here's an alternative that should work (using sample data from #DavidArenburg)
set.seed(1)
data1 <- data.frame(a = sample(5,5), b = sample(5,5))
f <- function (data, ...) {
xx <- lapply(substitute(...()), eval, data, parent.frame())
T <- do.call(table, xx)
return(T)
}
f(data = data1, a,b)
It is often far easier to avoid non-standard evaluation and use character strings to reference the columns within a data.frame.
set.seed(1)
data1 <- data.frame(a = sample(5,5), b = sample(5,5))
f <- function (data, ...) {
do.call(table,data[unlist(list(...))])
}
# the following calls to `f` return the same results
f(data = data1, 'a','b')
f(data = data1, c('a','b'))
a <- c('a','b')
f(data = data1, a)
Related
I am trying to write a function with an unspecified number of arguments using ... but I am running into issues where those arguments are column names. As a simple example, if I want a function that takes a data frame and uses within() to make a new column that is several other columns pasted together, I would intuitively write it as
example.fun <- function(input,...){
res <- within(input,pasted <- paste(...))
res}
where input is a data frame and ... specifies column names. This gives an error saying that the column names cannot be found (they are treated as objects). e.g.
df <- data.frame(x = c(1,2),y=c("a","b"))
example.fun(df,x,y)
This returns "Error in paste(...) : object 'x' not found "
I can use attach() and detach() within the function as a work around,
example.fun2 <- function(input,...){
attach(input)
res <- within(input,pasted <- paste(...))
detach(input)
res}
This works, but it's clunky and runs into issues if there happens to be an object in the global environment that is called the same thing as a column name, so it's not my preference.
What is the correct way to do this?
Thanks
1) Wrap the code in eval(substitute(...code...)) like this:
example.fun <- function(data, ...) {
eval(substitute(within(data, pasted <- paste(...))))
}
# test
df <- data.frame(x = c(1, 2), y = c("a", "b"))
example.fun(df, x, y)
## x y pasted
## 1 1 a 1 a
## 2 2 b 2 b
1a) A variation of that would be:
example.fun.2 <- function(data, ...) {
data.frame(data, pasted = eval(substitute(paste(...)), data))
}
example.fun.2(df, x, y)
2) Another possibility is to convert each argument to a character string and then use indexing.
example.fun.3 <- function(data, ...) {
vnames <- sapply(substitute(list(...))[-1], deparse)
data.frame(data, pasted = do.call("paste", data[vnames]))
}
example.fun.3(df, x, y)
3) Other possibilities are to change the design of the function and pass the variable names as a formula or character vector.
example.fun.4 <- function(data, formula) {
data.frame(data, pasted = do.call("paste", get_all_vars(formula, data)))
}
example.fun.4(df, ~ x + y)
example.fun.5 <- function(data, vnames) {
data.frame(data, pasted = do.call("paste", data[vnames]))
}
example.fun.5(df, c("x", "y"))
I have this data frame in R:
x <- seq(1, 10,0.1)
y <- seq(1, 10,0.1)
data_frame <- expand.grid(x,y)
I also have this function:
some_function <- function(x,y) { return(x+y) }
Basically, I want to create a new column in the data frame based on "some_function". I thought I could do this with the "lapply" function in R:
data_frame$new_column <-lapply(c(data_frame$x, data_frame$y),some_function)
This does not work:
Error in `$<-.data.frame`(`*tmp*`, f, value = list()) :
replacement has 0 rows, data has 8281
I know how to do this in a more "clunky and traditional" way:
data_frame$new_column = x + y
But I would like to know how to do this using "lapply" - in the future, I will have much more complicated and longer functions that will be a pain to write out like I did above. Can someone show me how to do this using "lapply"?
Thank you!
When working within a data.frame you could use apply instead of lapply:
x <- seq(1, 10,0.1)
y <- seq(1, 10,0.1)
data_frame <- expand.grid(x,y)
head(data_frame)
some_function <- function(x,y) { return(x+y) }
data_frame$new_column <- apply(data_frame, 1, \(x) some_function(x["Var1"], x["Var2"]))
head(data_frame)
To apply a function to rows set MAR = 1, to apply a function to columns set MAR = 2.
lapply, as the name suggests, is a list-apply. As a data.frame is a list of columns you can use it to compute over columns but within rectangular data, apply is often the easiest.
If some_function is written for that specific purpose, it can be written to accept a single row of the data.frame as in
x <- seq(1, 10,0.1)
y <- seq(1, 10,0.1)
data_frame <- expand.grid(x,y)
head(data_frame)
some_function <- function(row) { return(row[1]+row[2]) }
data_frame$yet_another <- apply(data_frame, 1, some_function)
head(data_frame)
Final comment: Often functions written for only a pair of values come out as perfectly vectorized. Probably the best way to call some_function is without any function of the apply-familiy as in
some_function <- function(x,y) { return(x + y) }
data_frame$last_one <- some_function(data_frame$Var1, data_frame$Var2)
I have written a function to change any one variable (i.e., column) in a data.frame to its unique levels and return the changed data.frame.
I wonder how to change multiple variables at once using my function and get one final data.frame with all the changes?
I have tried the following, but this gives multiple data.frames while only the last data.frame is the desired output:
data <- data.frame(sid = c(33,33, 41), pid = c('Bob', 'Bob', 'Jim'))
#== My function for ONE variable:
f <- function(data, what){
data[[what]] <- as.numeric(factor(data[[what]], levels = unique(data[[what]])))
return(data)
}
# Looping over `what`:
what <- c('sid', 'pid')
lapply(seq_along(what), function(i) f(data, what[i]))
In the function, we could change to return the data[[what]]
f <- function(data, what){
data[[what]] <- as.numeric(factor(data[[what]], levels = unique(data[[what]])))
data[[what]]
}
data[what] <- lapply(seq_along(what), function(i) f(data, what[i]))
Or do
data[what] <- lapply(what, function(x) f(data, x))
Or simply
data[what] <- lapply(what, f, data = data)
My toy dataframe:
d <- data.frame(
value = sample(1:10),
class = sample(c("a","b"), 20, replace = TRUE)
)
I split my data frame up by values of 'class' and put them in a list where each list element is named after its class:
l <- dlply(d, .(class), function(x)return(x))
Then I want to lapply over each class and make a histogram. Note that I do NOT want a facet. I want as many individual files saved as classes. So I define a function doPlots that makes histograms, then ggsaves them (as a_hist.png and b_hist.png, in this example):
doPlots <- function(d, name){
g <- ggplot(data = d, aes(x=value)) +
geom_histogram(binwidth=1)
ggsave(filename=paste(name,"hist.png",sep="_"))
}
However, when I lapply:
lapply(l, FUN=doPlots, name=names(l))
I get Error: device must be NULL, a string or a function.
Thanks in advance.
Two problems with your code, one is that you are passing the entire vector of names to the function. Second, you have not added a plot to save to the ggsave function. You can use mapply to iterate over two or more lists.
doPlots <- function(d, name){
g <- ggplot(data = d, aes(x=value)) +
geom_histogram(binwidth=1)
ggsave(filename=paste(name, "hist.png", sep="_"), g)
}
mapply(doPlots, l, names(l))
Consider base R's by which slices a dataframe by factor levels into a list of elements. You can even pass your user-defined function into it all in one call:
dlist <- by(d, d$class, FUN=function(i) {
name <- max(as.character(i$class))
doPlots(i, name)
})
Using dplyr: is there a way to loop over variables in a data frame and pass both the data and the variable name to a custom function?
I have a solution for this using mapply in base R. In the interest of learning I am wondering if there is a neat dplyr-way to achieve the same result.
Here is a small example, where each column in a data frame is transformed by adding a constant. The constant I wish to add is different for each variable, as listed in myconstants.
library(tidyverse)
mydata <- tibble(
a = 1:5,
b = 1:5,
c = 1:5
)
myconstants <- tibble(
a = 10,
b = 20
)
custom_function <- function (x, y, k) {
constant <- if (is.null(k[[y]])) 0 else k[[y]]
x + constant
}
# solution in base R
foo <- mapply(
custom_function,
mydata,
names(mydata),
MoreArgs = list(k = myconstants)
) %>%
as_tibble()