Quick question I was thinking about, I understand if you have one argument in your function with varying parameters, you can use ... and move on. But if you have several arguments with varying parameters, how would that work?
Example:
d <- data.frame(alpha=1:3, beta=4:6, gamma=7:9, type = c("okay", "no", "yes"))
rownames(d) <- d[,4]
test <- function(...){
x <- c(...)
y <- d[x,]
z <- y$alpha
print(z)
}
test("okay","no")
But, in the situation where I might have a second dataframe that I want to include in my function:
d2 <- data.frame(one=2:4, two=10:12, three=20:22, label = c("blue","yellow","red"))
rownames(d2) <- d[,4]
test2 <- function(..., ...){
x <- c(...)
y <- d[x,]
z <- y$alpha
x2 <- c(...) ## ?
}
How would I tell R that the first ... argument is for df, and the second ... argument is for df2?
EDIT -
the function call for the test2 would ideally be: test2("okay","no","blue","red") where the first two arguments are for df, and the other two for df2. I'm just not sure how to tell R how to differentiate between these arguments.
Related
I am trying to write a function with an unspecified number of arguments using ... but I am running into issues where those arguments are column names. As a simple example, if I want a function that takes a data frame and uses within() to make a new column that is several other columns pasted together, I would intuitively write it as
example.fun <- function(input,...){
res <- within(input,pasted <- paste(...))
res}
where input is a data frame and ... specifies column names. This gives an error saying that the column names cannot be found (they are treated as objects). e.g.
df <- data.frame(x = c(1,2),y=c("a","b"))
example.fun(df,x,y)
This returns "Error in paste(...) : object 'x' not found "
I can use attach() and detach() within the function as a work around,
example.fun2 <- function(input,...){
attach(input)
res <- within(input,pasted <- paste(...))
detach(input)
res}
This works, but it's clunky and runs into issues if there happens to be an object in the global environment that is called the same thing as a column name, so it's not my preference.
What is the correct way to do this?
Thanks
1) Wrap the code in eval(substitute(...code...)) like this:
example.fun <- function(data, ...) {
eval(substitute(within(data, pasted <- paste(...))))
}
# test
df <- data.frame(x = c(1, 2), y = c("a", "b"))
example.fun(df, x, y)
## x y pasted
## 1 1 a 1 a
## 2 2 b 2 b
1a) A variation of that would be:
example.fun.2 <- function(data, ...) {
data.frame(data, pasted = eval(substitute(paste(...)), data))
}
example.fun.2(df, x, y)
2) Another possibility is to convert each argument to a character string and then use indexing.
example.fun.3 <- function(data, ...) {
vnames <- sapply(substitute(list(...))[-1], deparse)
data.frame(data, pasted = do.call("paste", data[vnames]))
}
example.fun.3(df, x, y)
3) Other possibilities are to change the design of the function and pass the variable names as a formula or character vector.
example.fun.4 <- function(data, formula) {
data.frame(data, pasted = do.call("paste", get_all_vars(formula, data)))
}
example.fun.4(df, ~ x + y)
example.fun.5 <- function(data, vnames) {
data.frame(data, pasted = do.call("paste", data[vnames]))
}
example.fun.5(df, c("x", "y"))
I have this data frame in R:
x <- seq(1, 10,0.1)
y <- seq(1, 10,0.1)
data_frame <- expand.grid(x,y)
I also have this function:
some_function <- function(x,y) { return(x+y) }
Basically, I want to create a new column in the data frame based on "some_function". I thought I could do this with the "lapply" function in R:
data_frame$new_column <-lapply(c(data_frame$x, data_frame$y),some_function)
This does not work:
Error in `$<-.data.frame`(`*tmp*`, f, value = list()) :
replacement has 0 rows, data has 8281
I know how to do this in a more "clunky and traditional" way:
data_frame$new_column = x + y
But I would like to know how to do this using "lapply" - in the future, I will have much more complicated and longer functions that will be a pain to write out like I did above. Can someone show me how to do this using "lapply"?
Thank you!
When working within a data.frame you could use apply instead of lapply:
x <- seq(1, 10,0.1)
y <- seq(1, 10,0.1)
data_frame <- expand.grid(x,y)
head(data_frame)
some_function <- function(x,y) { return(x+y) }
data_frame$new_column <- apply(data_frame, 1, \(x) some_function(x["Var1"], x["Var2"]))
head(data_frame)
To apply a function to rows set MAR = 1, to apply a function to columns set MAR = 2.
lapply, as the name suggests, is a list-apply. As a data.frame is a list of columns you can use it to compute over columns but within rectangular data, apply is often the easiest.
If some_function is written for that specific purpose, it can be written to accept a single row of the data.frame as in
x <- seq(1, 10,0.1)
y <- seq(1, 10,0.1)
data_frame <- expand.grid(x,y)
head(data_frame)
some_function <- function(row) { return(row[1]+row[2]) }
data_frame$yet_another <- apply(data_frame, 1, some_function)
head(data_frame)
Final comment: Often functions written for only a pair of values come out as perfectly vectorized. Probably the best way to call some_function is without any function of the apply-familiy as in
some_function <- function(x,y) { return(x + y) }
data_frame$last_one <- some_function(data_frame$Var1, data_frame$Var2)
I have a list with 29 data frames.
I am trying to do a simple transformation with ifelse(), that looks something like this:
with(df, ifelse(col1 > x, col1 <- col1-y, col1<-col1+y))
The one thing I can't seem to get is how to change that x and y value so that a different value is used for each data frame in the list.
Here's a quick reproducible example of what I've got so far .. but I want to call different values for x and y from a data frame (e.g. info)
df.1 <- data.frame("df"=rep(c(1), times=4),"length"=c(10:7))
df.2 <- data.frame("df"=rep(c(2),times=4),"length"=c(8:11))
df.3 <- data.frame("df"=rep(c(3),times=4),"length"=c(9:12))
list <- list(df.1,df.2,df.3)
info <- data.frame(x=rep(c(8.5,9.5,10.5)), y=rep(c(1,1.5,2)))
# using static number for x & y but wanting these to be grabbed from the above df and change
# for each list
x <- 8
y <- 1
lapply(list, function(df) {
df <- with(df, ifelse(length > x,
length <- length-y,
length <- length+y)) })
Any and all help/insight is appreciated!
Edited to add clarification:
I would like the rows to match up with lists.
E.g. Row 1 in Info (x=8.5, y=1) is used in the function and applied just to the first data frame in the list (df.1).
When you need to pass more than one value to lapply, you must use mapply instead.
mapply(
function(df, x, y) {
#print("df")
#print(df)
#print("x")
#print(x)
#print("y")
#print(y)
with(df, ifelse(length > x, length <- length - x, length <- length + y))
},
list,
info$x,
info$y
)
I've left some debugging in the code which can enabled in case you want to see how it works.
The case I have is I want to "tack on" a bunch of columns to an existing data.frame, where each column is a function that does math on other columns. My goals are:
I want to specify the functions once
I don't want to worry about having to pass arguments in the right order and/or match them by name
I want to specify the order in which to apply the functions once
I want the new column names to be the function names
Ideally I want something like:
df <- data.frame(a = rnorm(10), b = rnorm(10))
y <- function (x) a + b
z <- function (x) b * y
df2 <- lapply (list (y, z), df)
where df2 is a data.frame with 4 columns: a, b, y and z. I think this achieves the goals.
The closest I've gotten to this is the following:
df <- data.frame(a = rnorm(10), b = rnorm(10))
y <- function (x) x$a + x$b
z <- function (x) x$b * x$y
funs <- list (
y = y,
z = z
)
df2 <- df
df2$y <- funs$y(df2)
df2$z <- funs$z(df2)
This achieves goals 1 and 2, but not 3 and 4.
Thanks in advance for the help.
This maybe the thing you want. After defining the function dfapply, it can be used very similar to your original intention without too much things like x$a etc, except to use expression instead of function.
dfapply <- function(exprs, df){
for (expr in exprs) {
df <- within(df, eval(expr))
}
df
}
df <- data.frame(a = rnorm(10), b = rnorm(10))
expr1 <- expression(y <- a + b)
expr2 <- expression(z <- b * y)
df2 <- dfapply(c(expr1, expr2), df)
I am looking to make a function that takes a vector as input, does some simple arithmetic with the vector and call the new vector something which consists of a set string (say, "log.") plus the original vector name.
d = c(1 2, 3)
my.function <- function { x
x2 <- log(x)
...
I would like the function to return a vector called log.d (that is, not log.x or something set, but something dependent on the name of the vector input as x).
You can try next:
d = c(1, 2, 3)
my.function <- function(x){
x2 <- log(x)
arg_name <- deparse(substitute(x)) # Get argument name
var_name <- paste("log", arg_name, sep="_") # Construct the name
assign(var_name, x2, env=.GlobalEnv) # Assign values to variable
# variable will be created in .GlobalEnv
}
One way to do this would be to store separately names of all your input vector names and then pass them to assign function. Like assign takes text string for output object name, get looks up object from string.
I will assume your vectors all follow common pattern and start with "d", to make it all as dynamic as possible.
d1 <- c(1,2,3)
d2 <- c(2,3,4)
vec_names <- ls(pattern = "^d")
log_vec <- function(x){
log(x)
}
sapply(vec_names, function(x) assign(paste0("log.", x), log_vec(get(x)), envir = globalenv()))
This should create two new objects "log.d1" and "log.d2".