Apply method in R missing argument (missing argument error) - r

I am trying to use the apply method in R. But I keep getting the error: Error in FUN(newX[, i], ...) : missing argument in "b".
The code which produces the error:
my_data <- data.frame(x1 = 1:5, x2 = 2:6, x3 = 3)
myFunction <- function(a, b, c){
return(a + b + c)
}
results = apply(my_data, 1, myFunction) #this line is producing the error massage
If I change "myFunction" to "sum" for example. Then there is no error. How can I get rid of this error?

Either the function should be changed to
myFunction <- function(x) sum(x)
apply(my_data, 1, myFunction)
#[1] 6 8 10 12 14
Also, the sum operation by row is more efficient with rowSums
rowSums(my_data)
or specify the arguments separately by using a lambda/anonymous function in the OP's original function
apply(my_data, 1, function(x) myFunction(x[1], x[2], x[3]))

You can use mapply:
with(my_data, mapply(myFunction, x1, x2, x3))

Also note your data.frame is a list of 3 vectors. You can add vectors just like a scalar.
with(my_data, x1 + x2 + x3)

Related

How to write a function with an unspecified number of arguments where the arguments are column names

I am trying to write a function with an unspecified number of arguments using ... but I am running into issues where those arguments are column names. As a simple example, if I want a function that takes a data frame and uses within() to make a new column that is several other columns pasted together, I would intuitively write it as
example.fun <- function(input,...){
res <- within(input,pasted <- paste(...))
res}
where input is a data frame and ... specifies column names. This gives an error saying that the column names cannot be found (they are treated as objects). e.g.
df <- data.frame(x = c(1,2),y=c("a","b"))
example.fun(df,x,y)
This returns "Error in paste(...) : object 'x' not found "
I can use attach() and detach() within the function as a work around,
example.fun2 <- function(input,...){
attach(input)
res <- within(input,pasted <- paste(...))
detach(input)
res}
This works, but it's clunky and runs into issues if there happens to be an object in the global environment that is called the same thing as a column name, so it's not my preference.
What is the correct way to do this?
Thanks
1) Wrap the code in eval(substitute(...code...)) like this:
example.fun <- function(data, ...) {
eval(substitute(within(data, pasted <- paste(...))))
}
# test
df <- data.frame(x = c(1, 2), y = c("a", "b"))
example.fun(df, x, y)
## x y pasted
## 1 1 a 1 a
## 2 2 b 2 b
1a) A variation of that would be:
example.fun.2 <- function(data, ...) {
data.frame(data, pasted = eval(substitute(paste(...)), data))
}
example.fun.2(df, x, y)
2) Another possibility is to convert each argument to a character string and then use indexing.
example.fun.3 <- function(data, ...) {
vnames <- sapply(substitute(list(...))[-1], deparse)
data.frame(data, pasted = do.call("paste", data[vnames]))
}
example.fun.3(df, x, y)
3) Other possibilities are to change the design of the function and pass the variable names as a formula or character vector.
example.fun.4 <- function(data, formula) {
data.frame(data, pasted = do.call("paste", get_all_vars(formula, data)))
}
example.fun.4(df, ~ x + y)
example.fun.5 <- function(data, vnames) {
data.frame(data, pasted = do.call("paste", data[vnames]))
}
example.fun.5(df, c("x", "y"))

R cbind with get paste

cbind() function works as x <- cbind(a,b)
where column name 'b' can be specified for the function b = get(paste0('var',i)),
that is x <- cbind(a,b = get(paste0('var',i)))
I am trying to do the following:
x <- cbind(a, get(paste0('var',i))) = j), where "j" can be a vector or a function.
however, got the following error: Error: unexpected '=' in "x <- cbind(a, get(paste0('var',i))) = j)"
If i just specify "x <- cbind(a, get(paste0('var',i))))", then the 2nd column name is "get(paste0('var',i))))", which is not convenient.
How can I define column names with a function get(paste()) within cbind() or rbind() or bind_cols()? Or what would be the alternative solution?
An example would have been helpful to understand the problem but maybe this?
x <- cbind(a, j)
colnames(x)[2] <- get(paste0('var',i))
Or if you want to do it in single line -
x <- cbind(a, setNames(j, get(paste0('var',i))))
We can use
x <- data.frame(a, j)
colnames(x)[2] <- get(paste('var', i, sep=""))
Or use tibble
tibble(a, !! b := j)

Using paste and substitute in combination with quotation marks in R

Please note that I already had a look at this and that but still cannot solve my problem.
Suppose a minimal working example:
a <- c(1,2,3)
b <- c(2,3,4)
c <- c(4,5,6)
dftest <- data.frame(a,b,c)
foo <- function(x, y, data = data) {
data[, c("x","y")]
}
foo(a, b, data = dftest)
Here, the last line obviously returns an Error: undefined columns selected. This error is returned because the columns to be selected are x and y, which are not part of the data frame dftest.
Question: How do I need to formulate the definition of the function to obtain the desired output, which is
> dftest[, c("a","b")]
# a b
# 1 1 2
# 2 2 3
# 3 3 4
which I want to obtain by calling the function foo.
Please be aware that in order for the solution to be useful for my purposes, the format of the function call of foo is to be regarded fixed, that is, the only changes are to be made to the function itself, not the call. I.e. foo(a, b, data = dftest) is the only input to be allowed.
Approach: I tried to use paste and substitute in combination with eval to first replace the x and y with the arguments of the function call and then evaluate the call. However, escaping the quotation marks seems to be a problem here:
foo <- function(x, y, data = data) {
substitute(data[, paste("c(\"",x,"\",\"",y,"\")", sep = "")])
}
foo(a, b, data = dftest)
eval(foo(a, b, data = dftest))
Here, foo(a, b, data = dftest) returns:
dftest[, paste("c(\"", a, "\",\"", b, "\")", sep = "")]
However, when evaluating with eval() (focusing only on the paste part),
paste("c(\"", a, "\",\"", b, "\")", sep = "")
returns:
# "c(\"1\",\"2\")" "c(\"2\",\"3\")" "c(\"3\",\"4\")"
and not, as I would hope c("a","b"), thus again resulting in the same error as above.
Try this:
foo <- function(x, y, data = data) {
x <- deparse(substitute(x))
y <- deparse(substitute(y))
data[, c(x, y)]
}

Why the parameter FUN in tapply is invalid combined with colwise

I usually use the combination of colwise and tapply to calculate grouped values in a data frame. However, I found unexpectedly that the parameter FUN in tapply cannot work correctly with colwise from plyr. The example is as follows:
Data:
df <- data.frame(a = 1:10, b = rep(1:2, each = 5), c = 2:11)
Normal:
library(plyr)
colwise(tapply)(subset(df, select = c(a, c)), df$b, function(x){sum(x[x > 2])})
Above code is correct and can work normally. But if I add FUN, it will be wrong:
colwise(tapply)(subset(df, select = c(a, c)), df$b, FUN = function(x){sum(x[x > 2])})
Error is:
Error in FUN(X[[1L]], ...) :
unused arguments (function (X, INDEX, FUN = NULL, ..., simplify = TRUE)
{
FUN <- if (!is.null(FUN)) match.fun(FUN)
if (!is.list(INDEX)) INDEX <- list(INDEX)
nI <- length(INDEX)
if (!nI) stop("'INDEX' is of length zero")
namelist <- vector("list", nI)
names(namelist) <- names(INDEX)
extent <- integer(nI)
nx <- length(X)
one <- 1
group <- rep.int(one, nx)
ngroup <- one
for (i in seq_along(INDEX)) {
index <- as.factor(INDEX[[i]])
if (length(index) != nx) stop("arguments must have same length")
namelist[[i]] <- levels(index)
extent[i] <- nlevels(index)
group <- group + ngroup * (as.integer(index) - one)
ngroup <- ngroup * nlevels(index)
}
if (is.null(FUN)) return(group)
ans <- lapply(X = split(X, group), FUN = FUN, ...)
index <- as.integer(names(ans))
if (simplify && all(unlist(lapply(ans, length)) == 1)) {
ansmat <- array(dim = extent, dimnames = namelist)
Could anyone explain the reason? Thank you in advance.
Well, the issue is that both lapply and tapply have an optional FUN argument. Note that colwise(tapply) is a function with the following line:
out <- do.call("lapply", c(list(filtered, .fun, ...), dots))
Let's go to this line with our debugger by writing
ct <- colwise(tapply); trace(ct, quote(browser()), at = 6)
and then running
ct(subset(df, select = c(a, c)), df$b, FUN = function(x){sum(x[x > 2])})
Now let's print c(list(filtered, .fun, ...), dots). Notice that the first three (unnamed) arguments are now the dataframe, tapply, and db$b, with the FUN argument above coming in last. However, this argument is named. Since this is a do.call on lapply, instead of that argument becoming an optional parameter for tapply, it now becomes the main call on lapply! So what is happening is that you are turning this into:
lapply(subset(df, select = c(a, c)), function(x){sum(x[x > 2])}, tapply, df$b)
This, of course, makes no sense, and if you execute the above (still in your debugger) manually, you will get the exact same error you are getting. For a simple workaround, try:
tapply2 <- function(.FUN, ...) tapply(FUN = .FUN, ...)
colwise(tapply2)(subset(df, select = c(a, c)), df$b, .FUN = function(x){sum(x[x > 2])})
The plyr package should be checking for ... arguments named FUN (or anything that can interfere with lapply's job), but it doesn't seem the author included this. You can submit a pull request to the plyr package that implements any of the following workarounds:
Defines a local
.lapply <- function(`*X*`, `*FUN*`, ...) lapply(X = `*X*`, `*FUN*`, ...)
(minimizing interference further).
Checks names(list(...)) within the colwise(tapply) function for X and FUN (can introduce problems if the author intended to prevent evaluation of promises until the child call).
Calls do.call("lapply", ...) explicitly with named X and FUN, so that you get the intended
formal argument "FUN" matched by multiple actual arguments

Error using the "prob" package in an R function

I'm attempting to write a function that uses the prob package to compute conditional probabilities. When using the function I continue to encounter the same error, which states an object within the function cannot be found.
Below is a reproducible example in which I compute a conditional probability without the function and then attempt to use the function to produce the same result. I'm not sure if the error is due to limitations with the prob package or an error on my part.
# Load prob package
library(prob)
# Set seed for reproducibility
set.seed(30)
# Sample data frame
sampledata <- data.frame(
X <- sample(1:10),
Y <- sample(c(-1, 0, 1), 10, replace=TRUE))
# Set probability space
S <- probspace(sampledata)
# Subset Y between -1 and 0
A <- subset(S, Y>=-1 & Y<=0)
# Subset X greater than 6
B <- subset(S, X>6)
# Compute conditional probability
P <- prob(A, given=B)
The above code produces the following probability:
> P
[1] 0.25
Attempting to write a function to calculate the same probability:
# Create function with data frame, variables, and conditional inputs
prob.function <- function(df, variable1, variable2, state1, state2, cond1){
s <- probspace(df)
a <- subset(s, variable1>=state1 & variable1<=state2)
b <- subset(s, variable2>cond1)
p <- prob(a, given=b)
return(p)
}
# Demonstrate the function
test <- prob.function(sampledata, Y, X, -1, 0, 6)
This function gives the following error:
Error in eval(expr, envir, enclos) : object 'b' not found
Any help you can provide would be great.
Thanks!
This looks like a bug in prob.
When I run this in Vanilla R, I get the same error. But when I create an object b in my workspace, the error disapears:
> print(b)
Error in print(b) : object 'b' not found
> test <- prob.function(sampledata, Y, X, -1, 0, 6)
Error in eval(expr, envir, enclos) : object 'b' not found
>
> b <- "dummy variable"
> print(b)
[1] "dummy variable"
> test <- prob.function(sampledata, Y, X, -1, 0, 6)
> test
[1] 0.25
>
As a temporary workaround, just create a dummy b in your current environment.
As for the bug, if you look at the source for prob.default (which in the example above is what prob(a, given=b) is eventually calling), you'll see the following section:
if (missing(given)) {
< cropped >
}
else {
f <- substitute(given)
g <- eval(f, x) <~~~~
if (!is.logical(g)) { <~~~~
if (!is.data.frame(given)) <~~~~
stop("'given' must be data.frame or evaluate to logical")
B <- given
}
...
< cropped >
}
it is jumping from g to given, perhaps inadvertently? I would reach out to the package maintainer, as this may be an oversight.
I don't think this is a bug in package prob.
First, you should create you sampledata as
sampledata <- data.frame(
X = sample(1:10),
Y = sample(c(-1, 0, 1), 10, replace=TRUE))
Your original code creates not only this dataframe but also variables X and Y in the global environment which are actually being used later when you call your function.
Second, you shouldn't call subset() inside a function. Use bracket subsetting instead:
prob.function <- function(df, variable1, variable2, state1, state2, cond1){
s <- probspace(df)
a <- s[s[[variable1]]>=state1 & s[[variable1]]<=state2, ]
b <- s[s[[variable2]]>cond1, ]
p <- prob(a, given=b)
return(p)
}
And pass variable1 and variable2 as strings:
test <- prob.function(sampledata, "Y", "X", -1, 0, 6)
Now you have test==0.25, and no error.
References for what is going on:
http://adv-r.had.co.nz/Computing-on-the-language.html#non-standard-evaluation-in-subset
Assignment operators in R: '=' and '<-'
Why is `[` better than `subset`?

Resources