How can create a function using variables in a dataframe - r

I'm sure the question is a bit dummy (sorry)... I'm trying to create a function using differents variables I have stored in a Dataframe. The function is like that:
mlr_turb <- function(Cond_in, Flow_in, pH_in, pH_out, Turb_in, nm250_i, nm400_i, nm250_o, nm400_o){
Coag = (+0.032690 + 0.090289*Cond_in + 0.003229*Flow_in - 0.021980*pH_in - 0.037486*pH_out
+0.016031*Turb_in -0.026006*nm250_i +0.093138*nm400_o - 0.397858*nm250_o - 0.109392*nm400_o)/0.167304
return(Coag)
}
m4_turb <- mlr_turb(dataset)
The problem is when I try to run my function in a dataframe (with the same name of variables). It doesn't detect my variables and shows this message:
Error in mlr_turb(dataset) :
argument "Flow_in" is missing, with no default
But, actually, there is, also all the variables.
I think I missplace or missing some order in the function that gives it the possibility to take the variables from the dataset. I have searched a lot about that but I have not found any answer...

No dumb questions!
I think you're looking for do.call. This function allows you to unpack values into a function as arguments. Here's a really simple example.
# a simple function that takes x, y and z as arguments
myFun <- function(x, y, z){
result <- (x + y)/z
return(result)
}
# a simple data frame with columns x, y and z
myData <- data.frame(x=1:5,
y=(1:5)*pi,
z=(11:15))
# unpack the values into the function using do.call
do.call('myFun', myData)
Output:
[1] 0.3765084 0.6902654 0.9557522 1.1833122 1.3805309

You meet a standard problem when writing R that is related to the question of standard evaluation (SE) vs non standard evaluation (NSE). If you need more elements, you can have a look at this blog post I wrote
I think the most convenient way to write function using variables is to use variable names as arguments of the function.
Let's take again #Muon example.
# a simple function that takes x, y and z as arguments
myFun <- function(x, y, z){
result <- (x + y)/z
return(result)
}
The question is where R should find the values behind names x, y and z. In a function, R will first look within the function environment (here x,y and z are defined as parameters) then it will look at global environment and then it will look at the different packages attached.
In myFun, R expects vectors. If you give a column name, you will experience an error. What happens if you want to give a column name ? You must say to R that the name you gave should be associated to a value in the scope of a dataframe. You can for instance do something like that:
myFun <- function(df, col1 = "x", col2 = "y", col3 = "z"){
result <- (df[,col1] + df[,col2])/df[,col3]
return(result)
}
You can go far further in that aspect with data.table package. If you start writing functions that need to use variables from a dataframe, I recommend you to start having a look at this package

I like Muon's answer, but I couldn't get it to work if there are columns in the data.frame not in the function. Using the with() function is a simple way to make this work as well...
#Code from Muon:
# a simple function that takes x, y and z as arguments
myFun <- function(x, y, z){
result <- (x + y)/z
return(result)
}
# a simple data frame with columns x, y and z
myData <- data.frame(x=1:5,
y=(1:5)*pi,
z=(11:15),
a=6:10) #adding a var not used in myFun
# unpack the values into the function using do.call
do.call('myFun', myData)
#generates an error for the unused "a" column
#using with() function:
with(myData, myFun(x, y, z))

Related

R: Reference list item within the same list

In R, we can reference items created within that same list, i.e.:
list(a = a <- 1, b = a)
I am curious if there is a way to write a function which takes the place of a = a <- 1. That is, if something like
`%=%` <- function(x,y) {
envir <- environment()
char_x <- deparse(substitute(x))
assign(char_x, y, parent.env(envir))
unlist(lapply(setNames(seq_along(x),char_x), function(T) y))
}
# does not work
list(a%=%1, b=a)
is possible in R (i.e. returns the list given above)?
edit: I think this boils down to asking, 'can we call list with a language object that preserves all aspects of manually coding list?' (specifically, assigns the list's names attribute the left-hand side of the language element).
It seems to me that below shows that such a solution is hopeless.
my_call <- do.call(substitute, list(expr(expr = {x = y}), list(x=quote(a), y=1)))
equals <- languageEl(my_call, which = 1)
str(equals)
do.call(list, list(equals))
Welp, the clever folk behind tibble have figured this out in their lst() function (also in package dplyr)
library(dplyr)
lst(a=1, b=a, c=c(3,4), d=c)
What a useful feature!

Multiple function inputs from single object in R?

I would like to use a single object to pass multiple inputs to a function in R, is this possible? MWE:
df <- data.frame(yes = c(10,20), no = c(50,60),maybe = c(100,200))
fxn <- function(x,y,z){
a = x + y
b = x + z
c = y + z
return(list(a=a,b=b,c=c))
}
foo <- c("rincon","malibu","steamer")
bar <- c("no","maybe")
df[foo] <- fxn(df$yes,df[bar])
In the actual problem, my function has more inputs that are in the default set to NULL. I am working in a dynamic shiny context, so the value and length of bar is changing. Any help for this newbie would be greatly appreciated.
With base R you can build the call using do.call and create a list() of parameters you want to pass to the function
do.call("fxn", c(list(df$yes), unname(df[bar])))
This would be the same as
fxn(df$yes, df[bar][[1]], df[bar][[2]])
We need to use the unname() because otherwise your parameters would be named "no" and "maybe" while your function is expecting "y" and "z".
The the rlang package, you could do
library(rlang)
eval_tidy(quo(fxn(df$yes, !!!unname(df[bar]))))
That uses the !!! splicing operator like some other languages have. Base R does not have such a syntax.

error with interpSpline function

I have a list of data frames, xyz, and in every data frame there are 2 numeric vectors (x and y). I want to apply the interpSpline function from package splines to x and y, but when I do :
lapply(xyz, function (x){
x%>%
interpSpline(x,y)
})
I get the following error:
Error in data.frame(x = as.numeric(obj1), y = as.numeric(obj2)) :
(list) object cannot be coerced to type 'double'
It doesn't work because interpSpline doesn't take a data frame as its first argument.
xyz <- list(data.frame(x=rnorm(10),y=rnorm(10)),
data.frame(x=rnorm(10),y=rnorm(10)))
library(splines)
sf <- function(d) with(d,interpSpline(x,y)))
s <- lapply(xyz,sf)
You could also use interpSpline(d$x,d$y). It might be possible to do enough contortions to get interpSpline to work with pipes, but it hardly seems worth the trouble ...
Per your comment on Ben's answer, interpSpline() requires the input x values to be unique. So, to avoid this error you could use the spline() function instead of interpSpline(). This will set s equal to the interpolated values of the spline at each x,y input coordinate. However, you will not have all the other output that you get from interpSpline().
set.seed(1)
# fake up some data that has duplicate 'x' values
xyz <- list(data.frame(x=round(rnorm(100),1),y=round(rnorm(100),1)),
data.frame(x=round(rnorm(100),1),y=round(rnorm(100),1)))
library(splines)
sf <- function(d) with(d,spline(x,y))
s <- lapply(xyz,sf)

Print.myclass function in R

I am trying to develop my first package in R and I am facing some issues with "myclass" generic functions that i will try to describe.
Assume a data.frame X with n <- nrow(X) rows and K <- ncol(X) columns.
My main package function (too big to put it in this post) lets say
fun1 <- function(X){
# do staff...
out <- list(index= character vector, A= A, B= B,... etc)
return(out)
class(out) <- "myclass"
}
returns as an output a list. Then I have to use the output for the generic print method in a print.myclass function. However, in my print function I want to use the data frame X used in my main function without asking the user to provide it in an argument (i.e, print(out,X)) and without having it in my output list out (visible to the user at least). Is there any way to do that? Thanks in advance!

Refer to a vector anonymously in R

Instead of writing one vector subscript operation a line, such as:
x.and.y <- intersect(x, y)
idx.x <- match(x, x.and.y)
idx.x <- idx.x[!is.na(idx.x)]
I could chain them in one line:
x.and.y <- intersect(x, y)
idx.x <- subset(tmp <- match(x, x.and.y), !is.na(tmp))
In order to do that, I must give intermediate vector a name to be used in subscript operations. To make code even more concise, is there a way to refer to a vector anonymously? Like this:
x.and.y <- intersect(x, y)
idx.x <- match(x, x.and.y)[!is.na] ## illegal R
Considering intersect calls match, what you're doing is redundant. intersect is defined as:
function (x, y)
{
y <- as.vector(y)
unique(y[match(as.vector(x), y, 0L)])
}
And you can get the same result as your 3 lines of code by using %in%: x[y%in%x].
I realize this may not be representative of your actual problem, but "referring to a vector anonymously" doesn't really fit the R paradigm. Function arguments are pass-by-value. You're essentially saying, "I want a function to manipulate an object, but I don't want to provide the object to the function."
You could use R's scoping rules to do this (which is what mplourde did using Filter with an anonymous function), but you're going to create quite a bit of convoluted code that way.

Resources