Refer to a vector anonymously in R - r

Instead of writing one vector subscript operation a line, such as:
x.and.y <- intersect(x, y)
idx.x <- match(x, x.and.y)
idx.x <- idx.x[!is.na(idx.x)]
I could chain them in one line:
x.and.y <- intersect(x, y)
idx.x <- subset(tmp <- match(x, x.and.y), !is.na(tmp))
In order to do that, I must give intermediate vector a name to be used in subscript operations. To make code even more concise, is there a way to refer to a vector anonymously? Like this:
x.and.y <- intersect(x, y)
idx.x <- match(x, x.and.y)[!is.na] ## illegal R

Considering intersect calls match, what you're doing is redundant. intersect is defined as:
function (x, y)
{
y <- as.vector(y)
unique(y[match(as.vector(x), y, 0L)])
}
And you can get the same result as your 3 lines of code by using %in%: x[y%in%x].
I realize this may not be representative of your actual problem, but "referring to a vector anonymously" doesn't really fit the R paradigm. Function arguments are pass-by-value. You're essentially saying, "I want a function to manipulate an object, but I don't want to provide the object to the function."
You could use R's scoping rules to do this (which is what mplourde did using Filter with an anonymous function), but you're going to create quite a bit of convoluted code that way.

Related

What's the meaning of the (x) in the following passage?

During my free time R studying, I read this article on usage of return(). In there, I came across a function, of which one element's meaning escapes my technical background, please see below:
bench_nor2 <- function(x,repeats) { system.time(rep(
# without explicit return
(function(x)vector(length=x,mode="numeric"))(x),repeats)) }
I've played with the codes of the article, but the logic behind this tiny (x) (specifically, it's 2nd occurrence) in the 3rd line is unclear to me.
It's an anonymous function. If we unwrap the code
bench_nor2 <- function(x,repeats) { system.time(rep(
# without explicit return
(function(x)
vector(length=x,mode="numeric")
)(x),
repeats)) }
we can see that within the rep( ... ) call, the first argument is
(function(x)vector(length=x,mode="numeric"))(x)
Now, this is a curious way of putting it. But what you get is that function(x) vector(...) defines a one-liner function (which calls vector to create a numeric vector of length x). Wrapped in parenthesis (function(x) ...) returns the function, and then with (function(x) ...)(x) calls the anonymous function with argument x.
You would get the same result from:
my_vector <- function(y) vector(length=y, mode="numeric")
bench_nor2 <- function(x, repeats) {system.time(rep(my_vector(x), repeats))}

How can create a function using variables in a dataframe

I'm sure the question is a bit dummy (sorry)... I'm trying to create a function using differents variables I have stored in a Dataframe. The function is like that:
mlr_turb <- function(Cond_in, Flow_in, pH_in, pH_out, Turb_in, nm250_i, nm400_i, nm250_o, nm400_o){
Coag = (+0.032690 + 0.090289*Cond_in + 0.003229*Flow_in - 0.021980*pH_in - 0.037486*pH_out
+0.016031*Turb_in -0.026006*nm250_i +0.093138*nm400_o - 0.397858*nm250_o - 0.109392*nm400_o)/0.167304
return(Coag)
}
m4_turb <- mlr_turb(dataset)
The problem is when I try to run my function in a dataframe (with the same name of variables). It doesn't detect my variables and shows this message:
Error in mlr_turb(dataset) :
argument "Flow_in" is missing, with no default
But, actually, there is, also all the variables.
I think I missplace or missing some order in the function that gives it the possibility to take the variables from the dataset. I have searched a lot about that but I have not found any answer...
No dumb questions!
I think you're looking for do.call. This function allows you to unpack values into a function as arguments. Here's a really simple example.
# a simple function that takes x, y and z as arguments
myFun <- function(x, y, z){
result <- (x + y)/z
return(result)
}
# a simple data frame with columns x, y and z
myData <- data.frame(x=1:5,
y=(1:5)*pi,
z=(11:15))
# unpack the values into the function using do.call
do.call('myFun', myData)
Output:
[1] 0.3765084 0.6902654 0.9557522 1.1833122 1.3805309
You meet a standard problem when writing R that is related to the question of standard evaluation (SE) vs non standard evaluation (NSE). If you need more elements, you can have a look at this blog post I wrote
I think the most convenient way to write function using variables is to use variable names as arguments of the function.
Let's take again #Muon example.
# a simple function that takes x, y and z as arguments
myFun <- function(x, y, z){
result <- (x + y)/z
return(result)
}
The question is where R should find the values behind names x, y and z. In a function, R will first look within the function environment (here x,y and z are defined as parameters) then it will look at global environment and then it will look at the different packages attached.
In myFun, R expects vectors. If you give a column name, you will experience an error. What happens if you want to give a column name ? You must say to R that the name you gave should be associated to a value in the scope of a dataframe. You can for instance do something like that:
myFun <- function(df, col1 = "x", col2 = "y", col3 = "z"){
result <- (df[,col1] + df[,col2])/df[,col3]
return(result)
}
You can go far further in that aspect with data.table package. If you start writing functions that need to use variables from a dataframe, I recommend you to start having a look at this package
I like Muon's answer, but I couldn't get it to work if there are columns in the data.frame not in the function. Using the with() function is a simple way to make this work as well...
#Code from Muon:
# a simple function that takes x, y and z as arguments
myFun <- function(x, y, z){
result <- (x + y)/z
return(result)
}
# a simple data frame with columns x, y and z
myData <- data.frame(x=1:5,
y=(1:5)*pi,
z=(11:15),
a=6:10) #adding a var not used in myFun
# unpack the values into the function using do.call
do.call('myFun', myData)
#generates an error for the unused "a" column
#using with() function:
with(myData, myFun(x, y, z))

r function in function arguments + apply

I'm having troubles using several functions within the same one and calling the arguments generated. I'm using a more complicated function that can be simplified as followed:
func.essai <- function(x) {
g <- sample(seq(1,30), x)
i <- sample(x,1)
func.essai.2 <- function(y,i) {
z <- y+i
}
h <- sapply(g,func.essai.2(y,i))
}
sq <- seq(1,4)
lapply(sq, func.essai)
I'm using arguments that are generated at the beginning of func.essai (and that depend on x) as a fixed input for func.essai.2, here for i, and as a vector to go through on the sapply function, here for g. This code doesn't work as such -- it doesn't recognize y and/or i. How can I rewrite the code to do so?
I think the error you get is because of your use of sapply. This should work instead of your line containing sapply:
h <- sapply(g,func.essai.2, i)
See ?sapply, which tells you that you should provide additional arguments behind the function that you are applying.

when the iterable is NOT the first argument of the function

The problem is quite simple yet I can't find the answer.
I have myfun <- function(x, y). How can I sapply this function over a list of y?
To apply over x I would do this
iterables <- 1:10
sapply(iterables, myfun, y)
But I want the iterables to be y instead.
You have several options - e.g. one mentioned by sgibb which relies on how R interprets function arguments, i.e. that myfun(y, x = x) is the same as myfun(x, y).
I prefer creating anonymous functions since it's easier to understand what's happening:
sapply(iterables, function(iter) myfun(x, iter))

R curve() on expression involving vector

I'd like to plot a function of x, where x is applied to a vector. Anyway, easiest to give a trivial example:
var <- c(1,2,3)
curve(mean(var)+x)
curve(mean(var+x))
While the first one works, the second one gives errors:
'expr' did not evaluate to an object of length 'n' and
In var + x : longer object length is not a multiple of shorter object length
Basically I want to find the minimum of such a function: e.g.
optimize(function(x) mean(var+x), interval=c(0,1))
And then be able to visualise the result. While the optimize function works, I can't figure out how to get the curve() to work as well.. Thanks!
The function needs to be vectorized. That means, if it evaluates a vector it has to return a vector of the same length. If you pass any vector to mean the result is always a vector of length 1. Thus, mean is not vectorized. You can use Vectorize:
f <- Vectorize(function(x) mean(var+x))
curve(f,from=0, to=10)
This can be done in the general case using sapply:
curve(sapply(x, function(e) mean(var + e)))
In the specific example you give, mean(var) + x, is of course arithmetically equivalent to what you're looking for. Similar shortcuts might exist for whatever more complicated function you're working with.

Resources