Tilde Operator in map function - r

I have a question regarding the map function in R and the tilde operator ´~´
Why does this code only work that way:
iris_unique <- map(iris, ~length(unique(.)))
and not for example like this
iris_unique <- map(iris, length(unique(iris$Sepal.Length)))
Thanks in advance

Assuming that you are talking about map from the package purrr, this function is designed to map a function over a vector.
length(unique(iris$Sepal.Length)) is a specific value (35 for the standard iris dataset), so
iris_unique <- map(iris, length(unique(iris$Sepal.Length)))
is equivalent to
iris_unique <- map(iris, 35)
since 35 is not a function, this is probably not what you mean. However map() tries to make sense of it. The documentation says that if for the function parameter you pass it a "character vector, numeric vector, or list, it is converted to an extractor function", which means that 35 is converted to the function function(x){x[35]}, hence the net result is to extract the 35th observation of iris.
On the other hand, the documentation also describes how it translates formulas into functions. According to that, the formula ~length(unique(.)) is translated to the function function(x){length(unique(x))}. Since this is a function, it makes perfect sense to map it over a list or vector.

Related

Functions that takes and returns vector in R

I want to create a function to help me convert a vector containing different values of Ghanaian Cedi to Hungarian Forint (1 cedi = 57.06 forint). My function name is Currency; such that if I give the function a vector [1,2,3,4],where1,2,3,4 represents cedi, the function will return me Currency(1), Currency(2), Currency(3) ,Currency(4), which are forints.
I was thinking of using loop to create my function. Before that, I would like to know if there's any easier way to separate the vector?
Vectors are a native data type in R. A numeric value is actually a numeric vector with one element as far as R is concerned.
In other words, R works directly with vectors making loops or apply functions rarely useful for trivial operations. Multiplying by a constant is a standard vector operation.
The function below handles a vector just fine:
convert_currency <- function(cedi) {
cedi * 57.06
}
convert_currency(1:4)
#> [1] 57.06 114.12 171.18 228.24

Calculating multiple ROC curves in R using a for loop and pROC package. What variable to use in the predictor field?

I am using the pROC package and I want to calculate multiple ROC curve plots using a for loop.
My variables are specific column names that are included as string in a vector and I want pROC to read sequentially that vector and use the strings in the field "predictor" that seems to accept text/characters.
However, I cannot parse correctly the variable, as I am getting the error:
'predictor' argument should be the name of the column, optionally quoted.
here is an example code with aSAH dataset:
ROCvector<- c("s100b","ndka")
for (i in seq_along(ROCvector)){
a<-ROCvector[i]
pROC_obj <- roc(data=aSAH, outcome, as.character(a))
#code for output/print#
}
I have tried to call just "a" and using the functions print() or get() without any results.
Writing manually the variable (with or without quoting) works, of course.
Is there something I am missing about the type of variable I should use in the predictor field?
By passing data=aSAH as first argument, you are triggering the non-standard evaluation (NSE) of arguments, dplyr-style. Therefore you cannot simply pass the column name in a variable. Note the inconsistency with outcome that you pass unquoted and looks like a variable (but isn't)? Fortunately, functions with NSE in dplyr come with an equivalent function with standard evaluation, whose name ends with _. The pROC package follows this convention. You should usually use those if you are programming with column names.
Long story short, you should use the roc_ function instead, which accepts characters as column names (don't forget to quote "outcome"):
pROC_obj <- roc_(data=aSAH, "outcome", as.character(a))
A slightly more idiomatic version of your code would be:
for (predictor in ROCvector) {
pROC_obj <- roc_(data=aSAH, "outcome", predictor)
}
roc can accept formula, so we can use paste0 and as.formula to create one. i.e.
library(pROC)
ROCvector<- c("s100b","ndka")
for (i in seq_along(ROCvector)){
a<-ROCvector[i]
pROC_obj <- roc(as.formula(paste0("outcome~",a)), data=aSAH)
print(pROC_obj)
#code for output/print#
}
To can get the original call i.e. without paste0 wich you can use for later for downstream calculations, use eval and bquote
pROC_obj <- eval(bquote(roc(.(as.formula(paste0("outcome~",a))), data=aSAH)))

Evaluating a function that is an argument in another function using quo() in R

I have made a function that takes as an argument another function, the argument function takes as its argument some object (in the example a vector) which is supplied by the original function. It has been challenging to make the function call in the right way. Below are three approaches I have used after having read Programming with dplyr.
Only Option three works,
I would like to know if this is in fact the best way to evaluate a function within a function.
library(dplyr);library(rlang)
#Function that will be passed as an argument
EvaluateThis1 <- quo(mean(vector))
EvaluateThis2 <- ~mean(vector)
EvaluateThis3 <- quo(mean)
#First function that will recieve a function as an argument
MyFunc <- function(vector, TheFunction){
print(TheFunction)
eval_tidy(TheFunction)
}
#Second function that will recieve a function as an argument
MyFunc2 <- function(vector, TheFunction){
print(TheFunction)
quo(UQ(TheFunction)(vector)) %>%
eval_tidy
}
#Option 1
#This is evaluating vector in the global environment where
#EvaluateThis1 was captured
MyFunc(1:4, EvaluateThis1)
#Option 2
#I don't know what is going on here
MyFunc(1:4, EvaluateThis2)
MyFunc2(1:4, EvaluateThis2)
#Option 3
#I think this Unquotes the function splices in the argument then
#requotes before evaluating.
MyFunc2(1:4, EvaluateThis3)
My question is:
Is option 3 the best/most simple way to perform this evaluation
An explanation of what is happening
Edit
After reading #Rui Barradas very clear and concise answer I realised that I am actually trying to do someting similar to below which I didn't manage to make work using Rui's method but solved using environment setting
OtherStuff <-c(10, NA)
EvaluateThis4 <-quo(mean(c(vector,OtherStuff), na.rm = TRUE))
MyFunc3 <- function(vector, TheFunction){
#uses the captire environment which doesn't contain the object vector
print(get_env(TheFunction))
#Reset the enivronment of TheFunction to the current environment where vector exists
TheFunction<- set_env(TheFunction, get_env())
print(get_env(TheFunction))
print(TheFunction)
TheFunction %>%
eval_tidy
}
MyFunc3(1:4, EvaluateThis4)
The function is evaluated within the current environment not the capture environment. Because there is no object "OtherStuff" within that environment, the parent environments are searched finding "OtherStuff" in the Global environment.
I will try to answer to question 1.
I believe that the best and simpler way to perform this kind of evaluation is to do without any sort of fancy evaluation techniques. To call the function directly usually works. Using your example, try the following.
EvaluateThis4 <- mean # simple
MyFunc4 <- function(vector, TheFunction){
print(TheFunction)
TheFunction(vector) # just call it with the appropriate argument(s)
}
MyFunc4(1:4, EvaluateThis4)
function (x, ...)
UseMethod("mean")
<bytecode: 0x000000000489efb0>
<environment: namespace:base>
[1] 2.5
There are examples of this in base R. For instance approxfun and ecdf both return functions that you can use directly in your code to perform subsequent calculations. That's why I've defined EvaluateThis4 like that.
As for functions that use functions as arguments, there are the optimization ones, and, of course, *apply, byand ave.
As for question 2, I must admit to my complete ignorance.

R: How to use named vector to initialize parameter values in mle2

I am a beginner in R and need to use it to estimate a MLE. I am using the mle2 function. However since i have many independent variables, I need to pass many parameter values when calling mle2. For example
library(bbmle)
x <- mle2(probit, start=list(b0=1,b1=1,b2=1,c0=1,c1=1,d0=1,d1=1),method="BFGS")
Instead, I would like to create a vector theta of length 7 and pass that when i call mle2. Something like
x <- mle2(probit, start=theta,method="BFGS")
Exactly this does not work. Do I need to change how I define the function probit accordingly to use a vector as argument?
I went through the existing threads related to passing vectors as arguments and they suggest using do.call, but can that be used to call mle2? If so, how?
many thanks!
If I understand correctly, your goal is to reduce boilerplate by finding a way to pass a single vector of parameter values. You could achieve this by defining a utility function which converts a single vector of un-named entries into a named list suitable for your model and for mle2:
## vector of un-named entries
theta <- c(1,1,1,1,1,1,1)
## utility function
toList <- function(th) structure(as.list(th), names=c("b0","b1","b2","c0","c1","d0","d1"))
## check
identical(toList(theta), list(b0=1,b1=1,b2=1,c0=1,c1=1,d0=1,d1=1))
## now this should work
x <- mle2(probit, start=toList(theta), method="BFGS")

R curve() on expression involving vector

I'd like to plot a function of x, where x is applied to a vector. Anyway, easiest to give a trivial example:
var <- c(1,2,3)
curve(mean(var)+x)
curve(mean(var+x))
While the first one works, the second one gives errors:
'expr' did not evaluate to an object of length 'n' and
In var + x : longer object length is not a multiple of shorter object length
Basically I want to find the minimum of such a function: e.g.
optimize(function(x) mean(var+x), interval=c(0,1))
And then be able to visualise the result. While the optimize function works, I can't figure out how to get the curve() to work as well.. Thanks!
The function needs to be vectorized. That means, if it evaluates a vector it has to return a vector of the same length. If you pass any vector to mean the result is always a vector of length 1. Thus, mean is not vectorized. You can use Vectorize:
f <- Vectorize(function(x) mean(var+x))
curve(f,from=0, to=10)
This can be done in the general case using sapply:
curve(sapply(x, function(e) mean(var + e)))
In the specific example you give, mean(var) + x, is of course arithmetically equivalent to what you're looking for. Similar shortcuts might exist for whatever more complicated function you're working with.

Resources