what does the small x means in lapply - r

I have the variables:
trims<- c(0,0.1,0.2,0.5)
x<-rcauchy(100)
and the following operation:
lapply(trims, mean, x=x)
what does the small x refer to in this case? The documentation for lapply does not explain it well either. I do know that for lapply function, it takes a function and apply it to each element of the list, which I believe is trim in this case. How does x come in then?

If we use anonymous function, it will be clear.
res <- lapply(trims, function(y) mean(x, trim=y))
res1 <- lapply(trims, mean, x=x)
identical(res, res1)
#[1] TRUE
The lapply loops through each of the 'trims' and as mean has first argument of x and second argument of trim and the first argument is already mentioned with x=x i.e. the object created with rauncy, naturally the the second argument i.e. trim selects the values in 'trimws'

Related

R why does do.call not match a direct calculation?

In the process of doing something more complicated, I've found the following:
If I use do.call converting a numeric vector to a list, I am getting a different value from the applied function and I'm not sure why.
x <- rnorm(30)
median(x) # -0.01192347
# does not match:
do.call("median",as.list(x)) # -1.912244
Why?
Note: I'm trying to run various functions using a vector of function names. This works with do.call, but only if I get the correct output from do.call.
Thanks for any suggestions.
So do.call expects the args argument to be a list of arguments, so technically we'd want to pass list(x = x):
> set.seed(123)
> x <- rnorm(10)
> median(x)
[1] -0.07983455
> do.call(median,list(x = x))
[1] -0.07983455
> do.call(median,as.list(x))
[1] -0.5604756
Calling as.list on the vector x turns it into a list of length 10, as though you were going to call median and pass it 10 separate arguments. But really, we're passing just one, x. So in the end it just grabs the first element of the vector and passes that to the argument x.

Confused by a vapply function using grepl internally (Part of datacamp course)

hits <- vapply(titles,
FUN = grepl,
FUN.VALUE = logical(length(pass_names)),
pass_names)
titles is a vector with titles such as "mr", pass_names is a list of names.
2 questions.
I don't understand the resulting matrix hits
I don't understand why the last line is pass_names nor what how I am supposed to know about these 4 arguments. Under ?vapply it specificies the x, FUN, FUN.VALUE but I cannot figure out how I am supposed to figure out that pass_names needs to be listed there.
I have looked online and could not find an answer, so I hope this will help others too. Thank you in advance for your answers, yes I am a beginner.
Extra info: This question uses the titanic package in R, pass_names is just titanic$Name, titles is just paste(",", c("Mr\\.", "Master", "Don", "Rev", "Dr\\.", "Major", "Sir", "Col", "Capt", "Jonkheer"))
You're right to be a bit confused.
The vapply code chunk in your question is equivalent to:
hits <- vapply(titles,
FUN = function(x) grepl(x, pass_names),
FUN.VALUE = logical(length(pass_names)))
vapply takes a ... argument which takes as many arguments as are provided. If the arguments are not named (see #Roland's comment), the n-th argument in the ... position is passed to the n+1-th argument of FUN (the first argument to FUN is X, i.e. titles in this case).
The resulting matrix has the same number of rows as the number of rows in titanic and has 10 columns, the length of titles. The [i, j]-th entry is TRUE if the i-th pass_names matches the j-th regular expression in titles, FALSE if it doesn't.
Essentially you are passing two vectors in your vapply which is equivalent to two nested for loops. Each pairing is then passed into the required arguments of grepl: grepl(pattern, x).
Specifically, on first loop of vapply the first item in titles is compared with every item of pass_names. Then on second loop, the second item in titles is compared again to all items of pass_names and so on until first vector, titles, is exhausted.
To illustrate, you can equivalently build a hits2 matrix using nested for loops, rendering exactly as your vapply output, hits:
hits2 <- matrix(NA, nrow=length(df$name), ncol=length(titles))
colnames(hits2) <- titles
for (i in seq_along(df$name)) {
for (j in seq_along(titles)) {
hits2[i, j] <- grepl(pattern=titles[j], x=df$name[i])
}
}
all.equal(hits, hits2)
# [1] TRUE
Alternatively, you can run same exact in sapply without the required FUN.VALUE argument as both sapply and vapply are wrappers to lapply. However, vapply is more preferred as you proactively assert your output while sapply renders one way depending on function. For instance, in vapply you could render an integer matrix with: FUN.VALUE = integer(length(pass_names)).
hits3 <- sapply(titles, FUN = grepl, pass_names)
all.equal(hits, hits3)
# [1] TRUE
All in all, the apply family are more concise, compact ways to run iterations and renders a data structure instead of initializing and assigning a vector/matrix with for or while loops.
For further reading, consider this interesting SO post: Is the “*apply” family really not vectorized?

getting lost in Using which() and regex in R

OK, I have a little problem which I believe I can solve with which and grepl (alternatives are welcome), but I am getting lost:
my_query<- c('g1', 'g2', 'g3')
my_data<- c('string2','string4','string5','string6')
I would like to return the index in my_query matching in my_data. In the example above, only 'g2' is in mydata, so the result in the example would be 2.
It seems to me that there is no easy way to do this without a loop. For each element in my_query, we can use either of the below functions to get TRUE or FALSE:
f1 <- function (pattern, x) length(grep(pattern, x)) > 0L
f2 <- function (pattern, x) any(grepl(pattern, x))
For example,
f1(my_query[1], my_data)
# [1] FALSE
f2(my_query[1], my_data)
# [1] FALSE
Then, we use *apply loop to apply, say f2 to all elements of my_query:
which(unlist(lapply(my_query, f2, x = my_data)))
# [1] 2
Thanks, that seems to work. To be honest, I preferred to your one-line original version. I am not sure why you edited with creating another function to call afterwards with *apply. Is there any advantage as compared to which(lengths(lapply(my_query, grep, my_data)) > 0L)?
Well, I am not entirely sure. When I read ?lengths:
One advantage of ‘lengths(x)’ is its use as a more efficient
version of ‘sapply(x, length)’ and similar ‘*apply’ calls to
‘length’.
I don't know how much more efficient that lengths is compared with sapply. Anyway, if it is still a loop, then my original suggestion which(lengths(lapply(my_query, grep, my_data)) > 0L) is performing 2 loops. My edit is essentially combining two loops together, hopefully to get some boost (if not too tiny).
You can still arrange my new edit into a single line:
which(unlist(lapply(my_query, function (pattern, x) any(grepl(pattern, x)), x = my_data)))
or
which(unlist(lapply(my_query, function (pattern) any(grepl(pattern, my_data)))))
Expanding on a comment posted initially by #Gregor you could try:
which(colSums(sapply(my_query, grepl, my_data)) > 0)
#g2
# 2
The function colSums is vectorized and represents no problem in terms of performance. The sapply() loop seems inevitable here, since we need to check each element within the query vector. The result of the loop is a logical matrix, with each column representing an element of my_query and each row an element of my_data. By wrapping this matrix into which(colSums(..) > 0) we obtain the index numbers of all columns that contain at least one TRUE, i.e., a match with an entry of my_data.

how to get variable names from list

I have list of functions which also contains one user defined function:
> fun <- function(x) {x}
> funs <- c(median, mean, fun)
Is it possible to get function names as strings from this list? My only workaround so far was to create vector which contains function names as strings:
> fun.names <- c("median", "mean", "fun")
When I want to get variable name I use to do this trick (if this is not correct correct me please) but as you can see it only work for one variable not for list:
> as.character(substitute(mean))
[1] "mean"
> as.character(substitute(funs))
[1] "funs"
Is there something that will work also for list? Is there any difference if list contains functions or data types?
EDIT:
I need to pass this list of functions (plus another data) to another function. Then those functions from list will be applied to dataset. Function names are needed because if there are several functions passed in list I want to being able to determine which function was applied. So far I've been using this:
window.size <- c(1,2,3)
combinations <- expand.grid(window.size, c(median, mean))
combinations <- cbind(combinations, rep(c("median","mean"), each = length(window.size)))
Generally speaking, this is not possible. Consider this definition of funs:
funs <- c(median,mean,function(x) x);
In this case, there's no name associated with the user-defined function at all. There's no rule in R that says all functions must be bound to a name at any point in time.
If you want to start making some assumptions about whether and where all such lambdas are defined, then possibilities open up.
One idea is to search the closure environment of each function for an entry that matches (identically) to the function itself, and then use that name. This will incur a performance penalty due to the comparison work, but may be tolerable if you don't have to run it repetitively:
getFunNameFromClosure <- function(fun) names(which(do.call(c,eapply(environment(fun),identical,fun)))[1L]);
Demo:
fun <- function(x) x;
funs <- c(median,mean,fun);
sapply(funs,getFunNameFromClosure);
## [1] "median" "mean" "fun"
Caveats:
1: As explained earlier, this will not work on functions that were never bound to a name. Furthermore, it will not work on functions whose closure environment does not contain a binding to the function. This could happen if the function was bound to a name in a different environment than its closure (via a return value, superassignment, or assign() call) or if its closure environment was explicitly changed.
2: It is possible to bind a function to multiple names. Thus, the name you get as a result of the eapply() search may not be the one you expect. Here's a good demonstration of this:
getFunNameFromClosure(ls); ## gets wrong name
## [1] "objects"
identical(ls,objects); ## this is why
## [1] TRUE
Here is a hacky approach:
funs <- list(median, mean)
fun_names = sapply(funs, function(x) {
s = as.character(deparse(eval(x)))[[2]]
gsub('UseMethod\\(|[[:punct:]]', '', s)
})
names(funs) <- fun_names
funs
$median
function (x, na.rm = FALSE)
UseMethod("median")
<bytecode: 0x103252878>
<environment: namespace:stats>
$mean
function (x, ...)
UseMethod("mean")
<bytecode: 0x103ea11b8>
<environment: namespace:base>
combinations <- expand.grid(window.size, fun_names, c(median, mean))

How do I access function names in R?

I am writing a function that receives two parameters: a data frame, and a function, and, after processing the data frame, summarizes it using the function parameter (e.g. mean, sd,...). My question is, how can I get the name of the function received as a parameter?
How about:
f <- function(x) deparse(substitute(x))
f(mean)
# [1] "mean"
f(sd)
# [1] "sd"
do.call may be what you want here. You can get a function name as character value, and then pass that and a list of arguments to do.call for evaluation. For example:
X<-"mean"
do.call(X,args=list(c(1:5)) )
[1] 3
Perhaps I'm misunderstanding the question, but it seems like you could simply have the function name as a parameter, and evaluate the function like normal within your function. This approach works fine for me. The ellipsis is for added parameters to your function of interest.
myFunc=function(data,func,...){return(func(data,...))}
myFunc(runif(100), sd)
And if you'd want to apply it to every column or row of a data.frame, you could simply use an apply statement in myFunc.
Here's my try, perhaps, you want to return both the result and the function name:
y <- 1:10
myFunction <- function(x, param) {
return(paste(param(x), substitute(param)))
}
myFunction(y, mean)
# [1] "5.5 mean"

Resources