Bound variable and sapply - r

I am used to use apply familiy functions to avoid for loop with R. In this context I was wondering it there is a way to avoid typing a bound variable. For example, say I want to do 100 times an operation do.call(myfun, args). With for I'd write:
res = seq(100)
for(i in seq(100)){res[i] = do.call(myfun, args)}
with apply I type:
res = sapply(seq(100), function(i) do.call(myfun, args))
I understand that sapply tries to apply the function to one argument, it is an element of seq(100), but is there a way to avoid this, because indeed this variable (here i) has no meaning neither utility ?
thanks for the insight

Related

What does declaring function at the start of a line do?

I encountered this code:
res <- lapply(strsplit(s, "\n")[[1]],
(function (str) paste(rev(strsplit(str, "")[[1]]), collapse = "")))
The secodnd line reverses each of the splitted strings at the first line.
How does it do that? Namely, what does calling 'function' at the start do?
Calling lapply takes and performs some function on each list element. It takes the form lapply(list_data, some_function). So, for instance, if I have a list of integers and want to find out how many integers are in each list element, I would run:
list_data <- list(list1 = 1:5,
list2 = 6:10,
list3 = 11:30)
lapply(list_data, length)
The function here is length, which is a function that is inherent in R. Some functions aren't defined in R, say if I want define my own formula for each value in the list, I could define my own function. Calling a function allows users to define a function that is not already in R or an R library. Like so:
lapply(list_data, function(x) x^2+4-x^3)
The function here is x^2+4-x^3, which is not defined in R programming itself.
So in your example, your data is strsplit(s, "\n")[[1]] and it is taking that data and applying the function paste(rev(strsplit(str, "")[[1]]), collapse = "")) to each element in the data.
Note that in my example, I put function(x) - your example puts function(str) - what's in the parentheses doesn't matter and is user defined. For example lapply(list_data, function(str) str^2+4-str^3) will return the same thing as lapply(list_data, function(x) x^2+4-x^3)
Please note that broad "learning" style questions like this are not exactly what this site is for, and this question will likely get removed and/or receive some negative feedback. Since you are new to this site and to R, I'm providing this answer but I would not be surprised if the question is removed. Just trying to help both you and the SO community!

inverting an index using clusters

This code is about inverting an index using clusters.
Unfortunately I do not understand the line with recognize<-...
I know that the function Vectorize applies the inner function element-wise, but I do not understand the inner function here.
The parameters (uniq, test) are not defined, how can we apply which then? Also why is there a "uniq" as text right after?
slots <- as.integer(Sys.getenv("NSLOTS"))
cl <- makeCluster(slots, type = "PSOCK")
inverted_index4<-function(x){
y <- unique(x)
recognize <- Vectorize(function(uniq,text) which(text %in% uniq),"uniq",SIMPLIFY = F)
y2 <- parLapply(cl, y, recognize, x)
unlist(y2,recursive=FALSE)
}
The
Vectorise()
function is just making a new element wise, vectorised function of the custom function
function(uniq,text) which(text %in% uniq).
The 'uniq' string is the argument of that function that you must specify you want to iterate over. Such that now you can pass a vector of length greater than one for uniq, and get returned a list with an element for the output of the function evaluated for every element of the input vector uniq.
I would suggest the author make the code a little clearer, better commented etc. the vectorise function doesn't need to be inside the function call necessarily.
Note
ParLapply()
isn't a function I recognise. But the x will be passed to the recognise function and the second argument text should presumably be defined earlier on, in the global environment, .GlobalEnv().

If my function doesn't work on every object, how do I skip those objects?

I am trying to write a function and apply it to a list. Inside my function is a function written by some one else. If I make my list very easy, everything will work fine. But if I use all the real data I have, there are some bad objects and the outside function doesn't work and my whole function won't go through.
What do I type to say "If the outside function doesn't work, skip that object and move to the next one in the list."? With or without NA, doesn't matter.
I cannot figure out how to write a reproducible example that would result in a list of dataframes, which is what happens inside this function. I'm willing to take any help to improve this question.
My function is something like this:
do_this<- function(x){
outside_function(x))%>% #this returns a dataframe for each object
filter()%>%
select()%>%
summarise_each(funs(mean(., na.rm = TRUE))) #by the end the df is down to one row
}
This is how I apply the function to the list to come up with my final dataframe.
df<-bind_rows(lapply(my_list, do_this))
An example:
myfun <- function(x) {if (x == 1) {stop("bad")} else x}
throws error on input of 1:
lapply(1:4, myfun) # stops from error
Just wrap it in try (as long as you don't need more complex error handling):
L <- lapply(1:4, function(x) try(myfun(x)))
And then you can use Filter to get rid of the "bad" cases:
Filter(function(x) !inherits(x, "try-error"), L)
Although you may want to just make your wrapper function more robust, or return NULL (or some other appropriate value) under the condition that makes the inner function fail.

How can I microbenchmark::microbenchmark the same function with different arguments programmatically?

I have an unbounded named list of arguments for a function that I plan to use positionally, e.g.
list(
method1 = "method1",
method2 = "method2",
...,
methodn = "methodn"
)
with
function(method) {
if (identical(method, "method1")) {Sys.sleep(1); return(NULL)}
if (identical(method, "method2")) {Sys.sleep(2); return(NULL)}
Sys.sleep(nchar(method))
return(NULL)
}
How can I use package:microbenchmark to benchmark my given function using the provided arguments? Bonus points if the benchmark itself is named as the positional argument is named in my source list.
The prime for package:microbenchmark use I've seen scattered about is where the tasks to be benchmarked are specified in dots. The argument list is available for evaluating unevaluated expressions; and that seems like the correct route for programmatic use. However, because expression() treats the inside of the parens as literal, I haven't found a way to inject my argument inside of expression(). I walked down a dark road with parse(), and got it working - but it seems like there has to be a better way.
One solution is to use cat and sprintf with a for loop, although it might become problematic if you have many combination of parameters.
cat("res <- microbenchmark(\n")
for (i in 1:4){
for (j in 1:4) {
cat(sprintf("f_%i_%i = f(%i, %i),\n", i, j, i, j))
}
}
cat(")\n")
Then copy-paste and run the code (remove the comma from the penultimate line).

How to build out more complex vectorized operations?

I have a sublist of principal component rotation vectors computed by prcomp, where each list item is an Nx2 array (i.e., two column vectors), for each class.
Using those vectors, I'd like to project some data similarly structured into a list of classes, each class item containing arrays with dimension NxMxT, where T is the number of trials.
My problem is, I can write simple vectorized functions with apply and its variants, but I'm having trouble generalizing this to apply that over each list.
Example data:
somedata <- list(array(rnorm(100),dim=c(5,4,5)),array(rnorm(100),dim=c(5,4,5)))
somevectors <- list(array(rnorm(10),dim=c(5,2)),array(rnorm(10),dim=c(5,2)))
Here is a simple example of the operation over each list element:
o.proj.1 <- apply(somedata[[1]],3,function(x){
t(somevectors[[1]]) %*% x
}) # returns an array where each projected trial is a column
I tried fitting this inside a call to lapply(), but didn't find much success:
lapply(somedata, y = somevectors, function(x,y){
apply(x,3,function(z){
t(y) %*% z
})
})
Error in t(y) %*% z : requires numeric/complex matrix/vector arguments
Basically my algorithm is to put the appropriate apply type (here lapply) around the more local function and remove the index that will be vectorized (here [[]]). What am I missing?
Of the *apply family of functions, mapply is the one to use when you want to loop simultaneously over two or more objects. Try:
o.proj <- mapply(function(x,y){
apply(x,3,function(z){
t(y) %*% z
})
}, somedata, somevectors, SIMPLIFY = FALSE)
I suppose you will want to use SIMPLIFY = FALSE to return a list, otherwise mapply will attempt to simplify your output into an array, a little like sapply does.
Also know that you can use Map as a shortcut for mapply(..., SIMPLIFY = FALSE).

Resources