I have two containers, conty and contx. The values of both are tied to each other. conty[1] relates to contx[1] etc. while using apply on contx I want to access the index inside an apply structure so I can put values from corresponding element in conty into contz depending upon the index of x.
lapply(contx, function(x) {
if (x==1) append(contz,conty[xindex])
})
I could easily do this in a for loop but everybody insists that using the apply is better. And I tried to look for examples but the only thing I could find was mostly stuff for generating maps where it wasn't entirely clear how I could adapt to my problem.
There are a few issues here.
"everybody insists that using the apply is better". Sorry, but they're wrong; it's not necessarily better. See the old-school Burns Inferno ("If you are using R and you think you’re in hell, this is a map for you"), chapter 4 ("Overvectorization"):
A common reflex is to use a function in the apply family. This is not vectorization, it is loop-hiding. The apply function has a for loop in its definition. The lapply function buries the loop, but execution times tend to be roughly equal to an explicit for loop ... Base your decision of using an apply function on Uwe’s Maxim (page 20). The issue is of human time rather than silicon chip time. Human time can be wasted by taking longer to write the code, and (often much more importantly) by taking more time to understand subsequently what it does.
However, what you are doing that's bad is growing an object (also covered in the Inferno). Assuming that in your example contz started as an empty list, this should work (is my example reflective of your use case?)
x <- c(1,2,3,1)
conty <- list("a","b","c","d")
contz <- conty[which(x==1)]
Alternatively, if you want to use both the value and the index in your function, you can write a two-variable function f(val,index) and then use Map(f,my_list,seq_along(my_list))
Related
I am writing a package with a suite of functions that take objects fit to a model (e.g., output from from "lmt", "lavaan", or "mirt" packages) and computes relevant indices based on those models.
The first thing EVERY function in this suite does is convert the input into a standardized form, so all of my functions look like this:
fooIndex <- function(x) {
x <- standardizerFunction(x)
# Now, compute the fooIndex
}
Here, standardizerFunction is an S3 generic function that has methods for all the supported input classes.
Is there a better way to accomplish this functionality than calling standardizerFunction inside of each of the functions computing indices?
EDIT: I just wanted to specify that my "problem" is that copying and pasting the same line of code into ~20 different functions seems like a poor programming style, and I am hoping for a better solution.
Based on what iod and Gregor wrote, the two ways to handle this are:
(1) Require the user to apply the standardizerFunction before running any of the main functions. The functions will the throw an error if the input is of the wrong class.
(2) Since our functions will be checking the input to make sure it is of the right class anyway, just fold standardizerFunction into the input checking part using something like:
if(!inherits(x, what="YourClass")) standardizerFunction(x)
In my particular setting, since most of my users are uncomfortable with R, asking them to pre-apply the standardizerFunction is not the best choice, so I am going with option 2.
I know that while and if functions in R are not vectorised. while and if functions help us selectively work on some rows based on some condition. I also know that the apply function in R is used to apply over the columns and hence it operates on all rows of columns that we wish to put apply on. Can I use apply() along with user defined functions and/or with while/if loop to conditionally use it over some rows rather than all rows as apply function usually does.
Note :- This core issue here is to bypass the drawback on non-vectorization of while/if loops in R.
You can supply user defined functions to apply using an argument FUN = function(x) user_defined_function(x) {}. And apply is "vectorized" in sense that as argument it accept vectors, not scalars (but its implementation is heavily using for and if loops, type apply without arguments in your console). So for and apply are of the same perfomance.
However you can break the execution of user defined function throwing exception with stop and wrapping in tryCatch it is a non-recommended technique (it influences environements, call stacks, scopes etc., make debugging difficult and lead to errors which are difficult to identify).
Better to use for and if and very often it is the most easiest and effective way (to write a recursive function, taking in consideration that (tail) recursion is not really optimized for R, or fully refactor your algorithm quite difficult and time consuming).
This post (Lazy evaluation in R – is assign affected?) covers some common ground but I am not sure it answers my question.
I stopped using assign when I discovered the apply family quite a while back, albeit, purely for reasons of elegance in situations such as this:
names.foo <- letters
values.foo <- LETTERS
for (i in 1:length(names.foo))
assign(names.foo[i], paste("This is: ", values.foo[i]))
which can be replaced by:
foo <- lapply(X=values.foo, FUN=function (k) paste("This is :", k))
names(foo) <- names.foo
This is also the reason this (http://cran.r-project.org/doc/FAQ/R-FAQ.html#How-can-I-turn-a-string-into-a-variable_003f) R-faq says this should be avoided.
Now, I know that assign is generally frowned upon. But are there other reasons I don't know? I suspect it may mess with the scoping or lazy evaluation but I am not sure? Example code that demonstrates such problems will be great.
Actually those two operations are quite different. The first gives you 26 different objects while the second gives you only one. The second object will be a lot easier to use in analyses. So I guess I would say you have already demonstrated the major downside of assign, namely the necessity of then needing always to use get for corralling or gathering up all the similarly named individual objects that are now "loose" in the global environment. Try imagining how you would serially do anything with those 26 separate objects. A simple lapply(foo, func) will suffice for the second strategy.
That FAQ citation really only says that using assignment and then assigning names is easier, but did not imply it was "bad". I happen to read it as "less functional" since you are not actually returning a value that gets assigned. The effect looks to be a side-effect (and in this case the assign strategy results in 26 separate side-effects). The use of assign seems to be adopted by people that are coming from languages that have global variables as a way of avoiding picking up the "True R Way", i.e. functional programming with data-objects. They really should be learning to use lists rather than littering their workspace with individually-named items.
There is another assignment paradigm that can be used:
foo <- setNames( paste0(letters,1:26), LETTERS)
That creates a named atomic vector rather than a named list, but the access to values in the vector is still done with names given to [.
As the source of fortune(236) I thought I would add a couple examples (also see fortune(174)).
First, a quiz. Consider the following code:
x <- 1
y <- some.function.that.uses.assign(rnorm(100))
After running the above 2 lines of code, what is the value of x?
The assign function is used to commit "Action at a distance" (see http://en.wikipedia.org/wiki/Action_at_a_distance_(computer_programming) or google for it). This is often the source of hard to find bugs.
I think the biggest problem with assign is that it tends to lead people down paths of thinking that take them away from better options. A simple example is the 2 sets of code in the question. The lapply solution is more elegant and should be promoted, but the mere fact that people learn about the assign function leads people to the loop option. Then they decide that they need to do the same operation on each object created in the loop (which would be just another simple lapply or sapply if the elegant solution were used) and resort to an even more complicated loop involving both get and apply along with ugly calls to paste. Then those enamored with assign try to do something like:
curname <- paste('myvector[', i, ']')
assign(curname, i)
And that does not do quite what they expected which leads to either complaining about R (which is as fair as complaining that my next door neighbor's house is too far away because I chose to walk the long way around the block) or even worse, delve into using eval and parse to get their constructed string to "work" (which then leads to fortune(106) and fortune(181)).
I'd like to point out that assign is meant to be used with environments.
From that point of view, the "bad" thing in the example above is using a not quite appropriate data structure (the base environment instead of a list or data.frame, vector, ...).
Side note: also for environments, the $ and $<- operators work, so in many cases the explicit assign and get isn't necessary there, neither.
While working with lists i've noticed an issue that i didn't expect.
result5 <- vector("list",length(queryResults[[1]]))
for(i in 1:length(queryResults[[1]])){
id <- queryResults[[1]][i]
result5[[id]] <-getPrices(id)
}
The problem is that after this code runs instead of the result staying the same size (w/e queryResults[[1]] is) it goes up to the last index creating a bunch of null entries in the middle.
result5 current stores a number of int,double lists so it looks like :
result5[[index(int)]][[row]][col]
While on it's own it's not too problematic I would rather avoid that simply for easier size calculations later on.
For clarification, id is an integer. And in the given case for loop offers same performance, but greater convenience than the apply functions.
After some testing seems like the easiest way of doing it is :
Using a hash package to convert it using a hash using :
result6 <- hash(queryResults[[1]],lapply(queryResults[[1]],getPrices))
And if it needs to get accessed calling
result6[[toString(id)]]
With the difference in performance being marginal, albeit it's still fairly annoying having to include toString in your code.
It's not clear exactly what your question is, but judging by the structure of the loop, you probably want
result5[[i]] <- getPrices(id)
rather than result5[[id]] <- getPrices(id).
I wrote a function in R - called "filtre": it takes a dataframe, and for each line it says whether it should go in say bin 1 or 2. At the end, we have two data frames that sum up to the original input, and corresponding respectively to all lines thrown in either bin 1 or 2. These two sets of bin 1 and 2 are referred to as filtre1 and filtre2. For convenience the values of filtre1 and filtre2 are calculated but not returned, because it is an intermediary thing in a bigger process (plus they are quite big data frame). I have the following issue:
(i) When I later on want to use filtre1 (or filtre2), they simply don't show up... like if their value was stuck within the function, and would not be recognised elsewhere - which would oblige me to copy the whole function every time I feel like using it - quite painful and heavy.
I suspect this is a rather simple thing, but I did search on the web and did not find the answer really (I was not sure of best key words). Sorry for any inconvenience.
Thxs / g.
It's pretty hard to know the optimum way of achieve what you want as you do not provide proper example, but I'll give it a try. If your variables filtre1 and filtre2 are defined inside of your function and you do not return them, of course they do not show up on your environment. But you could just return the classification and make filtre1 and filtre2 afterwards:
#example data
df<-data.frame(id=1:20,x=sample(1:20,20,replace=TRUE))
filtre<-function(df){
#example function, this could of course be done by bins<-df$x<10
bins<-numeric(nrow(df))
for(i in 1:nrow(df))
if(df$x<10)
bins[i]<-1
return(bins)
}
bins<-filtre(df)
filtre1<-df[bins==1,]
filtre2<-df[bins==0,]