How does one pass multiple data types in llply? - r

I have a function that requires both a S4 object and a data frame as arguments.
But functions like lapply and llply will only allow one list and one function.
example: new_list=llply(list, function)
I could make a single list with alternating S4 object and data but llply will push one list item at a time which means that it will either be the S4 object or the data (the function cannot perform with just one or the other).
In some sense what I am looking for is akin to a 2D list (where each row has the S4 obj and data and a row gets pushed through at a time).
So how would I make this work?
Here's a more general version of my problem. If I have a function like so:
foobar <- function(dat, threshold=0.5, max=.99)
{
...
}
and I wanted to push a list through this function, I could do:
new_list=llply(list, foobar)
but if I also wanted to pass a non-default value for threshold or max, how would I do so in this context?

Functions like lapply typically have a ... parameter of arguments which get passed to the function. Eg:
lapply(list, foobar, somearg='nondefaultvalue')
If you have multiple varying parameters (eg a different somearg value for each element in list), then you would either pack them as pairs in a list, or turn to a function like mapply:
mapply(foobar, list, somearg=c('vectorof', 'nondefault', 'values')

May be you can try this:
Make each list item itself a list, which contains a S4 object and a data frame.
Just a suggestion, I'm not quite sure if this works.

Related

Object modification only happens in list

I have put objects that I would like to edit in a list.
Say, the names of the objects are kind of like this:
name1_X
name1_Y
name2_X
name2_Y
And there are different sets of these objects, that are stored in different lists, so for each different set, they would have a slightly different name, like:
name1_P_X
name1_F1_X
name2_F2_Y
and so on..
So for every "name" there are six objects. There are two each ending with X or Y for P, F1, F2. We have three lists (listbF_P, listbF_F1, listbF_F2), each containing objects that end with X and Y.
I edited the objects in the list like this (example for only one list):
for (i in 1:NROW(listbF_P)){
listbF_P[[i]]#first.year <- 1986
listbF_P[[i]]#last.year <- 2005
listbF_P[[i]]#year.aggregate.method <- "mean"
listbF_P[[i]]#id <- makeFieldID(listbF_P[[i]])
}
When I check whether the changes were successfully applied, it works but only when referring to the objects inside the list but not the same objects "unlisted".
So if I call
listbF_P[[1]]#last.year
it returns
"2005"
But if I call
name1_X#last.year
it returns
"Inf"
The problem with this is that I want the edited objects in a different list later.
So I need either a way that the latter call example returns "2005" or a way that I can search for a certain object name pattern in multiple lists to put the ones that fit the pattern into another list.
This is because the example above was made with multiple lists (listbF_P, listbF_F1, listbF_F2) and these lists contain a pattern matching "X" and another matching "Y".
So basically I want to have two lists with edited objects, one matching pattern "X" and the other matching pattern "Y".
I would call the list matching the desired patterns like this:
listbF_ALL_X <- mget(ls(pattern=".*_X$"))
listbF_ALL_Y <- mget(ls(pattern=".*_Y$"))
The first list would hence contain all objects ending with "X", e.g.:
name1_P_X
name1_F1_X
name1_F2_X
name2_P_X
[...]
and I would like to have the ones that I edited in the loop earlier
..but when calling the objects out of that list
listbF_ALL_X[[1]]#last.year
again just returns
"Inf"
since it takes the objects out of the environment and not the list. But I want it to return the desired number that has been changed (e.g. "2005").
I hope my problem and the two possible ways of solving them are clear..
If something isn't, ask :)
Thanks for any input
Regards
In R, unlike in many other modern languages, (almost) all objects are logically copies of each other. You can’t have multiple names that are references to the same object (see below for caveats).
But even if this was supported, your design looks confusing. Rather than have lots of related objects with different names, put your objects into nested lists and classes that logically relate them. That is, rather than have objects with names name{1..10}_{P,F1,F2}_{X,Y}, you should have one list, name, in which you store nested lists or classes with named members P, F1, F2 which, in turn, are objects that have names X and Y. Then you could access an object by, say, name[1L]$P$X (or name[1L]#P#X, if you’re using S4 objects with slots).
Or you use a more data-oriented approach and flatten all these nested objects into a table with corresponding columns P, F1, F2, X and Y. Which solution is more appropriate depends on your exact use-case.
Now for the caveat: you can use reference semantics in R by using *environments8 instead of regular objects. When copying an environment, a reference to the same environment object is created. However, this semantic is usually confusing because it’s contrary to the expectation in R, so it should be used with care. The ‘R6’ package creates an object system with reference semantics based on environments. For many purposes where reference semantics are indispensable, ‘R6’ is the right answer.
I found another solution:
I went on by modifying this part:
listbF_ALL_X <- mget(ls(pattern=".*_X$"))
listbF_ALL_Y <- mget(ls(pattern=".*_Y$"))
To not call objects from the environment but by calling objects from each list:
listbF_ALL_X <- c(c(listbF_P, listbF_F1, listbF_F2)[grepl(".*_X$", names(c(listbF_P, listbF_F1, listbF_F2)))])
listbF_ALL_Y <- c(c(listbF_P, listbF_F1, listbF_F2)[grepl(".*_Y$", names(c(listbF_P, listbF_F1, listbF_F2)))])
It's not the prettiest way of doing it but it works and in my case it was the solution that required the least amount of change in my script.

setGeneric for a list of objects

I have the following function:
myFunction = function(objects,params) {
for (i in 1:length(objects)) {
object = objects[[i]]
object = myOtherFunction(objects, params)
objects[[i]] = object
}
return (objects)
}
#' #rdname myFunction
#' #aliases myFunction
setMethod("myFunction", signature(object ="list"), myFunction)
How can I properly set the setMethod() and setGeneric() methods to accept a list of objects of a given type, let's say a list of objects of type SingleCellExperiment ?
If you want to write different methods to handle lists of class foo and lists of class bar then S4 will need some help, since both objects are of class list and hence the same method will be called in both cases.
There are a few options:
firstly, do you need to use lists at all? Don't forget all the base types in R are vectors, so for simple classes like
setClass("cuboid",slots=list(
height="numeric",
width="numeric",
depth="numeric"
)) -> cuboid
if you want to represent a set of multiple cuboids you don't need to use a list at all, just feed vectors of values to cuboid. This doesn't work as well for more exotic classes, though.
alternatively, you can write a list method with some extra logic to determine which lower-order method to dispatch. You should also think about what to do if the list contains objects of multiple different classes.
in some situations you might be able to use either lapply or a function that takes arbitrary numbers of arguments via .... In the latter case you may be able to make use of dotsMethods (check the help page on that topic for more info).
If you want to write a method that will only be called on lists of objects of class foo and there may exist another method that wants to operate on lists, then you can either:
write a method for class foo directly, and then use sapply or lapply rather than calling your function on the list
write a method for class list that checks whether the list has foos in it and if it doesn't, calls nextMethod.

How do I apply a function with multiple arguments over a list?

For example, I want to apply the intersect() function to every element (dataset) of a list. I want each element compared to this aother dataset, data1.
I know I can use a for loop, but I was thinking that I could use lapply. However, I need to hold one of the arguments constant. How can I do so?
This doesn't work:
> lapply(list(winnepennninckx, brunner), intersect(,selectG))

How do I remove an object from within a function environment in R?

How do I remove an object from the current function environment?
I'm trying to achieve this:
foo <- function(bar){
x <- bar
rm(bar, envir = environment())
print(c(x, is.null(bar)))
}
Because I want the function to be able to handle multiple inputs.
Specifically I'm trying to pass either a dataframe or a vector to the function, and if I'm passing a dataframe I want to set the vector to NULL for later error handling.
If you want, you can watch my DepthPlotter script, where I want to let the second function check if depth is a dataframe, and if so, assign it to df in stead and remove depth from the environment.
Here is a very brief sketch of how to set this up using S3 method dispatch.
First, you define your generic:
DepthPlotter <- function(depth,...){
UseMethod("DepthPlotter", depth)
}
Then you define methods for specific classes of the argument depth. As a very basic example in your case, you might create only two, a data.frame method and a default method to handle the vector case:
DepthPlotter.default <- function(depth, variable, ...){
#Here you write a function assuming that depth is
# anything but a data frame
}
DepthPlotter.data.frame <- function(depth,...){
#Here you'd write a function that assumes
# that depth is a data frame
}
And then you can call DepthPlotter() using either type of argument and the correct function will be run based upon the result of class(depth).
The example I've sketched out here is a little crude, since I've used a default method to handle the vector case. You could write .numeric and .integer methods to handle numeric or integer vectors more specifically. In my example, the .default method will be called for any case other than data.frame, so if you go this route you'd want to write some code in there that checks for strange cases like depth being a complicated list, or other odd object, if you think there's a chance something like that might be passed to the function.

using clusterApply with unknown number of arguments

I want to be able to generalise the behavior of clusterApply() so that I can parallelise functions with different number of arguments.
Normally, I use clusterApply() like this:
clusterApply(cl=cl,seq_len(nsim),FUN=runsim,arg1,arg2,arg3)
But what if I don't know how many arguments function runsim has? I was thinking of using do.call("runsim",listofArguments), but I don't know if I can use it inside of clusterApply.
Any suggestions?
The main issue seems to be the fact that do.call wants the function (or name thereof) as first argument while clusterApply, like all functions from the apply family, passes the iterated over object as the first argument to the function it calls. Consequently one solution could be:
clusterApply(cl=cl,seq_len(nsim),FUN=function(x) do.call(rumsim, args = list(...)))
... can now be filled with whatever different arguments there are including the possibility to hand over x (i.e., the slice of the iterated over object, in this case the simulation number).
I do not see the need to also wrap clusterApply into do.call as you know which function to call (clusterApply).

Resources