subclass of `list` using S3 classes in R? - r

I'd like to make a subclass of list. As in the answer to this question, I'll need to define a custom method for [.
What other methods do I need to make so that my list subclass does not lose its subclass when users manipulate it? Filter, I think.
This gets me a few more functions that operate on lists:
> methods(class = "list")
[1] all.equal as.data.frame coerce escape Ops relist select within
But I don't think any of those return new lists that should have the same type as the old one.
Which am I missing?

Related

Recursively Apply a Function with multiple arguments to a List in R

I have a nested list and want to apply the same functions to multiple levels of this list. The functions are quite complex and require multiple arguments.
Here i have found how rapply is used and though it would be useful to me. Unfortunately rapply only take functions of a single argument.
Here is an little example in R :
nested list:
x=list(1,list(2,3),4,list(5,list(6,7)))
functions with multiple arguments:
doAddition <- function(listIterator,
list,
add){
item <- list[listIterator]]
out <- item + add
return(out)
}
I would use this function of the first level of my list such as:
result <- lapply(seq_along(x), FUN = doAddition,
list = x,
add = factor)
How can i use the same functions for all my lists levels?
Is there an alternative to rapply that accepts multiple arguments?
You generally wouldn’t iterate over indices, you’d iterate over elements. Once you do that, your code is directly translatable to rapply:
rapply(x, `+`, ... = factor)
This unlists the results. If you want to preserve the nested list structure, pass how = 'list' to rapply.
Note that this isn’t using your custom doAddition function since, once you iterate over elements, that function isn’t necessary. If your actual function is more complex the same applies: pass an element, not an index.

setGeneric for a list of objects

I have the following function:
myFunction = function(objects,params) {
for (i in 1:length(objects)) {
object = objects[[i]]
object = myOtherFunction(objects, params)
objects[[i]] = object
}
return (objects)
}
#' #rdname myFunction
#' #aliases myFunction
setMethod("myFunction", signature(object ="list"), myFunction)
How can I properly set the setMethod() and setGeneric() methods to accept a list of objects of a given type, let's say a list of objects of type SingleCellExperiment ?
If you want to write different methods to handle lists of class foo and lists of class bar then S4 will need some help, since both objects are of class list and hence the same method will be called in both cases.
There are a few options:
firstly, do you need to use lists at all? Don't forget all the base types in R are vectors, so for simple classes like
setClass("cuboid",slots=list(
height="numeric",
width="numeric",
depth="numeric"
)) -> cuboid
if you want to represent a set of multiple cuboids you don't need to use a list at all, just feed vectors of values to cuboid. This doesn't work as well for more exotic classes, though.
alternatively, you can write a list method with some extra logic to determine which lower-order method to dispatch. You should also think about what to do if the list contains objects of multiple different classes.
in some situations you might be able to use either lapply or a function that takes arbitrary numbers of arguments via .... In the latter case you may be able to make use of dotsMethods (check the help page on that topic for more info).
If you want to write a method that will only be called on lists of objects of class foo and there may exist another method that wants to operate on lists, then you can either:
write a method for class foo directly, and then use sapply or lapply rather than calling your function on the list
write a method for class list that checks whether the list has foos in it and if it doesn't, calls nextMethod.

R: modify any function applied to a S4 class

I've been developing a S4 class which is essentially a data.frame with a little bit of extra information. For the purposes of this question, the "extra" features of this class are irrelevant. What matters is that the class contains a data.frame object stored in one of it's slots. (I put the data.frame in a slot, instead of naming it a superclass, because I find that S4 classes which contain data.frames simplify the data.frames to lists for some reason).
Here's a basic example:
setClass('tmp_class', slots = c(df = 'data.frame'))
test_object <- new('tmp_class', df = data.frame(Num = 1:10, Let = letters[1:10]))
Now what I'd like to do is make it so that essentially any function applied to an object of this class is applied to the data.frame in slot #df. It's easy to write methods for specific functions to do this, like:
setMethod('dim', signature = c(x = 'tmp_class'), function(x) dim(x#df))
But I'm limited to only the functions I can think of, and any function invented by a user wouldn't work.
It is a simple matter to write a sort of wrapper/closure to modify a function to work on my class, like this:
tmp_classize <- function(func){
function(tmp, ...){ func(tmp#df, ...) }
}
So, rather than writing methods for, say, colnames() or ncol(), I could just run:
tmp_classize(colnames)(test_object)
or
tmp_classize(ncol)(test_object)
But what I'd like to do is somehow evoke my "tmp_classize" function on any function applied to my class, automatically. I can't figure out how to do it. I was thinking that if could somehow call a "universal method" with an input signature of class "tmp_class", and then use sys.function() to grab the actual function being called, maybe I could make something work, but A) there are recursion problems B) I don't know how to call such a "universal" method. It seems to me that the solution, if it exists at all, might necessitate non-standard evaluation, which I'd rather avoid, but might use if necessary.
Thanks!
P.S. I realize this undertaking may be unwise/poor programming technique, and I may never actually implement it in a package. Still I'm curious to know if it is possible.
P.P.S. I'd also be interested in the same idea applied to S3 classes!
In principal what you could do is make a classUnion for your class and data.frame and write methods for your class that deal with all of the ways to read and write to data.frames such as $, [, dim(), <- and many more. Then when other functions seek to use your new class as data.frame there will be methods for this to work. This is somewhat explained in John Chambers "Software for Data Analysis" starting on page 375. That said this system may be very difficult to implement.
A simpler system may be to just add an extra attribute to your data.frame with the extra info you need. For example:
x<-data.frame(a=1:3,b=4:6)
attr(x,"Info")<-"Extra info I need"
attributes(x)$Info
[1] "Extra info I need"
This is not as elegant as a S4 class but will do everything a data.frame does. I suspect that someone who is familiar with S3 classes could improve on this idea quite a bit.
The simplest solution is to have your class contain data.frame instead of having it as one of the slots. For example here is a data.frame with a timestamp:
setclass(
"timestampedDF",
slots=c(timestamp="POSIXt"),
contains="data.frame"
)
Now all functions which work for a data.frame (such as head) will automatically work for timestampedDF objects. If you need to get at the "data frame part", then that is held in a hidden slot object#.Data.

Dispatch of constructor in R (S3)

Is it ok from architectural point of view to dispatch the constructor in R (S3 system)?
I have a constructor for class returns and I want to dispatch it in a way like: returns.zoo, returns.data.frame etc.
Just my opinion, but I think there is (unwritten) convention to use as prefix in this case. For example: as.data.frame coerces various objects to a data frame.
Same with as.matrix, as.Date and as.list ...
Often a "non-as" function calls the generic as function (e.g. data.frame function calls as.data.frame).
There is also a good practice to implement a function with is prefix.
For example: is.data.frame, is.list.
But sometimes this is not so. For example formula is a generic "coercer"
and as.formula is not. And there are a lot of packages with combined practice. For example igraph includes as.igraph generic but uses from_data_frame to create object from a data frame.
So I guess as.returns.zoo will look aligned with existing practice but
returns.zoo is not wrong either.

How does one pass multiple data types in llply?

I have a function that requires both a S4 object and a data frame as arguments.
But functions like lapply and llply will only allow one list and one function.
example: new_list=llply(list, function)
I could make a single list with alternating S4 object and data but llply will push one list item at a time which means that it will either be the S4 object or the data (the function cannot perform with just one or the other).
In some sense what I am looking for is akin to a 2D list (where each row has the S4 obj and data and a row gets pushed through at a time).
So how would I make this work?
Here's a more general version of my problem. If I have a function like so:
foobar <- function(dat, threshold=0.5, max=.99)
{
...
}
and I wanted to push a list through this function, I could do:
new_list=llply(list, foobar)
but if I also wanted to pass a non-default value for threshold or max, how would I do so in this context?
Functions like lapply typically have a ... parameter of arguments which get passed to the function. Eg:
lapply(list, foobar, somearg='nondefaultvalue')
If you have multiple varying parameters (eg a different somearg value for each element in list), then you would either pack them as pairs in a list, or turn to a function like mapply:
mapply(foobar, list, somearg=c('vectorof', 'nondefault', 'values')
May be you can try this:
Make each list item itself a list, which contains a S4 object and a data frame.
Just a suggestion, I'm not quite sure if this works.

Resources