I am (probably) NOT referring to the "all other variables" meaning like var1~. here.
I was pointed to plyr once again and looked into mlplyand wondered why parameters are defined with leading dot like this:
function (.data, .fun = NULL, ..., .expand = TRUE, .progress = "none",
.parallel = FALSE)
{
if (is.matrix(.data) & !is.list(.data))
.data <- .matrix_to_df(.data)
f <- splat(.fun)
alply(.data = .data, .margins = 1, .fun = f, ..., .expand = .expand,
.progress = .progress, .parallel = .parallel)
}
<environment: namespace:plyr>
What's the use of that? Is it just personal preference, naming convention or more? Often R is so functional that I miss a trick that's long been done before.
A dot in function name can mean any of the following:
nothing at all
a separator between method and class in S3 methods
to hide the function name
Possible meanings
1. Nothing at all
The dot in data.frame doesn't separate data from frame, other than visually.
2. Separation of methods and classes in S3 methods
plot is one example of a generic S3 method. Thus plot.lm and plot.glm are the underlying function definitions that are used when calling plot(lm(...)) or plot(glm(...))
3. To hide internal functions
When writing packages, it is sometimes useful to use leading dots in function names because these functions are somewhat hidden from general view. Functions that are meant to be purely internal to a package sometimes use this.
In this context, "somewhat hidden" simply means that the variable (or function) won't normally show up when you list object with ls(). To force ls to show these variables, use ls(all.names=TRUE). By using a dot as first letter of a variable, you change the scope of the variable itself. For example:
x <- 3
.x <- 4
ls()
[1] "x"
ls(all.names=TRUE)
[1] ".x" "x"
x
[1] 3
.x
[1] 4
4. Other possible reasons
In Hadley's plyr package, he uses the convention to use leading dots in function names. This as a mechanism to try and ensure that when resolving variable names, the values resolve to the user variables rather than internal function variables.
Complications
This mishmash of different uses can lead to very confusing situations, because these different uses can all get mixed up in the same function name.
For example, to convert a data.frame to a list you use as.list(..)
as.list(iris)
In this case as.list is a S3 generic method, and you are passing a data.frame to it. Thus the S3 function is called as.list.data.frame:
> as.list.data.frame
function (x, ...)
{
x <- unclass(x)
attr(x, "row.names") <- NULL
x
}
<environment: namespace:base>
And for something truly spectacular, load the data.table package and look at the function as.data.table.data.frame:
> library(data.table)
> methods(as.data.table)
[1] as.data.table.data.frame* as.data.table.data.table* as.data.table.matrix*
Non-visible functions are asterisked
> data.table:::as.data.table.data.frame
function (x, keep.rownames = FALSE)
{
if (keep.rownames)
return(data.table(rn = rownames(x), x, keep.rownames = FALSE))
attr(x, "row.names") = .set_row_names(nrow(x))
class(x) = c("data.table", "data.frame")
x
}
<environment: namespace:data.table>
At the start of a name it works like the UNIX filename convention to keep objects hidden by default.
ls()
character(0)
.a <- 1
ls()
character(0)
ls(all.names = TRUE)
[1] ".a"
It can be just a token with no special meaning, it's not doing anything more than any other allowed token.
my.var <- 1
my_var <- 1
myVar <- 1
It's used for S3 method dispatch. So, if I define simple class "myClass" and create objects with that class attribute, then generic functions such as print() will automatically dispatch to my specific print method.
myvar <- 1
print(myvar)
class(myvar) <- c("myClass", class(myvar))
print.myClass <- function(x, ...) {
print(paste("a special message for myClass objects, this one has length", length(x)))
return(invisible(NULL))
}
print(myvar)
There is an ambiguity in the syntax for S3, since you cannot tell from a function's name whether it is an S3 method or just a dot in the name. But, it's a very simple mechanism that is very powerful.
There's a lot more to each of these three aspects, and you should not take my examples as good practice, but they are the basic differences.
If a user defines a function .doSomething and is lazy to specify all the roxygen documentation for parameters, it will not generate errors for compiling the package
Related
I am (probably) NOT referring to the "all other variables" meaning like var1~. here.
I was pointed to plyr once again and looked into mlplyand wondered why parameters are defined with leading dot like this:
function (.data, .fun = NULL, ..., .expand = TRUE, .progress = "none",
.parallel = FALSE)
{
if (is.matrix(.data) & !is.list(.data))
.data <- .matrix_to_df(.data)
f <- splat(.fun)
alply(.data = .data, .margins = 1, .fun = f, ..., .expand = .expand,
.progress = .progress, .parallel = .parallel)
}
<environment: namespace:plyr>
What's the use of that? Is it just personal preference, naming convention or more? Often R is so functional that I miss a trick that's long been done before.
A dot in function name can mean any of the following:
nothing at all
a separator between method and class in S3 methods
to hide the function name
Possible meanings
1. Nothing at all
The dot in data.frame doesn't separate data from frame, other than visually.
2. Separation of methods and classes in S3 methods
plot is one example of a generic S3 method. Thus plot.lm and plot.glm are the underlying function definitions that are used when calling plot(lm(...)) or plot(glm(...))
3. To hide internal functions
When writing packages, it is sometimes useful to use leading dots in function names because these functions are somewhat hidden from general view. Functions that are meant to be purely internal to a package sometimes use this.
In this context, "somewhat hidden" simply means that the variable (or function) won't normally show up when you list object with ls(). To force ls to show these variables, use ls(all.names=TRUE). By using a dot as first letter of a variable, you change the scope of the variable itself. For example:
x <- 3
.x <- 4
ls()
[1] "x"
ls(all.names=TRUE)
[1] ".x" "x"
x
[1] 3
.x
[1] 4
4. Other possible reasons
In Hadley's plyr package, he uses the convention to use leading dots in function names. This as a mechanism to try and ensure that when resolving variable names, the values resolve to the user variables rather than internal function variables.
Complications
This mishmash of different uses can lead to very confusing situations, because these different uses can all get mixed up in the same function name.
For example, to convert a data.frame to a list you use as.list(..)
as.list(iris)
In this case as.list is a S3 generic method, and you are passing a data.frame to it. Thus the S3 function is called as.list.data.frame:
> as.list.data.frame
function (x, ...)
{
x <- unclass(x)
attr(x, "row.names") <- NULL
x
}
<environment: namespace:base>
And for something truly spectacular, load the data.table package and look at the function as.data.table.data.frame:
> library(data.table)
> methods(as.data.table)
[1] as.data.table.data.frame* as.data.table.data.table* as.data.table.matrix*
Non-visible functions are asterisked
> data.table:::as.data.table.data.frame
function (x, keep.rownames = FALSE)
{
if (keep.rownames)
return(data.table(rn = rownames(x), x, keep.rownames = FALSE))
attr(x, "row.names") = .set_row_names(nrow(x))
class(x) = c("data.table", "data.frame")
x
}
<environment: namespace:data.table>
At the start of a name it works like the UNIX filename convention to keep objects hidden by default.
ls()
character(0)
.a <- 1
ls()
character(0)
ls(all.names = TRUE)
[1] ".a"
It can be just a token with no special meaning, it's not doing anything more than any other allowed token.
my.var <- 1
my_var <- 1
myVar <- 1
It's used for S3 method dispatch. So, if I define simple class "myClass" and create objects with that class attribute, then generic functions such as print() will automatically dispatch to my specific print method.
myvar <- 1
print(myvar)
class(myvar) <- c("myClass", class(myvar))
print.myClass <- function(x, ...) {
print(paste("a special message for myClass objects, this one has length", length(x)))
return(invisible(NULL))
}
print(myvar)
There is an ambiguity in the syntax for S3, since you cannot tell from a function's name whether it is an S3 method or just a dot in the name. But, it's a very simple mechanism that is very powerful.
There's a lot more to each of these three aspects, and you should not take my examples as good practice, but they are the basic differences.
If a user defines a function .doSomething and is lazy to specify all the roxygen documentation for parameters, it will not generate errors for compiling the package
I have list of functions which also contains one user defined function:
> fun <- function(x) {x}
> funs <- c(median, mean, fun)
Is it possible to get function names as strings from this list? My only workaround so far was to create vector which contains function names as strings:
> fun.names <- c("median", "mean", "fun")
When I want to get variable name I use to do this trick (if this is not correct correct me please) but as you can see it only work for one variable not for list:
> as.character(substitute(mean))
[1] "mean"
> as.character(substitute(funs))
[1] "funs"
Is there something that will work also for list? Is there any difference if list contains functions or data types?
EDIT:
I need to pass this list of functions (plus another data) to another function. Then those functions from list will be applied to dataset. Function names are needed because if there are several functions passed in list I want to being able to determine which function was applied. So far I've been using this:
window.size <- c(1,2,3)
combinations <- expand.grid(window.size, c(median, mean))
combinations <- cbind(combinations, rep(c("median","mean"), each = length(window.size)))
Generally speaking, this is not possible. Consider this definition of funs:
funs <- c(median,mean,function(x) x);
In this case, there's no name associated with the user-defined function at all. There's no rule in R that says all functions must be bound to a name at any point in time.
If you want to start making some assumptions about whether and where all such lambdas are defined, then possibilities open up.
One idea is to search the closure environment of each function for an entry that matches (identically) to the function itself, and then use that name. This will incur a performance penalty due to the comparison work, but may be tolerable if you don't have to run it repetitively:
getFunNameFromClosure <- function(fun) names(which(do.call(c,eapply(environment(fun),identical,fun)))[1L]);
Demo:
fun <- function(x) x;
funs <- c(median,mean,fun);
sapply(funs,getFunNameFromClosure);
## [1] "median" "mean" "fun"
Caveats:
1: As explained earlier, this will not work on functions that were never bound to a name. Furthermore, it will not work on functions whose closure environment does not contain a binding to the function. This could happen if the function was bound to a name in a different environment than its closure (via a return value, superassignment, or assign() call) or if its closure environment was explicitly changed.
2: It is possible to bind a function to multiple names. Thus, the name you get as a result of the eapply() search may not be the one you expect. Here's a good demonstration of this:
getFunNameFromClosure(ls); ## gets wrong name
## [1] "objects"
identical(ls,objects); ## this is why
## [1] TRUE
Here is a hacky approach:
funs <- list(median, mean)
fun_names = sapply(funs, function(x) {
s = as.character(deparse(eval(x)))[[2]]
gsub('UseMethod\\(|[[:punct:]]', '', s)
})
names(funs) <- fun_names
funs
$median
function (x, na.rm = FALSE)
UseMethod("median")
<bytecode: 0x103252878>
<environment: namespace:stats>
$mean
function (x, ...)
UseMethod("mean")
<bytecode: 0x103ea11b8>
<environment: namespace:base>
combinations <- expand.grid(window.size, fun_names, c(median, mean))
I developed an S3 class in R that behaves very similarly to factor variable, though not exactly. The only snafu that I have left in my implementation is that factor and as.factor are not generics.
I got around this limitation for my own personal use by overriding base::factor in the .onload function within my package as follows:
.onAttach <- function(libname,pkgname){
# note that as.factor is not a generic -- need to override it
methods:::bind_activation(on = TRUE)
# TODO: make a better attmept to deterime if base::factor is a generic or not.
if(!length(ls(pattern='^as\\.factor\\.default$', envir=as.environment('package:base'),all.names=TRUE))){
# bind the current implementation of 'as.factor' to 'as.factor.default'
assign('as.factor.default',
base:::as.factor,
envir=as.environment('package:base'))
# unock the binding for 'as.factor'
unlockBinding('as.factor', as.environment('package:base'))
# bind the generic to 'as.factor' in the 'package:base'
assign('as.factor',
function (x,...) UseMethod('as.factor') ,
envir=as.environment('package:base'))
# re-lock the binding for 'as.factor'
lockBinding('as.factor', as.environment('package:base'))
}
[similar code for making 'factor' and 'table' behave as generics excluded]
}
However I know modifying base would never fly on CRAN, so I'm curious if there's a workaround. As #BondedDust points out, I could of course rename my function which is responsible for coercion to ordinary factors (currently named as.factor.MYCLASS) to something like As.factor, but I'd rather not go that route, since it means users would have to write code like this:
#coerce x to a factor
if(inherits(x,'MYCLASS'))
x <- As.factor(x)
else
x <- as.factor(x)
or
if(inherits(x,'MYCLASS'))
x <- Factor(x)
else
x <- factor(x)
It just feels odd that coercion to factors is not implemented as a generic.
I also tried this implementation of the .onAttach
.onAttach <- function(libname,pkgname){
setOldClass(c("MYCLASS"),
where=as.environment('package:MyPackage'))
setMethod('factor',
signature(x='MYCLASS'),
factor.MYCLASS,
where=as.environment('package:MyPackage'))
}
But I get this error message:
Error in rematchDefinition(definition, fdef, mnames, fnames, signature) :
methods can add arguments to the generic ‘factor’ only if '...' is an
because factor does not use the dots argument and my factor.MYCLASS has one additional argument.
There’s absolutely no need to replace the base functions. Just override them in your package to make them generic.
So, inside your package, do:
factor = function (...)
UseMethod('factor')
factor.default = base::factor
factor.MyClass = function (...) your logic
Since your package will be attached after base, this factor redefinition will be found first.
Answered my own question. The code below has replaced the original.onLoad function in my package. This didn't completely satisfy my desire for users to be able to call as.factor(obj,arg='arg') where obj is an object with class MYCLASS, so I put the code from the original .onLoad method above into a function called setGenerics() which creates S3 generics for factor and as.factor at the user's request.
I'm pretty happy with this solution. I'm just hoping that this satisfies CRAN's requirements.
# create a virtual S4 class from my S3 class
setOldClass(c("MYCLASS"))
# set methods for the virtual S4 classes of 'ordered','factor'
setMethod('as.ordered',
signature(x='MYCLASS'),
function(x)as.factor.MYCLASS(x,ordered=T))
setMethod('as.factor',
signature(x='MYCLASS'),
function(x)as.factor.MYCLASS(x))
setMethod('factor',
signature(x='MYCLASS'),
# re-capitulate the signature for base::factor()
function (x , levels, labels = levels, exclude = NA,
ordered = is.ordered(x), nmax = NA) {
ARGS <- list(x=x)
if(!missing(levels))
args['levels'] <- levels
if(!missing(labels))
args['labels'] <- labels
if(!missing(exclude))
args['exclude'] <- exclude
if(!missing(ordered))
args['ordered'] <- ordered
if(!missing(nmax))
warning('unused argument `nmax` in factor.MYCLASS')
do.call(as.factor.MYCLASS,ARGS)
})
setGenerics <- function(){
[contents from the original .onLoad method]
}
.onAttach <- function(libname,pkgname)
cat('Call setGenerics() for increased compatibility with `factor`, `as.factor`, and `table`.\n')
In my package, I want to subclass a class TheBaseClass from a contributed package (so it is out of my reach). There is a function for creating objects of this class. Here is a minimal example for that code.
setClass("TheBaseClass", representation(a="numeric"))
initBase <- function() new("TheBaseClass", a=1) # in reality more complex
Now I want simply use initBase as constructor for my subclass, but I do not know how
to set the new class
setClass("MyInheritedClass", contains="TheBaseClass")
initInher <- function() {
res <- initBase()
class(res) <- "MyInheritedClass" # this does not work for S4
}
How can I alter the last line to make it work? Copy & paste the initBase function is not an option, since it involves a .C call. I read about setIs, but this seems not to be the right function here.
Any hint appreciated!
Perhaps this answer provides more extensive explanation. One pattern is to provide an instance of the base class as an unnamed argument to your class constructor
.MyInheritedClass <- setClass("MyInheritedClass", contains="TheBaseClass")
.MyInheritedClass(initBase())
(setClass returns a generator function, which is really no different from calling new but seems cleaner; I use . in front, because generators are maybe a little too crude for "end users", e.g., there is no hint about what the arguments are supposed to be, just ...). This assumes that you have not written an initialize method for your class, or that your initialize method has been constructed in a way that is consistent with the contract of initialize,ANY-method, with a slightly more complicated class
.A <- setClass("A", contains="TheBaseClass",
representation=representation(x="numeric"))
setMethod(initialize, "A",
function(.Object, ..., x)
{
x <- log(x) # your class-specific initialization...
callNextMethod(.Object, ..., x = x) # passed to parent constructor
})
This pattern requires that the initialize method of the base class has been designed correctly. In action:
> .A(initBase(), x=1:2)
An object of class "A"
Slot "x":
[1] 0.0000000 0.6931472
Slot "a":
numeric(0)
I am (probably) NOT referring to the "all other variables" meaning like var1~. here.
I was pointed to plyr once again and looked into mlplyand wondered why parameters are defined with leading dot like this:
function (.data, .fun = NULL, ..., .expand = TRUE, .progress = "none",
.parallel = FALSE)
{
if (is.matrix(.data) & !is.list(.data))
.data <- .matrix_to_df(.data)
f <- splat(.fun)
alply(.data = .data, .margins = 1, .fun = f, ..., .expand = .expand,
.progress = .progress, .parallel = .parallel)
}
<environment: namespace:plyr>
What's the use of that? Is it just personal preference, naming convention or more? Often R is so functional that I miss a trick that's long been done before.
A dot in function name can mean any of the following:
nothing at all
a separator between method and class in S3 methods
to hide the function name
Possible meanings
1. Nothing at all
The dot in data.frame doesn't separate data from frame, other than visually.
2. Separation of methods and classes in S3 methods
plot is one example of a generic S3 method. Thus plot.lm and plot.glm are the underlying function definitions that are used when calling plot(lm(...)) or plot(glm(...))
3. To hide internal functions
When writing packages, it is sometimes useful to use leading dots in function names because these functions are somewhat hidden from general view. Functions that are meant to be purely internal to a package sometimes use this.
In this context, "somewhat hidden" simply means that the variable (or function) won't normally show up when you list object with ls(). To force ls to show these variables, use ls(all.names=TRUE). By using a dot as first letter of a variable, you change the scope of the variable itself. For example:
x <- 3
.x <- 4
ls()
[1] "x"
ls(all.names=TRUE)
[1] ".x" "x"
x
[1] 3
.x
[1] 4
4. Other possible reasons
In Hadley's plyr package, he uses the convention to use leading dots in function names. This as a mechanism to try and ensure that when resolving variable names, the values resolve to the user variables rather than internal function variables.
Complications
This mishmash of different uses can lead to very confusing situations, because these different uses can all get mixed up in the same function name.
For example, to convert a data.frame to a list you use as.list(..)
as.list(iris)
In this case as.list is a S3 generic method, and you are passing a data.frame to it. Thus the S3 function is called as.list.data.frame:
> as.list.data.frame
function (x, ...)
{
x <- unclass(x)
attr(x, "row.names") <- NULL
x
}
<environment: namespace:base>
And for something truly spectacular, load the data.table package and look at the function as.data.table.data.frame:
> library(data.table)
> methods(as.data.table)
[1] as.data.table.data.frame* as.data.table.data.table* as.data.table.matrix*
Non-visible functions are asterisked
> data.table:::as.data.table.data.frame
function (x, keep.rownames = FALSE)
{
if (keep.rownames)
return(data.table(rn = rownames(x), x, keep.rownames = FALSE))
attr(x, "row.names") = .set_row_names(nrow(x))
class(x) = c("data.table", "data.frame")
x
}
<environment: namespace:data.table>
At the start of a name it works like the UNIX filename convention to keep objects hidden by default.
ls()
character(0)
.a <- 1
ls()
character(0)
ls(all.names = TRUE)
[1] ".a"
It can be just a token with no special meaning, it's not doing anything more than any other allowed token.
my.var <- 1
my_var <- 1
myVar <- 1
It's used for S3 method dispatch. So, if I define simple class "myClass" and create objects with that class attribute, then generic functions such as print() will automatically dispatch to my specific print method.
myvar <- 1
print(myvar)
class(myvar) <- c("myClass", class(myvar))
print.myClass <- function(x, ...) {
print(paste("a special message for myClass objects, this one has length", length(x)))
return(invisible(NULL))
}
print(myvar)
There is an ambiguity in the syntax for S3, since you cannot tell from a function's name whether it is an S3 method or just a dot in the name. But, it's a very simple mechanism that is very powerful.
There's a lot more to each of these three aspects, and you should not take my examples as good practice, but they are the basic differences.
If a user defines a function .doSomething and is lazy to specify all the roxygen documentation for parameters, it will not generate errors for compiling the package