By playing around with a function in R, I found out there are more aspects to it than meets the eye.
Consider ths simple function assignment, typed directly in the console:
f <- function(x)x^2
The usual "attributes" of f, in a broad sense, are (i) the list of formal arguments, (ii) the body expression and (iii) the environment that will be the enclosure of the function evaluation frame. They are accessible via:
> formals(f)
$x
> body(f)
x^2
> environment(f)
<environment: R_GlobalEnv>
Moreover, str returns more info attached to f:
> str(f)
function (x)
- attr(*, "srcref")=Class 'srcref' atomic [1:8] 1 6 1 19 6 19 1 1
.. ..- attr(*, "srcfile")=Classes 'srcfilecopy', 'srcfile' <environment: 0x00000000145a3cc8>
Let's try to reach them:
> attributes(f)
$srcref
function(x)x^2
This is being printed as a text, but it's stored as a numeric vector:
> c(attributes(f)$srcref)
[1] 1 6 1 19 6 19 1 1
And this object also has its own attributes:
> attributes(attributes(f)$srcref)
$srcfile
$class
[1] "srcref"
The first one is an environment, with 3 internal objects:
> mode(attributes(attributes(f)$srcref)$srcfile)
[1] "environment"
> ls(attributes(attributes(f)$srcref)$srcfile)
[1] "filename" "fixedNewlines" "lines"
> attributes(attributes(f)$srcref)$srcfile$filename
[1] ""
> attributes(attributes(f)$srcref)$srcfile$fixedNewlines
[1] TRUE
> attributes(attributes(f)$srcref)$srcfile$lines
[1] "f <- function(x)x^2" ""
There you are! This is the string used by R to print attributes(f)$srcref.
So the questions are:
Are there any other objects linked to f? If so, how to reach them?
If we strip f of its attributes, using attributes(f) <- NULL, it doesn't seem to affect the function. Are there any drawbacks of doing this?
As far as I know, srcref is the only attribute typically attached to S3 functions. (S4 functions are a different matter, and I wouldn't recommend messing with their sometimes numerous attributes).
The srcref attribute is used for things like enabling printing of comments included in a function's source code, and (for functions that have been sourced in from a file) for setting breakpoints by line number, using utils::findLineNum() and utils::setBreakpoint().
If you don't want your functions to carry such additional baggage, you can turn off recording of srcref by doing options(keep.source=FALSE). From ?options (which also documents the related keep.source.pkgs option):
‘keep.source’: When ‘TRUE’, the source code for functions (newly
defined or loaded) is stored internally allowing comments to
be kept in the right places. Retrieve the source by printing
or using ‘deparse(fn, control = "useSource")’.
Compare:
options(keep.source=TRUE)
f1 <- function(x) {
## This function is needlessly commented
x
}
options(keep.source=FALSE)
f2 <- function(x) {
## This one is too
x
}
length(attributes(f1))
# [1] 1
f1
# function(x) {
# ## This function is needlessly commented
# x
# }
length(attributes(f2))
# [1] 0
f2
# function (x)
# {
# x
# }
I jst figured out an attribute that compiled functions (package compiler) have that is not available with attributes or str. It's the bytecode.
Example:
require(compiler)
f <- function(x){ y <- 0; for(i in 1:length(x)) y <- y + x[i]; y }
g <- cmpfun(f)
The result is:
> print(f, useSource=FALSE)
function (x)
{
y <- 0
for (i in 1:length(x)) y <- y + x[i]
y
}
> print(g, useSource=FALSE)
function (x)
{
y <- 0
for (i in 1:length(x)) y <- y + x[i]
y
}
<bytecode: 0x0000000010eb29e0>
However, this doesn't show with normal commands:
> identical(f, g)
[1] TRUE
> identical(f, g, ignore.bytecode=FALSE)
[1] FALSE
> identical(body(f), body(g), ignore.bytecode=FALSE)
[1] TRUE
> identical(attributes(f), attributes(g), ignore.bytecode=FALSE)
[1] TRUE
It seems to be accessible only via .Internal(bodyCode(...)):
> .Internal(bodyCode(f))
{
y <- 0
for (i in 1:length(x)) y <- y + x[i]
y
}
> .Internal(bodyCode(g))
<bytecode: 0x0000000010eb29e0>
Related
For an arbitrary function
f <- function(x, y = 3){
z <- x + y
z^2
}
I want to be able take the argument names of f
> argument_names(f)
[1] "x" "y"
Is this possible?
formalArgs and formals are two functions that would be useful in this case. If you just want the parameter names then formalArgs will be more useful as it just gives the names and ignores any defaults. formals gives a list as the output and provides the parameter name as the name of the element in the list and the default as the value of the element.
f <- function(x, y = 3){
z <- x + y
z^2
}
> formalArgs(f)
[1] "x" "y"
> formals(f)
$x
$y
[1] 3
My first inclination was to just suggest formals and if you just wanted the names of the parameters you could use names like names(formals(f)). The formalArgs function just is a wrapper that does that for you so either way works.
Edit: Note that technically primitive functions don't have "formals" so this method will return NULL if used on primitives. A way around that is to first wrap the function in args before passing to formalArgs. This works regardless of it the function is primitive or not.
> # formalArgs will work for non-primitives but not primitives
> formalArgs(f)
[1] "x" "y"
> formalArgs(sum)
NULL
> # But wrapping the function in args first will work in either case
> formalArgs(args(f))
[1] "x" "y"
> formalArgs(args(sum))
[1] "..." "na.rm"
I am implementing a replacement for the subset operator in an S3 class. I followed the advice on
How to define the subset operators for a S4 class?
However I am having a special problem. How do I distinguish in R code if someone wrote x[i] or x[i,]. In both cases, the variable j just comes back missing.
setOldClass("myclass")
'[.myclass' <- function(x, i, j, ..., drop=TRUE) {
print(missing(j))
return(invisible(NULL))
}
And as a result I get:
x <- structure(list(), class="myclass")
> x[i]
[1] TRUE
> x[i,]
[1] TRUE
> x[i,j]
[1] FALSE
I don't see a way on how to distinguish between the two. I assume the internal C code does it by looking at the length of the argument pairlist, but is there a way to do the same in native R?
Thanks!
From alexis_laz's comment:
See, perhaps, how [.data.frame handles arguments and nargs()
Inside the function call nargs() to see how many arguments were supplied, including missing ones.
> myfunc = function(i, j, ...) {
+ nargs()
+ }
>
> myfunc()
[1] 0
> myfunc(, )
[1] 2
> myfunc(, , )
[1] 3
> myfunc(1)
[1] 1
> myfunc(1, )
[1] 2
> myfunc(, 1)
[1] 2
> myfunc(1, 1)
[1] 2
This should be enough to help you figure out which arguments were passed in the same fashion as [.data.frame.
I have the following data frame:
> coc_comp_model[1:3,]
Relationship Output Input |r-Value| Y-Intercept Gradient
1 DG-r ~ DG-cl DG-r DG-cl 0.8271167 0.0027217513 12.9901380
2 CA3-r ~ CA3-cl CA3-r CA3-cl 0.7461309 0.0350767684 27.6107963
3 CA2-r ~ CA2-cl CA2-r CA2-cl 0.9732584 -0.0040992226 35.8299582
I want to create simple functions for each row of the data frame. here's what I've tried:
for(i in 1:nrow(coc_comp_model)) {
coc_glm_f[i] <- function(x)
x*coc_comp_model$Gradient[i] + coc_comp_model$Y-Intercept[i]
}
also tried making a vector of functions, which also does ont work either.
Thanks for reading this/helping.
Something like this:
myfunc<-function(datrow, x){
x*as.numeric(datrow[6]) + as.numeric(datrow[5] )
}
Then you can use apply to call it on each row, changing x as desired:
apply(hzdata, 1, myfunc, x = 0.5)
note: using dput() to share your data is much easier than a pasting in a subset.
There is no such thing as a vector of functions. There are 6 atomic vector types in R: raw, logical, integer, double, complex, and character, plus there is the heterogeneous list type, and finally there is the lesser known expression type, which is basically a vector of parse trees (such as you get from a call to the substitute() function). Those are all the vector types in R.
printAndType <- function(x) { print(x); typeof(x); };
printAndType(as.raw(1:3));
## [1] 01 02 03
## [1] "raw"
printAndType(c(T,F));
## [1] TRUE FALSE
## [1] "logical"
printAndType(1:3);
## [1] 1 2 3
## [1] "integer"
printAndType(as.double(1:3));
## [1] 1 2 3
## [1] "double"
printAndType(c(1i,2i,3i));
## [1] 0+1i 0+2i 0+3i
## [1] "complex"
printAndType(letters[1:3]);
## [1] "a" "b" "c"
## [1] "character"
printAndType(list(c(T,F),1:3,letters[1:3]));
## [[1]]
## [1] TRUE FALSE
##
## [[2]]
## [1] 1 2 3
##
## [[3]]
## [1] "a" "b" "c"
##
## [1] "list"
printAndType(expression(a+1,sum(1,2+3*4),if (T) 1 else 2));
## expression(a + 1, sum(1, 2 + 3 * 4), if (T) 1 else 2)
## [1] "expression"
If you want to store multiple functions in a single object, you have to use a list, and you must use the double-bracket indexing operator in the lvalue to assign to it:
fl <- list();
for (i in 1:3) fl[[i]] <- (function(i) { force(i); function(a) a+i; })(i);
fl;
## [[1]]
## function (a)
## a + i
## <environment: 0x600da11a0>
##
## [[2]]
## function (a)
## a + i
## <environment: 0x600da1ab0>
##
## [[3]]
## function (a)
## a + i
## <environment: 0x600da23f8>
sapply(fl,function(f) environment(f)$i);
## [1] 1 2 3
sapply(fl,function(f) f(3));
## [1] 4 5 6
In the above code I also demonstrate the proper way to closure around a loop variable. This requires creating a temporary function evaluation environment to hold a copy of i, and the returned function will then closure around that evaluation environment so that it can access the iteration-specific i. This holds true for other languages that support dynamic functions and closures, such as JavaScript. In R there is an additional requirement of forcing the promise to be resolved via force(), otherwise, for each generated function independently, the promise wouldn't be resolved until the first evaluation of that particular generated function, which would at that time lock in the current value of the promise target (the global i variable in this case) for that particular generated function. It should also be mentioned that this is an extremely wasteful design, to generate a temporary function for every iteration and evaluate it, which generates a new evaluation environment with a copy of the loop variable.
If you wanted to use this design then your code would become:
coc_glm_f <- list();
for (i in 1:nrow(coc_comp_model)) {
coc_glm_f[[i]] <- (function(i) { force(i); function(x) x*coc_comp_model$Gradient[i] + coc_comp_model$`Y-Intercept`[i]; })(i);
};
However, it probably doesn't make sense to create a separate function for every row of the data.frame. If you intended the x parameter to take a scalar value (by which I mean a one-element vector), then you can define the function as follows:
coc_glm_f <- function(x) x*coc_comp_model$Gradient + coc_comp_model$`Y-Intercept`;
This function is vectorized, meaning you can pass a vector for x, where each element of x would correspond to a row of coc_comp_model. For example:
coc_comp_model <- data.frame(Relationship=c('DG-r ~ DG-cl','CA3-r ~ CA3-cl','CA2-r ~ CA2-cl'),Output=c('DG-r','CA3-r','CA2-r'),Input=c('DG-cl','CA3-cl','CA2-cl'),`|r-Value|`=c(0.8271167,0.7461309,0.9732584),`Y-Intercept`=c(0.0027217513,0.0350767684,-0.0040992226),Gradient=c(12.9901380,27.6107963,35.8299582),check.names=F);
coc_glm_f(seq_len(nrow(coc_comp_model)));
## [1] 12.99286 55.25667 107.48578
I am really struggling to understand the following behaviour of R. Let's say we want to define a function f, which is supposed to return whether its argument exists as a variable; but we want to pass the argument without quotes. So for example to check whether variable y exists, we would call f(y).
f <- function(x) {
xchar <- deparse(substitute(x))
exists(xchar)
}
So I start a brand new R session and define f, but no other variables. I then get
f(y)
# [1] FALSE
f(z)
# [1] FALSE
f(f)
# [1] TRUE
f(x)
# [1] TRUE
The first three calls (on y, z, f) give the expected result. But there is no variable named x
exists("x")
# [1] FALSE
EDIT I now realise that this is because of the use of substitute, which will create the variable x. But is there a way around this?
The object x does exist inside the function since it is the name of the parameter.
If you modify the function
f <- function(...) {
xchar <- deparse(substitute(...))
exists(xchar)
}
you can see the expected output:
f(x)
# FALSE
You may want to just search the global environment
f <- function(x) {
xchar <- deparse(substitute(x))
exists(xchar,where=globalenv())
}
in which case you get:
> f(y)
[1] FALSE
> f(f)
[1] TRUE
> f(x)
[1] FALSE
> f(z)
[1] FALSE
> f(mean)
[1] TRUE
The title is the self-contained question. An example clarifies it: Consider
x=list(a=1, b="name")
f <- function(){
assign('y[["d"]]', FALSE, parent.frame() )
}
g <- function(y) {f(); print(y)}
g(x)
$a
[1] 1
$b
[1] "name"
whereas I would like to get
g(x)
$a
[1] 1
$b
[1] "name"
$d
[1] FALSE
A few remarks. I knew what is wrong in my original example, but am using it to make clear my objective. I want to avoid <<-, and want x to be changed in the parent frame.
I think my understanding of environments is primitive, and any references are appreciated.
The first argument to assign must be a variable name, not the character representation of an expression. Try replacing f with:
f <- function() with(parent.frame(), y$d <- FALSE)
Note that a, b and d are list components, not list attributes. If we wanted to add an attribute "d" to y in f's parent frame we would do this:
f <- function() with(parent.frame(), attr(y, "d") <- FALSE)
Also, note that depending on what you want to do it may (or may not) be better to have x be an environment or a proto object (from the proto package).
assign's first argument needs to be an object name. Your use of assign is basically the same as the counter-example at the end of the the assign help page. Observe:
> x=list(a=1, b="name")
> f <- function(){
+ assign('x["d"]', FALSE, parent.frame() )
+ }
> g <- function(y) {f(); print(`x["d"]`)}
> g(x)
[1] FALSE # a variable with the name `x["d"]` was created
This may be where you want to use "<<-" but it's generally considered suspect.
> f <- function(){
+ x$d <<- FALSE
+ }
> g <- function(y) {f(); print(y)}
> g(x)
$a
[1] 1
$b
[1] "name"
$d
[1] FALSE
A further thought, offered in the absence of any goal for this exercise and ignoring the term "attributes" which Gabor has pointed out has a specific meaning in R, but may not have been your goal. If all you want is the output to match your specs then this achieves that goal but take notice that no alteration of x in the global environment is occurring.
> f <- function(){
+ assign('y', c(x, d=FALSE), parent.frame() )
+ }
> g <- function(y) {f(); print(y)}
> g(x)
$a
[1] 1
$b
[1] "name"
$d
[1] FALSE
> x # `x` is unchanged
$a
[1] 1
$b
[1] "name"
The parent.frame for f is what might be called the "interior of g but the alteration does not propagate out to the global environment.