parameter passing mechanism in R - r

The following function is used to multiply a sequence 1:x by y
f1<-function(x,y){return (lapply(1:x, function(a,b) b*a, b=y))}
Looks like a is used to represent the element in the sequence 1:x, but I do not know how to understand this parameter passing mechanism. In other OO languages, like Java or C++, there have call by reference or call by value.

Short answer: R is call by value. Long answer: it can do both.
Call By Value, Lazy Evaluation, and Scoping
You'll want to read through: the R language definition for more details.
R mostly uses call by value but this is complicated by its lazy evaluation:
So you can have a function:
f <- function(x, y) {
x * 3
}
If you pass in two big matrixes to x and y, only x will be copied into the callee environment of f, because y is never used.
But you can also access variables in parent environments of f:
y <- 5
f <- function(x) {
x * y
}
f(3) # 15
Or even:
y <- 5
f <- function() {
x <- 3
g <- function() {
x * y
}
}
f() # returns function g()
f()() # returns 15
Call By Reference
There are two ways for doing call by reference in R that I know of.
One is by using Reference Classes, one of the three object oriented paradigms of R (see also: Advanced R programming: Object Oriented Field Guide)
The other is to use the bigmemory and bigmatrix packages (see The bigmemory project). This allows you to create matrices in memory (underlying data is stored in C), returning a pointer to the R session. This allows you to do fun things like accessing the same matrix from multiple R sessions.

To multiply a vector x by a constant y just do
x * y
The (some prefix)apply functions works very similar to each other, you want to map a function to every element of your vector, list, matrix and so on:
x = 1:10
x.squared = sapply(x, function(elem)elem * elem)
print(x.squared)
[1] 1 4 9 16 25 36 49 64 81 100
It gets better with matrices and data frames because you can now apply a function over all rows or columns, and collect the output. Like this:
m = matrix(1:9, ncol = 3)
# The 1 below means apply over rows, 2 would mean apply over cols
row.sums = apply(m, 1, function(some.row) sum(some.row))
print(row.sums)
[1] 12 15 18

If you're looking for a simple way to multiply a sequence by a constant, definitely use #Fernando's answer or something similar. I'm assuming you're just trying to determine how parameters are being passed in this code.
lapply calls its second argument (in your case function(a, b) b*a) with each of the values of its first argument 1, 2, ..., x. Those values will be passed as the first parameter to the second argument (so, in your case, they will be argument a).
Any additional parameters to lapply after the first two, in your case b=y, are passed to the function by name. So if you called your inner function fxn, then your invocation of lapply is making calls like fxn(1, b=4), fxn(2, b=4), .... The parameters are passed by value.

You should read the help of lapply to understand how it works. Read this excellent answer to get and a good explanation of different xxpply family functions.
From the help of laapply:
lapply(X, FUN, ...)
Here FUN is applied to each elementof X and ... refer to:
... optional arguments to FUN.
Since FUN has an optional argument b, We replace the ... by , b=y.
You can see it as a syntax sugar and to emphasize the fact that argument b is optional comparing to argument a. If the 2 arguments are symmetric maybe it is better to use mapply.

Related

What is the calling function in R?

One of the most important things to know about the evaluation of arguments to a function is that supplied arguments and default arguments are treated differently. The supplied arguments to a function are evaluated in the evaluation frame of the calling function. The default arguments to a function are evaluated in the evaluation frame of the function.
I don't quite understand what it is meant by calling function. Is it the function that is invoked (like in interactive sesion with function that has named assigned you type name and hit enter). If yes how evaluation frame of the callinig function differs from evaluation frame of the function?
First change to standard terms. The arguments that are used in the function definition are the formal arguments and the arguments that are passed to the function when calling it are the actual arguments. (The quoted passage in the question is referring to the actual arguments when it uses the nonstandard term, supplied arguments.)
Consider two cases via example.
Case 1
Below f has the formal argument x and when f is called in the last line of code there are no actual arguments.
Now when f is called in the last line of code x gets the value 2 because x is not set until it is used and when it is used a is looked up within the function where it has the value 2, not in the caller where it has the value 1.
a <- 1
f <- function(x = a) {
a <- 2
x
}
f()
## [1] 2
Case 2
On the other hand the actual arguments are evaluated in the caller. In the last line of code below x is set to 1 because that is the value of b in the caller. Again, x is not evaluated until it is used but now even though b has been set to 2 in the function itself this has no effect on x. x is set to 1, not 2.
b <- 1
g <- function(x) { b <- 2; x + b }
g(b)
## [1] 3
Other
Although this covers the two cases in the quote note that there exists another case which is the situation that occurs when x is referred to in a function but is not defined in the function. In the code below a is a free variable in g since a is not an argument or otherwise defined in g. In this case when gg (which equals g) is called R attempts to look up a in the function g and fails but the next place it looks is not the caller (where a is 1) but the environment in which the function was defined, i.e. the environment where the word function appears and a is 2 in that environment.
a <- 1
f <- function() {
a <- 2
g <- function() a
}
gg <- f()
gg()
## [1] 2
This is referred to as lexical scoping since one can tell where the free variables are looked up by simply looking at the function definitions.

what is the purpose of 'NULL' in processing loops?

sqr = seq(1, 100, by=2)
sqr.squared = NULL
for (n in 1:50)
{
sqr.squared[n] = sqr[n]^2
}
I came accross the loop above, for a beginner this was simple enough. To further understand r what was the precise purpose of the second line? For my research I gather it has something to do with resetting the vector. If someone could elaborate it'd be much appreciated.
sqr.squared <- NULL
is one of many ways initialize the empty vector sqr.squared prior to running it through a loop. In general, when the length of the resulting vector is known, it is much better practice to allocate the vector's length. So here,
sqr.squared <- vector("integer", 50)
would be much better practice. And faster too. This way you are not building the new vector in the loop. But since ^ is vectorized, you could also simply do
sqr[1:50] ^ 2
and ditch the loop all together.
Another way to think about it is to remember that everything in r is a function call, and functions need input (usually).
say you calculated y and want to store that value somewhere. You can do x <- y without initializing an x object (r does this for you unlike in other languages, c for example), but say you want to store it in a specific place in x.
So note that <- (or = in your example) is a function
y <- 1
x[2] <- y
# Error in x[2] <- y : object 'x' not found
This is a different function than <-. Since you want to put y at x[2], you need the function [<-
`[<-`(x, 2, y)
# Error: object 'x' not found
But this still doesn't work because we need the object x to use this function, so initialize x to something.
(x <- numeric(5))
# [1] 0 0 0 0 0
# and now use the function
`[<-`(x, 2, y)
# [1] 0 1 0 0 0
This prefix notation is easier for computers to parse (eg, + 1 1) but harder for humans (me at least), so we prefer infix notation (eg, 1 + 1). R makes such functions easier to use x[2] <- y rather than how I did above.
The first answer is correct, when you assign a NULL value to a variable, the purpose is to initialize a vector. In many cases, when you are working checking numbers or with different types of variables, you will need to set NULL this arrays, matrix, etc.
For example, in you want to create a some type of element, in some cases you will need to put something inside them. This is the purpose of to use NULL. In addition, sometimes you will require NA instead of NULL.

R: passing by parameter to function and using apply instead of nested loop and recursive indexing failed

I have two lists of lists. humanSplit and ratSplit. humanSplit has element of the form::
> humanSplit[1]
$Fetal_Brain_408_AGTCAA_L001_R1_report.txt
humanGene humanReplicate alignment RNAtype
66 DGKI Fetal_Brain_408_AGTCAA_L001_R1_report.txt 6 reg
68 ARFGEF2 Fetal_Brain_408_AGTCAA_L001_R1_report.txt 5 reg
If you type humanSplit[[1]], it gives the data without name $Fetal_Brain_408_AGTCAA_L001_R1_report.txt
RatSplit is also essentially similar to humanSplit with difference in column order. I want to apply fisher's test to every possible pairing of replicates from humanSplit and ratSplit. Now I defined the following empty vector which I will use to store the informations of my fisher's test
humanReplicate <- vector(mode = 'character', length = 0)
ratReplicate <- vector(mode = 'character', length = 0)
pvalue <- vector(mode = 'numeric', length = 0)
For fisher's test between two replicates of humanSplit and ratSplit, I define the following function. In the function I use `geneList' which is a data.frame made by reading a file and has form:
> head(geneList)
human rat
1 5S_rRNA 5S_rRNA
2 5S_rRNA 5S_rRNA
Now here is the main function, where I use a function getGenetype which I already defined in other part of the code. Also x and y are integers :
fishertest <-function(x,y) {
ratReplicateName <- names(ratSplit[x])
humanReplicateName <- names(humanSplit[y])
## merging above two based on the one-to-one gene mapping as in geneList
## defined above.
mergedHumanData <-merge(geneList,humanSplit[[y]], by.x = "human", by.y = "humanGene")
mergedRatData <- merge(geneList, ratSplit[[x]], by.x = "rat", by.y = "ratGene")
## [here i do other manipulation with using already defined function
## getGenetype that is defined outside of this function and make things
## necessary to define following contingency table]
contingencyTable <- matrix(c(HnRn,HnRy,HyRn,HyRy), nrow = 2)
fisherTest <- fisher.test(contingencyTable)
humanReplicate <- c(humanReplicate,humanReplicateName )
ratReplicate <- c(ratReplicate,ratReplicateName )
pvalue <- c(pvalue , fisherTest$p)
}
After doing all this I do the make matrix eg to use in apply. Here I am basically trying to do something similar to double for loop and then using fisher
eg <- expand.grid(i = 1:length(ratSplit),j = 1:length(humanSplit))
junk = apply(eg, 1, fishertest(eg$i,eg$j))
Now the problem is, when I try to run, it gives the following error when it tries to use function fishertest in apply
Error in humanSplit[[y]] : recursive indexing failed at level 3
Rstudio points out problem in following line:
mergedHumanData <-merge(geneList,humanSplit[[y]], by.x = "human", by.y = "humanGene")
Ultimately, I want to do the following:
result <- data.frame(humanReplicate,ratReplicate, pvalue ,alternative, Conf.int1, Conf.int2, oddratio)
I am struggling with these questions:
In defining fishertest function, how should I pass ratSplit and humanSplit and already defined function getGenetype?
And how I should use apply here?
Any help would be much appreciated.
Up front: read ?apply. Additionally, the first three hits on google when searching for "R apply tutorial" are helpful snippets: one, two, and three.
Errors in fishertest()
The error message itself has nothing to do with apply. The reason it got as far as it did is because the arguments you provided actually resolved. Try to do eg$i by itself, and you'll see that it is returning a vector: the corresponding column in the eg data.frame. You are passing this vector as an index in the i argument. The primary reason your function erred out is because double-bracket indexing ([[) only works with singles, not vectors of length greater than 1. This is a great example of where production/deployed functions would need type-checking to ensure that each argument is a numeric of length 1; often not required for quick code but would have caught this mistake. Had it not been for the [[ limit, your function may have returned incorrect results. (I've been bitten by that many times!)
BTW: your code is also incorrect in its scoped access to pvalue, et al. If you make your function return just the numbers you need and the aggregate it outside of the function, your life will simplify. (pvalue <- c(pvalue, ...) will find pvalue assigned outside the function but will not update it as you want. You are defeating one purpose of writing this into a function. When thinking about writing this function, try to answer only this question: "how do I compare a single rat record with a single human record?" Only after that works correctly and simply without having to overwrite variables in the parent environment should you try to answer the question "how do I apply this function to all pairs and aggregate it?" Try very hard to have your function not change anything outside of its own environment.
Errors in apply()
Had your function worked properly despite these errors, you would have received the following error from apply:
apply(eg, 1, fishertest(eg$i, eg$j))
## Error in match.fun(FUN) :
## 'fishertest(eg$i, eg$j)' is not a function, character or symbol
When you call apply in this sense, it it parsing the third argument and, in this example, evaluates it. Since it is simply a call to fishertest(eg$i, eg$j) which is intended to return a data.frame row (inferred from your previous question), it resolves to such, and apply then sees something akin to:
apply(eg, 1, data.frame(...))
Now that you see that apply is being handed a data.frame and not a function.
The third argument (FUN) needs to be a function itself that takes as its first argument a vector containing the elements of the row (1) or column (2) of the matrix/data.frame. As an example, consider the following contrived example:
eg <- data.frame(aa = 1:5, bb = 11:15)
apply(eg, 1, mean)
## [1] 6 7 8 9 10
# similar to your use, will not work; this error comes from mean not getting
# any arguments, your error above is because
apply(eg, 1, mean())
## Error in mean.default() : argument "x" is missing, with no default
Realize that mean is a function itself, not the return value from a function (there is more to it, but this definition works). Because we're iterating over the rows of eg (because of the 1), the first iteration takes the first row and calls mean(c(1, 11)), which returns 6. The equivalent of your code here is mean()(c(1, 11)) will fail for a couple of reasons: (1) because mean requires an argument and is not getting, and (2) regardless, it does not return a function itself (in a "functional programming" paradigm, easy in R but uncommon for most programmers).
In the example here, mean will accept a single argument which is typically a vector of numerics. In your case, your function fishertest requires two arguments (templated by my previous answer to your question), which does not work. You have two options here:
Change your fishertest function to accept a single vector as an argument and parse the index numbers from it. Bothing of the following options do this:
fishertest <- function(v) {
x <- v[1]
y <- v[2]
ratReplicateName <- names(ratSplit[x])
## ...
}
or
fishertest <- function(x, y) {
if (missing(y)) {
y <- x[2]
x <- x[1]
}
ratReplicateName <- names(ratSplit[x])
## ...
}
The second version allows you to continue using the manual form of fishertest(1, 57) while also allowing you to do apply(eg, 1, fishertest) verbatim. Very readable, IMHO. (Better error checking and reporting can be used here, I'm just providing a MWE.)
Write an anonymous function to take the vector and split it up appropriately. This anonymous function could look something like function(ii) fishertest(ii[1], ii[2]). This is typically how it is done for functions that either do not transform as easily as in #1 above, or for functions you cannot or do not want to modify. You can either assign this intermediary function to a variable (which makes it no longer anonymous, figure that) and pass that intermediary to apply, or just pass it directly to apply, ala:
.func <- function(ii) fishertest(ii[1], ii[2])
apply(eg, 1, .func)
## equivalently
apply(eg, 1, function(ii) fishertest(ii[1], ii[2]))
There are two reasons why many people opt to name the function: (1) if the function is used multiple times, better to define once and reuse; (2) it makes the apply line easier to read than if it contained a complex multi-line function definition.
As a side note, there are some gotchas with using apply and family that, if you don't understand, will be confusing. Not the least of which is that when your function returns vectors, the matrix returned from apply will need to be transposed (with t()), after which you'll still need to rbind or otherwise aggregrate.
This is one area where using ddply may provide a more readable solution. There are several tutorials showing it off. For a quick intro, read this; for a more in depth discussion on the bigger picture in which ddply plays a part, read Hadley's Split, Apply, Combine Strategy for Data Analysis paper from JSS.

Vector with elements equal to a function evaluated at a, a+1,... b .in R

I have two integers a and b (with a less than b), as well as a function f(x). Is there a way of getting the vector
x<-(f(a), ..., f(b))
from R without having to explicitly having to write it out? as my a and b vary.
Thanks for your help.
You can try something like the following :
foo <- function(x) x+1
a <- 1
b <- 5
sapply(a:b, foo)
But note that if you need this kind of behavior, you should vectorize your function, ie make it accept a vector as argument instead of a single integer. In my previous example, the sapply is not needed at all : + is vectorized, so I can just do :
foo(a:b)

Why are arguments to replacement functions not evaluated lazily?

Consider the following simple function:
f <- function(x, value){print(x);print(substitute(value))}
Argument x will eventually be evaluated by print, but value never will. So we can get results like this:
> f(a, a)
Error in print(x) : object 'a' not found
> f(3, a)
[1] 3
a
> f(1+1, 1+1)
[1] 2
1 + 1
> f(1+1, 1+"one")
[1] 2
1 + "one"
Everything as expected.
Now consider the same function body in a replacement function:
'g<-' <- function(x, value){print(x);print(substitute(value))}
(the single quotes should be fancy quotes)
Let's try it:
> x <- 3
> g(x) <- 4
[1] 3
[1] 4
Nothing unusual so far...
> g(x) <- a
Error: object 'a' not found
This is unexpected. Name a should be printed as a language object.
> g(x) <- 1+1
[1] 4
1 + 1
This is ok, as x's former value is 4. Notice the expression passed unevaluated.
The final test:
> g(x) <- 1+"one"
Error in 1 + "one" : non-numeric argument to binary operator
Wait a minute... Why did it try to evaluate this expression?
Well the question is: bug or feature? What is going on here? I hope some guru users will shed some light about promises and lazy evaluation on R. Or we may just conclude it's a bug.
We can reduce the problem to a slightly simpler example:
g <- function(x, value)
'g<-' <- function(x, value) x
x <- 3
# Works
g(x, a)
`g<-`(x, a)
# Fails
g(x) <- a
This suggests that R is doing something special when evaluating a replacement function: I suspect it evaluates all arguments. I'm not sure why, but the comments in the C code (https://github.com/wch/r-source/blob/trunk/src/main/eval.c#L1656 and https://github.com/wch/r-source/blob/trunk/src/main/eval.c#L1181) suggest it may be to make sure other intermediate variables are not accidentally modified.
Luke Tierney has a long comment about the drawbacks of the current approach, and illustrates some of the more complicated ways replacement functions can be used:
There are two issues with the approach here:
A complex assignment within a complex assignment, like
f(x, y[] <- 1) <- 3, can cause the value temporary
variable for the outer assignment to be overwritten and
then removed by the inner one. This could be addressed by
using multiple temporaries or using a promise for this
variable as is done for the RHS. Printing of the
replacement function call in error messages might then need
to be adjusted.
With assignments of the form f(g(x, z), y) <- w the value
of z will be computed twice, once for a call to g(x, z)
and once for the call to the replacement function g<-. It
might be possible to address this by using promises.
Using more temporaries would not work as it would mess up
replacement functions that use substitute and/or
nonstandard evaluation (and there are packages that do
that -- igraph is one).
I think the key may be found in this comment beginning at line 1682 of "eval.c" (and immediately followed by the evaluation of the assignment operation's RHS):
/* It's important that the rhs get evaluated first because
assignment is right associative i.e. a <- b <- c is parsed as
a <- (b <- c). */
PROTECT(saverhs = rhs = eval(CADR(args), rho));
We expect that if we do g(x) <- a <- b <- 4 + 5, both a and b will be assigned the value 9; this is in fact what happens.
Apparently, the way that R ensures this consistent behavior is to always evaluate the RHS of an assignment first, before carrying out the rest of the assignment. If that evaluation fails (as when you try something like g(x) <- 1 + "a"), an error is thrown and no assignment takes place.
I'm going to go out on a limb here, so please, folks with more knowledge feel free to comment/edit.
Note that when you run
'g<-' <- function(x, value){print(x);print(substitute(value))}
x <- 1
g(x) <- 5
a side effect is that 5 is assigned to x. Hence, both must be evaluated. But if you then run
'g<-'(x,10)
both the values of x and 10 are printed, but the value of x remains the same.
Speculation:
So the parser is distinguishing between whether you call g<- in the course of making an actual assignment, and when you simply call g<- directly.

Resources