Applying a simple function over a vector using vapply - r

This question is somewhat two-fold. First, I'm trying to figure out the best way to achieve a simple implementation of a function over a vector. Second, I'm trying to understand the meaning of FUN.VALUE in vapply.
I've defined a simple halfwave rectification function:
Fun = function(x) x[x<0]=0
I then define a vector:
A = -10:10
I wish to apply this function to the vector. I've tried a number of approaches, but can't figure out how to do it easily.
I could always do
A[A<0]=0
but that defeats the purpose of using a function.
I've tried defining the function so that it applies to each element of the vector, rather than the vector as a whole:
Fun = function(x) {if (x<0) {x=0} }
I can't use apply, since apply only seems to work when the object that you're applying the function to has dimensions, and a vector doesn't have dimensions in R for some reason.
If I use lapply, I get NULL, or 0 for each element of the output object, depending on which of the two above functions I use.
For vapply, I can't make any sense out of the documentation for FUN.VALUE
Here's what it says:
FUN.VALUE
a (generalized) vector; a template for the return value from FUN. See ‘Details’.
and in Details:
vapply returns a vector or array of type matching the FUN.VALUE. If length(FUN.VALUE) == 1 a vector of the same length as X is returned, otherwise an array. If FUN.VALUE is not an array, the result is a matrix with length(FUN.VALUE) rows and length(X) columns, otherwise an array a with dim(a) == c(dim(FUN.VALUE), length(X)).
This makes no sense to me. In my case, A is a vector with 20 elements, and clearly I want the output to be a 20 element vector. So how do I specify a generalized 20 element vector template? And wouldn't the length of such a template be 20, and not 1?
I'd really appreciate some help here. I'm finding the documentation to be incredibly frustrating, and it's hard to tell whether it's because I'm missing something obvious, or whether the documentation is vague.

Your function has a bug in it. Instead of
Fun = function(x) x[x<0]=0
You actually want:
correct_fun = function(x){
x[x<0]=0
return(x)
}
As is, your function is not returning anything at the moment. It modifies x but doesn't return it so it just dies with the function. Executing correct_fun on A I get:
[1] 0 0 0 0 0 0 0 0 0 0 0 1 2 3 4 5 6 7 8 9 10

Related

Iterating over the argument of a function (grep) passed to lapply

I currently am doing an operation similar to below:
v<-c("my","pig","is","big","with","a","name")
s<-c("m","g")
for(i in c(1:length(s))){
print(grep(v,pattern=s[i]))
}
Which prints
[1] 1 7
[1] 2 4
I would like to instead vectorize this operation where the return values are stored in a vector. I tried
mynewvector<-lapply(v,grep,pattern=s,x=v)
but the problem is that I don't know how to get lapply iterate over the elements passed as arguments (e.g. iterating over s). I saw this answer, but I don't think mapply works here because I am trying to hold one argument constant (x=v) and iterate over the other argument (pattern=s)
How would I do this?
Following up on d.b.'s response, the most clear solution is
lapply(s, function(a) grep(pattern = a, x = v))

what is the purpose of 'NULL' in processing loops?

sqr = seq(1, 100, by=2)
sqr.squared = NULL
for (n in 1:50)
{
sqr.squared[n] = sqr[n]^2
}
I came accross the loop above, for a beginner this was simple enough. To further understand r what was the precise purpose of the second line? For my research I gather it has something to do with resetting the vector. If someone could elaborate it'd be much appreciated.
sqr.squared <- NULL
is one of many ways initialize the empty vector sqr.squared prior to running it through a loop. In general, when the length of the resulting vector is known, it is much better practice to allocate the vector's length. So here,
sqr.squared <- vector("integer", 50)
would be much better practice. And faster too. This way you are not building the new vector in the loop. But since ^ is vectorized, you could also simply do
sqr[1:50] ^ 2
and ditch the loop all together.
Another way to think about it is to remember that everything in r is a function call, and functions need input (usually).
say you calculated y and want to store that value somewhere. You can do x <- y without initializing an x object (r does this for you unlike in other languages, c for example), but say you want to store it in a specific place in x.
So note that <- (or = in your example) is a function
y <- 1
x[2] <- y
# Error in x[2] <- y : object 'x' not found
This is a different function than <-. Since you want to put y at x[2], you need the function [<-
`[<-`(x, 2, y)
# Error: object 'x' not found
But this still doesn't work because we need the object x to use this function, so initialize x to something.
(x <- numeric(5))
# [1] 0 0 0 0 0
# and now use the function
`[<-`(x, 2, y)
# [1] 0 1 0 0 0
This prefix notation is easier for computers to parse (eg, + 1 1) but harder for humans (me at least), so we prefer infix notation (eg, 1 + 1). R makes such functions easier to use x[2] <- y rather than how I did above.
The first answer is correct, when you assign a NULL value to a variable, the purpose is to initialize a vector. In many cases, when you are working checking numbers or with different types of variables, you will need to set NULL this arrays, matrix, etc.
For example, in you want to create a some type of element, in some cases you will need to put something inside them. This is the purpose of to use NULL. In addition, sometimes you will require NA instead of NULL.

parameter passing mechanism in R

The following function is used to multiply a sequence 1:x by y
f1<-function(x,y){return (lapply(1:x, function(a,b) b*a, b=y))}
Looks like a is used to represent the element in the sequence 1:x, but I do not know how to understand this parameter passing mechanism. In other OO languages, like Java or C++, there have call by reference or call by value.
Short answer: R is call by value. Long answer: it can do both.
Call By Value, Lazy Evaluation, and Scoping
You'll want to read through: the R language definition for more details.
R mostly uses call by value but this is complicated by its lazy evaluation:
So you can have a function:
f <- function(x, y) {
x * 3
}
If you pass in two big matrixes to x and y, only x will be copied into the callee environment of f, because y is never used.
But you can also access variables in parent environments of f:
y <- 5
f <- function(x) {
x * y
}
f(3) # 15
Or even:
y <- 5
f <- function() {
x <- 3
g <- function() {
x * y
}
}
f() # returns function g()
f()() # returns 15
Call By Reference
There are two ways for doing call by reference in R that I know of.
One is by using Reference Classes, one of the three object oriented paradigms of R (see also: Advanced R programming: Object Oriented Field Guide)
The other is to use the bigmemory and bigmatrix packages (see The bigmemory project). This allows you to create matrices in memory (underlying data is stored in C), returning a pointer to the R session. This allows you to do fun things like accessing the same matrix from multiple R sessions.
To multiply a vector x by a constant y just do
x * y
The (some prefix)apply functions works very similar to each other, you want to map a function to every element of your vector, list, matrix and so on:
x = 1:10
x.squared = sapply(x, function(elem)elem * elem)
print(x.squared)
[1] 1 4 9 16 25 36 49 64 81 100
It gets better with matrices and data frames because you can now apply a function over all rows or columns, and collect the output. Like this:
m = matrix(1:9, ncol = 3)
# The 1 below means apply over rows, 2 would mean apply over cols
row.sums = apply(m, 1, function(some.row) sum(some.row))
print(row.sums)
[1] 12 15 18
If you're looking for a simple way to multiply a sequence by a constant, definitely use #Fernando's answer or something similar. I'm assuming you're just trying to determine how parameters are being passed in this code.
lapply calls its second argument (in your case function(a, b) b*a) with each of the values of its first argument 1, 2, ..., x. Those values will be passed as the first parameter to the second argument (so, in your case, they will be argument a).
Any additional parameters to lapply after the first two, in your case b=y, are passed to the function by name. So if you called your inner function fxn, then your invocation of lapply is making calls like fxn(1, b=4), fxn(2, b=4), .... The parameters are passed by value.
You should read the help of lapply to understand how it works. Read this excellent answer to get and a good explanation of different xxpply family functions.
From the help of laapply:
lapply(X, FUN, ...)
Here FUN is applied to each elementof X and ... refer to:
... optional arguments to FUN.
Since FUN has an optional argument b, We replace the ... by , b=y.
You can see it as a syntax sugar and to emphasize the fact that argument b is optional comparing to argument a. If the 2 arguments are symmetric maybe it is better to use mapply.

Vector with elements equal to a function evaluated at a, a+1,... b .in R

I have two integers a and b (with a less than b), as well as a function f(x). Is there a way of getting the vector
x<-(f(a), ..., f(b))
from R without having to explicitly having to write it out? as my a and b vary.
Thanks for your help.
You can try something like the following :
foo <- function(x) x+1
a <- 1
b <- 5
sapply(a:b, foo)
But note that if you need this kind of behavior, you should vectorize your function, ie make it accept a vector as argument instead of a single integer. In my previous example, the sapply is not needed at all : + is vectorized, so I can just do :
foo(a:b)

vectorize a bidimensional function in R

I have a some true and predicted labels
truth <- factor(c("+","+","-","+","+","-","-","-","-","-"))
pred <- factor(c("+","+","-","-","+","+","-","-","+","-"))
and I would like to build the confusion matrix.
I have a function that works on unary elements
f <- function(x,y){ sum(y==pred[truth == x])}
however, when I apply it to the outer product, to build the matrix, R seems unhappy.
outer(levels(truth), levels(truth), f)
Error in outer(levels(x), levels(x), f) :
dims [product 4] do not match the length of object [1]
What is the recommended strategy for this in R ?
I can always go through higher order stuff, but that seems clumsy.
I sometimes fail to understand where outer goes wrong, too. For this task I would have used the table function:
> table(truth,pred) # arguably a lot less clumsy than your effort.
pred
truth - +
- 4 2
+ 1 3
In this case, you are test whether a multivalued vector is "==" to a scalar.
outer assumes that the function passed to FUN can take vector arguments and work properly with them. If m and n are the lengths of the two vectors passed to outer, it will first create two vectors of length m*n such that every combination of inputs occurs, and pass these as the two new vectors to FUN. To this, outer expects, that FUN will return another vector of length m*n
The function described in your example doesn't really do this. In fact, it doesn't handle vectors correctly at all.
One way is to define another function that can handle vector inputs properly, or alternatively, if your program actually requires a simple matching, you could use table() as in #DWin 's answer
If you're redefining your function, outer is expecting a function that will be run for inputs:
f(c("+","+","-","-"), c("+","-","+","-"))
and per your example, ought to return,
c(3,1,2,4)
There is also the small matter of decoding the actual meaning of the error:
Again, if m and n are the lengths of the two vectors passed to outer, it will first create a vector of length m*n, and then reshapes it using (basically)
dim(output) = c(m,n)
This is the line that gives an error, because outer is trying to shape the output into a 2x2 matrix (total 2*2 = 4 items) while the function f, assuming no vectorization, has given only 1 output. Hence,
Error in outer(levels(x), levels(x), f) :
dims [product 4] do not match the length of object [1]

Resources