FUN.VALUE argument in vapply - r

I don't really get the FUN.VALUE argument in vapply.
Here is my example:
a = list(list(1,2), list(1), list(1,2,3))
# give the lengths of each list in a
sapply(a, length)
Now, I try to make it type-safe using vapply instead of sapply
# gives me same result as sapply
vapply(a, length, FUN.VALUE=1)
# same result, but why?
vapply(a, length, FUN.VALUE=1000)
# gives me error
vapply(a, length, FUN.VALUE="integer")
# gives me error
vapply(a, length, FUN.VALUE="vector")
# gives me error
vapply(a, length, FUN.VALUE=c(1,2))
From ?vapply I read that FUN.VALUE can be a scalar, vector or matrix, and is used to match the type of the output. Any hints for why vapply behaves this way?

From the vapply documentation,
FUN.VALUE (generalized) vector; a template for the return value from FUN. See ‘Details’.
Notice that it says "a template for the return value", not "a character string describing the return value". Looking in the Details section provides more guidance:
This function checks that all values of FUN are compatible with the FUN.VALUE, in that they must have the same length and type. [Emphasis added]
The function in your example, length() returns a numeric of length 1, (an integer, if we want to be specific). When you use FUN.VALUE = 1, 1 has the same length and type as the output you expect, so the check passes. When you use FUN.VALUE = "integer", "integer" is a character vector of length 1, so the length check passes but the type check fails.
It continues:
(Types may be promoted to a higher type within the ordering logical < integer < double < complex, but not demoted.)
So, if you were vapplying a function that might return a value that is integer or double, you should make sure to specify something like FUN.VALUE = 1.0
The documentation continues to talk about how arrays/matrices are handled. I won't copy/paste it here.

Related

I'm having trouble calling the vectors in an array rather than the whole array as a single vector

I'm creating an array which contains ten-thousand vectors, where each vector has 4 character vectors, which can be either "win" or "lose".
I then want to call each individual vector and use the "any" function to returns TRUE if any one character vector in each vector is "win", and false otherwise. AKA if the vector is c("lose", "lose", "lose", "lose"), it returns FALSE, and otherwise, TRUE.
I of course wish to do this all at once, and I thought it could be done either by passing the array of vectors through the "any" function and get an array back, like some other function allow, or by using the "apply" function with the array and "any() == TRUE" s arguments.
B <- 10000
set.seed(1)
a <- replicate(B, sample(c("lose","win"), 4, replace = TRUE, prob = c(0.6, 0.4)))
Option 1
celtic_wins <- any(a[,1:10000] == "win")
OR
Option 2
celtic_wins <- apply(a, any() == "win")
What actually happens in both cases (I think but can't be sure) is that the array gets parsed into a vector of vectors, which r then treats as a single 40,000 element long vector, checks whether a single "win" character vector exists in the whole lot (which is like 99.99999999....% the case), and thus, returns a single TRUE statement, rather than 10,000 Boolean values.
If this is the case, I don't know how to create a work around; please help?
does this give you what you want?
apply(a,MARGIN = 2, FUN = function(x) {any(x=="win")})
as #Gregor mentioned below, this can be simplified to:
apply(a == "win", MARGIN = 2, any)
The first version may help you understand the apply() function better and what the arguments are doing, but once you understand what apply() is doing I would use the second version (#Gregor's version) in production as it is simpler and cleaner.
You can see if 'win' is part of a column x by checking whether the sum of x == 'win' is positive. a == 'win' will give a matrix with the same dimensions as a, with elements equal to TRUE if the corresponding element of a is 'win', and FALSE otherwise. colSums(a == 'win') creates a vector whose i-th element is the sum of column i in the matrix a == 'win'.
colSums(a == 'win') > 0

How can I pass NULL as an argument in R?

I have a vector arguments, where some values are NA. I would like to pass these arguments sequentially to a function like this:
myFunction(argument = ifelse(!is.na(arguments[i]),
arguments[i], NULL))
so that it will take the value in arguments[i] whenever it's not NA, and take the default NULL otherwise. But this generates an error.
If it makes any difference, the function in question is match_on(), from the optmatch package. The argument in question is caliper, because I would like to provide a caliper only when one is available (i.e. when the value in the vector of calipers is not NA). And the error message is this:
Error in ans[!test & ok] <- rep(no, length.out = length(ans))[!test & :
replacement has length zero
In addition: Warning message:
In rep(no, length.out = length(ans)) :'x' is NULL so the result will be NULL
You can use ?switch() instead of ifelse -
myFunction(argument = switch(is.na(arguments[i]) + 1, arguments[i], NULL))
Here's the help doc for switch -
switch(EXPR, ...)
Arguments
EXPR an expression evaluating to a number or a character string.
... the list of alternatives. If it is intended that EXPR has a
character-string value these will be named, perhaps except for one
alternative to be used as a ‘default’ value.
Details
switch works in two distinct ways depending whether the first argument
evaluates to a character string or a number.
If the value of EXPR is not a character string it is coerced to
integer. If the integer is between 1 and nargs()-1 then the
corresponding element of ... is evaluated and the result returned:
thus if the first argument is 3 then the fourth argument is evaluated
and returned
Basically, when argument is NA then EXPR evaluates to 2 which returns NULL and when it is not NA then EXPR evaluates to 1 and returns arguments[i].

Confused by a vapply function using grepl internally (Part of datacamp course)

hits <- vapply(titles,
FUN = grepl,
FUN.VALUE = logical(length(pass_names)),
pass_names)
titles is a vector with titles such as "mr", pass_names is a list of names.
2 questions.
I don't understand the resulting matrix hits
I don't understand why the last line is pass_names nor what how I am supposed to know about these 4 arguments. Under ?vapply it specificies the x, FUN, FUN.VALUE but I cannot figure out how I am supposed to figure out that pass_names needs to be listed there.
I have looked online and could not find an answer, so I hope this will help others too. Thank you in advance for your answers, yes I am a beginner.
Extra info: This question uses the titanic package in R, pass_names is just titanic$Name, titles is just paste(",", c("Mr\\.", "Master", "Don", "Rev", "Dr\\.", "Major", "Sir", "Col", "Capt", "Jonkheer"))
You're right to be a bit confused.
The vapply code chunk in your question is equivalent to:
hits <- vapply(titles,
FUN = function(x) grepl(x, pass_names),
FUN.VALUE = logical(length(pass_names)))
vapply takes a ... argument which takes as many arguments as are provided. If the arguments are not named (see #Roland's comment), the n-th argument in the ... position is passed to the n+1-th argument of FUN (the first argument to FUN is X, i.e. titles in this case).
The resulting matrix has the same number of rows as the number of rows in titanic and has 10 columns, the length of titles. The [i, j]-th entry is TRUE if the i-th pass_names matches the j-th regular expression in titles, FALSE if it doesn't.
Essentially you are passing two vectors in your vapply which is equivalent to two nested for loops. Each pairing is then passed into the required arguments of grepl: grepl(pattern, x).
Specifically, on first loop of vapply the first item in titles is compared with every item of pass_names. Then on second loop, the second item in titles is compared again to all items of pass_names and so on until first vector, titles, is exhausted.
To illustrate, you can equivalently build a hits2 matrix using nested for loops, rendering exactly as your vapply output, hits:
hits2 <- matrix(NA, nrow=length(df$name), ncol=length(titles))
colnames(hits2) <- titles
for (i in seq_along(df$name)) {
for (j in seq_along(titles)) {
hits2[i, j] <- grepl(pattern=titles[j], x=df$name[i])
}
}
all.equal(hits, hits2)
# [1] TRUE
Alternatively, you can run same exact in sapply without the required FUN.VALUE argument as both sapply and vapply are wrappers to lapply. However, vapply is more preferred as you proactively assert your output while sapply renders one way depending on function. For instance, in vapply you could render an integer matrix with: FUN.VALUE = integer(length(pass_names)).
hits3 <- sapply(titles, FUN = grepl, pass_names)
all.equal(hits, hits3)
# [1] TRUE
All in all, the apply family are more concise, compact ways to run iterations and renders a data structure instead of initializing and assigning a vector/matrix with for or while loops.
For further reading, consider this interesting SO post: Is the “*apply” family really not vectorized?

How is.integer() works in R

I have a data frame of numerics,integers and string. I would like to check which columns are integers and I do
raw<-read.csv('./rawcorpus.csv',head=F)
ints<-sapply(raw,is.integer)
anyway this gives me all false. So I have to make a little change
nums<-sapply(raw,is.numeric)
ints2<-sapply(raw[,nums],function(col){return(!(sum(col%%1)==0))})
The second case works fine. My question is: what is actually checking the 'is.integer' function?
By default, R will store all numbers as double precision floating points, i.e., the numeric. Three useful functions class, typeof and storage.mode will tell you how a value is stored. Try:
x <- 1
class(x)
typeof(x)
storage.mode(x)
If you want x to be integer 1, you should do with suffix "L"
x <- 1L
class(x)
typeof(x)
storage.mode(x)
Or, you can cast numeric to integers by:
x <- as.integer(1)
class(x)
typeof(x)
storage.mode(x)
The is.integer function checks whether the storage mode is integer or not. Compare
is.integer(1)
is.integer(1L)
You should be aware that some functions actually return numeric, even if you expect it to return integer. These include round, floor, ceiling, and mod operator %%.
From R documentation:
is.integer(x) does not test if x contains integer numbers! For that, use round, as in the function is.wholenumber(x) in the examples.
So in is.integer(x), x must be a vector and if that contains integer numbers, you will get true. In your first example, argument is a number, not a vector
Hope that helps
Source: https://stat.ethz.ch/R-manual/R-devel/library/base/html/integer.html

Sapply over list of matrices - knowing index of current element

I am quite new to R and I have found one vector operation frustrating:
I just want to know an index of current element of the list while using sapply, let's say: to print an index, but all my trials do not work, e.g.:
> test <- sapply(my.list.of.matrices,
function(x) print(which(my.list.of.matrices == x)))
Error in which(my.list.of.matrices == x) :
(list) object cannot be coerced to type 'logical'
In addition: Warning message:
In my.list.of.matrices == x :
longer object length is not a multiple of shorter object length
That's not possible unless you pass an index vector. sapply and lapply just pass the elements. And in that case it becomes a disguised for-loop.

Resources