outer reuses first element of X instead of doing its job - r

I have a two argument function that takes as its first input a triple of pairs of numbers in the form "(a, b)(c, d)(e, f)" (as a character string) and as second argument a pair of numbers (also written as a character string of the form "(a, b)") and outputs a logical that states if the pair (the second argument) is one of the three pairs in the triple (the first argument). I actually wrote two versions:
version1 <- function(x, y){#x is a triple of pairs, y is a pair
pairsfromthistriple <- paste(c("", "(", "("), strsplit(x, split = ")(", fixed = T)[[1]], c(")", ")", ""), sep = "")
y %in% pairsfromthistriple
}
version2 <- function(x, y){#x is triple of pairs, y is pair
y == substr(x, 1, 6) | y == substr(x, 7, 12) | y == substr(x, 13, 18)
}
I want to set this function loose for every triple-of-pairs from a vector of triples an every pair from some vector of pairs using outer. For here I'll us the following very short vectors:
triples <- c("(1, 2)(3, 4)(5, 6)", "(1, 2)(3, 5)(4, 6)")
names(triples) <- triples
pairs <- c("(5, 6)", "(3, 5)")
names(pairs) <- pairs
So here we go:
test1 <- outer(X = triples, Y = pairs, FUN = version1)
test2 <- outer(X = triples, Y = pairs, FUN = version2)
test2 evaluates to exactly what you expect, but test1 gives a non-sensical output:
> test1
(5, 6) (3, 5)
(1, 2)(3, 4)(5, 6) TRUE FALSE
(1, 2)(3, 5)(4, 6) TRUE FALSE
> test2
(5, 6) (3, 5)
(1, 2)(3, 4)(5, 6) TRUE FALSE
(1, 2)(3, 5)(4, 6) FALSE TRUE
The natural conclusion is that there is an error in version1, but it is not as simple as that. 'Manually' computing the terms in the matrix using version1 gives:
> version1(triples[1], pairs[1])
[1] TRUE
> version1(triples[1], pairs[2])
[1] FALSE
> version1(triples[2], pairs[1])
[1] FALSE
> version1(triples[2], pairs[2])
[1] TRUE
exactly as it should! So at least part of the fault is with the function outer. In fact what happens (in this small example it is not so clear, but this is very visible in larger examples) is that outer correctly computes the first row of its output matrix, but then copies this first row over and over to make up the subsequent rows. Obviously this is not what I want. If I only wanted to compute version1(x, y) for all y in some vector but just one single x, I would have used sapply rather than outer.
What is going on here?

Note this detail from the documentation for ?outer:
X and Y must be suitable arguments for FUN. Each will be extended by rep to length the products of the lengths of X and Y before FUN is called.
FUN is called with these two extended vectors as arguments (plus any arguments in ...). It must be a vectorized function (or the name of one) expecting at least two arguments and returning a value with the same length as the first (and the second).
Your version1 function is not vectorized properly like version2 is. You can see this by simply testing it on the original triples and pairs vectors, which should both match.
version1(triples, pairs)
#> [1] TRUE FALSE
version2(triples, pairs)
#> (5, 6) (3, 5)
#> TRUE TRUE
Your version1 function seems designed for use with apply(), because you retrieve a list from strsplit() but then just take the first element. If you want to maintain the approach of splitting the vector, then you would have to use the apply family of functions. Without using them, you are going to expand the triples or x vector into something much longer than y and you can't do element wise comparison.
However, I would just use something very simple. stringr::str_detect is already vectorized for string and pattern, so you can just use that directly.
library(stringr)
outer(X = triples, Y = pairs, FUN = str_detect)
#> (5, 6) (3, 5)
#> (1, 2)(3, 4)(5, 6) TRUE FALSE
#> (1, 2)(3, 5)(4, 6) FALSE TRUE

Related

ifelse is acting oddly by giving answer other than designed, in r [duplicate]

This question already has answers here:
if-else vs ifelse with lists
(3 answers)
Closed 8 years ago.
Those two functions should give similar results, don't they?
f1 <- function(x, y) {
if (missing(y)) {
out <- x
} else {
out <- c(x, y)
}
return(out)
}
f2 <- function(x, y) ifelse(missing(y), x, c(x, y))
Results:
> f1(1, 2)
[1] 1 2
> f2(1, 2)
[1] 1
This is not related to missing, but rather to your wrong use of ifelse. From help("ifelse"):
ifelse returns a value with the same shape as test which is filled with elements selected from either yes or no depending on whether the element of test is TRUE or FALSE.
The "shape" of your test is a length-one vector. Thus, a length-one vector is returned. ifelse is not just different syntax for if and else.
The same result occurs outside of the function:
> ifelse(FALSE, 1, c(1, 2))
[1] 1
The function ifelse is designed for use with vectorised arguments. It tests the first element of arg1, and if true returns the first element of arg2, if false the first element of arg3. In this case it ignores the trailing elements of arg3 and returns only the first element, which is equivalent to the TRUE value in this case, which is the confusing part. It is clearer what is going on with different arguments:
> ifelse(FALSE, 1, c(2, 3))
[1] 2
> ifelse(c(FALSE, FALSE), 1, c(2,3))
[1] 2 3
It is important to remember that everything (even length 1) is a vector in R, and that some functions deal with each element individually ('vectorised' functions) and some with the vector as a whole.

if-else with list-like true/false arguments

Is there an elegant way to simplify this call?
a <- list(1, 2, 3)
b <- list(4, 5)
conditional = TRUE
if (conditional) {
x <- a
} else {
x <- b
}
x
# [1, 2, 3]
I've tried x <- ifelse(TRUE, a, b), but it assumes the conditional is a vector which must be iterated, so in this case it returns a single value (in this case, 1).
dplyr's if_else, on the other hand, demands that the lists be of equal length. And even if they were, it also iterates through the conditional and would also output a single value 1.
So, is there some clean way of solving this or is the simple if{}else{} the way to go?
Here's a simple one-liner using switch.
x <- switch(TRUE + 1, b, a)
x
[[1]]
[1] 1
[[2]]
[1] 2
[[3]]
[1] 3
x <- switch(FALSE + 1, b, a)
x
[[1]]
[1] 4
[[2]]
[1] 5
This uses the switch behavior with integer EXPR as described in documentation -
switch works in two distinct ways depending whether the first argument
evaluates to a character string or a number.
If the value of EXPR is not a character string it is coerced to
integer. Note that this also happens for factors, with a warning, as
typically the character level is meant. If the integer is between 1
and nargs()-1 then the corresponding element of ... is evaluated and
the result returned: thus if the first argument is 3 then the fourth
argument is evaluated and returned.

How to pass an arbitrary number of arguments to R function without for loop?

My question is about getting rid of a for loop while retaining the functionality of the code.
I have a matrix of pairwise orderings of elements A_1, A_2, ... A_N. Each ordering is represented as a row of a matrix. The code below shows an example.
# Matrix representing the relations
# A1 < A2, A1 < A5, A2 < A4
(mat <- matrix(c(1, 2, 1, 5, 2, 4), ncol = 2, byrow = TRUE))
#> [,1] [,2]
#> [1,] 1 2
#> [2,] 1 5
#> [3,] 2 4
I want this whole matrix as a set of ordered pairs. The reason is that I later need to generate the transitive closure of these relations. I have been using the sets package and created the function below.
create_sets <- function(mat){
# Empty set
my_set <- sets::set()
# For loop for adding pair elements to the set, one at a time
for(i in seq(from = 1, to = nrow(mat), by = 1)){
my_set <- sets::set_union(my_set,
sets::pair(mat[[i, 1]], mat[[i, 2]]))
}
return(my_set)
}
create_sets(mat)
#> {(1, 2), (1, 5), (2, 4)}
This function works well, but I believe the for loop is unnecessary, and am not capable of replacing it. For the particular example matrix above with exactly three rows, I could instead have used to following code:
my_set2 <- sets::set(
sets::pair(mat[[1, 1]], mat[[1, 2]]),
sets::pair(mat[[2, 1]], mat[[2, 2]]),
sets::pair(mat[[3, 1]], mat[[3, 2]])
)
my_set2
#> {(1, 2), (1, 5), (2, 4)}
The reason why this works, is that sets::set takes any number of pairs.
args(sets::set)
#> function (...)
#> NULL
However, the matrix mat will have an arbitrary number of rows, and I want the function to be able to handle all possible cases. This is why I have not been able to get rid of the for loop.
My question is hence: Given a matrix mat in which each row represents an ordered pair, is there some generic way of passing the pairs in each row as separate arguments to sets::set, without looping?
The OP has asked
[...] is there some generic way of passing the pairs in each row as separate arguments to sets::set, without looping?
Yes, the do.call() function is probably what you are looking for. From help(do.call):
do.call constructs and executes a function call from a name or a function and a list of arguments to be passed to it.
So, OP's create_sets() function can be replaced by
do.call(sets::set, apply(mat, 1, function(x) sets::pair(x[1], x[2])))
{(1, 2), (1, 5), (2, 4)}
The second argument to do.call() requires a list. This is created by
apply(mat, 1, function(x) sets::pair(x[1], x[2]))
which returns the list
[[1]]
(1, 2)
[[2]]
(1, 5)
[[3]]
(2, 4)
apply(mat, 1, FUN) is a kind of implied for loop which loops over the rows of a matrix mat and takes the vector of row values as argument when calling function FUN.
Edit: as.tuple() instead of pair()
The pair() function requires exactly two arguments. This is why we were forced to define an anonymous function function(x) sets::pair(x[1], x[2]).
The as.tuple() function coerces the elements of an object into elements of a set. So, the code can be even more simplified :
do.call(sets::set, apply(mat, 1, sets::as.tuple))
{(1, 2), (1, 5), (2, 4)}
Here, as.tuple() takes the whole vector of row values and coerces it to a set.
Option 1: do nothing
for loops aren't always the end of the world, this doesn't look too bad if your matrices aren't enormous.
Option 2: the split, apply, combine way (by way of a new function)
Write a function that combines the row things (there is a shorter way to do this, but this makes your task explicit)
f <- function(x) {
sets::pair(x[1], x[2])
}
Reduce(sets::set_union, lapply(split(mat, 1:nrow(mat)), f))
## {(1, 2), (1, 5), (2, 4)}
The Reduce does the same thing as the for loop (repeatedly apply set_union), and the lapply turns the matrix into a list of pairs (also like a for loop would)

R identical returning False for strings that are identical [duplicate]

This question already has an answer here:
Behavior of identical() in apply in R
(1 answer)
Closed 4 years ago.
I have a data frame consisting of identical strings, but the identical() function is returning false when I compare them?
Example:
df <- data.frame("x" = rep("a", times = 10),
"y" = rep("a", times = 10))
checkEquality <- function(x) {
y = x[1]
z = x[2]
return(identical(y, z))
}
apply(df[1:2], 1, checkEquality)
This code returns a vector of FALSE when it should return a vector of TRUE. I have no idea what's going on here. Any help appreciated.
It's because they're not totally identical. Your function takes the data frame row by row and then compares the former columns. Since you use the single bracket operator [] you maintain the column and row names:
x = df[1,]
x[1]
x
1 a
x[2]
y
1 a
While the value is the same, the column names are different so the two vectors are not identical.
If you use the double bracket notation [[]], then it will extract just that one element, dropping the row and column names and it should work:
checkEquality <- function(x) {
y = x[[1]]
z = x[[2]]
return(identical(y, z))
}
apply(df, 1, checkEquality)
[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
I haven't used identical() before, but have you tried ifelse()?
ifelse(col1==col2, 'TRUE', 'FALSE')

How to find a string in a vector in r?

I have created a function that essentially creates a vector of a 1000 binary values. I have been able to count the longest streak of consecutive 1s by using rle.
I was wondering how to find a specific vector (say c(1,0,0,1)) in this larger vector? I would want it to return the amount of occurrences of that vector. So c(1,0,0,1,1,0,0,1) should return 2, while c(1,0,0,0,1) should return 0.
Most solutions that I have found just find whether a sequence occurs at all and return TRUE or FALSE, or they give results for the individual values, not the specific vector that is specified.
Here's my code so far:
# creates a function where a 1000 people choose either up or down.
updown <- function(){
n = 1000
X = rep(0,n)
Y = rbinom(n, 1, 1 / 2)
X[Y == 1] = "up"
X[Y == 0] = "down"
#calculate the length of the longest streak of ups:
Y1 <- rle(Y)
streaks <- Y1$lengths[Y1$values == c(1)]
max(streaks, na.rm=TRUE)
}
# repeat this process n times to find the average outcome.
longeststring <- replicate(1000, updown())
longeststring(p_vals)
This will also work:
library(stringr)
x <- c(1,0,0,1)
y <- c(1,0,0,1,1,0,0,1)
length(unlist(str_match_all(paste(y, collapse=''), '1001')))
[1] 2
y <- c(1,0,0,0,1)
length(unlist(str_match_all(paste(y, collapse=''), '1001')))
[1] 0
If you want to match overlapped patterns,
y <- c(1,0,0,1,0,0,1) # overlapped
length(unlist(gregexpr("(?=1001)",paste(y, collapse=''),perl=TRUE)))
[1] 2
Since Y is only 0s and 1s, we can paste it into a string and use regex, specifically gregexpr. Simplified a bit:
set.seed(47) # for reproducibility
Y <- rbinom(1000, 1, 1 / 2)
count_pattern <- function(pattern, x){
sum(gregexpr(paste(pattern, collapse = ''),
paste(x, collapse = ''))[[1]] > 0)
}
count_pattern(c(1, 0, 0, 1), Y)
## [1] 59
paste reduces the pattern and Y down to strings, e.g. "1001" for the pattern here, and a 1000-character string for Y. gregexpr searches for all occurrences of the pattern in Y and returns the indices of the matches (together with a little more information so they can be extracted, if one wanted). Because gregexpr will return -1 for no match, testing for numbers greater than 0 will let us simply sum the TRUE values to get the number of macthes; in this case, 59.
The other sample cases mentioned:
count_pattern(c(1,0,0,1), c(1,0,0,1,1,0,0,1))
## [1] 2
count_pattern(c(1,0,0,1), c(1,0,0,0,1))
## [1] 0

Resources