R - Compare vector of objects using Reduce - r

I am trying to compare vector a of objects using Reduce
all.equal does not work
== works for numericals but will not be sufficient for objects.
I would prefer a solution that does not use existing packages but R core functions only
Example (Simplified to use numeric vectors instead of objects):
test <- c(1,1,1,1,1)
Reduce("==",test)
[1] TRUE
I do not understand why == works while all.equal does not
Reduce(all.equal,test)
[1] "Modes: character, numeric"
[2] "Lengths: 3, 1"
[3] "target is character, current is numeric"
Final remark:
This is not a duplicate. I am interested in a solution that compares objects not numeric values
Comparison of the elements of a vector of numeric values: See existing solution on stackoverflow
Test for equality among all elements of a single numeric vector

You can try identical in sapply and compare each with the first element.
x <- list(list(1), list(1))
all(sapply(x[-1], identical, x[[1]]))
#[1] TRUE
x <- list(list(1), list(2))
all(sapply(x[-1], identical, x[[1]]))
#[1] FALSE

Here is a function returning TRUE if all pairwise comparisons of the elements inside a list were identical:
all_pairs_equal <- function(elements) {
all(mapply(function(x, y) identical(elements[x], elements[y]), 1, seq(1, length(elements))))
}
all_pairs_equal(list(iris, iris, iris))
#> [1] TRUE
all_pairs_equal(list(1, 1, 1))
#> [1] TRUE
all_pairs_equal(list(iris, iris, 2))
#> [1] FALSE
Created on 2021-10-05 by the reprex package (v2.0.1)

Related

How do you test if a matrix exists in a matrix list? (Wordle Project)

I am an infrequent R users so my apologies if any of my terminology is incorrect. I am working on a project around the game Wordle to see if a given Wordle submission in my family group chat is unique or if they have already been submitted before. The inspiration for this came from the twitter account "Scorigami" which tracks every NFL game and tweets whether or not that score has occurred before in the history of the league.
To load the Wordle entries into R, I've decided to turn each submission into a Matrix where 0 = incorrect letter, 1 = right letter/wrong position, and 2 = right letter/correct position. In R this looks like this:
wordle_brendan <- rbind(c(1,0,0,0,0),c(2,2,0,0,0),c(2,2,0,0,0),c(2,2,2,2,2))
wordle_jack <- rbind(c(2,0,0,0,0),c(2,2,0,0,0),c(2,2,2,2,2))
I then combine them into a list that will be used to check against any future Wordle submissions to see if they have been previously submitted.
list <- list(wordle_brendan, wordle_jack)
I think I am on the right track, but I don't know how to create a new wordle matrix to test whether that submission has been given before. Say I recreated "wordle_brendan" with the same values but under a different name... How would I then get R to check if that matrix exists in my preexisting list of matrices? I've tried using the %in% function 1,000 different ways but can't get it to work.. Any help would be much appreciated! Thanks! (And if you can think of a better way to do this, please let me know!)
There are multiple ways to do this, but this is pretty simple. We need some samples to check:
new1 <- list[[2]] # The same as your second matrix
new2 <- new1
new2[3, 5] <- 0 # Change one position from 2 to 0.
To compare
any(sapply(list, identical, y=new1))
# [1] TRUE
any(sapply(list, identical, y=new2))
# [1] FALSE
So new1 matches an existing matrix, but new2 does not. To see which matrix:
which(sapply(list, identical, y=new1))
# [1] 2
which(sapply(list, identical, y=new2))
# integer(0)
So new1 matches the second matrix in list, but new2 does not match any matrix.
Here is a way with a matequal function. Base function identical compares objects, not values and if the matrices have the same values but different attributes, such as names, identical returns FALSE.
This is many times too strict. A function that compares values only will return TRUE in these cases.
I will use dcarlson's new1 to illustrate this point.
matequal <- function(x, y) {
ok <- is.matrix(x) && is.matrix(y) && all(dim(x) == dim(y))
ok && all(x == y)
}
wordle_brendan <- rbind(c(1,0,0,0,0),c(2,2,0,0,0),c(2,2,0,0,0),c(2,2,2,2,2))
wordle_jack <- rbind(c(2,0,0,0,0),c(2,2,0,0,0),c(2,2,2,2,2))
list <- list(wordle_brendan, wordle_jack)
new1 <- list[[2]] # The same as your second matrix
wordle_john <- wordle_jack
dimnames(wordle_john) <- list(1:3, letters[1:5])
list2 <- list(wordle_brendan, wordle_jack, wordle_john)
sapply(list2, identical, y=new1)
#> [1] FALSE TRUE FALSE
sapply(list2, matequal, y=new1)
#> [1] FALSE TRUE TRUE
Created on 2022-09-27 with reprex v2.0.2
Edit
identical is not a function to compare two objects' values, it's a function to compare the objects themselves. In the following example identical returns FALSE though x and y have equal values, in the usual sense of equal.
matequal <- function(x, y) {
ok <- is.matrix(x) && is.matrix(y) && all(dim(x) == dim(y))
ok && all(x == y)
}
x <- matrix(1:5, ncol = 1)
y <- matrix(1 + 0:4, ncol = 1)
all(x == y)
#> [1] TRUE
identical(x, y)
#> [1] FALSE
matequal(x, y)
#> [1] TRUE
Created on 2022-09-28 with reprex v2.0.2
This is because the internal representations of x and y, borrowed from the C language, correspond to different class attributes. One of the objects stores elements of class "integer" and the other elements of class "numeric". The matrices both have the same class attribute ("matrix" "array"), the matrices elements' storage type is the main difference.
In a comment it is asked
Thank you and dcarlson for the response! Regarding the your two sapply lines, can you explain what the use would be behind using matequal as opposed to identical? Is the only difference that matequal takes into account the column and row names?
So the answer to the question in comment is no, the attributes, in this case dimnames, are not the only reason why identical is some or many times not ideal to compare R objects.
typeof(x)
#> [1] "integer"
typeof(y)
#> [1] "double"
class(x[1])
#> [1] "integer"
class(y[2])
#> [1] "numeric"
class(x)
#> [1] "matrix" "array"
class(y)
#> [1] "matrix" "array"
Created on 2022-09-28 with reprex v2.0.2

Which With a Logical Vector Returning Integer(0)

A little perplexed by this. I have a logical vector as such:
logical_vec <- c(TRUE, FALSE)
I am interested in capturing the indices of this logical vector so that I can subset another R object. For example, if I am interested in the position of the TRUE element, I thought I would use this:
which(TRUE, logical_vec)
[1] 1
But when trying to find which index is FALSE, I get an integer(0) error.
which(FALSE, logical_vec)
integer(0)
Does which only return conditions that satisfy as TRUE or am I doing something incorrect here?
Maybe this is what you want? Note that you supply a second argument arr.ind, which is not what you want.
logical_vec <- c(TRUE, FALSE)
which(logical_vec == TRUE)
#> [1] 1
which(logical_vec == FALSE)
#> [1] 2
Created on 2021-09-06 by the reprex package (v2.0.1)
which takes a single argument for 'x' and by passing two arguments, it takes the first one as 'x' and second argument by default is arr.ind = FALSE. According to ?which
which(x, arr.ind = FALSE, useNames = TRUE)
where
x - a logical vector or array. NAs are allowed and omitted (treated as if FALSE).
which(FALSE)
integer(0)
We may need to concatenate (c) to create a single vector instead of two arguments
which(c(FALSE, logical_vec))
[1] 2
Also, there is no need to do == on a logical vector - which by default gets the postion index of logical vector and if we need to negate, use !
which(logical_vec)
[1] 1
which(!logical_vec)
[1] 2

Remove the numbers < 4 digits in list in a data frame in R

I have a data frame like this this, i need to remove the values less than 4 digits in the item column,
department item
xyz009 c("1","676547","2","434567","3","567369","4","987654","6","54546676732")
Output
department item
xyz009 676547,434567,567369,987654,54546676732
Thank you for your help
Maybe you can try nchar+subset
> subset(v,nchar(v)>4)
[1] "676547" "434567" "567369"
[4] "987654" "54546676732"
DATA
v <- c("1","676547","2","434567","3","567369","4","987654","6","54546676732")
1.Create a minimal reproducible example
xyz009 <- c("1","676547","2","434567","3","567369","4","987654","6","54546676732")
2.Suggested solution using base R:
The vector xyz009 is of type character
typeof(xyz009)
[1] "character"
In order to do maths with it (i.e. use >) we have to cast it to numeric using as.numeric
num_xyz <- as.numeric(xyz009)
Now we can use an index to 'filter' values where an expression evaluates to TRUE:
test_result <- num_xyz > 9999
The vector test_result consists of booleans
test_result
[1] FALSE TRUE FALSE TRUE FALSE TRUE FALSE TRUE FALSE TRUE
We can use these booleans as an 'index' (R keeps only values where the index is TRUE):
num_xyz[test_result]
This returns:
[1] 676547 434567 567369 987654 54546676732
Using base R you can use unlist, and lapply:
xyz009<-c("1","676547","2","434567","3","567369","4","987654","6","54546676732")
unlist(lapply(xyz009,function(x) x[nchar(x)>3]))
The result is:
[1] "676547" "434567" "567369" "987654" "54546676732"

A problem on "identical()" function in R? How does "identical()" work for different types of objects?

(reproducible example added)
I cannot grasp enough why the following is FALSE (I aware they are double and integer resp.):
identical(1, as.integer(1)) # FALSE
?identical reveals:
num.eq:
logical indicating if (double and complex non-NA) numbers should be compared using == (‘equal’), or by bitwise comparison. The latter (non-default)
differentiates between -0 and +0.
sprintf("%.8190f", as.integer(1)) and sprintf("%.8190f", 1) return exactly equal bit pattern. So, I think that at least one of the following must return TRUE. But, I get FALSE in each of the following:
identical(1, as.integer(1), num.eq=TRUE) # FALSE
identical(1, as.integer(1), num.eq=FALSE) # FALSE
I consider like that now: If sprintf is a notation indicator, not the storage indicator, then this means identical() compares based on storage. i.e.
identical(bitpattern1, bitpattern1bitpattern2) returns FALSE. I could not find any other logical explanation to above FALSE/FALSE situation.
I do know that in both 32bit/64bit architecture of R, integers are stored as 32bit.
They are not identical precisely because they have different types. If you look at the documentation for identical you'll find the example identical(1, as.integer(1)) with the comment ## FALSE, stored as different types. That's one clue. The R language definition reminds us that:
Single numbers, such as 4.2, and strings, such as "four point two" are still vectors, of length 1; there are no more basic types (emphasis mine).
So, basically everything is a vector with a type (that's also why [1] shows up every time R returns something). You can check this by explicitly creating a vector with length 1 by using vector, and then comparing it to 0:
x <- vector("double", 1)
identical(x, 0)
# [1] TRUE
That is to say, both vector("double", 1) and 0 output vectors of type "double" and length == 1.
typeof and storage.mode point to the same thing, so you're kind of right when you say "this means identical() compares based on storage". I don't think this necessarily means that "bit patterns" are being compared, although I suppose it's possible. See what happens when you change the storage mode using storage.mode:
## Assign integer to x. This is really a vector length == 1.
x <- 1L
typeof(x)
# [1] "integer"
identical(x, 1L)
# [1] TRUE
## Now change the storage mode and compare again.
storage.mode(x) <- "double"
typeof(x)
# [1] "double"
identical(x, 1L) # This is no longer TRUE.
# [1] FALSE
identical(x, 1.0) # But this is.
# [1] TRUE
One last note: The documentation for identical states that num.eq is a…
logical indicating if (double and complex non-NA) numbers should be compared using == (‘equal’), or by bitwise comparison.
So, changing num.eq doesn't affect any comparison involving integers. Try the following:
# Comparing integers with integers.
identical(+0L, -0L, num.eq = T) # TRUE
identical(+0L, -0L, num.eq = F) # TRUE
# Comparing integers with doubles.
identical(+0, -0L, num.eq = T) # FALSE
identical(+0, -0L, num.eq = F) # FALSE
# Comparing doubles with doubles.
identical(+0.0, -0.0, num.eq = T) # TRUE
identical(+0.0, -0.0, num.eq = F) # FALSE

R: How to compare a pair of single-element lists without loop

This should be simple, but a limitiation with lapply (or at least in the way I understand to implement lapply) is only being allowed to pass a single list as the first argument. Here is a toy example of what I am trying to do:
a = list(1, 2, 3)
b = list(3, 2, 1)
a > b
What I want as output is:
[[1]]
[1] FALSE
[[2]]
[1] FALSE
[[3]]
[1] TRUE
which, through unlist(), I will convert to
[1] FALSE FALSE TRUE
Of course, you cannot do it using a > b, so instead, I get:
Error in a > b : comparison of these types is not implemented
What is the most elegant way to compare these lists without resorting to loops, which will yield an output similar to what I a looking for. Thanks!

Resources