imap() not working as I expected, what am I missing? - r

So here, i get errors with imap, when using the index (".y") to go through a list. below I have made it work with map2, but this is confusing, because the way I made the map2() function is exacly the same way I would've thought that imap would do it. But it clearly isn't, otherwise it wouldn't error out.
I would love to understand the purrr logic as good as possible, could anyone tell me what's going on?
library(purrr)
l1 <- list(a='a', b='b')
# single brackets - 'missing value where TRUE/FALSE needed'
imap(l1, ~{
y1 <- names(l1)[.y]
if(y1 == 'a') out1 <- TRUE
if(y1 == 'b') out1 <- FALSE
out
})
# double brackets - subscript out of bounds
imap(l1, ~{
y1 <- names(l1)[[.y]]
if(y1 == 'a') out1 <- TRUE
if(y1 == 'b') out1 <- FALSE
out
})
# emulating what I think imap() does
map2(l1, seq_along(l1), ~{
y1 <- names(l1)[.y]
if(y1 == 'a') out1 <- TRUE
if(y1 == 'b') out1 <- FALSE
out1
})

If the names are present in the list, .y is the name of the list and not it's index. So,
names(l1)['a'] #returns
#[1] NA
which explains 'missing value where TRUE/FALSE needed'.
and
names(l1)[['a']]
returns
Error in names(l1)[["a"]] : subscript out of bounds
What you need is -
purrr::imap(l1, ~{
if(.y == 'a') out1 <- TRUE
if(.y == 'b') out1 <- FALSE
out1
})
#$a
#[1] TRUE
#$b
#[1] FALSE

perhaps you simply need
imap(l1, ~ if(.y == 'a') TRUE else FALSE)
$a
[1] TRUE
$b
[1] FALSE

Related

Is there a better way to check if all elements in a list are named?

I want to check if all elements in a list are named. I've came up with this solution, but I wanted to know if there is a more elegant way to check this.
x <- list(a = 1, b = 2)
y <- list(1, b = 2)
z <- list (1, 2)
any(stringr::str_length(methods::allNames(x)) == 0L) # FALSE, all elements are
# named.
any(stringr::str_length(methods::allNames(y)) == 0L) # TRUE, at least one
# element is not named.
# Throw an error here.
any(stringr::str_length(methods::allNames(z)) == 0L) # TRUE, at least one
# element is not named.
# Throw an error here.
I am not sure if the following base R code works for your general cases, but it seems work for the ones in your post.
Define a function f to check the names
f <- function(lst) length(lst) == sum(names(lst) != "",na.rm = TRUE)
and you will see
> f(x)
[1] TRUE
> f(y)
[1] FALSE
> f(z)
[1] FALSE
We can create a function to check if the the names attribute is NULL or (|) there is blank ("") name, negate (!)
f1 <- function(lst1) is.list(lst1) && !(is.null(names(lst1))| '' %in% names(lst1))
-checking
f1(x)
#[1] TRUE
f1(y)
#[1] FALSE
f1(z)
#[1] FALSE
Or with allNames
f2 <- function(lst1) is.list(lst1) && !("" %in% allNames(lst1))
-checking
f2(x)
#[1] TRUE
f2(y)
#[1] FALSE
f2(z)
#[1] FALSE

Create indicator variables within a list

I have a list containing sequences of numbers. I want to create a list that indicates all non-zero elements up to the first element that matches a defined limit. I also want to create a list that indicates all non-zero elements after the first element to match the defined limit.
I prefer a base R solution. Presumably the solution will use lapply, but I have not been able to come up with a simple solution.
Below is a minimally reproducible example in which the limit is 2:
my.limit <- 2
my.samples <- list(0,c(1,2),0,c(0,1,1),0,0,0,0,0,c(1,1,2,2,3,4),c(0,1,2),0,c(0,0,1,1,2,2,3))
Here are the two desired lists:
within.limit <- list(0,c(1,1),0,c(0,1,1),0,0,0,0,0,c(1,1,1,0,0,0),c(0,1,1),0,c(0,0,1,1,1,0,0))
outside.limit <- list(0,c(0,0),0,c(0,0,0),0,0,0,0,0,c(0,0,0,1,1,1),c(0,0,0),0,c(0,0,0,0,0,1,1))
We can use match with nomatch argument as a very big number (should be greater than any length of the list, for some reason I couldn't use Inf here.)
within.limit1 <- lapply(my.samples, function(x)
+(x > 0 & seq_along(x) <= match(my.limit, x, nomatch = 1000)))
outside.limit1 <- lapply(my.samples, function(x)
+(seq_along(x) > match(my.limit, x, nomatch = 1000)))
Checking if output is correct to shown one :
all(mapply(function(x, y) all(x == y), within.limit, within.limit1))
#[1] TRUE
all(mapply(function(x, y) all(x == y), outside.limit, outside.limit1))
#[1] TRUE
I would do
within.limit <- lapply(my.samples, function(x)
+(x!=0 & (x<limit | cumsum(x == limit)==1)))
outside.limit <- lapply(my.samples, function(x)
+(x!=0 & (x>limit | cumsum(x == limit)>1)))
foo <- function(samples, limit, within = TRUE) {
`%cp%` <- if (within) `<=` else `>`
lapply(samples, function(x) pmin(x, seq_along(x) %cp% match(my.limit, x, nomatch = 1e8)))
}
> all.equal(foo(my.samples, my.limit, FALSE), outside.limit)
# [1] TRUE
> all.equal(foo(my.samples, my.limit, TRUE), within.limit)
# [1] TRUE
We can use findInterval
lapply(my.samples, function(x)
+(x > 0 & seq_along(x) <= findInterval(my.limit, x)-1))
and
lapply(my.samples, function(x) +(seq_along(x) > findInterval(my.limit, x)-1))

Identifying source of FALSE

My question is, does there exist a function that, given a logical statement, identifies the source of FALSE (if it is false)?
For example,
x=1; y=1; z=1;
x==1 & y==1 & z==2
Obviously it is the value of z that makes the statement false. In general though, is there a function that let's me identify the variable(s) in a logical statement who's value makes a logical statement false?
Instead of writing x==1 & y==1 & z==2 you could define
cn <- c(x == 1, y == 1, z == 2)
or
cn <- c(x, y, z) == c(1, 1, 2)
and use all(cn). Then
which(!cn)
# [1] 3
gives the source(s) of FALSE.
In general, no, there is no such function that you are looking for, but for different logical statements a similar approach should work, although it might be too lengthy to pursue.
Considering (!(x %in% c(1,2,3)) & y==3) | z %in% c(4,5), we get FALSE if z %in% c(4,5) is FALSE and (!(x %in% c(1,2,3)) & y==3) is FALSE simultaneously. So, if (!(x %in% c(1,2,3)) & y==3) | z %in% c(4,5) returns FALSE, we are sure about z and still need to check x and y, so that the list of problematic variables can be obtained as follows:
if(!((!(x %in% c(1,2,3)) & y==3) | z %in% c(4,5)))
c("x", "y", "z")[c(x %in% c(1,2,3), !y == 3, TRUE)]
# [1] "x" "y" "z"
or
a <- !(x %in% c(1,2,3))
b <- y == 3
c <- z %in% c(4,5)
if(!((a & b) | c))
c("x", "y", "z")[c(!a, !b, TRUE)]
# [1] "x" "y" "z"
I like #julius's answer but there is also the stopifnot function.
x <- 1; y <- 1; z <- 2
stopifnot(x == 1, y == 1, z == 1)
#Error: z == 1 is not TRUE
Not that the result is an error if there are any false statements and nothing if they're all true. It also stops at the first false statement so if you had something like
x <- T; y <- F; z <- F
stopifnot(x, y, z)
#Error: y is not TRUE
you would not be told that z is FALSE in this case.
So the result isn't a logical or an index but instead is either nothing or an error. This doesn't seem desirable but it is useful if the reason you're using it is for checking inputs to a function or something similar where you want to produce an error on invalid inputs and just keep on moving if everything is fine. I mention stopifnot because it seems like this might be the situation you're in. I'm not sure.
Here is a silly example where you might use it. In this case you apparently only want positive numbers as input and reject everything else:
doublePositiveNumber <- function(x){
stopifnot(is.numeric(x), x >= 0)
return(2*x)
}
which results in
> doublePositiveNumber("hey")
Error: is.numeric(x) is not TRUE
> doublePositiveNumber(-2)
Error: x >= 0 is not TRUE
> doublePositiveNumber(2)
[1] 4
So here you guarantee you get the inputs you want and produce and error message for the user that hopefully tells them what the issue is.

R, whether all the elements of X are present in Y

In R, how do you test for elements of one vector NOT present in another vector?
X <- c('a','b','c','d')
Y <- c('b', 'e', 'a','d','c','f', 'c')
I want to know whether all the elements of X are present in Y ? (TRUE or FALSE answer)
You can use all and %in% to test if all values of X are also in Y:
all(X %in% Y)
#[1] TRUE
You want setdiff:
> setdiff(X, Y) # all elements present in X but not Y
character(0)
> length(setdiff(X, Y)) == 0
[1] TRUE
A warning about setdiff : if your input vectors have repeated elements, setdiff will ignore the duplicates. This may or may not be what you want to do.
I wrote a package vecsets , and here's the difference in what you'll get. Note that I modified X to demonstrate the behavior.
library(vecsets)
X <- c('a','b','c','d','d')
Y <- c('b', 'e', 'a','d','c','f', 'c')
setdiff(X,Y)
character(0)
vsetdiff(X,Y)
[1] "d"

How do I test if three variables are equal [R]

I'm trying to do if else statement which includes a condition if three variables in the data frame equal each other.
I was hoping to use the identical function but not sure whether this works for three variables.
I've also used the following but R doesn't seem to like this:
geno$VarMatch <- ifelse((geno[c(1)] != '' & geno[c(2)] != '' & geno[c(3)] != '')
& (geno[c(5)] == geno[c(4)] == geno[c(6)]), 'Not Important', 'Important')
Keeps telling me:
Error: unexpected '=='
Am I supposed to specify something as data.frame/vector etc... Coming from an SPSS stand point, I'm slightly confused.
Sorry for the simplistic query.
I see so complicated results, mine is simple:
all(sapply(list(a,b,c,d), function(x) x == d))
returns TRUE, if all equals d all equals each other.
Here's a recursive function which generalises to any number of inputs and runs identical on them. It returns FALSE if any member of the set of inputs is not identical to the others.
ident <- function(...){
args <- c(...)
if( length( args ) > 2L ){
# recursively call ident()
out <- c( identical( args[1] , args[2] ) , ident(args[-1]))
}else{
out <- identical( args[1] , args[2] )
}
return( all( out ) )
}
ident(1,1,1,1,1)
#[1] TRUE
ident(1,1,1,1,2)
#[1] FALSE
If it's about numeric values, you can put the numbers in an array, then check the array's max and min, as well:
if(max(list) == min(list))
# all numbers in list are equal
else
# at least one element has a different value
You need to use:
geno$VarMatch <- ifelse((gene[c(1)] != '' & gene[c(2)] != '' &
gene[c(3)] != '') &
((gene[c(5)] == gene[c(4)]) &
(gene[c(4)] == gene[c(6)]))),
'Not Important', 'Important')
The == is a binary operator which returns a single logical value. R doesn't expect further input past your first evaluation, unless you feed it a Boolean & for vectors. You may want to modify this, but here's one attempt at a functional programming approach:
testEqual <- function(x, y) ifelse(x == y, x, FALSE)
all(!!Reduce(testEqual, list(1:10, 1:10))) # True
all(!!Reduce(testEqual, rep(T, 3))) # True
all(!!Reduce(testEqual, list(1, 5, 10))) # False
all(!!Reduce(testEqual, list(T, T, F))) # False
The double negation is used to convert values to logical vectors, and the all command returns a single Boolean. This only works for numeric values or logical vectors.
I'm throwing this out here just for fun. I'm not sure if I would actually use this approach, but any critiques are welcomed.
This answer is based on #John's comment under the OP. This is by far the easiest way to go about this.
geno$VarMatch <- ifelse((geno[c(1)] != '' & geno[c(2)] != '' & geno[c(3)] != '')
& (geno[c(5)] == geno[c(4)] & geno[c(5)] == geno[c(6)]), 'Not Important', 'Important')
Simpler than the other answers, and can be used with basic subsetting/ assignment too, e.g.
geno$VarMatch[geno[c(5)] == geno[c(4)] & geno[c(5)] == geno[c(6)]] <– 'Important'
I think you can just come up with simple generic function comparing three elements and then using mutate and rowwise from dplyr apply those to each combination.
library("tidyverse")
set.seed(123)
dta_sample <- tibble(
colA = sample(letters, 10000, TRUE),
colB = sample(letters, 10000, TRUE),
colC = sample(letters, 10000, TRUE)
)
compare_strs <- function(one, two, three) {
if (one == two) {
if (two == three) {
return(TRUE)
} else {
return(FALSE)
}
} else {
return(FALSE)
}
}
dta_sample %>%
rowwise() %>%
mutate(all_cols_identical = compare_strs(colA, colB, colC)) %>%
# For results
filter(all_cols_identical)
Preview
# A tibble: 25 x 4
# Rowwise:
colA colB colC all_cols_identical
<chr> <chr> <chr> <lgl>
1 w w w TRUE
2 k k k TRUE
3 m m m TRUE
4 b b b TRUE
5 y y y TRUE
6 n n n TRUE
7 e e e TRUE
8 j j j TRUE
9 q q q TRUE
10 a a a TRUE
# … with 15 more rows

Resources