Given a list of objects that are either symbols or language, what is the best way to search for containment?
For example, consider the following example:
> a = list(substitute(1 + 2), substitute(2 + 3))
> substitute(1 + 2) %in% a
Error in match(x, table, nomatch = 0L) :
'match' requires vector arguments
> a == substitute(1 + 2)
[1] TRUE FALSE
Warning message:
In a == substitute(1 + 2) :
longer object length is not a multiple of shorter object length
The second method seems to work, but I am unsure of what the warning means.
Another idea is to use deparse and then compare characters, but this becomes complicated when the parsed expressions are long enough to exceed the width.cutoff in deparse.
Not sue why you would need to do this but you can use identical to do the comparison. However, since identical only compares two arguments, you will have to loop over your list, preferably using lapply...
lapply( a , function(x) identical( substitute(1 + 2) , x ) )
#[[1]]
#[1] TRUE
#[[2]]
#[1] FALSE
Or similarly you can still use ==. Inspection of substitute(1 + 2) reveals it to be a language object of length 3, whilst your list a is obviously of length 2, hence the warning on vector recycling. Therefore you just need to loop over the elements in your list which you can do thus:
lapply( a , `==` , substitute(1 + 2) )
#[[1]]
#[1] TRUE
#[[2]]
#[1] FALSE
Related
A little perplexed by this. I have a logical vector as such:
logical_vec <- c(TRUE, FALSE)
I am interested in capturing the indices of this logical vector so that I can subset another R object. For example, if I am interested in the position of the TRUE element, I thought I would use this:
which(TRUE, logical_vec)
[1] 1
But when trying to find which index is FALSE, I get an integer(0) error.
which(FALSE, logical_vec)
integer(0)
Does which only return conditions that satisfy as TRUE or am I doing something incorrect here?
Maybe this is what you want? Note that you supply a second argument arr.ind, which is not what you want.
logical_vec <- c(TRUE, FALSE)
which(logical_vec == TRUE)
#> [1] 1
which(logical_vec == FALSE)
#> [1] 2
Created on 2021-09-06 by the reprex package (v2.0.1)
which takes a single argument for 'x' and by passing two arguments, it takes the first one as 'x' and second argument by default is arr.ind = FALSE. According to ?which
which(x, arr.ind = FALSE, useNames = TRUE)
where
x - a logical vector or array. NAs are allowed and omitted (treated as if FALSE).
which(FALSE)
integer(0)
We may need to concatenate (c) to create a single vector instead of two arguments
which(c(FALSE, logical_vec))
[1] 2
Also, there is no need to do == on a logical vector - which by default gets the postion index of logical vector and if we need to negate, use !
which(logical_vec)
[1] 1
which(!logical_vec)
[1] 2
I know of the switch statement in R, but I'm curious if there's a way to assign the same action/value to multiple patterns in the same arm, similar to how it's in Rust:
let x = 1;
match x {
1 | 2 => println!("one or two"),
3 => println!("three"),
_ => println!("anything"),
}
I don't need to write two separate cases for 1 & 2, I can just combine them into one with '|'. It would be also helpful if I could define the default case ("_") if no pattern before was matched.
Preceding values with no assignment carry forward until an assigned value is found.
switch(
as.character(x),
"1"=,
"2"="one or two",
"3"="three",
"anything"
)
I use as.character(x) instead of just x because EXPR (the first argument) may be interpreted as positional instead of equality. From ?switch:
If the value of 'EXPR' is not a character string it is coerced to
integer. Note that this also happens for 'factor's, with a
warning, as typically the character level is meant. If the
integer is between 1 and 'nargs()-1' then the corresponding
element of '...' is evaluated and the result returned: thus if the
first argument is '3' then the fourth argument is evaluated and
returned.
So if x is an integer between 1 and the number of other arguments, then it is interpreted as a positional indicator, as in
switch(3, 'a','z','y','f')
# [1] "y"
which means that the named arguments are effectively ignored, as in this very confusing example
switch(3, '1'='a','3'='z','2'='y','4'='f')
# [1] "y"
Note that the help does not reference non-strings that are greater than nargs()-1 ... those integers return null:
(switch(9, '1'='a','3'='z','2'='y','4'='f'))
# NULL
Since it is the value of the integer you're looking to match, you need to confusingly convert to string:
switch(as.character(3), '1'='a','3'='z','2'='y','4'='f')
# [1] "z"
Alternatively,
dplyr::case_when(
x %in% 1:2 ~ "one or two",
x == 3 ~ "three",
TRUE ~ "anything"
)
# [1] "one or two"
or
data.table::fcase(
x %in% 1:2 , "one or two",
x == 3 , "three",
rep(TRUE, length(x)), "anything"
)
(The need for rep(TRUE,length(x)) is because fcase requires all arguments to be exactly the same length, i.e., it allows no recycling as many R functions allow. I personally would prefer that they allow 1 or N recycling instead of only N, but that isn't the way at the moment.)
This has an advantage that it is naturally vectorized.
switch is only length-1 friendly. A workaround for a vector x could be
sapply(x, switch, '1'='a', '3'='z', '2'='y', '4'='f')
(or, better yet, vapply enforcing the return class).
I wanna loop through a sequence of letters 'ABCDEFGHIJK', but the loop in R loops over 1 value at a time. Is there a way to loop over 3 values at a time? In this case the sequence 'ABCDEFGHIJK' would be looped as 'ABC' then 'DEF' and so on.
I've tried to change the length of the function but I still didn't find a way, I can do this in python but I didn't find any information about it in R nor in the help option of R.
xp <-'ACTGCT'
for(i in 1:length(xp)){
if(i == 'ACG'){
print('T')
}
}
We can use the vectorized substring, i.e.
substring('ABCDEFGHIJK', seq(1, nchar('ABCDEFGHIJK') - 1, 3), seq(3, nchar('ABCDEFGHIJK'), 3)) == 'ACG'
#[1] FALSE FALSE FALSE FALSE
NOTE: This will only extract 3-characters. So If at the end you are left with 2 characters, it will not return them. For the above example, it outputs:
substring('ABCDEFGHIJK', seq(1, nchar('ABCDEFGHIJK') - 1, 3), seq(3, nchar('ABCDEFGHIJK'), 3))
#[1] "ABC" "DEF" "GHI" ""
An option would be to split the string over each 3 characters and then do the comparison
lapply(strsplit(v1, "(?<=.{3})", perl = TRUE), function(x) x== 'ACG')
#[[1]]
#[1] FALSE FALSE FALSE FALSE
data
v1 <- 'ABCDEFGHIJK'
Here is a stringr solution that outputs a list for whether or not there are matches:
library(stringr)
# Split string into sequences of 3 (or fewer if length is not multiple of 3)
split_strings <- str_extract_all("ABCDEFGHIJK", ".{1,3}", simplify = T)[1,]
# The strings you want to loop through / search for
x <- c("ABC", "DEF", "GHI", "LMN")
# Output is named list
sapply(x, `%in%`, split_strings, simplify = F)
$ABC
[1] TRUE
$DEF
[1] TRUE
$GHI
[1] TRUE
$LMN
[1] FALSE
Or, if you only want to look for one element:
"ABC" %in% split_strings
[1] TRUE
1) Base R Iterate over the sequence 1, 4, 7, ... and use substr to extract the 3 character portion of the input string starting at that position number. Then perform whatever processing that is desired. If there are fewer than 3 characters in the last chunk it will use whatever is available for that chunk. This is a particularly good approach if you want to exit early since a break can be inserted into the loop.
for(i in seq(1, nchar(xp), 3)) {
s <- substr(xp, i, i+2)
print(s) # replace with desired processing
}
## [1] "ACT"
## [1] "GCT"
1a) lapply We translate the loop to lapply or sapply if one iteration does not depend on another.
process <- function(i) {
s <- substr(xp, i, i+2)
s # replace with desired processing
}
sapply(seq(1, nchar(xp), 3), process)
## [1] "ACT" "GCT"
2) rollapply Another possibility is to break the string up into single characters and then iterate over those passing a 3 element vector of single characters to the indicated function. Here we have used toString to process each chunk but that can be replaced with any other suitable function.
library(zoo)
rollapply(strsplit(xp, "")[[1]], 3, by = 3, toString, align = "left", partial = TRUE)
## [1] "A, C, T" "G, C, T"
For example
testList<-list(list(a=1,b=2,c=3),c(1,2),list(a=1,b=2,c=0))
I have a list of either numeric vectors or lists of 3 elements.
I want a boolean indicating list elements where the third element (element c) >0.
If I run
sapply(testList,function(x)is.list(x) & x[[3]]>1)
I get:
Error in x[[3]] : subscript out of bounds.
But really the problem is that the x[[3]]>1 should only be applied to the lists, not the vectors.
The boolean needs to be of length testList. Any simple way of doing this?
We can use an if/else condition to solve this. The if/else make sure that the second condition will apply only to list.
sapply(testList, function(x) if(is.list(x)) x[[3]] >0 else FALSE)
#[1] TRUE FALSE FALSE
According to ?"if"
if returns the value of the expression evaluated, or NULL invisibly if
none was (which may happen if there is no else).
This can also be used when there are NA values without resorting to additional braces.
testList[[3]][[3]] <- NA
sapply(testList, function(x) if(is.list(x)) x[[3]] >0 & !is.na(x[[3]]) else FALSE)
#[1] TRUE FALSE FALSE
Change the & to && to avoid vectorized boolean operation:
sapply(testList, function(x) is.list(x) && x[[3]] > 0)
# [1] TRUE FALSE FALSE
According to ?&&:
& and && indicate logical AND and | and || indicate logical OR. The
shorter form performs elementwise comparisons in much the same way as
arithmetic operators. The longer form evaluates left to right
examining only the first element of each vector. Evaluation proceeds
only until the result is determined.
So it says Evaluation proceeds only until the result is determined, that is, for &&, it will only proceed if the first condition is TRUE thus avoiding check the validity of the second statement for vector of length of 2, for example:
x = c(1, 2)
x[[3]]
# Error in x[[3]] : subscript out of bounds
TRUE && x[[3]]
# Error in x[[3]] : subscript out of bounds
FALSE && x[[3]]
# [1] FALSE # note here it doesn't check if x[[3]] is valid or not
This rule does not apply to &, however:
TRUE & x[[3]]
# Error in x[[3]] : subscript out of bounds
FALSE & x[[3]]
# Error in x[[3]] : subscript out of bounds
Both of them throw an error here.
I am using the bit64 package in some R code. I have created a vector
of 64 bit integers and then tried to use sapply to iterate over these
integers in a vector. Here is an example:
v = c(as.integer64(1), as.integer64(2), as.integer64(3))
sapply(v, function(x){is.integer64(x)})
sapply(v, function(x){print(x)})
Both the is.integer64(x) and print(x) give the incorrect
(or at least) unexpected answers (FALSE and incorrect float values).
I can circumvent this by directly indexing the vector c but I have
two questions:
Why the type conversion? Is their some rule R uses in such a scenario?
Any way one can avoid this type conversion?
TIA.
Here is the code of lapply:
function (X, FUN, ...)
{
FUN <- match.fun(FUN)
if (!is.vector(X) || is.object(X))
X <- as.list(X)
.Internal(lapply(X, FUN))
}
Now check this:
!is.vector(v)
#TRUE
as.list(v)
#[[1]]
#[1] 4.940656e-324
#
#[[2]]
#[1] 9.881313e-324
#
#[[3]]
#[1] 1.482197e-323
From help("as.list"):
Attributes may be dropped unless the argument already is a list or
expression.
So, either you creaste a list from the beginning or you add the class attributes:
v_list <- lapply(as.list(v), function(x) {
class(x) <- "integer64"
x
})
sapply(v_list, function(x){is.integer64(x)})
#[1] TRUE TRUE TRUE
The package authours should consider writing a method for as.list. Might be worth a feature request ...