String matching within a list of lists [duplicate]

String matching within a list of lists [duplicate] - r

I have a list like this:
map_tmp <- list("ABC",
c("EGF", "HIJ"),
c("KML", "ABC-IOP"),
"SIN",
"KMLLL")
> grep("ABC", map_tmp)
[1] 1 3
> grep("^ABC$", map_tmp)
[1] 1 # by using regex, I get the index of "ABC" in the list
> grep("^KML$", map_tmp)
[1] 5 # I wanted 3, but I got 5. Claiming the end of a string by "$" didn't help in this case.
> grep("^HIJ$", map_tmp)
integer(0) # the regex do not return to me the index of a string inside the vector
How can I get the index of a string (exact match) in the list?
I'm ok not to use grep. Is there any way to get the index of a certain string (exact match) in the list? Thanks!

Using lapply:
which(lapply(map_tmp, function(x) grep("^HIJ$", x))!=0)
The lapply function gives you a list of which for each element in the list (0 if there's no match). The which!=0 function gives you the element in the list where your string occurs.

Use either mapply or Map with str_detect to find the position, I have run only for one string "KML" , you can run it for all others. I hope this is helpful.
First of all we make the lists even so that we can process it easily
library(stringr)
map_tmp_1 <- lapply(map_tmp, `length<-`, max(lengths(map_tmp)))
### Making the list even
val <- t(mapply(str_detect,map_tmp_1,"^KML$"))
> which(val[,1] == T)
[1] 3
> which(val[,2] == T)
integer(0)
In case of "ABC" string:
val <- t(mapply(str_detect,map_tmp_1,"ABC"))
> which(val[,1] == T)
[1] 1
> which(val[,2] == T)
[1] 3
>

I had the same question. I cannot explain why grep would work well in a list with characters but not with regex. Anyway, the best way I found to match a character string using common R script is:
map_tmp <- list("ABC",
c("EGF", "HIJ"),
c("KML", "ABC-IOP"),
"SIN",
"KMLLL")
sapply( map_tmp , match , 'ABC' )
It returns a list with similar structure as the input with 'NA' or '1', depending on the result of the match test:
[[1]]
[1] 1
[[2]]
[1] NA NA
[[3]]
[1] NA NA
[[4]]
[1] NA
[[5]]
[1] NA

Related

making for loop for character vector in R

char_vector <- c("Africa", "identical", "ending" ,"aa" ,"bb", "rain" ,"Friday" ,"transport") # character vector
Suppose I have the above character vector
I would like to create a for loop to print on the screen only the elements in a vector that have more than 5 characters and starts with a vowel
and also delete from the vector those elements that do not start with a vowel
I created this for loop but it also gives null characters
for (i in char_vector){
if (str_length(i) > 5){
i <- str_subset(i, "^[AEIOUaeiou]")
print(i)
}
}
The result for the above is
[1] "Africa"
[1] "identical"
[1] "ending"
character(0)
character(0)
My desired result would only be the first 3 characters
I'm really new to R and facing huge difficulty with creating a for loop for this problem. Any help would be greatly appreciated!

Use grepl with the pattern ^[AEIOUaeiuo]\w{5,}$:
char_vector <- c("Africa", "identical", "ending" ,"aa" ,"bb", "rain" ,"Friday" ,"transport")
char_vector <- char_vector[grepl("^[AEIOUaeiuo]\\w{5,}$", char_vector)]
char_vector
[1] "Africa" "identical" "ending"
The regex pattern used here says to match words which:
^ from the start of the word
[AEIOUaeiuo] starts with a vowel
\w{5,} followed by 5 or more characters (total length > 5)
$ end of the word

You don't need for loop, because we use vectorized functions in R.
A simple solution using grep and substr (refer to Tim Blegeleisen answer for details):
substr(grep('^[aeiu].{4}', char_vector, T, , T), 1, 3)
# [1] "Afr" "ide" "end"

With stringr functions, you'd rather use str_detect instead of str_subset, and you can take advantage of the fact that those functions are vectorized:
library(stringr)
char_vector[str_length(char_vector) > 5 & str_detect(char_vector, "^[AEIOUaeiou]")]
#[1] "Africa" "identical" "ending"
or if you want your for loop as a single vector:
vec <- c()
for (i in char_vector){
if (str_length(i) > 5 & str_detect(i, "^[AEIOUaeiou]")){
vec <- c(vec, i)
}
}
vec
# [1] "Africa" "identical" "ending"

The first 3 characters?
library(stringr)
for (i in char_vector){
if (str_length(i) > 5 & str_detect(i, "^[AEIOUaeiou]")) {
word <- str_sub(i, 1, 3)
print(word)
}
}
output is:
[1] "Afr"
[1] "ide"
[1] "end"

Using only base R functions. No need for a loop. I wrapped the steps in a function so you can use the function with other character vectors. You could make this code shorter (see #utubun's answer) but I feel it is easier to understand the process with a "one line one step" approach.
char_vector <- c("Africa", "identical", "ending" ,"aa" ,"bb", "rain" ,"Friday" ,"transport")
yourfun <- function(char_vector){
char_vector <- char_vector[nchar(char_vector)>= 5] # grab only the strings that are at least 5 characters long
char_vector <- char_vector[grep(pattern = "^[AEIOUaeiou]", char_vector)] # grab strings that starts with vowel
return(char_vector) # print the first three strings
# remove comments to get the first three characters of each string
# out <- substring(char_vector, 1, 3) # select only the first 3 characters of each string
# return(out)
}
yourfun(char_vector = char_vector)
#> [1] "Africa" "identical" "ending"
Created on 2022-05-09 by the reprex package (v2.0.1)

grep exact match in vector inside a list in R

I have a list like this:
map_tmp <- list("ABC",
c("EGF", "HIJ"),
c("KML", "ABC-IOP"),
"SIN",
"KMLLL")
> grep("ABC", map_tmp)
[1] 1 3
> grep("^ABC$", map_tmp)
[1] 1 # by using regex, I get the index of "ABC" in the list
> grep("^KML$", map_tmp)
[1] 5 # I wanted 3, but I got 5. Claiming the end of a string by "$" didn't help in this case.
> grep("^HIJ$", map_tmp)
integer(0) # the regex do not return to me the index of a string inside the vector
How can I get the index of a string (exact match) in the list?
I'm ok not to use grep. Is there any way to get the index of a certain string (exact match) in the list? Thanks!

Using lapply:
which(lapply(map_tmp, function(x) grep("^HIJ$", x))!=0)
The lapply function gives you a list of which for each element in the list (0 if there's no match). The which!=0 function gives you the element in the list where your string occurs.

Use either mapply or Map with str_detect to find the position, I have run only for one string "KML" , you can run it for all others. I hope this is helpful.
First of all we make the lists even so that we can process it easily
library(stringr)
map_tmp_1 <- lapply(map_tmp, `length<-`, max(lengths(map_tmp)))
### Making the list even
val <- t(mapply(str_detect,map_tmp_1,"^KML$"))
> which(val[,1] == T)
[1] 3
> which(val[,2] == T)
integer(0)
In case of "ABC" string:
val <- t(mapply(str_detect,map_tmp_1,"ABC"))
> which(val[,1] == T)
[1] 1
> which(val[,2] == T)
[1] 3
>

I had the same question. I cannot explain why grep would work well in a list with characters but not with regex. Anyway, the best way I found to match a character string using common R script is:
map_tmp <- list("ABC",
c("EGF", "HIJ"),
c("KML", "ABC-IOP"),
"SIN",
"KMLLL")
sapply( map_tmp , match , 'ABC' )
It returns a list with similar structure as the input with 'NA' or '1', depending on the result of the match test:
[[1]]
[1] 1
[[2]]
[1] NA NA
[[3]]
[1] NA NA
[[4]]
[1] NA
[[5]]
[1] NA

Get indices of all character elements matches in string in R

I want to get indices of all occurences of character elements in some word. Assume these character elements I look for are: l, e, a, z.
I tried the following regex in grep function and tens of its modifications, but I keep receiving not what I want.
grep("/([leazoscnz]{1})/", "ylaf", value = F)
gives me
numeric(0)
where I would like:
[1] 2 3

To use grep work with individual characters of a string, you first need to split the string into separate character vectors. You can use strsplit for this:
strsplit("ylaf", split="")[[1]]
[1] "y" "l" "a" "f"
Next you need to simplify your regular expression, and try the grep again:
strsplit("ylaf", split="")[[1]]
grep("[leazoscnz]", strsplit("ylaf", split="")[[1]])
[1] 2 3
But it is easier to use gregexpr:
gregexpr("[leazoscnz]", "ylaf")
[[1]]
[1] 2 3
attr(,"match.length")
[1] 1 1
attr(,"useBytes")
[1] TRUE

Vector-list comparison in R

I am currently trying to check if a list(containing multiple vectors filled with values) is equal to a vector. Unfortunately the following functions did not worked for me: match(), any(), %in%. An example of what I am trying to achieve is given below:
Lets say:
lists=list(c(1,2,3,4),c(5,6,7,8),c(9,7))
vector=c(1,2,3,4)
answer=match(lists,vector)
When I execute this it does return False values instead of a positive result. When I compare a vector with a vector is working but when I compare a vector with a list it seems that it can not work properly.

I would use intersect, something like this :
lapply(lists,intersect,vector)
[[1]]
[1] 1 2 3 4
[[2]]
numeric(0)
[[3]]
numeric(0)

I'm not completely sure what you want the result to be (for example do you care about vector order?) but regardless you'll need to think about lapply. For example,
##Create some data
R> lists=list(c(1,2,3,4),c(5,6,7,8),c(9,7))
R> vector=c(1,2,3,4)
then we use lapply to go through each list element and apply a function. In this case, I've used the match function (since you mentioned that in your question):
R> lapply(lists, function(i) all(match(i, vector)))
[[1]]
[1] TRUE
[[2]]
[1] NA
[[3]]
[1] NA
It's probably worth converting to a vector, so
R> unlist(lapply(lists, function(i) all(match(i, vector))))
[1] TRUE NA NA
and to change NA to FALSE, something like:
m = unlist(lapply(lists, function(i) all(match(i, vector))))
m[is.na(m)] = FALSE

How do I use `[` correctly with (l|s)apply to select a specific column from a list of matrices?

Consider the following situation where I have a list of n matrices (this is just dummy data in the example below) in the object myList
mat <- matrix(1:12, ncol = 3)
myList <- list(mat1 = mat, mat2 = mat, mat3 = mat, mat4 = mat)
I want to select a specific column from each of the matrices and do something with it. This will get me the first column of each matrix and return it as a matrix (lapply() would give me a list either is fine).
sapply(myList, function(x) x[, 1])
What I can't seem able to do is use [ directly as a function in my sapply() or lapply() incantations. ?'[' tells me that I need to supply argument j as the column identifier. So what am I doing wrong that this does't work?
> lapply(myList, `[`, j = 1)
$mat1
[1] 1
$mat2
[1] 1
$mat3
[1] 1
$mat4
[1] 1
Where I would expect this:
$mat1
[1] 1 2 3 4
$mat2
[1] 1 2 3 4
$mat3
[1] 1 2 3 4
$mat4
[1] 1 2 3 4
I suspect I am getting the wrong [ method but I can't work out why? Thoughts?

I think you are getting the 1 argument form of [. If you do lapply(myList, `[`, i =, j = 1) it works.

After two pints of Britain's finest ale and a bit of cogitation, I realise that this version will work:
lapply(myList, `[`, , 1)
i.e. don't name anything and treat it like I had done mat[ ,1]. Still don't grep why naming j doesn't work...
...actually, having read ?'[' more closely, I notice the following section:
Argument matching:
Note that these operations do not match their index arguments in
the standard way: argument names are ignored and positional
matching only is used. So ‘m[j=2,i=1]’ is equivalent to ‘m[2,1]’
and *not* to ‘m[1,2]’.
And that explains my quandary above. Yeah for actually reading the documentation.

It's because [ is a .Primitive function. It has no j argument. And there is no [.matrix method.
> `[`
.Primitive("[")
> args(`[`)
NULL
> methods(`[`)
[1] [.acf* [.AsIs [.bibentry* [.data.frame
[5] [.Date [.difftime [.factor [.formula*
[9] [.getAnywhere* [.hexmode [.listof [.noquote
[13] [.numeric_version [.octmode [.person* [.POSIXct
[17] [.POSIXlt [.raster* [.roman* [.SavedPlots*
[21] [.simple.list [.terms* [.ts* [.tskernel*
Though this really just begs the question of how [ is being dispatched on matrix objects...

Categories

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

String matching within a list of lists [duplicate] - r

Using lapply: which(lapply(map_tmp, function(x) grep("^HIJ$", x))!=0) The lapply function gives you a list of which for each element in the list (0 if there's no match). The which!=0 function gives you the element in the list where your string occurs.

Related

making for loop for character vector in R

grep exact match in vector inside a list in R

Get indices of all character elements matches in string in R

Vector-list comparison in R

How do I use `[` correctly with (l|s)apply to select a specific column from a list of matrices?

Categories

Resources