How to use "and" with grepl? [duplicate] - r

This question already has answers here:
Is it possible to use an AND operator in grepl()?
(2 answers)
Closed 4 years ago.
This what I have:
f=5.20
y=168.9850
dat=c("dat.txt","dat_5.20.txt","data_5.20_168.9850.txt")
Filter(function(x) grepl(f, x), dat)
# [1] "dat_5.20.txt" "data_5.20_168.9850.txt"
I need to grep only the one obtained f and y
How to use both f and y in grepl?
The desired result would be:
"data_5.20_168.9850.txt"

One pure regex way of doing this would be to just use two lookahead assertions which independently check for the presence of each of the number strings:
f <- "5\\.20"
y <- "168\\.9850"
dat <- c("dat.txt","dat_5.20.txt","data_5.20_168.9850.txt")
grepl(paste0("(?=.*", f, ")(?=.*", y, ")"), dat, perl=TRUE)
[1] FALSE FALSE TRUE
The pattern used here is (?=.*5\.20)(?=.*168\.9850).

I suppose if you had a long set of search strings and you didn't want to have to type out everything you could do:
dat[Reduce("&", lapply(c(f,y), function(x, dat) grepl(x, dat), dat = dat))]
However, you could probably also get around typing everything out using #TimBiegeleisen's method by doing something like: paste0("(?=.*", c(f,y), ")", collapse = "") and using the result as your search string.

We can do two grep's using any of these alternatives:
grep(y, grep(f, dat, value = TRUE), value = TRUE)
## [1] "data_5.20_168.9850.txt"
dat[grepl(f, dat) & grepl(y, dat)]
## [1] "data_5.20_168.9850.txt"
dat[ intersect(grep(f, dat), grep(y, dat)) ]
## [1] "data_5.20_168.9850.txt"

Related

How to get the most frequent character within a character string? [duplicate]

This question already has answers here:
Finding the most repeated character in a string in R
(2 answers)
Closed 1 year ago.
Suppose the next character string:
test_string <- "A A B B C C C H I"
Is there any way to extract the most frequent value within test_string?
Something like:
extract_most_frequent_character(test_string)
Output:
#C
We can use scan to read the string as a vector of individual elements by splitting at the space, get the frequency count with table, return the named index that have the max count (which.count), get its name
extract_most_frequent_character <- function(x) {
names(which.max(table(scan(text = x, what = '', quiet = TRUE))))
}
-testing
extract_most_frequent_character(test_string)
[1] "C"
Or with strsplit
extract_most_frequent_character <- function(x) {
names(which.max(table(unlist(strsplit(x, "\\s+")))))
}
Here is another base R option (not as elegant as #akrun's answer)
> intToUtf8(names(which.max(table(utf8ToInt(gsub("\\s", "", test_string))))))
[1] "C"
One possibility involving stringr could be:
names(which.max(table(str_extract_all(test_string, "[A-Z]", simplify = TRUE))))
[1] "C"
Or marginally shorter:
names(which.max(table(str_extract_all(test_string, "[A-Z]")[[1]])))
Here is solution using stringr package, table and which:
library(stringr)
test_string <- str_split(test_string, " ")
test_string <- table(test_string)
names(test_string)[which.max(test_string)]
[1] "C"

How can I reverse the words in a string in R without using strsplit? [duplicate]

I'm trying to teach myself R and in doing some sample problems I came across the need to reverse a string.
Here's what I've tried so far but the paste operation doesn't seem to have any effect.
There must be something I'm not understanding about lists? (I also don't understand why I need the [[1]] after strsplit.)
test <- strsplit("greg", NULL)[[1]]
test
# [1] "g" "r" "e" "g"
test_rev <- rev(test)
test_rev
# [1] "g" "e" "r" "g"
paste(test_rev)
# [1] "g" "e" "r" "g"
From ?strsplit, a function that'll reverse every string in a vector of strings:
## a useful function: rev() for strings
strReverse <- function(x)
sapply(lapply(strsplit(x, NULL), rev), paste, collapse="")
strReverse(c("abc", "Statistics"))
# [1] "cba" "scitsitatS"
stringi has had this function for quite a long time:
stringi::stri_reverse("abcdef")
## [1] "fedcba"
Also note that it's vectorized:
stringi::stri_reverse(c("a", "ab", "abc"))
## [1] "a" "ba" "cba"
As #mplourde points out, you want the collapse argument:
paste(test_rev, collapse='')
Most commands in R are vectorized, but how exactly the command handles vectors depends on the command. paste will operate over multiple vectors, combining the ith element of each:
> paste(letters[1:5],letters[1:5])
[1] "a a" "b b" "c c" "d d" "e e"
collapse tells it to operate within a vector instead.
The following can be a useful way to reverse a vector of strings x, and is slightly faster (and more memory efficient) because it avoids generating a list (as in using strsplit):
x <- rep( paste( collapse="", LETTERS ), 100 )
str_rev <- function(x) {
sapply( x, function(xx) {
intToUtf8( rev( utf8ToInt( xx ) ) )
} )
}
str_rev(x)
If you know that you're going to be working with ASCII characters and speed matters, there is a fast C implementation for reversing a vector of strings built into Kmisc:
install.packages("Kmisc")
str_rev(x)
You can also use the IRanges package.
library(IRanges)
x <- "ATGCSDS"
reverse(x)
# [1] "SDSCGTA"
You can also use the Biostrings package.
library(Biostrings)
x <- "ATGCSDS"
reverse(x)
# [1] "SDSCGTA"
If your data is in a data.frame, you can use sqldf:
myStrings <- data.frame(forward = c("does", "this", "actually", "work"))
library(sqldf)
sqldf("select forward, reverse(forward) `reverse` from myStrings")
# forward reverse
# 1 does seod
# 2 this siht
# 3 actually yllautca
# 4 work krow
Here is a function that returns the whole reversed string, or optionally the reverse string keeping only the elements specified by index, counting backward from the last character.
revString = function(string, index = 1:nchar(string)){
paste(rev(unlist(strsplit(string, NULL)))[index], collapse = "")
}
First, define an easily recognizable string as an example:
(myString <- paste(letters, collapse = ""))
[1] "abcdefghijklmnopqrstuvwxyz"
Now try out the function revString with and without the index:
revString(myString)
[1] "zyxwvutsrqponmlkjihgfedcba"
revString(myString, 1:5)
[1] "zyxwv"
The easiest way to reverse string:
#reverse string----------------------------------------------------------------
revString <- function(text){
paste(rev(unlist(strsplit(text,NULL))),collapse="")
}
#example:
revString("abcdef")
You can do with rev() function as mentioned in a previous post.
`X <- "MyString"
RevX <- paste(rev(unlist(strsplit(X,NULL))),collapse="")
Output : "gnirtSyM"
Thanks,
Here's a solution with gsub. Although I agree that it's easier with strsplit and paste (as pointed out in the other answers), it may be interesting to see that it works with regular expressions too:
test <- "greg"
n <- nchar(test) # the number of characters in the string
gsub(paste(rep("(.)", n), collapse = ""),
paste("", seq(n, 1), sep = "\\", collapse = ""),
test)
# [1] "gerg"
##function to reverse the given word or sentence
reverse <- function(mystring){
n <- nchar(mystring)
revstring <- rep(NA, n)
b <- n:1
c <- rev(b)
for (i in 1:n) {
revstring[i] <- substr(mystring,c[(n+1)- i], b[i])
}
newrevstring <- paste(revstring, sep = "", collapse = "")
return (cat("your string =", mystring, "\n",
("reverse letters = "), revstring, "\n",
"reverse string =", newrevstring,"\n"))
}
Here is one more base-R solution:
# Define function
strrev <- function(x) {
nc <- nchar(x)
paste(substring(x, nc:1, nc:1), collapse = "")
}
# Example
strrev("Sore was I ere I saw Eros")
[1] "sorE was I ere I saw eroS"
Solution was inspired by these U. Auckland slides.
The following Code will take input from user and reverse the entire string-
revstring=function(s)
print(paste(rev(strsplit(s,"")[[1]]),collapse=""))
str=readline("Enter the string:")
revstring(str)
So apparently front-end JS developers get asked to do this (for interviews) in JS without using built-in reverse functions. It took me a few minutes, but I came up with:
string <- 'hello'
foo <- vector()
for (i in nchar(string):1) foo <- append(foo,unlist(strsplit(string,''))[i])
paste0(foo,collapse='')
Which all could be wrapped in a function...
What about higher-order functionals? Reduce?

find names that match either of two patters [duplicate]

This question already has answers here:
grep using a character vector with multiple patterns
(11 answers)
Closed 2 years ago.
Is is possible to find the names in a vector that contain either id OR group Or both in the example below?
I have used grepl() without success.
a = c("c-id" = 2, "g_idgroups" = 3, "z+i" = 4)
grepl(c("id", "group"), names(a)) # return name of elements that contain either `id` OR `group` OR both
You can use :
pattern <- c("id", "group")
grep(paste0(pattern, collapse = '|'), names(a), value = TRUE)
#[1] "c-id" "g_igroups"
With grepl you can get logical value
grepl(paste0(pattern, collapse = '|'), names(a))
#[1] TRUE TRUE FALSE
A stringr solution :
stringr::str_subset(names(a), paste0(pattern, collapse = '|'))
#[1] "c-id" "g_igroups"
Using str_detect:
> names(a)[str_detect(names(a), 'id|groups')]
[1] "c-id" "g_idgroups"
> names(a)
[1] "c-id" "g_idgroups" "z+i"
>

Reorder each row from end to beginning in R [duplicate]

I'm trying to teach myself R and in doing some sample problems I came across the need to reverse a string.
Here's what I've tried so far but the paste operation doesn't seem to have any effect.
There must be something I'm not understanding about lists? (I also don't understand why I need the [[1]] after strsplit.)
test <- strsplit("greg", NULL)[[1]]
test
# [1] "g" "r" "e" "g"
test_rev <- rev(test)
test_rev
# [1] "g" "e" "r" "g"
paste(test_rev)
# [1] "g" "e" "r" "g"
From ?strsplit, a function that'll reverse every string in a vector of strings:
## a useful function: rev() for strings
strReverse <- function(x)
sapply(lapply(strsplit(x, NULL), rev), paste, collapse="")
strReverse(c("abc", "Statistics"))
# [1] "cba" "scitsitatS"
stringi has had this function for quite a long time:
stringi::stri_reverse("abcdef")
## [1] "fedcba"
Also note that it's vectorized:
stringi::stri_reverse(c("a", "ab", "abc"))
## [1] "a" "ba" "cba"
As #mplourde points out, you want the collapse argument:
paste(test_rev, collapse='')
Most commands in R are vectorized, but how exactly the command handles vectors depends on the command. paste will operate over multiple vectors, combining the ith element of each:
> paste(letters[1:5],letters[1:5])
[1] "a a" "b b" "c c" "d d" "e e"
collapse tells it to operate within a vector instead.
The following can be a useful way to reverse a vector of strings x, and is slightly faster (and more memory efficient) because it avoids generating a list (as in using strsplit):
x <- rep( paste( collapse="", LETTERS ), 100 )
str_rev <- function(x) {
sapply( x, function(xx) {
intToUtf8( rev( utf8ToInt( xx ) ) )
} )
}
str_rev(x)
If you know that you're going to be working with ASCII characters and speed matters, there is a fast C implementation for reversing a vector of strings built into Kmisc:
install.packages("Kmisc")
str_rev(x)
You can also use the IRanges package.
library(IRanges)
x <- "ATGCSDS"
reverse(x)
# [1] "SDSCGTA"
You can also use the Biostrings package.
library(Biostrings)
x <- "ATGCSDS"
reverse(x)
# [1] "SDSCGTA"
If your data is in a data.frame, you can use sqldf:
myStrings <- data.frame(forward = c("does", "this", "actually", "work"))
library(sqldf)
sqldf("select forward, reverse(forward) `reverse` from myStrings")
# forward reverse
# 1 does seod
# 2 this siht
# 3 actually yllautca
# 4 work krow
Here is a function that returns the whole reversed string, or optionally the reverse string keeping only the elements specified by index, counting backward from the last character.
revString = function(string, index = 1:nchar(string)){
paste(rev(unlist(strsplit(string, NULL)))[index], collapse = "")
}
First, define an easily recognizable string as an example:
(myString <- paste(letters, collapse = ""))
[1] "abcdefghijklmnopqrstuvwxyz"
Now try out the function revString with and without the index:
revString(myString)
[1] "zyxwvutsrqponmlkjihgfedcba"
revString(myString, 1:5)
[1] "zyxwv"
The easiest way to reverse string:
#reverse string----------------------------------------------------------------
revString <- function(text){
paste(rev(unlist(strsplit(text,NULL))),collapse="")
}
#example:
revString("abcdef")
You can do with rev() function as mentioned in a previous post.
`X <- "MyString"
RevX <- paste(rev(unlist(strsplit(X,NULL))),collapse="")
Output : "gnirtSyM"
Thanks,
Here's a solution with gsub. Although I agree that it's easier with strsplit and paste (as pointed out in the other answers), it may be interesting to see that it works with regular expressions too:
test <- "greg"
n <- nchar(test) # the number of characters in the string
gsub(paste(rep("(.)", n), collapse = ""),
paste("", seq(n, 1), sep = "\\", collapse = ""),
test)
# [1] "gerg"
##function to reverse the given word or sentence
reverse <- function(mystring){
n <- nchar(mystring)
revstring <- rep(NA, n)
b <- n:1
c <- rev(b)
for (i in 1:n) {
revstring[i] <- substr(mystring,c[(n+1)- i], b[i])
}
newrevstring <- paste(revstring, sep = "", collapse = "")
return (cat("your string =", mystring, "\n",
("reverse letters = "), revstring, "\n",
"reverse string =", newrevstring,"\n"))
}
Here is one more base-R solution:
# Define function
strrev <- function(x) {
nc <- nchar(x)
paste(substring(x, nc:1, nc:1), collapse = "")
}
# Example
strrev("Sore was I ere I saw Eros")
[1] "sorE was I ere I saw eroS"
Solution was inspired by these U. Auckland slides.
The following Code will take input from user and reverse the entire string-
revstring=function(s)
print(paste(rev(strsplit(s,"")[[1]]),collapse=""))
str=readline("Enter the string:")
revstring(str)
So apparently front-end JS developers get asked to do this (for interviews) in JS without using built-in reverse functions. It took me a few minutes, but I came up with:
string <- 'hello'
foo <- vector()
for (i in nchar(string):1) foo <- append(foo,unlist(strsplit(string,''))[i])
paste0(foo,collapse='')
Which all could be wrapped in a function...
What about higher-order functionals? Reduce?

Matching multiple patterns

I want to see, if "001" or "100" or "000" occurs in a string of 4 characters of 0 and 1. For example, a 4 character string could be like "1100" or "0010" or "1001" or "1111". How do I match many strings in a string with a single command?
I know grep could be used for pattern matching, but using grep, I can check only one string at a time. I want to know if multiple strings can be used with some other command or with grep itself.
Yes, you can. The | in a grep pattern has the same meaning as or. So you can test for your pattern by using "001|100|000" as your pattern. At the same time, grep is vectorised, so all of this can be done in one step:
x <- c("1100", "0010", "1001", "1111")
pattern <- "001|100|000"
grep(pattern, x)
[1] 1 2 3
This returns an index of which of your vectors contained the matching pattern (in this case the first three.)
Sometimes it is more convenient to have a logical vector that tells you which of the elements in your vector were matched. Then you can use grepl:
grepl(pattern, x)
[1] TRUE TRUE TRUE FALSE
See ?regex for help about regular expressions in R.
Edit:
To avoid creating pattern manually we can use paste:
myValues <- c("001", "100", "000")
pattern <- paste(myValues, collapse = "|")
Here is one solution using stringr package
require(stringr)
mylist = c("1100", "0010", "1001", "1111")
str_locate(mylist, "000|001|100")
Use the -e argument to add additional patterns:
echo '1100' | grep -e '001' -e '110' -e '101'
If you want logical vector then you should check stri_detect function from stringi package. In your case the pattern is regex, so use this one:
stri_detect_regex(x, pattern)
## [1] TRUE TRUE TRUE FALSE
And some benchmarks:
require(microbenchmark)
test <- stri_paste(stri_rand_strings(100000, 4, "[0-1]"))
head(test)
## [1] "0001" "1111" "1101" "1101" "1110" "0110"
microbenchmark(stri_detect_regex(test, pattern), grepl(pattern, test))
Unit: milliseconds
expr min lq mean median uq max neval
stri_detect_regex(test, pattern) 29.67405 30.30656 31.61175 30.93748 33.14948 35.90658 100
grepl(pattern, test) 36.72723 37.71329 40.08595 40.01104 41.57586 48.63421 100
Sorry for making this an additonal answer, but it is too many lines for a comment.
I just wanted to remind, that the number of items that can be pasted together via paste(..., collapse = "|") to be used as a single matching pattern is limited - see below. Maybe somebody can tell where exactly the limit is? Admittedly the number might not be realistic, but depending on the task to be performed it should not entirely be excluded from our considerations.
For a really large number of items, a loop would be required to check each item of the pattern.
set.seed(0)
samplefun <- function(n, x, collapse){
paste(sample(x, n, replace=TRUE), collapse=collapse)
}
words <- sapply(rpois(10000000, 8) + 1, samplefun, letters, '')
text <- sapply(rpois(1000, 5) + 1, samplefun, words, ' ')
#since execution takes a while, I have commented out the following lines
#result <- grepl(paste(words, collapse = "|"), text)
# Error in grepl(pattern, text) :
# invalid regular expression
# 'wljtpgjqtnw|twiv|jphmer|mcemahvlsjxr|grehqfgldkgfu|
# ...
#result <- stringi::stri_detect_regex(text, paste(words, collapse = "|"))
# Error in stringi::stri_detect_regex(text, paste(words, collapse = "|")) :
# Pattern exceeds limits on size or complexity. (U_REGEX_PATTERN_TOO_BIG)
You can also use the %like% operator from data.table library.
library(data.table)
# input
x <- c("1100", "0010", "1001", "1111")
pattern <- "001|100|000"
# check for pattern
x %like% pattern
> [1] TRUE TRUE TRUE FALSE

Resources