How to determine if a string "ends with" another string in R?

How to determine if a string "ends with" another string in R? - r

I want to filter out the rows of a table which contain '*' in the string value of the column. Checking just that column.
string_name = c("aaaaa", "bbbbb", "ccccc", "dddd*", "eee*eee")
zz <- sapply(tx$variant_full_name, function(x) {substrRight(x, -1) =="*"})
Error in FUN(c("Agno I30N", "VP2 E17Q", "VP2 I204*", "VP3 I85F", "VP1 K73R", :
could not find function "substrRight"
The 4th value of zz should be TRUE by this.
in python there is endswith function for strings [ string_s.endswith('*') ]
Is there something similar to that in R ?
Also, is it problem because of '*' as a character as it means any character ? grepl also not working.
> grepl("*^",'dddd*')
[1] TRUE
> grepl("*^",'dddd')
[1] TRUE

Base now contains startsWith and endsWith. Thus the OP's question can be answered with endsWith:
> string_name = c("aaaaa", "bbbbb", "ccccc", "dddd*", "eee*eee")
> endsWith(string_name, '*')
[1] FALSE FALSE FALSE TRUE FALSE
This is much faster than substring(string_name, nchar(string_name)) == '*'.

* is a quantifier in regular expressions. It tells the regular expression engine to attempt to match the preceding token "zero or more times". To match a literal, you need to precede it with two backslashes or place inside of a character class [*]. To check if the string ends with a specific pattern, use the end of string $ anchor.
> grepl('\\*$', c('aaaaa', 'bbbbb', 'ccccc', 'dddd*', 'eee*eee'))
# [1] FALSE FALSE FALSE TRUE FALSE
You can simply do this without implementing a regular expression in base R:
> x <- c('aaaaa', 'bbbbb', 'ccccc', 'dddd*', 'eee*eee')
> substr(x, nchar(x)-1+1, nchar(x)) == '*'
# [1] FALSE FALSE FALSE TRUE FALSE

This is simple enough that you don't need regular expressions.
> string_name = c("aaaaa", "bbbbb", "ccccc", "dddd*", "eee*eee")
> substring(string_name, nchar(string_name)) == "*"
[1] FALSE FALSE FALSE TRUE FALSE

I use something like this:
strEndsWith <- function(haystack, needle)
{
hl <- nchar(haystack)
nl <- nchar(needle)
if(nl>hl)
{
return(F)
} else
{
return(substr(haystack, hl-nl+1, hl) == needle)
}
}

here is a tidyverse solution:
string_name = c("aaaaa", "bbbbb", "ccccc", "dddd*", "eee*eee")
str_sub(string_name, -1) == "*"
[1] FALSE FALSE FALSE TRUE FALSE
It has the benefits of being much more readable and can also be changed easily if a different location needs to be checked.

Related

Pattern matching in R if string NOT followed but another string

I am trying to match the following in R using str_detect from the stringr package.
I want to to detect if a given string if followed or preceeded by 'and' or '&'. For example, in:
string_1<-"A and B"
string_2<-"A B"
string_3<-"B and A"
string_4<-"A B and C"
I want str_detect(string_X) to be FALSE for string_1, string_3 and string_4 but TRUE for string_2.
I have tried:
str_detect(string_X,paste0(".*(?<!and |& )","A"))==TRUE & str_detect(string_X,paste0(".*","A","(?! and| &).*"))==TRUE)
I use paste0 because I want to run this over different strings. This works all the cases above except 4. I am new to regex, and it also does not seem very elegant. Is there a more general solution?
Thank you.

First let's combine your four strings into a single vector:
strings <- c(string_1, string_2, string_3, string_4)
Now using
library(stringr)
str_detect(strings, "(A|B)(?=\\s(and|&))", negate = TRUE)
we look for "A" or "B" followed by "and" or "&". So this returns
#> [1] FALSE TRUE FALSE FALSE
You could wrap it into a function:
detector <- function(letters, strings) {
pattern <- paste0("(", paste0(letters, collapse = "|"), ")(?=\\s(and|&))")
str_detect(strings, pattern, negate = TRUE)
}
detector(c("A", "B"), strings)
#> [1] FALSE TRUE FALSE FALSE
detector(c("A"), strings)
#> [1] FALSE TRUE TRUE TRUE
detector(c("B"), strings)
#> [1] TRUE TRUE FALSE FALSE
detector(c("C"), strings)
#> [1] TRUE TRUE TRUE TRUE

You can use a positive lookahead assertion to make sure that there is no A or B present followed by and or & and also not in the other order.
^(?!.*[AB] (?:and|&))(?!.*(?:and|&) [AB])
^ Start of string
(?!.*[AB] (?:and|&)) Assert that the string does not contain A or B followed by either and or &
(?!.*(?:and|&) [AB]) Assert that the string does not contain either and or & followed by either A or B
Regex demo | R demo
library(stringr)
string_1<-"A and B"
string_2<-"A B"
string_3<-"B and A"
string_4<-"A B and C"
string_5<-"& B"
strings <- c(string_1, string_2, string_3, string_4, string_5)
str_detect(strings, "^(?!.*[AB] (?:and|&))(?!.*(?:and|&) [AB])")
Output
[1] FALSE TRUE FALSE FALSE FALSE

how do you do the equivalent of Excel's AND() and OR() operations in R?

drives_DF$block_device == ""
[1] TRUE TRUE TRUE FALSE TRUE
How do I reduce this down to a single FALSE like doing an AND() in Excel?
How do I reduce this down to a single TRUE like doing an OR() in Excel?

Wrapping your code with all() will return TRUE if all evaluated elements are TRUE
all(drives_DF$block_device == "")
[1] FALSE
Wrapping your code with any() will return TRUE if at least one of the evaluated elements is TRUE
any(drives_DF$block_device == "")
[1] TRUE

You can use any and all functions available in R to get the required like this:
#Considering a vector of boolean values
boolVector = c(F,T,F,T,F)
print(all(boolVector, na.rm = FALSE)) #AND OPERATION
print(any(boolVector, na.rm = FALSE)) #OR OPERATION
The output of the print statements are:
[1] FALSE
[1] TRUE

R - Find if characters of a vector are in another vector

I have a doubt very similar to this topic here: Find matches of a vector of strings in another vector of strings.
I have a vector of clients, and if the name indicates that is a commercial client, I need to change the type in my data.frame.
So, suppose that:
commercial_names <- c("BAKERY","MARKET", "SCHOOL", "CINEMA")
clients <- c("JOHN XX","REESE YY","BAKERY ZZ","SAMANTHA WW")
I tried the code in the topic cited before, but I had an error:
> grepl(paste(commercial_names, collape="|"),clients)
[1] TRUE TRUE TRUE TRUE
Warning message:
In grepl(paste(commercial_names, collape = "|"), clients) :
argument 'pattern' has length > 1 and only the first element will be used
What am I doing wrong? I would thank any help.

Your code is correct but for a typo:
grepl(paste0(commercial_names, collapse = "|"), clients) # typo: collape
[1] FALSE FALSE TRUE FALSE
Given the typo, the commercial_names are not collapsed.

Not sure how to do this with a one-liner but a loop seems to do the trick.
sapply(clients, function(client) {
any(str_detect(client, commercial_names))
})
> JOHN XX REESE YY BAKERY ZZ SAMANTHA WW
> FALSE FALSE TRUE FALSE

I found another way of to do this, with the command %like% of package data.table:
> clients %like% paste(commercial_names,collapse = "|")
[1] FALSE FALSE TRUE FALSE

You can do something like this too:
clients.first <- gsub(" ..", "", clients)
clients.first %in% commercial_names
This returns:
[1] FALSE FALSE TRUE FALSE
You might need to change the regular expression for gsub if your clients data is more heterogeneous though.

Full word match with grepl

I would like to have TRUE FALSE instead of the following. Any suggestion?
testLines <- c("buried","medium-buried")
grepl('\\<buried\\>',testLines)
[1] TRUE TRUE

Perhaps this?
testLines <- c("buried","medium-buried")
grepl('^buried$',testLines)
#[1] TRUE FALSE
My understanding (and regex is not my forte) is that ^ denotes the start of the string and $ the end.

How to check if the value is numeric?

Can someone help me modify the function below to check if a number is numeric?
# handy function that checks if something is numeric
check.numeric <- function(N){
!length(grep("[^[:digit:]]", as.character(N)))
}
check.numeric(3243)
#TRUE
check.numeric("sdds")
#FALSE
check.numeric(3.14)
#FALSE
I want check.numeric() to return TRUE when it's a decimal like 3.14.

You could use is.finite to test whether the value is numeric and non-NA. This will work for numeric, integer, and complex values (if both real/imaginary parts are finite).
> is.finite(NA)
[1] FALSE
> is.finite(NaN)
[1] FALSE
> is.finite(Inf)
[1] FALSE
> is.finite(1L)
[1] TRUE
> is.finite(1.0)
[1] TRUE
> is.finite("A")
[1] FALSE
> is.finite(pi)
[1] TRUE
> is.finite(1+0i)
[1] TRUE

Sounds like you want a function like this:
f <- function(x) is.numeric(x) & !is.na(x)

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

How to determine if a string "ends with" another string in R? - r

Base now contains startsWith and endsWith. Thus the OP's question can be answered with endsWith: > string_name = c("aaaaa", "bbbbb", "ccccc", "dddd", "eeeeee") > endsWith(string_name, '') [1] FALSE FALSE FALSE TRUE FALSE This is much faster than substring(string_name, nchar(string_name)) == ''.

This is simple enough that you don't need regular expressions. > string_name = c("aaaaa", "bbbbb", "ccccc", "dddd", "eeeeee") > substring(string_name, nchar(string_name)) == "*" [1] FALSE FALSE FALSE TRUE FALSE

I use something like this: strEndsWith <- function(haystack, needle) { hl <- nchar(haystack) nl <- nchar(needle) if(nl>hl) { return(F) } else { return(substr(haystack, hl-nl+1, hl) == needle) } }

here is a tidyverse solution: string_name = c("aaaaa", "bbbbb", "ccccc", "dddd", "eeeeee") str_sub(string_name, -1) == "*" [1] FALSE FALSE FALSE TRUE FALSE It has the benefits of being much more readable and can also be changed easily if a different location needs to be checked.

Related

Pattern matching in R if string NOT followed but another string

how do you do the equivalent of Excel's AND() and OR() operations in R?

R - Find if characters of a vector are in another vector

Full word match with grepl

How to check if the value is numeric?

Categories

Resources

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

How to determine if a string "ends with" another string in R? - r

Base now contains startsWith and endsWith. Thus the OP's question can be answered with endsWith: > string_name = c("aaaaa", "bbbbb", "ccccc", "dddd*", "eee*eee") > endsWith(string_name, '*') [1] FALSE FALSE FALSE TRUE FALSE This is much faster than substring(string_name, nchar(string_name)) == '*'.

This is simple enough that you don't need regular expressions. > string_name = c("aaaaa", "bbbbb", "ccccc", "dddd*", "eee*eee") > substring(string_name, nchar(string_name)) == "*" [1] FALSE FALSE FALSE TRUE FALSE

I use something like this: strEndsWith <- function(haystack, needle) { hl <- nchar(haystack) nl <- nchar(needle) if(nl>hl) { return(F) } else { return(substr(haystack, hl-nl+1, hl) == needle) } }

here is a tidyverse solution: string_name = c("aaaaa", "bbbbb", "ccccc", "dddd*", "eee*eee") str_sub(string_name, -1) == "*" [1] FALSE FALSE FALSE TRUE FALSE It has the benefits of being much more readable and can also be changed easily if a different location needs to be checked.

Related

Pattern matching in R if string NOT followed but another string

how do you do the equivalent of Excel's AND() and OR() operations in R?

R - Find if characters of a vector are in another vector

Full word match with grepl

How to check if the value is numeric?

Categories

Resources

Base now contains startsWith and endsWith. Thus the OP's question can be answered with endsWith: > string_name = c("aaaaa", "bbbbb", "ccccc", "dddd", "eeeeee") > endsWith(string_name, '') [1] FALSE FALSE FALSE TRUE FALSE This is much faster than substring(string_name, nchar(string_name)) == ''.

This is simple enough that you don't need regular expressions. > string_name = c("aaaaa", "bbbbb", "ccccc", "dddd", "eeeeee") > substring(string_name, nchar(string_name)) == "*" [1] FALSE FALSE FALSE TRUE FALSE

here is a tidyverse solution: string_name = c("aaaaa", "bbbbb", "ccccc", "dddd", "eeeeee") str_sub(string_name, -1) == "*" [1] FALSE FALSE FALSE TRUE FALSE It has the benefits of being much more readable and can also be changed easily if a different location needs to be checked.