How to determine if a string "ends with" another string in R? - r

I want to filter out the rows of a table which contain '*' in the string value of the column. Checking just that column.
string_name = c("aaaaa", "bbbbb", "ccccc", "dddd*", "eee*eee")
zz <- sapply(tx$variant_full_name, function(x) {substrRight(x, -1) =="*"})
Error in FUN(c("Agno I30N", "VP2 E17Q", "VP2 I204*", "VP3 I85F", "VP1 K73R", :
could not find function "substrRight"
The 4th value of zz should be TRUE by this.
in python there is endswith function for strings [ string_s.endswith('*') ]
Is there something similar to that in R ?
Also, is it problem because of '*' as a character as it means any character ? grepl also not working.
> grepl("*^",'dddd*')
[1] TRUE
> grepl("*^",'dddd')
[1] TRUE

Base now contains startsWith and endsWith. Thus the OP's question can be answered with endsWith:
> string_name = c("aaaaa", "bbbbb", "ccccc", "dddd*", "eee*eee")
> endsWith(string_name, '*')
[1] FALSE FALSE FALSE TRUE FALSE
This is much faster than substring(string_name, nchar(string_name)) == '*'.

* is a quantifier in regular expressions. It tells the regular expression engine to attempt to match the preceding token "zero or more times". To match a literal, you need to precede it with two backslashes or place inside of a character class [*]. To check if the string ends with a specific pattern, use the end of string $ anchor.
> grepl('\\*$', c('aaaaa', 'bbbbb', 'ccccc', 'dddd*', 'eee*eee'))
# [1] FALSE FALSE FALSE TRUE FALSE
You can simply do this without implementing a regular expression in base R:
> x <- c('aaaaa', 'bbbbb', 'ccccc', 'dddd*', 'eee*eee')
> substr(x, nchar(x)-1+1, nchar(x)) == '*'
# [1] FALSE FALSE FALSE TRUE FALSE

This is simple enough that you don't need regular expressions.
> string_name = c("aaaaa", "bbbbb", "ccccc", "dddd*", "eee*eee")
> substring(string_name, nchar(string_name)) == "*"
[1] FALSE FALSE FALSE TRUE FALSE

I use something like this:
strEndsWith <- function(haystack, needle)
{
hl <- nchar(haystack)
nl <- nchar(needle)
if(nl>hl)
{
return(F)
} else
{
return(substr(haystack, hl-nl+1, hl) == needle)
}
}

here is a tidyverse solution:
string_name = c("aaaaa", "bbbbb", "ccccc", "dddd*", "eee*eee")
str_sub(string_name, -1) == "*"
[1] FALSE FALSE FALSE TRUE FALSE
It has the benefits of being much more readable and can also be changed easily if a different location needs to be checked.

Related

Pattern matching in R if string NOT followed but another string

I am trying to match the following in R using str_detect from the stringr package.
I want to to detect if a given string if followed or preceeded by 'and' or '&'. For example, in:
string_1<-"A and B"
string_2<-"A B"
string_3<-"B and A"
string_4<-"A B and C"
I want str_detect(string_X) to be FALSE for string_1, string_3 and string_4 but TRUE for string_2.
I have tried:
str_detect(string_X,paste0(".*(?<!and |& )","A"))==TRUE & str_detect(string_X,paste0(".*","A","(?! and| &).*"))==TRUE)
I use paste0 because I want to run this over different strings. This works all the cases above except 4. I am new to regex, and it also does not seem very elegant. Is there a more general solution?
Thank you.
First let's combine your four strings into a single vector:
strings <- c(string_1, string_2, string_3, string_4)
Now using
library(stringr)
str_detect(strings, "(A|B)(?=\\s(and|&))", negate = TRUE)
we look for "A" or "B" followed by "and" or "&". So this returns
#> [1] FALSE TRUE FALSE FALSE
You could wrap it into a function:
detector <- function(letters, strings) {
pattern <- paste0("(", paste0(letters, collapse = "|"), ")(?=\\s(and|&))")
str_detect(strings, pattern, negate = TRUE)
}
detector(c("A", "B"), strings)
#> [1] FALSE TRUE FALSE FALSE
detector(c("A"), strings)
#> [1] FALSE TRUE TRUE TRUE
detector(c("B"), strings)
#> [1] TRUE TRUE FALSE FALSE
detector(c("C"), strings)
#> [1] TRUE TRUE TRUE TRUE
You can use a positive lookahead assertion to make sure that there is no A or B present followed by and or & and also not in the other order.
^(?!.*[AB] (?:and|&))(?!.*(?:and|&) [AB])
^ Start of string
(?!.*[AB] (?:and|&)) Assert that the string does not contain A or B followed by either and or &
(?!.*(?:and|&) [AB]) Assert that the string does not contain either and or & followed by either A or B
Regex demo | R demo
library(stringr)
string_1<-"A and B"
string_2<-"A B"
string_3<-"B and A"
string_4<-"A B and C"
string_5<-"& B"
strings <- c(string_1, string_2, string_3, string_4, string_5)
str_detect(strings, "^(?!.*[AB] (?:and|&))(?!.*(?:and|&) [AB])")
Output
[1] FALSE TRUE FALSE FALSE FALSE

how do you do the equivalent of Excel's AND() and OR() operations in R?

drives_DF$block_device == ""
[1] TRUE TRUE TRUE FALSE TRUE
How do I reduce this down to a single FALSE like doing an AND() in Excel?
How do I reduce this down to a single TRUE like doing an OR() in Excel?
Wrapping your code with all() will return TRUE if all evaluated elements are TRUE
all(drives_DF$block_device == "")
[1] FALSE
Wrapping your code with any() will return TRUE if at least one of the evaluated elements is TRUE
any(drives_DF$block_device == "")
[1] TRUE
You can use any and all functions available in R to get the required like this:
#Considering a vector of boolean values
boolVector = c(F,T,F,T,F)
print(all(boolVector, na.rm = FALSE)) #AND OPERATION
print(any(boolVector, na.rm = FALSE)) #OR OPERATION
The output of the print statements are:
[1] FALSE
[1] TRUE

R - Find if characters of a vector are in another vector

I have a doubt very similar to this topic here: Find matches of a vector of strings in another vector of strings.
I have a vector of clients, and if the name indicates that is a commercial client, I need to change the type in my data.frame.
So, suppose that:
commercial_names <- c("BAKERY","MARKET", "SCHOOL", "CINEMA")
clients <- c("JOHN XX","REESE YY","BAKERY ZZ","SAMANTHA WW")
I tried the code in the topic cited before, but I had an error:
> grepl(paste(commercial_names, collape="|"),clients)
[1] TRUE TRUE TRUE TRUE
Warning message:
In grepl(paste(commercial_names, collape = "|"), clients) :
argument 'pattern' has length > 1 and only the first element will be used
What am I doing wrong? I would thank any help.
Your code is correct but for a typo:
grepl(paste0(commercial_names, collapse = "|"), clients) # typo: collape
[1] FALSE FALSE TRUE FALSE
Given the typo, the commercial_names are not collapsed.
Not sure how to do this with a one-liner but a loop seems to do the trick.
sapply(clients, function(client) {
any(str_detect(client, commercial_names))
})
> JOHN XX REESE YY BAKERY ZZ SAMANTHA WW
> FALSE FALSE TRUE FALSE
I found another way of to do this, with the command %like% of package data.table:
> clients %like% paste(commercial_names,collapse = "|")
[1] FALSE FALSE TRUE FALSE
You can do something like this too:
clients.first <- gsub(" ..", "", clients)
clients.first %in% commercial_names
This returns:
[1] FALSE FALSE TRUE FALSE
You might need to change the regular expression for gsub if your clients data is more heterogeneous though.

Full word match with grepl

I would like to have TRUE FALSE instead of the following. Any suggestion?
testLines <- c("buried","medium-buried")
grepl('\\<buried\\>',testLines)
[1] TRUE TRUE
Perhaps this?
testLines <- c("buried","medium-buried")
grepl('^buried$',testLines)
#[1] TRUE FALSE
My understanding (and regex is not my forte) is that ^ denotes the start of the string and $ the end.

How to check if the value is numeric?

Can someone help me modify the function below to check if a number is numeric?
# handy function that checks if something is numeric
check.numeric <- function(N){
!length(grep("[^[:digit:]]", as.character(N)))
}
check.numeric(3243)
#TRUE
check.numeric("sdds")
#FALSE
check.numeric(3.14)
#FALSE
I want check.numeric() to return TRUE when it's a decimal like 3.14.
You could use is.finite to test whether the value is numeric and non-NA. This will work for numeric, integer, and complex values (if both real/imaginary parts are finite).
> is.finite(NA)
[1] FALSE
> is.finite(NaN)
[1] FALSE
> is.finite(Inf)
[1] FALSE
> is.finite(1L)
[1] TRUE
> is.finite(1.0)
[1] TRUE
> is.finite("A")
[1] FALSE
> is.finite(pi)
[1] TRUE
> is.finite(1+0i)
[1] TRUE
Sounds like you want a function like this:
f <- function(x) is.numeric(x) & !is.na(x)

Resources