Exact match on word in middle of string in R [duplicate] - r

This question already has answers here:
Using regex in R to find strings as whole words (but not strings as part of words)
(2 answers)
Closed 1 year ago.
I referred this question (How to filter Exact match string using dplyr) but mine is slightly different as the word is not the start but can occur anywhere in the string. I want TRUE to be returned only for first one not the second & third
library(stringr)
vec <- c("this should be selected", "thisus should not be selected","not selected thisis too")
str_detect(vec,"this")
Current output
TRUE TRUE TRUE
Expected output
TRUE FALSE FALSE

Use a word boundary (\\b)
stringr::str_detect(vec,"\\bthis\\b")
#[1] TRUE FALSE FALSE
In base R :
grepl('\\bthis\\b', vec)

Related

Select strings from list where logical value is TRUE [duplicate]

This question already has answers here:
List distinct values in a vector in R
(7 answers)
Closed 2 years ago.
I would like to extract duplicated strings from a list. As, the unique function does not work on non-numerical data, I used the stringi package with the stri_duplicated function to obtain logical values (TRUE or FALSE). I would like to extract the strings that are duplicated from the list (the strings for which stri_duplicated reports a TRUE).
Here a minimal example:
ex1 <- c("SE1", "SE2", "SE5", "SE2")
dupl <- stri_duplicated(ex1)
> dupl
[1] FALSE FALSE FALSE TRUE
Many thanks in advance.
In base-R there is
duplicated(ex1)
[1] FALSE FALSE FALSE TRUE
if you want to extract the duplicated items
ex1[duplicated(ex1)]
[1] "SE2"

extract regular expressions from collapsed characters (including "|") [duplicate]

This question already has answers here:
Find the words in list of strings
(1 answer)
Regular expression pipe confusion
(5 answers)
Closed 3 years ago.
I want to detect (and then extract) month names from text using str_detect and str_extract.
For this, I create an object containing all month names and abbreviations.
m <- paste(c(month.name, month.abb), collapse = "|")
> m
[1] "January|February|March|April|May|June|July|August|September|October|November|December|Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec"
Then, I want to detect any of the entries occurring as a single word (surrounded by word boundaries):
stringr::str_detect(c("inJan", "Jan"), str_glue("\\b{m}\\b"))
This, however, returns TRUE TRUE (I expect FALSE TRUE, as the first is not a single word.
I suspect this is due to the collapsing of the list, as stringr::str_detect(c("inJan", "Jan"), str_glue("\\bJan\\b")) returns the expected FALSE TRUE.
I need to detect occurrences of m, however. What's the best way to go about this?

Matching a single character within a string in R [duplicate]

This question already has answers here:
grepl for a period "." in R?
(2 answers)
Closed 3 years ago.
Im assert if there is a "." in a string in R, but grepl always comes back false. Can anyone explain where im going wrong?
Here is my code:
grepl("testtxt",".")
[1] FALSE
grepl("test.txt",".")
[1] FALSE
We need either fixed = TRUE
grepl("test.txt", pattern = ".", fixed = TRUE)
#[1] TRUE
NOTE: pattern is the first argument of grep/grepl If we specify it in different order, make sure to name the parameter
or escape (\\.) the . as . is a metacharacter that matches any character

Regex for "is every character in string numeric?" [duplicate]

This question already has answers here:
Regex to check whether a string contains only numbers [duplicate]
(21 answers)
Check if string contains ONLY NUMBERS or ONLY CHARACTERS (R)
(3 answers)
Closed 4 years ago.
I'm wondering what regex pattern I could use to detect if every character in a string is numeric.
So, here's what I'm thinking that isn't working:
stringr::str_detect("311apple", "[0-9]")
#> [1] TRUE
That statement is TRUE because there exists a numeric character in that string, but I'm trying to find a pattern so that it's only TRUE if every character is numeric.
Any ideas? Thanks!

Extract Last Upper cases from a string [duplicate]

This question already has answers here:
Extract the last word between | |
(5 answers)
Closed 4 years ago.
I am practicing with regular expressions in R.
I would like to extract the last occurrence of two upper case letters.
I tried
>str_extract("kjhdjkaYY,","[:upper:][:upper:]")
[1] "YY"
And it works perfectly fine. What if I would like to extract the last occurrence of such pattern. Example:
function("kKKjhdjkaYY,")
[1] "YY"
Thank you for your help
We can use stri_extract_last_regex from stringi package
library(stringi)
stri_extract_last_regex("AAkjhdjkaYY,","[:upper:][:upper:]")
#[1] "YY"
Or if you want to stick with stringr, we can extract all the groups which match the pattern and then get the last one using tail
library(stringr)
tail(str_extract_all("AAkjhdjkaYY,","[:upper:][:upper:]")[[1]], 1)
#[1] "YY"

Resources