Matching a single character within a string in R [duplicate] - r

This question already has answers here:
grepl for a period "." in R?
(2 answers)
Closed 3 years ago.
Im assert if there is a "." in a string in R, but grepl always comes back false. Can anyone explain where im going wrong?
Here is my code:
grepl("testtxt",".")
[1] FALSE
grepl("test.txt",".")
[1] FALSE

We need either fixed = TRUE
grepl("test.txt", pattern = ".", fixed = TRUE)
#[1] TRUE
NOTE: pattern is the first argument of grep/grepl If we specify it in different order, make sure to name the parameter
or escape (\\.) the . as . is a metacharacter that matches any character

Related

Regex: extracting matches preceding a pattern in R [duplicate]

This question already has answers here:
Remove part of string after "."
(6 answers)
Extract string before "|" [duplicate]
(3 answers)
Closed 1 year ago.
I'm trying to extract matches preceding a pattern in R. Lets say that I have a vector consisting of the next elements:
my_vector
> [1] "ABCC12|94160" "ABCC13|150000" "ABCC1|4363" "ACTA1|58"
[5] "ADNP2|22850" "ADNP|23394" "ARID1B|57492" "ARID2|196528"
I'm looking for a regular expression to extract all characters preceding the "|". The expected result must be something like this:
my_new_vector
> [1] "ABCC12" "ABCC13" "ABCC1" "ACTA1"
and so on.
I have already tried using stringr functions and regular expressions based on look arounds, but I failed.
I really appreciate your advices and help to solve my issue.
Thanks in advance!
We could use trimws and specify the whitespace as a regex that matches the | (metacharacter - so escape \\ followed by one or more character (.*)
trimws(my_vector, whitespace = "\\|.*")

Exact match on word in middle of string in R [duplicate]

This question already has answers here:
Using regex in R to find strings as whole words (but not strings as part of words)
(2 answers)
Closed 1 year ago.
I referred this question (How to filter Exact match string using dplyr) but mine is slightly different as the word is not the start but can occur anywhere in the string. I want TRUE to be returned only for first one not the second & third
library(stringr)
vec <- c("this should be selected", "thisus should not be selected","not selected thisis too")
str_detect(vec,"this")
Current output
TRUE TRUE TRUE
Expected output
TRUE FALSE FALSE
Use a word boundary (\\b)
stringr::str_detect(vec,"\\bthis\\b")
#[1] TRUE FALSE FALSE
In base R :
grepl('\\bthis\\b', vec)

extract regular expressions from collapsed characters (including "|") [duplicate]

This question already has answers here:
Find the words in list of strings
(1 answer)
Regular expression pipe confusion
(5 answers)
Closed 3 years ago.
I want to detect (and then extract) month names from text using str_detect and str_extract.
For this, I create an object containing all month names and abbreviations.
m <- paste(c(month.name, month.abb), collapse = "|")
> m
[1] "January|February|March|April|May|June|July|August|September|October|November|December|Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec"
Then, I want to detect any of the entries occurring as a single word (surrounded by word boundaries):
stringr::str_detect(c("inJan", "Jan"), str_glue("\\b{m}\\b"))
This, however, returns TRUE TRUE (I expect FALSE TRUE, as the first is not a single word.
I suspect this is due to the collapsing of the list, as stringr::str_detect(c("inJan", "Jan"), str_glue("\\bJan\\b")) returns the expected FALSE TRUE.
I need to detect occurrences of m, however. What's the best way to go about this?

Regex for "is every character in string numeric?" [duplicate]

This question already has answers here:
Regex to check whether a string contains only numbers [duplicate]
(21 answers)
Check if string contains ONLY NUMBERS or ONLY CHARACTERS (R)
(3 answers)
Closed 4 years ago.
I'm wondering what regex pattern I could use to detect if every character in a string is numeric.
So, here's what I'm thinking that isn't working:
stringr::str_detect("311apple", "[0-9]")
#> [1] TRUE
That statement is TRUE because there exists a numeric character in that string, but I'm trying to find a pattern so that it's only TRUE if every character is numeric.
Any ideas? Thanks!

Remove characters in string before specific symbol(including it) [duplicate]

This question already has answers here:
My regex is matching too much. How do I make it stop? [duplicate]
(5 answers)
Use gsub remove all string before first white space in R
(4 answers)
Closed 5 years ago.
at the beginning, yes - simillar questions are present here, however the solution doesn't work as it should - at least for me.
I'd like to remove all characters, letters and numbers with any combination before first semicolon, and also remove it too.
So we have some strings:
x <- "1;ABC;GEF2"
y <- "X;EER;3DR"
Let's do so gsub() with . and * which means any symbol with occurance 0 or more:
gsub(".*;", "", x)
gsub(".*;", "", y)
And as a result i get:
[1] "GEF2"
[1] "3DR"
But I'd like to have:
[1] "ABC;GEF2"
[1] "EER;3DR"
Why did it 'catch' second occurence of semicolon instead of first?
You could use
gsub("[^;]*;(.*)", "\\1", x)
# [1] "ABC;GEF2"

Resources