Delete initial (matching) pattern from a string [duplicate] - r

This question already has answers here:
Regular expression to match characters at beginning of line only
(8 answers)
Closed 2 years ago.
I have the following vector of character strings:
v<-c("RT #name1: hello world", "Hi guys, how are you?", "Hello RT I have no text", "RT #name2: Hello!")
I would like to delete only those RT that are positioned at the beginning of strings and store the results in another vector, e.g., w:
> w
"#name1: hello world" "Hi guys, how are you?" "Hello RT I have no text" "#name2: Hello!"
Maybe I could use function str_extract_all from the package stringr, but I can't apply it to my problem.

Use gsub and the 'anchor' ^, which signifies the beginning of a string:
w <- gsub("^RT\\s", "", v)

<- str_replace(v,"^RT","")

Related

R: Regex for Phone Numbers [duplicate]

This question already has an answer here:
How to use regex character class extensions in R?
(1 answer)
Closed 3 months ago.
I am working with the R programming language.
I have a column of data that looks something like this:
string = c("a1 123-456-7899 hh", "b 124-123-9999 b3")
I would like to remove the "phone numbers" so that the final result looks like this:
[1] "a1 hh" "b b3"
I tried to apply the answer provided here Regular expression to match standard 10 digit phone number to my question:
gsub("^(\+\d{1,2}\s)?\(?\d{3}\)?[\s.-]\d{3}[\s.-]\d{4}$", "", string, fixed = TRUE)
But I get the following error: Error: '\+' is an unrecognized escape in character string starting ""^(\+"
Can someone please show me how to fix this?
Thanks!
Try:
library(stringr)
s <- c("a1 123-456-7899 hh", "b 124-123-9999 b3")
result <- str_replace(s, "\\d+[-]\\d+[-]\\d+\\s", "")
print(result)
OUTPUT:
[1] "a1 hh" "b b3"
This will look for :
\\d+ : one or more digits, followed by
[-] : a hyphen, followed by
\\d+ : one or more digits, followed by
[-] : a hyphen, followed by
\\d+ : one or more digits, followed by
\\s : a space
And replace it with "" - nothing

Extraxt substring until "?" with sub() [duplicate]

This question already has answers here:
How do I deal with special characters like \^$.?*|+()[{ in my regex?
(2 answers)
Closed last year.
So, I want to extract the substring of a string like this
mystr <- "aa/bb/cc?rest"
I found the sub() function but executing sub("?.*", "", mystr) returns "" instead of "aa/bb/cc".
Why?
The reason is obviousyl because of ? being a special character but using backticks or "\?" doesn't solve this problem.
You need double \ for escaping:
> mystr <- "aa/bb/cc?rest"
> sub("\?.*", "", mystr)
Error: '\?' is an unrecognized escape in character string starting ""\?"
> sub("\\?.*", "", mystr)
[1] "aa/bb/cc"

Unexpected regex behavior when executed in base R [duplicate]

This question already has answers here:
Extract a regular expression match
(12 answers)
Closed 1 year ago.
As per this link, I wrote a regex that does not give the expected result when executed for a specific string in R:
string <- "0,9% BB"
regex <- "^ ?\\d+[\\d ,\\.]*[B-DF-HJ-NP-TV-Z\\/]*%?"
grep(regex, string, value = T, perl = T)
The result output is
[1] "0,9% BB"
instead of the desired (and outputed by the link)
[1] "0,9%"
What am I missing to get the desired output? Preferably base R, please.
This returns "0,9%" using only base R
string <- "0,9% BB"
regex <- "^ ?\\d+[\\d ,\\.]*[B-DF-HJ-NP-TV-Z\\/]*%?"
regmatches(x = string, m = regexpr(regex,string,perl = TRUE))

Extract Text Starting and Ending with Punctuations in R [duplicate]

This question already has answers here:
Extracting a string between other two strings in R
(4 answers)
Closed 3 years ago.
I want to extract a group of strings between two punctuations using RStudio.
I tried to use str_extract command, but whenever I tried to use anchors (^ for starting char, and $ for ending char), it failed.
Here is the sample problem:
> text <- "Name : Dr. CHARLES DOWNING MAP ; POB : London; Age/DOB : 53 years / August 05, 1958;"
Here is the sample code I used:
> str_extract(text,"(Name : )(.+)?( ;)")
> str_match(str_extract(text,"(Name : )(.+)?( ;)"),"(Name : )(.+)?( ;)")[3]
But it seemed too verbose, and not flexible.
I only want to extract "Dr. CHARLES DOWNING MAP".
Anyone can help with my problem?
Can I tell the regex to start with any non-white-space character after "Name : " and ends before " ; POB"?
This seems to work.
> gsub(".*Name :(.*) ;.*", "\\1", text)
[1] " Dr. CHARLES DOWNING MAP"
With str_match
stringr::str_match(text, "^Name : (.*) ;")[, 2]
#[1] "Dr. CHARLES DOWNING MAP"
[, 2] is to get the contents from the capture group.
There is also qdapRegex::ex_between to extract string between left and right markers
qdapRegex::ex_between(text, "Name : ", ";")[[1]]
#[1] "Dr. CHARLES DOWNING MAP"

R: compare and subset two strings [duplicate]

This question already has answers here:
Replace specific characters within strings
(7 answers)
Closed 7 years ago.
Is there a function in R that can respond at these requirements:
if string1 exists in string2 then remove string1 from string2
I passed a day searching on a such function. So, any help would be appreciated.
Edit:
I have a dataframe. Here's a part of it:
mark name ChekMark
Caudalie Caudalie Eau démaquillante 200ml TRUE
Mustela Mustela Bébé lait hydra corps 300ml TRUE
Lierac Lierac Phytolastil gel prévention TRUE
I want to create an new dataframe in witch the mark doesn't exist on the product name.
That's my final goal.
You can use gsub and work with regular expressions:
gsub(" this part ", " ", "A Text where this part should be removed")
# [1] "A Text where should be removed"
gsub(" this part ", " ", "A Text where this 1 part should be removed")
# [1] "A Text where this 1 part should be removed"
Are you looking for string2.replace(string1, '')?
or you could:
>>> R = lambda string1, string2: string2.replace(string1, '')
>>> R('abc', 'AAAabcBBB')
'AAABBB'
>>>

Resources