Using Gsub in R to remove a string containing brackets [duplicate] - r

This question already has answers here:
How do I deal with special characters like \^$.?*|+()[{ in my regex?
(2 answers)
Closed 6 years ago.
I'm trying to use gsub to remove certain parts of a string. However, I can't get it to work, and I think it's because the string to be removed contains brackets. Is there any way around this? Thanks for any help.
The command I want to use:
gsub('(4:4aCO)_','', '(5:3)_(4:4)_(5:3)_(4:4)_(4:4aCO)_(6:2)_(4:4a)')
Returns:
#"(5:3)_(4:4)_(5:3)_(4:4)_(4:4aCO)_(6:2)_(4:4a)"
Expected output:
#"(5:3)_(4:4)_(5:3)_(4:4)_(6:2)_(4:4a)"
A quick test to see if brackets were the problem:
gsub('te','', 'test')
#[1] "st"
gsub('(te)','', '(te)st')
#[1] "()st"

We can by placing the brackets inside the square brackets as () is a metacharacter
gsub('[(]4:4aCO[)]','', '(5:3)(4:4)(5:3)(4:4)(4:4aCO)(6:2)_(4:4a)')
Or with fixed = TRUE to evaluate the literal meaning of that character
gsub('(4:4aCO)','', '(5:3)(4:4)(5:3)(4:4)(4:4aCO)(6:2)_(4:4a)', fixed = TRUE)

Related

How do I gsub ' with empty? [duplicate]

This question already has answers here:
How to remove single quote from a string in R?
(3 answers)
How do I deal with special characters like \^$.?*|+()[{ in my regex?
(2 answers)
Closed 11 months ago.
For example for a string like this
NANYANG-GIRLS'-HIGH-SCHOOL
how do I use gsub to replace ' to empty and make it
NANYANG-GIRLS-HIGH-SCHOOL
when I do it in R, it shows error
You can use either of the following two approaches:
sec_name <- gsub('\'', '', sec_name, fixed=TRUE)
sec_name <- gsub("'", "", sec_name, fixed=TRUE)
This first approach is a correct version of what you were doing. Here, we use single quotes for the strings, but we escape the single quote to make it a literal single quote.

How to replace a caret with str_replace from stringr [duplicate]

This question already has answers here:
How do I deal with special characters like \^$.?*|+()[{ in my regex?
(2 answers)
Closed 3 years ago.
I have a code problem need some help(exclude specific string).
I found the str_replace_all to do the job.However,it works on other
characters like"/n", "/t" or "A","B","C",except the "^",I want to exclude
this sign but get a error message
(Error in stri_replace_all_regex(string, pattern, fix_replacement(replacement), : Missing closing bracket on a bracket expression.(U_REGEX_MISSING_CLOSE_BRACKET))
Thanks for your help!
code=c("^GSPC","^FTSE","000001.SS","^HSI","^FCHI","^KS11","^TWII","^GDAXI","^STI")
str_replace_all(code, "([^])", "")
An option is to wrap with fixed and should be fine
library(stringr)
str_replace_all(code, fixed("^"), "")
#[1] "GSPC" "FTSE" "000001.SS" "HSI" "FCHI" "KS11" "TWII" "GDAXI" "STI"
Also, as we are replacing with blank (""), an option is str_remove
str_remove(code, fixed("^"))
Regarding why the OP's code didn't, inside the square brackets, if we use ^, it is not reading the literal character, instead the metacharacter in it looks for characters other than and here it is blank ([^])

R regex using stringr::str_detect and grepl don't seem to be matching "\\+" when it is surrounded by "\\b" [duplicate]

This question already has answers here:
Why does is this end of line (\\b) not recognised as word boundary in stringr/ICU and Perl
(2 answers)
Closed 3 years ago.
I'm pretty new to regex and am trying to detect a word with the "+" symbol when surrounded by "\\b" in long strings of words but both stringr and grepl are giving me the wrong result.
This is the code that I have wrote:
library(stringr)
str_detect("coversyl +", "\\bcoversyl(plus| plus|\\+| \\+)\\b")
The output is FALSE which is wrong.
What would be the right way to do it?
My guess is that your expression is just fine, maybe missing an space,
\\bcoversyl\\b\\s(\\bplus\\b|\\+)
Please see the demo for additional explanation.
If we might want more than one space, we would simply change \\s to \\s+ and it might work:
\\bcoversyl\\b\\s+(\\bplus\\b|\\+)

How to replace "|" in R [duplicate]

This question already has answers here:
What special characters must be escaped in regular expressions?
(13 answers)
Closed 4 years ago.
I have a dataframe which contains
"HYD_SOA_UNBLOCK~SOA_BLOCK-UK|SOA_BLOCK-DE||SOA_BLOCK-FR||SOA_BLOCK-IT||SOA_BLOCK-ES|"
I want the result to be -
"HYD_SOA_UNBLOCK~SOA_BLOCK-UK|SOA_BLOCK-DE|SOA_BLOCK-FR|SOA_BLOCK-IT|SOA_BLOCK-ES|"
I tried:
leadtemp$collate = gsub("||","|",leadtemp$collate)
but it is not working.
Please help me replace "||" with "|"
As MrFlick suggested, include fixed = TRUE in your gsub statement. The problem occurs because "|" is a Regular Expression operator. Using fixed = TRUE tells gsub to assume the pattern is a string and not a RegEx.
leadtemp$collate = gsub("||","|",leadtemp$collate, fixed=TRUE)
Another (although more complicated) way of doing it would be to escape all the |s:
leadtemp$collate = gsub("\\|\\|","\\|",leadtemp$collate)
Try:
gsub("[|]{2}", "|", leadtemp$collate)
I have defined character class comprising the pipe character and forced gsub to look for exactly two occurrences.Result is:
"HYD_SOA_UNBLOCK~SOA_BLOCK-UK|SOA_BLOCK-DE|SOA_BLOCK-FR|SOA_BLOCK-IT|SOA_BLOCK-ES|"
| is a metacharacter. As you can read here, metacharacters need to be escaped out of with a \. \ is also a metacharacter so it must be escaped out of in the same way. So whenever you want to refer to a | in a string, you have to put \\|. This should make your code work:
leadtemp$collate = gsub("\\|\\|","\\|",leadtemp$collate)

How to remove beginning-digits only in R [duplicate]

This question already has answers here:
Remove numbers at the beginning and end of a string
(3 answers)
Remove string from a vector in R
(4 answers)
Closed 5 years ago.
I have some strings with digits and alpha characters in them. Some of the digits are important, but the ones at the beginning of the string (and only these) are unimportant. This is due to a peculiarity in how email addresses are stored. So the best example is:
x<-'12345johndoe23#gmail.com'
Should be transformed to johndoe23#gmail.com
unfortunately there are no spaces. I have tried gsub('[[:digit:]]+', '', x) but this removes all numbers, not just the beginning-ones
Edit: I have found some solutions in other languages: Python: Remove numbers at the beginning of a string
As per my comment:
See regex in use here
^[[:digit:]]+
^ Asserts position at the start of the string
You can do this:
x<-'12345johndoe23#gmail.com'
gsub('^[[:digit:]]+', '', x) #added ^ as begin of string
Another regex is :
sub('^\\d+','',x)

Resources