Remove parenthesis from a character string - r

I am trying to remove a parenthesis from a string in R and run into the following error:
string <- "log(M)"
gsub("log", "", string) # Works just fine
gsub("log(", "", string) #breaks
# Error in gsub("log(", "", test) :
# invalid regular expression 'log(', reason 'Missing ')''

Escape the parenthesis with a double-backslash:
gsub("log\\(", "", string)
(Obligatory: http://xkcd.com/234/)

Ben's answer gives you the good generally applicable way of doing this.
Alternatively, in your situation you could use the fixed=TRUE argument, like this:
gsub("log(", "", string, fixed=TRUE)
# [1] "M)"
It is appropriate whenever the pattern argument to gsub() is a character string containing the literal sequence of characters you are searching for. Then, it's nice because it allows you to type the exact pattern that you are searching for, without escapes etc.

If you are not a regex specialist (many of us are not!), I find it more straight forward to separate the removal of the unneeded text and the parens, provided your query supports that.
The question seems to indicate only wanting to remove parens, so you could use:
gsub(paste(c("[(]", "[)]"), collapse = "|"), "", string)
This results in the string without parens: "logM"
If you also want to remoeve the "M"
gsub(paste(c("M", "[(]", "[)]"), collapse = "|"), "", string)
This results in the result "log"

Related

How to get rid of brackets () using gsub [duplicate]

I am trying to remove a parenthesis from a string in R and run into the following error:
string <- "log(M)"
gsub("log", "", string) # Works just fine
gsub("log(", "", string) #breaks
# Error in gsub("log(", "", test) :
# invalid regular expression 'log(', reason 'Missing ')''
Escape the parenthesis with a double-backslash:
gsub("log\\(", "", string)
(Obligatory: http://xkcd.com/234/)
Ben's answer gives you the good generally applicable way of doing this.
Alternatively, in your situation you could use the fixed=TRUE argument, like this:
gsub("log(", "", string, fixed=TRUE)
# [1] "M)"
It is appropriate whenever the pattern argument to gsub() is a character string containing the literal sequence of characters you are searching for. Then, it's nice because it allows you to type the exact pattern that you are searching for, without escapes etc.
If you are not a regex specialist (many of us are not!), I find it more straight forward to separate the removal of the unneeded text and the parens, provided your query supports that.
The question seems to indicate only wanting to remove parens, so you could use:
gsub(paste(c("[(]", "[)]"), collapse = "|"), "", string)
This results in the string without parens: "logM"
If you also want to remoeve the "M"
gsub(paste(c("M", "[(]", "[)]"), collapse = "|"), "", string)
This results in the result "log"

R/ Regex: Remove an immediate character in front of a pattern along with the pattern

I have this string:
cd/etc/init[BKSP][BKSP]it.d[ENTER]
I want the end result to be like this :
cd/etc/init.d[ENTER]
It would remove all the [BKSP] substrings along with an immediate character in front of it.
I have this sub function:
sub(“(.?\\[BKSP\\]+)+”, “”, string, perl = TRUE)
But getting: cd/etc/iniit.d[ENTER] instead.
Any help would be greatly appreciated! Thanks!
You may use
gsub("(?s).(?R)?\\[BKSP]", "", string, perl=TRUE)
See the regex demo
Details
(?s) - turns on the DOTALL modifier
. - matches any char
(?R)? - matches 1 or 0 ocurrences of the whole pattern (recurses the whole pattern)
\\[BKSP] - a literal substring [BKSP].
R demo:
string <- c("cd/etc/init[BKSP][BKSP]it.d[ENTER]", "abcd[BKSP]e")
gsub("(?s).(?R)?\\[BKSP]", "", string, perl=TRUE)
## => [1] "cd/etc/init.d[ENTER]" "abce"
You could use
test <- "cd/etc/init[BKSP][BKSP]it.d[ENTER]"
pattern <- "\\[BKSP\\]\\w*"
gsub(pattern, "", test)
Which yields
[1] "cd/etc/init.d[ENTER]"

Extracting substring using R

I want to extract substring (description details) from the following strings:
string1 <- #{self=https://somesite.atlassian.net/rest/api/2/status/1; description=The issue is open and ready for the assignee to start work on it.; iconUrl=https://somesite.atlassian.net/images/icons/statuses/open.png; name=Open; id=1; statusCategory=}
string2 <- #{self=https://somesite.atlassian.net/rest/api/2/status/10203; description=; iconUrl=https://somesite.atlassian.net/images/icons/statuses/generic.png; name=Full Curation; id=10203; statusCategory=}
I am trying to get the following
ExtractedSubString1 = "The issue is open and ready for the assignee to start work on it."
ExtractedSubString2 = ""
I tried this:
library(stringr)
ExtractedSubString1 <- substr(string1, str_locate(string1, "description=")+12, str_locate(string1, "; iconUrl")-1)
ExtractedSubString2 <- substr(string2, str_locate(string2, "description=")+12, str_locate(string2, "; iconUrl")-1)
Looking for a better way to accomplish this.
Using only base R's sub and back referencing, you could do
sub(".*description=(.*?);.*", "\\1", c(string1, string2))
[1] "The issue is open and ready for the assignee to start work on it." ""
The ".*" match any set of characters, "description=" is a literal match, ".*?" matches any set of characters, but the ? forces a lazy match rather than a greedy match. ";" is a literal, and the "()" capture the sub-expression that is lazily matched. The back reference "\\1" returns the sub-expression captured in the parentheses.
Using the base R functions regexec and regmatchesgets a bit closer to the method in the OP. sapply with "[" is then used to extract the desired result.
sapply(regmatches(c(string1, string2),
regexec(".*description=(.*?);.*", c(string1, string2))),
"[", 2)
[1] "The issue is open and ready for the assignee to start work on it." ""
You could try:
test.1 <- gsub("description=", "", strsplit(string1, "; ")[[1]][2])
test.2 <- gsub("description=", "", strsplit(string2, "; ")[[1]][2])
This simply splits the string on ; which divides each string in to 6 elements the square brackets select the 2nd element and the gsub replaces the description= to nothing to remove it.

Gsub transforming numbers

I find this problem >S
I scrap some data from the web and for instance I obtain this
"3.444.654" (As character)
If I use gsub("3.444.654", ".", "") in order to get 3444654...
R gives me
[1] ""
What could I do to get the integer!
> gsub(".", "", "3.444.654", fixed = TRUE)
[1] "3444654"
Maybe read the documentation for gsub for argument order etc. To then turn the string into a number, use as.numeric, as.integer etc.

str_replace (package stringr) cannot replace brackets in r?

I have a string, say
fruit <- "()goodapple"
I want to remove the brackets in the string. I decide to use stringr package because it usually can handle this kind of issues. I use :
str_replace(fruit,"()","")
But nothing is replaced, and the following is replaced:
[1] "()good"
If I only want to replace the right half bracket, it works:
str_replace(fruit,")","")
[1] "(good"
However, the left half bracket does not work:
str_replace(fruit,"(","")
and the following error is shown:
Error in sub("(", "", "()good", fixed = FALSE, ignore.case = FALSE, perl = FALSE) :
invalid regular expression '(', reason 'Missing ')''
Anyone has ideas why this happens? How can I remove the "()" in the string, then?
Escaping the parentheses does it...
str_replace(fruit,"\\(\\)","")
# [1] "goodapple"
You may also want to consider exploring the "stringi" package, which has a similar approach to "stringr" but has more flexible functions. For instance, there is stri_replace_all_fixed, which would be useful here since your search string is a fixed pattern, not a regex pattern:
library(stringi)
stri_replace_all_fixed(fruit, "()", "")
# [1] "goodapple"
Of course, basic gsub handles this just fine too:
gsub("()", "", fruit, fixed=TRUE)
# [1] "goodapple"
The accepted answer works for your exact problem, but not for the more general problem:
my_fruits <- c("()goodapple", "(bad)apple", "(funnyapple")
str_replace(my_fruits,"\\(\\)","")
## "goodapple" "(bad)apple", "(funnyapple"
This is because the regex exactly matches a "(" followed by a ")".
Assuming you care only about bracket pairs, this is a stronger solution:
str_replace(my_fruits, "\\([^()]{0,}\\)", "")
## "goodapple" "apple" "(funnyapple"
Building off of MJH's answer, this removes all ( or ):
my_fruits <- c("()goodapple", "(bad)apple", "(funnyapple")
str_replace_all(my_fruits, "[//(//)]", "")
[1] "goodapple" "badapple" "funnyapple"

Resources