How to get rid of brackets () using gsub [duplicate] - r

I am trying to remove a parenthesis from a string in R and run into the following error:
string <- "log(M)"
gsub("log", "", string) # Works just fine
gsub("log(", "", string) #breaks
# Error in gsub("log(", "", test) :
# invalid regular expression 'log(', reason 'Missing ')''

Escape the parenthesis with a double-backslash:
gsub("log\\(", "", string)
(Obligatory: http://xkcd.com/234/)

Ben's answer gives you the good generally applicable way of doing this.
Alternatively, in your situation you could use the fixed=TRUE argument, like this:
gsub("log(", "", string, fixed=TRUE)
# [1] "M)"
It is appropriate whenever the pattern argument to gsub() is a character string containing the literal sequence of characters you are searching for. Then, it's nice because it allows you to type the exact pattern that you are searching for, without escapes etc.

If you are not a regex specialist (many of us are not!), I find it more straight forward to separate the removal of the unneeded text and the parens, provided your query supports that.
The question seems to indicate only wanting to remove parens, so you could use:
gsub(paste(c("[(]", "[)]"), collapse = "|"), "", string)
This results in the string without parens: "logM"
If you also want to remoeve the "M"
gsub(paste(c("M", "[(]", "[)]"), collapse = "|"), "", string)
This results in the result "log"

Related

Find a regex statement to delete all occurences of "\" and the subsequent symbol

I have a use case where
x <- "test - hello\r\n 1...124"
and I would like to obtain "test - hello 1...124. I am aware that I can use "gsub("[\r\n]", "", x)" for this specific case. However, I am wondering how to more generally remove any backslash followed by any symbol (e.g. using something like "\." and escaping the backslash). Examples that did not work are
gsub("\.", "", x) # error
gsub("\\.", "", x) # escapes "."?
gsub("\\\.", "", x) # error
gsub("\\\\.", "", x) # ??
...
Also I would be very thankful for an explanation as to why this is not working.
With the package strings you can use str_squish to remove not only leading and trailing whitespaces but also whitespaces somewhere in the middle.
x <- "test - hello\r\n 1...124"
stringr::str_squish(x)
#> [1] "test - hello 1...124"
gsub("\\r|\\n","", x)
gives the same result.
If you're looking to remove both newlines (or other escaped characters) and other strings that begin with \, you can just include both in the expression:
\r|\n|\t|\0|\\.

Extracting substring using R

I want to extract substring (description details) from the following strings:
string1 <- #{self=https://somesite.atlassian.net/rest/api/2/status/1; description=The issue is open and ready for the assignee to start work on it.; iconUrl=https://somesite.atlassian.net/images/icons/statuses/open.png; name=Open; id=1; statusCategory=}
string2 <- #{self=https://somesite.atlassian.net/rest/api/2/status/10203; description=; iconUrl=https://somesite.atlassian.net/images/icons/statuses/generic.png; name=Full Curation; id=10203; statusCategory=}
I am trying to get the following
ExtractedSubString1 = "The issue is open and ready for the assignee to start work on it."
ExtractedSubString2 = ""
I tried this:
library(stringr)
ExtractedSubString1 <- substr(string1, str_locate(string1, "description=")+12, str_locate(string1, "; iconUrl")-1)
ExtractedSubString2 <- substr(string2, str_locate(string2, "description=")+12, str_locate(string2, "; iconUrl")-1)
Looking for a better way to accomplish this.
Using only base R's sub and back referencing, you could do
sub(".*description=(.*?);.*", "\\1", c(string1, string2))
[1] "The issue is open and ready for the assignee to start work on it." ""
The ".*" match any set of characters, "description=" is a literal match, ".*?" matches any set of characters, but the ? forces a lazy match rather than a greedy match. ";" is a literal, and the "()" capture the sub-expression that is lazily matched. The back reference "\\1" returns the sub-expression captured in the parentheses.
Using the base R functions regexec and regmatchesgets a bit closer to the method in the OP. sapply with "[" is then used to extract the desired result.
sapply(regmatches(c(string1, string2),
regexec(".*description=(.*?);.*", c(string1, string2))),
"[", 2)
[1] "The issue is open and ready for the assignee to start work on it." ""
You could try:
test.1 <- gsub("description=", "", strsplit(string1, "; ")[[1]][2])
test.2 <- gsub("description=", "", strsplit(string2, "; ")[[1]][2])
This simply splits the string on ; which divides each string in to 6 elements the square brackets select the 2nd element and the gsub replaces the description= to nothing to remove it.

Gsub transforming numbers

I find this problem >S
I scrap some data from the web and for instance I obtain this
"3.444.654" (As character)
If I use gsub("3.444.654", ".", "") in order to get 3444654...
R gives me
[1] ""
What could I do to get the integer!
> gsub(".", "", "3.444.654", fixed = TRUE)
[1] "3444654"
Maybe read the documentation for gsub for argument order etc. To then turn the string into a number, use as.numeric, as.integer etc.

str_replace (package stringr) cannot replace brackets in r?

I have a string, say
fruit <- "()goodapple"
I want to remove the brackets in the string. I decide to use stringr package because it usually can handle this kind of issues. I use :
str_replace(fruit,"()","")
But nothing is replaced, and the following is replaced:
[1] "()good"
If I only want to replace the right half bracket, it works:
str_replace(fruit,")","")
[1] "(good"
However, the left half bracket does not work:
str_replace(fruit,"(","")
and the following error is shown:
Error in sub("(", "", "()good", fixed = FALSE, ignore.case = FALSE, perl = FALSE) :
invalid regular expression '(', reason 'Missing ')''
Anyone has ideas why this happens? How can I remove the "()" in the string, then?
Escaping the parentheses does it...
str_replace(fruit,"\\(\\)","")
# [1] "goodapple"
You may also want to consider exploring the "stringi" package, which has a similar approach to "stringr" but has more flexible functions. For instance, there is stri_replace_all_fixed, which would be useful here since your search string is a fixed pattern, not a regex pattern:
library(stringi)
stri_replace_all_fixed(fruit, "()", "")
# [1] "goodapple"
Of course, basic gsub handles this just fine too:
gsub("()", "", fruit, fixed=TRUE)
# [1] "goodapple"
The accepted answer works for your exact problem, but not for the more general problem:
my_fruits <- c("()goodapple", "(bad)apple", "(funnyapple")
str_replace(my_fruits,"\\(\\)","")
## "goodapple" "(bad)apple", "(funnyapple"
This is because the regex exactly matches a "(" followed by a ")".
Assuming you care only about bracket pairs, this is a stronger solution:
str_replace(my_fruits, "\\([^()]{0,}\\)", "")
## "goodapple" "apple" "(funnyapple"
Building off of MJH's answer, this removes all ( or ):
my_fruits <- c("()goodapple", "(bad)apple", "(funnyapple")
str_replace_all(my_fruits, "[//(//)]", "")
[1] "goodapple" "badapple" "funnyapple"

Remove parenthesis from a character string

I am trying to remove a parenthesis from a string in R and run into the following error:
string <- "log(M)"
gsub("log", "", string) # Works just fine
gsub("log(", "", string) #breaks
# Error in gsub("log(", "", test) :
# invalid regular expression 'log(', reason 'Missing ')''
Escape the parenthesis with a double-backslash:
gsub("log\\(", "", string)
(Obligatory: http://xkcd.com/234/)
Ben's answer gives you the good generally applicable way of doing this.
Alternatively, in your situation you could use the fixed=TRUE argument, like this:
gsub("log(", "", string, fixed=TRUE)
# [1] "M)"
It is appropriate whenever the pattern argument to gsub() is a character string containing the literal sequence of characters you are searching for. Then, it's nice because it allows you to type the exact pattern that you are searching for, without escapes etc.
If you are not a regex specialist (many of us are not!), I find it more straight forward to separate the removal of the unneeded text and the parens, provided your query supports that.
The question seems to indicate only wanting to remove parens, so you could use:
gsub(paste(c("[(]", "[)]"), collapse = "|"), "", string)
This results in the string without parens: "logM"
If you also want to remoeve the "M"
gsub(paste(c("M", "[(]", "[)]"), collapse = "|"), "", string)
This results in the result "log"

Resources