Replace square bracket with bracket using gsub [duplicate] - r

This question already has answers here:
Remove all special characters from a string in R?
(3 answers)
How do I deal with special characters like \^$.?*|+()[{ in my regex?
(2 answers)
Closed 3 years ago.
I want to change "[" too "(" in a a data.frame (class is string) but i get the following error:
Error in gsub("[", "(", df) :
invalid regular expression '[', reason 'Missing ']''
Doing the revers works perfectly:
df <- gsub("]",")", df)
all "]" got replaced in the data.frame df
so in essence this is the problem
df <- gsub("[","(", df)
Error in gsub("[", "(", df) :
invalid regular expression '[', reason 'Missing ']''
can anyone help to fix the code
or is there an alternative function to gsub which can accomplish the same?

The [ is. a metacharacter, so we may need either fixed = TRUE or escaping \\[
gsub("[", "(", df, fixed = TRUE)

We can also use the Hexadecimal representation of the ASCII character [ by prefixing it with \\x:
gsub('\\x5B', '(', '[')
# [1] "("
Just a preference, but I find this to be more readable in cases where the metacharacter [ and ] is mixed with it's literal/escaped version. For example I find this:
gsub('[\\x5B\\x5D]+', '(', ']][[[', perl = TRUE)
more readable than these:
gsub('[\\]\\[]+', '(', ']][[[', perl = TRUE)
[1] "("
gsub('[][]+', '(', ']][[[', perl = TRUE)
[1] "("
gsub('[\\[\\]]+', '(', ']][[[', perl = TRUE)
[1] "("
especially when you have a long and complicated pattern.
Here is the ASCII table I used from http://www.asciitable.com/
The obvious disadvantage is that you have to lookup the hex code from the table.

Related

How to get rid of brackets () using gsub [duplicate]

I am trying to remove a parenthesis from a string in R and run into the following error:
string <- "log(M)"
gsub("log", "", string) # Works just fine
gsub("log(", "", string) #breaks
# Error in gsub("log(", "", test) :
# invalid regular expression 'log(', reason 'Missing ')''
Escape the parenthesis with a double-backslash:
gsub("log\\(", "", string)
(Obligatory: http://xkcd.com/234/)
Ben's answer gives you the good generally applicable way of doing this.
Alternatively, in your situation you could use the fixed=TRUE argument, like this:
gsub("log(", "", string, fixed=TRUE)
# [1] "M)"
It is appropriate whenever the pattern argument to gsub() is a character string containing the literal sequence of characters you are searching for. Then, it's nice because it allows you to type the exact pattern that you are searching for, without escapes etc.
If you are not a regex specialist (many of us are not!), I find it more straight forward to separate the removal of the unneeded text and the parens, provided your query supports that.
The question seems to indicate only wanting to remove parens, so you could use:
gsub(paste(c("[(]", "[)]"), collapse = "|"), "", string)
This results in the string without parens: "logM"
If you also want to remoeve the "M"
gsub(paste(c("M", "[(]", "[)]"), collapse = "|"), "", string)
This results in the result "log"

Extraxt substring until "?" with sub() [duplicate]

This question already has answers here:
How do I deal with special characters like \^$.?*|+()[{ in my regex?
(2 answers)
Closed last year.
So, I want to extract the substring of a string like this
mystr <- "aa/bb/cc?rest"
I found the sub() function but executing sub("?.*", "", mystr) returns "" instead of "aa/bb/cc".
Why?
The reason is obviousyl because of ? being a special character but using backticks or "\?" doesn't solve this problem.
You need double \ for escaping:
> mystr <- "aa/bb/cc?rest"
> sub("\?.*", "", mystr)
Error: '\?' is an unrecognized escape in character string starting ""\?"
> sub("\\?.*", "", mystr)
[1] "aa/bb/cc"

Escape backslash with eval and parse (non-standard evaluation) [duplicate]

This question already has an answer here:
R - gsub replacing backslashes
(1 answer)
Closed 3 years ago.
I need to use a regex in combination with non-standard evaluation.
The following works fine:
library(stringr)
> str_replace("2.5", "\\.", ",") # without non-standard evaluation
[1] "2,5"
> eval(parse(text = 'str_replace("2.5", ".", ",")')) # with non-standard evaluation
[1] ",.5"
The following does not work:
> eval(parse(text = 'str_replace("2.5", "\\.", ",")'))
Error: '\.' is an unrecognized escape in character string starting ""\."
I was thinking that I need to escape the backslash itself, however, this doesn't seem to work either:
> eval(parse(text = 'str_replace("2.5", "\\\.", ",")'))
Error: '\.' is an unrecognized escape in character string starting "'str_replace("2.5", "\\\."
The solution is to double-escape the backslash inside the non-standard evaluation:
> eval(parse(text = 'str_replace("2.5", "\\\\.", ",")'))
[1] "2,5"

How do I remove suffix from a list of Ensembl IDs in R [duplicate]

This question already has answers here:
Remove part of string after "."
(6 answers)
Closed 3 years ago.
I have a large list which contains expressed genes from many cell lines. Ensembl genes often come with version suffixes, but I need to remove them. I've found several references that describe this here or here, but they will not work for me, likely because of my data structure (I think its a nested array within a list?). Can someone help me with the particulars of the code and with my understanding of my own data structures?
Here's some example data
>listOfGenes_version <- list("cellLine1" = c("ENSG001.1", "ENSG002.1", "ENSG003.1"), "cellLine2" = c("ENSG003.1", "ENSG004.1"))
>listOfGenes_version
$cellLine1
[1] "ENSG001.1" "ENSG002.1" "ENSG003.1"
$cellLine2
[1] "ENSG003.1" "ENSG004.1"
And what I would like to see is
>listOfGenes_trimmed
$cellLine1
[1] "ENSG001" "ENSG002" "ENSG003"
$cellLine2
[1] "ENSG003" "ENSG004"
Here are some things I tried, but did not work
>listOfGenes_trimmed <- str_replace(listOfGenes_version, pattern = ".[0-9]+$", replacement = "")
Warning message:
In stri_replace_first_regex(string, pattern, fix_replacement(replacement), :
argument is not an atomic vector; coercing
>listOfGenes_trimmed <- lapply(listOfGenes_version, gsub('\\..*', '', listOfGenes_version))
Error in match.fun(FUN) :
'gsub("\\..*", "", listOfGenes_version)' is not a function, character or symbol
Thanks so much!
An option would be to specify the pattern as . (metacharacter - so escape) followeed by one or more digits (\\d+) at the end ($) of the string and replace with blank ('")
lapply(listOfGenes_version, sub, pattern = "\\.\\d+$", replacement = "")
#$cellLine1
#[1] "ENSG001" "ENSG002" "ENSG003"
#$cellLine2
#[1] "ENSG003" "ENSG004"
The . is a metacharacter that matches any character, so we need to escape it to get the literal value as the mode is by default regex

str_replace (package stringr) cannot replace brackets in r?

I have a string, say
fruit <- "()goodapple"
I want to remove the brackets in the string. I decide to use stringr package because it usually can handle this kind of issues. I use :
str_replace(fruit,"()","")
But nothing is replaced, and the following is replaced:
[1] "()good"
If I only want to replace the right half bracket, it works:
str_replace(fruit,")","")
[1] "(good"
However, the left half bracket does not work:
str_replace(fruit,"(","")
and the following error is shown:
Error in sub("(", "", "()good", fixed = FALSE, ignore.case = FALSE, perl = FALSE) :
invalid regular expression '(', reason 'Missing ')''
Anyone has ideas why this happens? How can I remove the "()" in the string, then?
Escaping the parentheses does it...
str_replace(fruit,"\\(\\)","")
# [1] "goodapple"
You may also want to consider exploring the "stringi" package, which has a similar approach to "stringr" but has more flexible functions. For instance, there is stri_replace_all_fixed, which would be useful here since your search string is a fixed pattern, not a regex pattern:
library(stringi)
stri_replace_all_fixed(fruit, "()", "")
# [1] "goodapple"
Of course, basic gsub handles this just fine too:
gsub("()", "", fruit, fixed=TRUE)
# [1] "goodapple"
The accepted answer works for your exact problem, but not for the more general problem:
my_fruits <- c("()goodapple", "(bad)apple", "(funnyapple")
str_replace(my_fruits,"\\(\\)","")
## "goodapple" "(bad)apple", "(funnyapple"
This is because the regex exactly matches a "(" followed by a ")".
Assuming you care only about bracket pairs, this is a stronger solution:
str_replace(my_fruits, "\\([^()]{0,}\\)", "")
## "goodapple" "apple" "(funnyapple"
Building off of MJH's answer, this removes all ( or ):
my_fruits <- c("()goodapple", "(bad)apple", "(funnyapple")
str_replace_all(my_fruits, "[//(//)]", "")
[1] "goodapple" "badapple" "funnyapple"

Resources