This question already has an answer here:
R - gsub replacing backslashes
(1 answer)
Closed 3 years ago.
I need to use a regex in combination with non-standard evaluation.
The following works fine:
library(stringr)
> str_replace("2.5", "\\.", ",") # without non-standard evaluation
[1] "2,5"
> eval(parse(text = 'str_replace("2.5", ".", ",")')) # with non-standard evaluation
[1] ",.5"
The following does not work:
> eval(parse(text = 'str_replace("2.5", "\\.", ",")'))
Error: '\.' is an unrecognized escape in character string starting ""\."
I was thinking that I need to escape the backslash itself, however, this doesn't seem to work either:
> eval(parse(text = 'str_replace("2.5", "\\\.", ",")'))
Error: '\.' is an unrecognized escape in character string starting "'str_replace("2.5", "\\\."
The solution is to double-escape the backslash inside the non-standard evaluation:
> eval(parse(text = 'str_replace("2.5", "\\\\.", ",")'))
[1] "2,5"
Related
This question already has answers here:
How do I deal with special characters like \^$.?*|+()[{ in my regex?
(2 answers)
Closed last year.
So, I want to extract the substring of a string like this
mystr <- "aa/bb/cc?rest"
I found the sub() function but executing sub("?.*", "", mystr) returns "" instead of "aa/bb/cc".
Why?
The reason is obviousyl because of ? being a special character but using backticks or "\?" doesn't solve this problem.
You need double \ for escaping:
> mystr <- "aa/bb/cc?rest"
> sub("\?.*", "", mystr)
Error: '\?' is an unrecognized escape in character string starting ""\?"
> sub("\\?.*", "", mystr)
[1] "aa/bb/cc"
I'm trying to get rid of characters before or after special characters in a string.
My example string looks like this:
test <- c(">P01923|description", ">P19405orf|description2")
I'm trying to get the part between the > key and the | key, so that I'd be left with c("P01923", "P19405orf") only. I was trying to do this by using gsub twice, first to get rid of everything behind | and then to get rid of >.
I first tried this: gsub("|.*, "", test) but this seems to remove all the characters (not sure why?). I used the regex101.com website to check my regex and learned that | is a special character and that I need to use \| instead, and this worked in the regex101.com website, so I tried gsub("\|.*", "", test), but this gave me an error saying "\|' is an unrecognized escape in character string starting ""\|". I'm having the same problem with >.
How can I get R to recognize special characters like | and > using regex?
If you use "..." to specify character constants you need also escape the \ what leads to \\. But you can also use r"(...)" to specify raw character constants where you can use one \.
gsub(".*>|\\|.*", "", test)
[1] "P01923" "P19405orf"
gsub(r"(.*>|\|.*)", "", test)
[1] "P01923" "P19405orf"
Here .*> removes everything before and >, and \|.* removes | and everything after it and the | in between is an or.
Alternatively regexpr and regmatches could be used like:
regmatches(test, regexpr("(?<=>)[^|]*", test, perl=TRUE))
#[1] "P01923" "P19405orf"
Where (?<=>) is a look behind for > and [^|]* matches everything but not |.
You can extract text between > and |. Special characters can be escaped with \\.
sub('>(.*)\\|.*', '\\1', test)
#[1] "P01923" "P19405orf"
Here is a regex split option. We can split the input string on [>|], which will leave the desired substring in the second position of the output vector.
test <- c(">P01923|description", ">P19405orf|description2")
unlist(lapply(strsplit(test, "[>|]"), function(x) x[2]))
[1] "P01923" "P19405orf"
library(stringr)
test <- c(">P01923|description", ">P19405orf|description2")
#if '>' is always the first character
str_sub(test, 2, -1) %>%
str_replace('\\|.*$', '')
#> [1] "P01923" "P19405orf"
#if not
str_replace(test, '\\>', '') %>%
str_replace('\\|.*$', '')
#> [1] "P01923" "P19405orf"
#alternative way
str_match(test, '\\>(.*)\\|')[, 2]
#> [1] "P01923" "P19405orf"
Created on 2021-06-30 by the reprex package (v2.0.0)
This question already has answers here:
How do I deal with special characters like \^$.?*|+()[{ in my regex?
(2 answers)
Closed 2 years ago.
I am trying to replace the hyphen - in a vector with a period ..
> samples_149_vector
$Patient.ID
[1] MB-0020 MB-0100 MB-0115 MB-0158 MB-0164 MB-0174 MB-0179 MB-0206
[9] MB-0214 MB-0238 MB-0259 MB-0269 MB-0278 MB-0333 MB-0347 MB-0352
[17] MB-0372 MB-0396 MB-0399 MB-0400 MB-0401 MB-0420 MB-0424 MB-0446
[25] MB-0464 MB-0476 MB-0481 MB-0489 MB-0494 MB-0495 MB-0500 MB-0502
The following code
library(stringr)
str_replace_all(samples_149_vector, "-", ".")
generates the following error:
> str_replace_all(samples_149_vector, "-", ".")
[1] "1:149"
[2] "function (length = 0) \n.Internal(vector(\"character\", length))"
Warning message:
In stri_replace_all_regex(string, pattern, fix_replacement(replacement), :
argument is not an atomic vector; coercing
Any ideas? I have tried so many things and combinations but the coercing atomic vector message seems to reoccur
Can you try utilizing an escape since "." is used to match any character when matching patterns with regular expressions? To create the regular expression, you need to use "\\."
str_replace_all(samples_149_vector, "-", "\\.")
This question already has answers here:
Remove all special characters from a string in R?
(3 answers)
How do I deal with special characters like \^$.?*|+()[{ in my regex?
(2 answers)
Closed 3 years ago.
I want to change "[" too "(" in a a data.frame (class is string) but i get the following error:
Error in gsub("[", "(", df) :
invalid regular expression '[', reason 'Missing ']''
Doing the revers works perfectly:
df <- gsub("]",")", df)
all "]" got replaced in the data.frame df
so in essence this is the problem
df <- gsub("[","(", df)
Error in gsub("[", "(", df) :
invalid regular expression '[', reason 'Missing ']''
can anyone help to fix the code
or is there an alternative function to gsub which can accomplish the same?
The [ is. a metacharacter, so we may need either fixed = TRUE or escaping \\[
gsub("[", "(", df, fixed = TRUE)
We can also use the Hexadecimal representation of the ASCII character [ by prefixing it with \\x:
gsub('\\x5B', '(', '[')
# [1] "("
Just a preference, but I find this to be more readable in cases where the metacharacter [ and ] is mixed with it's literal/escaped version. For example I find this:
gsub('[\\x5B\\x5D]+', '(', ']][[[', perl = TRUE)
more readable than these:
gsub('[\\]\\[]+', '(', ']][[[', perl = TRUE)
[1] "("
gsub('[][]+', '(', ']][[[', perl = TRUE)
[1] "("
gsub('[\\[\\]]+', '(', ']][[[', perl = TRUE)
[1] "("
especially when you have a long and complicated pattern.
Here is the ASCII table I used from http://www.asciitable.com/
The obvious disadvantage is that you have to lookup the hex code from the table.
Q: How can I replace underscores "_" with backslash-underscores "_" in an R string? I'd prefer to use the stringr package.
Also, can anyone explain why line 5 below fails to get the desired result? I was almost certain that would work.
library(stringr)
s <- "foo_bar_baz"
str_replace_all(s, "_", 5) # [1] "foo5bar5baz"
str_replace_all(s, "_", "\_") # Error: '\_' is an unrecognized escape in character string starting ""\_"
str_replace_all(s, "_", "\\_") # [1] "foo_bar_baz"
str_replace_all(s, "_", "\\\_") # Error: '\_' is an unrecognized escape in character string starting ""\\\_"
str_replace_all(s, "_", "\\\\_") # [1] "foo\\_bar\\_baz"
Context: I'm making a LaTeX table using xtable and need to sanitize my column names since they all have underscores and break LaTeX.
It is all much easier. Replace literal strings with literal strings with the help of fixed("_"), no need for a regex.
> library(stringr)
> s <- "foo_bar_baz"
> str_replace_all(s, fixed("_"), "\\_")
[1] "foo\\_bar\\_baz"
And if you use cat:
> cat(str_replace_all(s, fixed("_"), "\\_"))
foo\_bar\_baz>
You will see that you actually have 1 backslash in the result.