Regex for literal curly brackets in R [duplicate] - r

This question already has answers here:
Error: '\R' is an unrecognized escape in character string starting "C:\R"
(5 answers)
Closed 2 years ago.
I am not an expert on Regex in R, but I feel I have read the docs first long enough and still come up short, so I am posting here.
I am trying to replace the following string, all LITERALLY as written:
a = "\\begin{tabular}"
a = gsub("\\begin{tabular}", "\\scalebox{0.7}{
\\begin{tabular}", a)
Desired output is : cat('\\scalebox{0.7}{ \\begin{tabular}')
So I know I need to escape the first "\" to "\", but when I escape the brackets I get
Error: '\}' is an unrecognized escape in character string starting...

In your case since you're seeking to replace a fixed string, you can simply set fixed = T option to avoid regular expressions entirely.
a = "\\begin{tabular}"
a = gsub("\\begin{tabular}", "\\scalebox{0.7}{\n\\begin{tabular}", x=a, fixed= T)
and use \n for the newline.
If you did want to use regex, you need to escape curly bracket in pattern using two backslashes rather than one.
e.g.,
a = "\\begin{tabular}"
gsub(pattern = "\\{|\\}", replacement = "_foo_", x=a)
[1] "\\begin_foo_tabular_foo_"
Alternatively, you can enclose the curly brackets in square brackets like so:
e.g.,
a = "\\begin{tabular}"
gsub(pattern = "[{]|[}]", replacement = "_foo_", x=a)
[1] "\\begin_foo_tabular_foo_"

Related

Replace "$" in a string in R [duplicate]

This question already has answers here:
How do I deal with special characters like \^$.?*|+()[{ in my regex?
(2 answers)
Closed 1 year ago.
I would like to replace $ in my R strings. I have tried:
mystring <- "file.tree.id$HASHd15962267-44c21f1cee1057d95d6840$HASHe92451fece3b3341962516acfa962b2f$checked"
stringr::str_replace(mystring, pattern="$",
replacement="!")
However, it fails and my replacement character is put as the last character in my original string:
[1] "file.tree.id$HASHd15962267-44c21f1cee1057d95d6840$HASHe92451fece3b3341962516acfa962b2f$checked!"
I tried some variation using "pattern="/$" but it fails as well. Can someone point a strategy to do that?
In base R, You could use:
chartr("$","!", mystring)
[1] "file.tree.id!HASHd15962267-44c21f1cee1057d95d6840!HASHe92451fece3b3341962516acfa962b2f!checked"
Or even
gsub("$","!", mystring, fixed = TRUE)
We need fixed to be wrapped as by default pattern is in regex mode and in regex $ implies the end of string
stringr::str_replace_all(mystring, pattern = fixed("$"),
replacement = "!")
Or could escape (\\$) or place it in square brackets ([$]$), but `fixed would be more faster

Trying to figure out regular expression in R for sub() [duplicate]

This question already has answers here:
Replace single backslash in R
(5 answers)
Closed 3 years ago.
I'm trying to use regular expression in a sub() function in order to replace all the "\" in a Vector
I've tried a number of different ways to get R to recognize the "\":
I've tried "\\\" but I keep getting errors.
I've tried "\.*"
I've tried "\\\.*"
data.frame1$vector4 <- sub(pattern = "\\\", replace = ", data.frame1$vector4)
The \ that I am trying to get rid of only appears occasionally in the vector and always in the middle of the string. I want to get rid of it and all the characters that follow it.
The error that I am getting
Error: '\.' is an unrecognized escape in character string starting "\."
Also I'm struggling to get Stack to print the "\" that I am typing above. It keeps deleting them.
1) 4 backslashes To insert backslash into an R literal string use a double backslash; however, a backslash is a metacharacter for a regular expression so it must be escaped by prefacing it with another backslash which also has to be doubled. Thus using 4 backslashes will be needed in the regular expression.
s <- "a\\b\\c"
nchar(s)
## [1] 5
gsub("\\\\", "", s)
## [1] "abc"
2) character class Another way to effectively escape it is to surround it with [...]
gsub("[\\]", "", s)
## [1] "abc"
3) fixed argument Perhaps the simplest way is to use fixed=TRUE in which case special characters will not be regarded as regular expression metacharacters.
gsub("\\", "", s, fixed = TRUE)
## [1] "abc"

Negating a string while matching others [duplicate]

This question already has answers here:
Regular expression that both includes and excludes certain strings in R
(3 answers)
Closed 5 years ago.
I would like to match some strings using regex while negating others in R. In the below example, I would like exclude subsections of strings that I would otherwise like to match. Example below using the answer from Regular expression to match a line that doesn't contain a word?.
My confusion is that when I try this, grepl throws an error:
Error in grepl(mypattern, mystring) :
invalid regular expression 'boardgames|(^((?!games).)*$)', reason 'Invalid regexp'
mypattern <- "boardgames|(^((?!games).)*$)"
mystring <- c("boardgames", "boardgames", "games")
grepl(mypattern, mystring)
Note running using str_detect returns desired results (i.e. T, T, F), but I would like to use grepl.
We need perl = TRUE as the default option is perl = FALSE
grepl(mypattern, mystring, perl = TRUE)
#[1] TRUE TRUE FALSE
This is needed when Perl-compatible regexps are used
According to ?regexp
The perl = TRUE argument to grep, regexpr, gregexpr, sub, gsub and
strsplit switches to the PCRE library that implements regular
expression pattern matching using the same syntax and semantics as
Perl 5.x, with just a few differences.

Trying to replace a () in a string in R using str_replace [duplicate]

This question already has answers here:
str_replace (package stringr) cannot replace brackets in r?
(3 answers)
Closed 6 years ago.
I am trying to replace a () in a string using the sub_string function in R but it appears that due that the function is overlooking the (). I am pretty new to coding and R so I imagine that it has something to do with the regular expression of ().
I just dont know how to make the code identify that I want it to treat the () as regular characters
example string:
tBodyAcc-mean()-X
Here is the function I am using:
mutate(feature,feature=str_replace(feature$feature,(),""))
Appreciate the help
Sub, gsub
\\ identify special characters
If you want to replace ONLY the parenthesis that are in the middle of the string (that is not at the start or at the end):
text <- "tBodyAcc-mean()-X"
sub("#\\(\\)#", "", text)
[1] "tBodyAcc-mean-X"
text <- "tBodyAcc-mean-X()"
sub("#\\(\\)#", "", text)
[1] "tBodyAcc-mean-X()"
If you want to replace ANY parenthesis (including those at the end and at the start of the string)
text <- "tBodyAcc-mean()-X"
sub("\\(\\)", "", text)
EDIT, as pointed out in several comments using gsub instead of sub will replace all the "()" in a string, while sub only replace the first "()"
text <- "t()BodyAcc-mean()-X"
sub("\\(\\)", "", text)
[1] "tBodyAcc-mean()-X"
> gsub("\\(\\)", "", text)
[1] "tBodyAcc-mean-X"
You can do better using gsub. It will replace all occurrences.
# First argument is the pattern to find. Instead of () you specify \\(\\) because is a regular expression and you want the literal ()
# Second argument is the string to replace
# Third argument is the string in which the replacement takes place
gsub("\\(\\)", "REPLACE", "tBodyAcc-mean()-X")
Output:
[1] "tBodyAcc-meanREPLACE-X"

In R replace punctuation "." within a string [duplicate]

This question already has answers here:
Replacing commas and dots in R
(3 answers)
Closed 7 years ago.
I have look into the web and found this webpage In R, replace text within a string to replace a text within in a string.
I tried the same method to replace the punctuation "." into another punctuation "-" but it did not work.
group <- c("12357.", "12575.", "197.18", ".18947")
gsub(".", "-", group)
gives this output
[1] "------" "------" "------" "------"
instead of
[1] "12357-" "12575-" "197-18" "-18947"
Is there an alternate way to do this ?
"." in regex langage means "any character". To capture the actual point, you need to escape it, so:
gsub("\\.", "-", group)
#[1] "12357-" "12575-" "197-18" "-18947"
As mentioned by #akrun in the comments, if you prefer, you can also enclosed it in between brackets, then you don't need to escape it:
gsub('[.]', '-', group)
[1] "12357-" "12575-" "197-18" "-18947"

Resources