Replace latex with r strings using gsub [duplicate] - r

This question already has an answer here:
"'\w' is an unrecognized escape" in grep
(1 answer)
Closed 1 year ago.
I would like to find and replace tabular instances by tabularx. I tried with gsub but it seems to enter me into a world of escaping pain. Following other questions and answers I find fixed=TRUE which is the best I so far have. The code snippet below almost works, \B is unrecognized. If I escape it twice I get \BEGIN as output!
texText <- '\begin{tabular}{rl}\begin{tabular}{rll}'
texText <- gsub("\begin{tabular}{rl}", "\BEGIN{tabular}{rll}", texText, fixed=TRUE)
I'm using BEGIN as my test to see what is happening. This is before I get to tackling the question of what goes on in the brackets {rl} {ll} {rrl} etc. Ideally I'm looking for a regex that would output:
\begin{tabularx}{rX}\begin{tabularx}{rlX}
That is the final column is replaced by X.

Try using proper escaping:
texText <- "\begin{tabular}{rl}\begin{tabular}{rll}"
output <- gsub("\begin\\{tabular\\}", "\begin{tabularx}", texText)
output
[1] "\begin{tabularx}{rl}\begin{tabularx}{rll}"
A literal backslash requires two backslashes, and also metacharacters such as { and } require two backslashes.

Related

How to use regex to match upto third forward slash in R using gsub? [duplicate]

This question already has answers here:
How to Select everything up to and including the 3rd slash (RegExp)?
(2 answers)
Extract a regular expression match
(12 answers)
Closed 2 years ago.
So this question is relating to specifically how R handles regex - I would like to find some regex in conjunction with gsub to extract out the text all but before the 3rd forward slash.
Here are some string examples:
/google.com/images/video
/msn.com/bing/chat
/bbc.com/video
I would like to obtain the following strings only:
/google.com/images
/msn.com/bing
/bbc.com/video
So it is not keeping the information after the 3rd forward slash.
I cannot seem to get any regex working along with using gsub to solve this!
The closest I have got is:
gsub(pattern = "/[A-Za-z0-9_.-]/[A-Za-z0-9_.-]*$", replacement = "", x = the_data_above )
I think R has some issues regarding forward slashes and escaping them.
From the start of the string match two instances of slash and following non-slash characters followed by anything and replace with the two instances.
paths <- c("/google.com/images/video", "/msn.com/bing/chat", "/bbc.com/video")
sub("^((/[^/]*){2}).*", "\\1", paths)
## [1] "/google.com/images" "/msn.com/bing" "/bbc.com/video"
You can take advantage of lazy (vs greedy) matching by adding the ? after the quantifier (+ in this case) within your capture group:
gsub("(/.+?/.+?)/.*", "\\1", text)
[1] "/google.com/images" "/msn.com/bing" "/bbc.com/video"
Data:
text <- c("/google.com/images/video",
"/msn.com/bing/chat",
"/bbc.com/video")
Try this out:
^\/[A-Za-z0-9_.-]+\/[A-Za-z0-9_.-]+
As seen here: https://regex101.com/r/9ZYppe/1
Your problem arises from the fact that [A-Za-z0-9_.-] matches only one such character. You need to use the + operator to specify that there are multiple of them. Also, the $ at the end is pretty unnecessary because using ^ to assert the start of the sentence solves a great many problems.

R regex using stringr::str_detect and grepl don't seem to be matching "\\+" when it is surrounded by "\\b" [duplicate]

This question already has answers here:
Why does is this end of line (\\b) not recognised as word boundary in stringr/ICU and Perl
(2 answers)
Closed 3 years ago.
I'm pretty new to regex and am trying to detect a word with the "+" symbol when surrounded by "\\b" in long strings of words but both stringr and grepl are giving me the wrong result.
This is the code that I have wrote:
library(stringr)
str_detect("coversyl +", "\\bcoversyl(plus| plus|\\+| \\+)\\b")
The output is FALSE which is wrong.
What would be the right way to do it?
My guess is that your expression is just fine, maybe missing an space,
\\bcoversyl\\b\\s(\\bplus\\b|\\+)
Please see the demo for additional explanation.
If we might want more than one space, we would simply change \\s to \\s+ and it might work:
\\bcoversyl\\b\\s+(\\bplus\\b|\\+)

How to automatically handle strings/paths with backslashes? [duplicate]

This question already has answers here:
How to escape backslashes in R string
(3 answers)
Efficiently convert backslash to forward slash in R
(11 answers)
Closed 3 years ago.
I often want to read in csv files and I get the path by using shift + right click and then clicking "copy path".
I paste this path into my code. See an example below:
read_csv("C:\Users\me\data\file.csv")
Obviously this doesn't work because of the backslashes. My current solution is to escape each one, so that my code looks like this:
read_csv("C:\\Users\\me\\data\\file.csv")
It works, but it's annoying and occasionally I'll get errors because I missed one of the backslashes.
I wanted to create a function automatically adds the extra slashes
fix_path <- function(string) str_replace(string, "\\\\", "\\\\\\\\")
but R won't recognize the string in the first place until the backslashes are taken care of.
Is there another way to deal with this? Python has the option of adding an "r" before strings to note that the backslashes should be treated just as regular backslashes, is there anything similar in R? To be clear, I know that I can escape the backslashes, but I am looking for a way to do it automatically.
You can use this hack. Suppose you had copied your path as mentioned then you could use
scan("clipboard", "character", quiet = TRUE)
scan reads the text copied from the clipboard and takes care about the backslashes. Then copy again what is returned from scan

R - regex: W metacharacter not working when within square brackets [duplicate]

This question already has answers here:
regular expressions in base R: 'perl=TRUE' vs. the default (PCRE vs. TRE)
(3 answers)
Closed 3 years ago.
Let's take the following string:
x <- " hello world"
I would like to extract the first word. To do so, I am using the following regex ^\\W*([a-zA-Z]+).* with a back-reference to the first group.
> gsub("^\\W*([a-zA-Z]+).*", "\\1", x)
[1] "hello"
It works as expected.
Now, let's add a digit and underscore to our string:
x <- " 0_hello world"
I replace \\W by [\\W_0-9] to match the new characters.
> gsub("^[\\W_0-9]*([a-zA-Z]+).*", "\\1", x)
[1] " 0_hello world"
Now, it doesn't work and I do not understand why. It seems that the problem arises when putting \\W within [] but I am not sure why.
The regex works on online regex tester using PCRE though.
What am I doing wrong?
The quick solution is to use Perl-like Regular Expressions by adding an additional argument perl = TRUE.
By default, grep use Extended Regular Expressions (see ?regex) where character classes are defined in the format of [:xxx:]. However, I could not find a character class to match \W exactly.

How to put \' in my string using paste0 function [duplicate]

This question already has answers here:
How to escape backslashes in R string
(3 answers)
Closed 5 years ago.
I have an array:
t <- c("IMCR01","IMFA02","IMFA03")
I want to make it look like this:
"\'IMCR01\'","\'IMFA02\'","\'IMFA03\'"
I tried different ways like:
paste0("\'",t,"\'")
paste0("\\'",t,"\\'")
paste0("\\\\'",t,"\\\\'")
But none of them is correct. Any other functions are OK as well.
Actually your second attempt is correct:
paste0("\\'",t,"\\'")
If you want to tell paste to use a literal backslash, you need to escape it once (but not twice, as you would need within a regex pattern). This would output the following to the console in R:
[1] "\\'IMCR01\\'" "\\'IMFA02\\'" "\\'IMFA03\\'"
The trick here is that the backslash is even being escaped by R in the console output. If you were instead to write t to a text file, you would only see a single backslash as you wanted:
write(t, file = "/path/to/your/file.txt")
But why does R need to escape backslash when writing to its own console? One possibility is that if it were to write a literal \n then this would actually be interpreted by the console as a newline. Hence the need for eacaping is still there.

Resources