using gsub function in r to remove slash - r

suppose I have a string that has the following characters
"\"------------080209060700030309080805\""
And now I want to use gsub function in r to remove the "\ and \" part, and only keep the following characters:
"------------080209060700030309080805\"
Could anyone help me to figure out how should I do it properly ?

Edit 1: Fixed bug (two backslashes required to create a backslash in a string):
s <- '\\"------------080209060700030309080805\\"'
s
gsub('\\"', "", s, fixed = TRUE)
results in
> s <- '\\"------------080209060700030309080805\\"'
> s
[1] "\\\"------------080209060700030309080805\\\""
> gsub('\\"', "", s, fixed = TRUE)
[1] "------------080209060700030309080805"
Please note that a single backslash in R is the escape code which is NOT part of the string:
> charToRaw('\\"')
[1] 5c 22
> charToRaw('\"')
[1] 22
Therefor you have to use two backslashes in the quoted string to create one backslash internally. If you print this string the backslash is escaped again which looks confusing:
> print('\\"')
[1] "\\\""
If you want to print the unescaped content of the string use cat instead of print:
> cat('\\"')
\"
For more see help in R: ?"'":
Character constants
Single and double quotes delimit character constants. They can be used
interchangeably but double quotes are preferred (and character
constants are printed using double quotes), so single quotes are
normally only used to delimit character constants containing double
quotes.
Backslash is used to start an escape sequence inside character
constants. Escaping a character not in the following table is an
error.
Single quotes need to be escaped by backslash in single-quoted
strings, and double quotes in double-quoted strings.
\n newline \r carriage return \t tab \b backspace \a alert (bell)
\f form feed \v vertical tab \ backslash \ \' ASCII apostrophe '
\" ASCII quotation mark " ` ASCII grave accent (backtick) ` \nnn
character with given octal code (1, 2 or 3 digits) \xnn character
with given hex code (1 or 2 hex digits) \unnnn Unicode character with
given code (1--4 hex digits) \Unnnnnnnn Unicode character with given
code (1--8 hex digits)

string <- "\\------------080209060700030309080805\\"
string <- gsub("^\\\\(.*)\\\\$", "\\1", string)
Notes: The pattern I used was ^\(.*)\$, which will match everything in between a beginning and ending backslash. This would only match strings therefore which both begin and end with backslash. Also, we use four backslashes (\\\\) to represent a literal backslash for the pattern in gsub(). We need to escape twice, once for R, and a second time for the regex engine.

Related

Applying a regular expression to a string in R

I'm just getting to know the language R, previously worked with python. The challenge is to replace the last character of each word in the string with *.
How it should look: example text in string, and result work: exampl* tex* i* strin*
My code:
library(tidyverse)
library(stringr)
string_example = readline("Enter our text:")
string_example = unlist(strsplit(string_example, ' '))
string_example
result = str_replace(string_example, pattern = "*\b", replacement = "*")
result
I get an error:
> result = str_replace(string_example, pattern = "*\b", replacement = "*")
Error in stri_replace_first_regex(string, pattern, fix_replacement(replacement), :
Syntax error in regex pattern. (U_REGEX_RULE_SYNTAX, context=``)
Help solve the task
Oh, I noticed an error, the pattern should be .\b. this is how the code is executed, but there is no replacement in the string
If you mean words consisting of letters only, you can use
string_example <- "example text in string"
library(stringr)
str_replace_all(string_example, "\\p{L}\\b", "*")
## => [1] "exampl* tex* i* strin*"
See the R demo and the regex demo.
Details:
\p{L} - a Unicode category (propery) class matching any Unicode letter
\b - a word boundary, in this case, it makes sure there is no other word character immediately on the right. It will fails the match if the letter matched with \p{L} is immediately followed with a letter, digit or _ (these are all word chars). If you want to limit this to a letter check, replace \b with (?!\p{L}).
Note the backslashes are doubled because in regular string literals backslashes are used to form string escape sequences, and thus need escaping themselves to introduce literal backslashes in string literals.
Some more things to consider
If you do not want to change one-letter words, add a non-word boundary at the start, "\\B\\p{L}\\b"
If you want to avoid matching letters that are followed with - + another letter (i.e. some compound words), you can add a lookahead check: "\\p{L}\\b(?!-)".
You may combine the lookarounds and (non-)word boundaries as you need.

How to replace "&" with "\&"

I find it difficult to replace "&" with "\&" using R's base gsub() function -
gsub("&", "\&", "A&B")
Gives below error -
Error: '\&' is an unrecognized escape in character string starting ""\&"
Is there any way to achieve this substitution?
You may use
gsub("&", "\\&", "A&B",fixed=TRUE) # Fixed string replacement
gsub("(&)", "\\\\\\1", "A&B") # Regex replacement
The fixed string replacement is clear: every & is replaced with a \&. The double \ is used in the string literal to denote a literal \.
In the regex replacement, the & is matched and captured into Group 1. Since a backslash is a special character in the regex replacement pattern, it must be doubled, and - keeping in mind a literal backslash is defined with \\ inside a string literal - we need to use \\\\ in the replacement. The \1 is the backreference to Group 1 value, but again, the \ must be doubled in the string, literal, hence, we use \\1 in there. That is why there are 6 backslashes in a row. You may find more about backslashes problem here.
The result only contains a single backslash, you can easily check that using cat or saving the contents to a text file:
cat(gsub("&", "\\&", "A&B",fixed=TRUE), collapse="\n")
cat(gsub("(&)", "\\\\\\1", "A&B"))
See the R demo online

Split a character string in R on a single backslash [duplicate]

I am trying to extract the part of the string before the first backslash but I can't seem to get it tot work properly.
I have tried multiple ways of getting it to work, based on the manual page for strsplit and after searching online.
In my actual situation the strings are in a dataframe which I get from a database connection but I can simplify the situation with the following:
> strsplit("BLAAT1\022E:\\BLAAT2\\BLAAT3","\\",fixed=TRUE)
[[1]]
[1] "BLAAT1\022E:" "BLAAT2" "BLAAT3"
> strsplit("BLAAT1\022E:\\BLAAT2\\BLAAT3","\\",fixed=FALSE)
Error in strsplit("BLAAT1\022E:\\BLAAT2\\BLAAT3", "\\", fixed = FALSE) :
invalid regular expression '\', reason 'Trailing backslash'
> strsplit("BLAAT1\022E:\\BLAAT2\\BLAAT3","\\\\",fixed=TRUE)
[[1]]
[1] "BLAAT1\022E:\\BLAAT2\\BLAAT3"
> strsplit("BLAAT1\022E:\\BLAAT2\\BLAAT3","\\\\",fixed=FALSE)
[[1]]
[1] "BLAAT1\022E:" "BLAAT2" "BLAAT3"
The expected output would also split on the \ between BLAAT1 and 022E:
Thanks in advance
If you use a regex with strsplit function, a literal backslash can be coded as two literal backslashes (as a literal \ is a special regex metacharacter that is used to form regex escapes, like \d, \w, etc.), but since R string literals support string escape sequences (like "\r" for carriage return, "\n" for a newline char) a literal backslash needs to be defined with a double backslash.
So, "\\" is a literal \, and a regex pattern to match a literal backslash char, being \\, should be coded with 4 backslashes, "\\\\".
Here is a regex that you can use: it splits at \ and a non-printable character:
strsplit("BLAAT1\022E:\\BLAAT2\\BLAAT3","\\\\|[^[:print:]]",fixed=FALSE)
# [1] "BLAAT1" "E:" "BLAAT2" "BLAAT3"
See IDEONE demo

Replace double quotes to string empty

I want to remove "Portugal" remove double quotes and replace it with string empty what to do i am using Replace("""," ") but this not working
Use backslash n instead:
Replace("\"", string.Empty);
The backslash ("\") character is a special escape character used to indicate other special characters such as quotation marks (\"). For other special escape characters, you can refer to this link:
https://msdn.microsoft.com/en-us/library/aa691087(v=vs.71).aspx

R character/string: '...' vs "..."

To declare a character or a string on R, one can use both following ways:
x <- 'Some string'
x <- "Some string"
Both work, but is there any difference ?
From ?"'":
Details
Three types of quotes are part of the syntax of R: single and double
quotation marks and the backtick (or back quote, `). In addition,
backslash is used to escape the following character inside character
constants.
Character constants
Single and double quotes delimit character constants. They can be used
interchangeably but double quotes are preferred (and character
constants are printed using double quotes), so single quotes are
normally only used to delimit character constants containing double
quotes.
Backslash is used to start an escape sequence inside character
constants. Escaping a character not in the following table is an
error.
Single quotes need to be escaped by backslash in single-quoted
strings, and double quotes in double-quoted strings.
No. These are identical.
......

Resources