I'd like to gain a better understanding of escape character sequences in R. I've tried searching for things like ?'\' but, that escapes itself and ?'\\'
I'd like to avoid this kind of behaviour with cat(). For example:
cat("\")
+
Versus:
cat("\\")
\
The help page you are looking for is ?Quotes (with the capital Q). String literal syntax is also described (less clearly IMHO) at http://cran.r-project.org/doc/manuals/R-lang.html#Literal-constants.
The backslash escape works very nearly the same as it does in C and all the other languages that borrowed backslash escapes from C -- \n inserts a newline, \\ inserts a single backslash, \" in a double quoted string prevents the " from ending the string, etc.
Related
I have a document, and I need to find all the words(no spaces) borded with '. (e.g. 'apple', 'hello') What would be the regular expression?
I've tried ^''$ but it didn't work.
If there isn't any solution, it could not be "any word" but also it can be a word from an order(e.g. apple, banana, lemon) but it still must have the (')s.
Thank you so much
Andrew
If you want to capture single-quoted strings, literally any character run except single-quotes but between the single-quotes, use
/'[^']+'/
If you need single words, i.e. alphabetic characters but no spaces, try
/'[a-zA-Z]+'/
I'm asssuming a couple things here:
You're using a language that delimits regexes with slashes. This includes Javascript and Perl to my knowledge, and probably a bunch of others. In some other languages, like C#, you should use double quotes to delimit, e.g. "'[a-zA-Z]+'"
You're using a flavor of regex that does not need to escape the plus sign.
You're trying to capture all such words within a long string. I.e., if the input string is "Here is a 'long' string with 'some' 'words' single-quoted" then you will capture three words: 'long','some', and 'words'.
So, I am doing a small function to strip all the weird chars from a string, eg. #$& will be replaced just for a " "
The chars I am trying to remove are the following, defined into a string:
xChars = r"#$%()'^*\;:/|+_.–°ªº"
However I kepp getting the warning:
Anomalous backslash in string: '\;'. String constant might be missing an r prefix
However, when i used the r prefix eg. r"\" python rules out some of the special chars i want to replace. It doesnt produce an error it just thinks that those chars are ok or something and it rules them out.
Any ideas on how to fix this ?
Normally backslashes escape characters, therefore the compiler isn´t sure if the backslash has to be escaped. Maybe try using a double backslash to escape the backslash itself like: xChars = r"#$%()'^*\\;:/|+_.–°ªº"
I am trying to use a grep formula to search for at least one of the following terms in quotations in the code below in df$AllPrograms.
grep("Service & Product Provider (Partner;ACT)" | "Buildings (Prospect;INA)", df$AllPrograms)
This isn't working and I suspect it is because grep is not interpret ting the & ; and () as operators rather than characters.
Use a double backslash "\" to escape these characters. This is because the backslash is an escape character in extended regex, but we need to "escape" the first backslash as well.
Also, in your example code you have incorrectly specified the OR statement. Try:
grep("Service \\& Product Provider \\(Partner\\;ACT\\)|Buildings \\(Prospect\\;INA\\)", df$AllPrograms)
If there are many other patterns that you'd like to check for, take a look at this link here:
grep using a character vector with multiple patterns
For example, in Unix, a backslash (\) is a common escape character. So to escape a full stop (.) in a regular expression, one does this:
\.
But with % encoding URL parameters, we have an escape character, %, and a control code, so an ampersand (&) doesn't become:
%&
Instead, it becomes:
%26
Any reason why? Seems to just make things more complicated, on the face of it, when we could just have one escape character and a mechanism to escape itself where necessary:
%%
Then it'd be:
simpler to remember; we just need to know which characters to escape, not which to escape and what to escape them to
encoding-agnostic, as we wouldn't be sending an ASCII or Unicode representation explicitly, we'd just be sending them in the encoding the rest of the URL is going in
easy to write an encoder: s/[!\*'();:#&=+$,/?#\[\] "%-\.<>\\^_`{|}~]/%&/g (untested!)
better because we could switch to using \ as an escape character, and life would be simpler and it'd be summer all year long
I might be getting carried away now. Someone shoot me down? :)
EDIT: replaced two uses of "delimiter" with "escape character".
Percent encoding happens not only to escape delimiters, but also so that you can transport bytes that are not allowed inside URIs (such as control characters or non-ASCII characters).
I guess it's because the URL Specification and specifically the HTTP part of it, only allow certain characters so to escape those one must replace them with characters that are allowed.
Also some allowed characters have special meanings like & and ? etc
so replacing them with a control code seems the only way to solve it
If you find it hard to recognize them, bookmark this page
http://www.w3schools.com/tags/ref_urlencode.asp
Windows copies path with backslash \, which R does not accept. So, I wanted to write a function which would convert \ to /. For example:
chartr0 <- function(foo) chartr('\','\\/',foo)
Then use chartr0 as...
source(chartr0('E:\RStuff\test.r'))
But chartr0 is not working. I guess, I am unable to escape /. I guess escaping / may be important in many other occasions.
Also, is it possible to avoid the use chartr0 every time, but convert all path automatically by creating an environment in R which calls chartr0 or use some kind of temporary use like using options
From R 4.0.0 you can use r"(...)" to write a path as raw string constant, which avoids the need for escaping:
r"(E:\RStuff\test.r)"
# [1] "E:\\RStuff\\test.r"
There is a new syntax for specifying raw character constants similar to the one used in C++: r"(...)" with ... any character sequence not containing the sequence )". This makes it easier to write strings that contain backslashes or both single and double quotes. For more details see ?Quotes.
Your fundamental problem is that R will signal an error condition as soon as it sees a single back-slash before any character other than a few lower-case letters, backslashes themselves, quotes or some conventions for entering octal, hex or Unicode sequences. That is because the interpreter sees the back-slash as a message to "escape" the usual translation of characters and do something else. If you want a single back-slash in your character element you need to type 2 backslashes. That will create one backslash:
nchar("\\")
#[1] 1
The "Character vectors" section of _Intro_to_R_ says:
"Character strings are entered using either matching double (") or single (') quotes, but are printed using double quotes (or sometimes without quotes). They use C-style escape sequences, using \ as the escape character, so \ is entered and printed as \, and inside double quotes " is entered as \". Other useful escape sequences are \n, newline, \t, tab and \b, backspace—see ?Quotes for a full list."
?Quotes
chartr0 <- function(foo) chartr('\\','/',foo)
chartr0('E:\\RStuff\\test.r')
You cannot write E:\Rxxxx, because R believes R is escaped.
The problem is that every single forward slash and backslash in your code is escaped incorrectly, resulting in either an invalid string or the wrong string being used. You need to read up on which characters need to be escaped and how. Take a look at the list of escape sequences in the link below. Anything not listed there (such as the forward slash) is treated literally and does not require any escaping.
http://cran.r-project.org/doc/manuals/R-lang.html#Literal-constants