Warning on regex string in Python - r

So, I am doing a small function to strip all the weird chars from a string, eg. #$& will be replaced just for a " "
The chars I am trying to remove are the following, defined into a string:
xChars = r"#$%()'^*\;:/|+_.–°ªº"
However I kepp getting the warning:
Anomalous backslash in string: '\;'. String constant might be missing an r prefix
However, when i used the r prefix eg. r"\" python rules out some of the special chars i want to replace. It doesnt produce an error it just thinks that those chars are ok or something and it rules them out.
Any ideas on how to fix this ?

Normally backslashes escape characters, therefore the compiler isn´t sure if the backslash has to be escaped. Maybe try using a double backslash to escape the backslash itself like: xChars = r"#$%()'^*\\;:/|+_.–°ªº"

Related

MariaDB removing backslashes from string in UPDATE query

I've an issue with a very basic query using MariaDB 10.3. I'm updating thousands of path in a database with this code:
UPDATE il1_il8_localisation i SET i.`IL_17_CODE_PHOTO_1`="..\IPB\Photos\Foto1_261_ 3837.jpg" WHERE i.`IQ_1_NUMERO_DU_QUESTIONNAIRE`= 261;
and it fills the column IL_17_CODE_PHOTO_1 with the string
..IPBPhotosFoto1_ 261_ 3837.jpg instead of ..\IPB\Photos\Foto1_ 261_ 3837.jpg
I tried to change data sturcture from varchar(120) to TEXT with no results.
MariaDB uses the backslash character (\) as an escape character. From the linked article:
Backslash (\), if not used as an escape character, must always be escaped. When followed by a character that is not [a valid escape sequence], backslashes will simply be ignored.
Replace each single backslash with a double backslash (to escape the escape character):
UPDATE il1_il8_localisation i SET i.IL_17_CODE_PHOTO_1="..\\IPB\\Photos\\Foto1_ 261_ 3837.jpg" WHERE i.IQ_1_NUMERO_DU_QUESTIONNAIRE= 261;

substitute single backslash in R

I have read some questions and answers on this topic in stack overflow but still don't know how to solve this problem:
My purpose is to transform the file directory strings in windows explorer to the form which is recognizable in R, e.g. C:\Users\Public needs to be transformed to C:/Users/Public, basically the single back slash should be substituted with the forward slash. However the R couldn't store the original string "C:\Users\Public" because the \U and \P are deemed to be escape character.
dirTransformer <- function(str){
str.trns <- gsub("\\", "/", str)
return(str.trns)
}
str <- "C:\Users\Public"
dirTransformer(str)
> Error: '\U' used without hex digits in character string starting ""C:\U"
What I am actually writing is a GUI, where the end effect is, the user types or pastes the directory into a entry field, pushes a button and then the program will process it automatically.
Would someone please suggest to me how to solve this problem?
When you need to use a backslash in the string in R, you need to put double backslash. Also, when you use gsub("\\", "/", str), the first argument is parsed as a regex, and it is not valid as it only contains a single literal backslash that must escape something. In fact, you need to make gsub treat it as a plain text with fixed=TRUE.
However, you might want to use normalizePath, see this SO thread.
dirTransformer <- function(str){
str.trns <- gsub("\\\\", "/", str)
return(str.trns)
}
str <- readline()
C:\Users\Public
dirTransformer(str)
I'm not sure how you intend the user to input the path into the GUI, but when using readline() and then typing C:\Users\Public unquoted, R reads that in as:
> str
[1] "C:\\Users\\Public"
We then want to replace "\\" with "/", but to escape the "\\" we need "\\\\" in the gsub.
I can't be sure how the input from the user is going to be read into R in your GUI, but R will most likely escape the \s in the string like it does when using the readline example. the string you're trying to create "C:\Users\Public" wouldn't normally happen.

Searching a backslash in a string received from external source

I have a string I received from my DB, so in R it looks like:
a <- c("www", "x", "yes", "\303\243")
> a
[1] "www" "x" "yes" "ã"
What I want to do is to find which of the elements has backslash in it.
I tried:
grepl('\\',a[4])
But I keep getting the error
invalid regular expression '\', reason 'Trailing backslash'
no matter whether I use cat or fixed=T.
How do I find that backslash in the list?
You need to escape the backslash twice, once for the String literal in R and once for the regular Expression. grepl("\\", a[4]) applies the regexp \, while grepl("\\\\", a[4]) applies the regexp \\. To view the escaped string literal you can use cat("\\").
But i think your string does not contain any backslash at all, because in the definition the backslash occurs in an escape sequence, not as a character itself.

Escaping backslash (\) in string or paths in R

Windows copies path with backslash \, which R does not accept. So, I wanted to write a function which would convert \ to /. For example:
chartr0 <- function(foo) chartr('\','\\/',foo)
Then use chartr0 as...
source(chartr0('E:\RStuff\test.r'))
But chartr0 is not working. I guess, I am unable to escape /. I guess escaping / may be important in many other occasions.
Also, is it possible to avoid the use chartr0 every time, but convert all path automatically by creating an environment in R which calls chartr0 or use some kind of temporary use like using options
From R 4.0.0 you can use r"(...)" to write a path as raw string constant, which avoids the need for escaping:
r"(E:\RStuff\test.r)"
# [1] "E:\\RStuff\\test.r"
There is a new syntax for specifying raw character constants similar to the one used in C++: r"(...)" with ... any character sequence not containing the sequence )". This makes it easier to write strings that contain backslashes or both single and double quotes. For more details see ?Quotes.
Your fundamental problem is that R will signal an error condition as soon as it sees a single back-slash before any character other than a few lower-case letters, backslashes themselves, quotes or some conventions for entering octal, hex or Unicode sequences. That is because the interpreter sees the back-slash as a message to "escape" the usual translation of characters and do something else. If you want a single back-slash in your character element you need to type 2 backslashes. That will create one backslash:
nchar("\\")
#[1] 1
The "Character vectors" section of _Intro_to_R_ says:
"Character strings are entered using either matching double (") or single (') quotes, but are printed using double quotes (or sometimes without quotes). They use C-style escape sequences, using \ as the escape character, so \ is entered and printed as \, and inside double quotes " is entered as \". Other useful escape sequences are \n, newline, \t, tab and \b, backspace—see ?Quotes for a full list."
?Quotes
chartr0 <- function(foo) chartr('\\','/',foo)
chartr0('E:\\RStuff\\test.r')
You cannot write E:\Rxxxx, because R believes R is escaped.
The problem is that every single forward slash and backslash in your code is escaped incorrectly, resulting in either an invalid string or the wrong string being used. You need to read up on which characters need to be escaped and how. Take a look at the list of escape sequences in the link below. Anything not listed there (such as the forward slash) is treated literally and does not require any escaping.
http://cran.r-project.org/doc/manuals/R-lang.html#Literal-constants

Where can I find documentation on escape characters like "\"

I'd like to gain a better understanding of escape character sequences in R. I've tried searching for things like ?'\' but, that escapes itself and ?'\\'
I'd like to avoid this kind of behaviour with cat(). For example:
cat("\")
+
Versus:
cat("\\")
\
The help page you are looking for is ?Quotes (with the capital Q). String literal syntax is also described (less clearly IMHO) at http://cran.r-project.org/doc/manuals/R-lang.html#Literal-constants.
The backslash escape works very nearly the same as it does in C and all the other languages that borrowed backslash escapes from C -- \n inserts a newline, \\ inserts a single backslash, \" in a double quoted string prevents the " from ending the string, etc.

Resources