fread escapes quotes when not necessary - r

I'm reading a csv file with quoted fields using the fread function. In some of the fields escaped quotes (\") appear. I don't understand why the fread function escapes these quotes that are already escaped.
I reproduce the behavior with a simple example. I created a file with a single line and a single field:
"Hello \"World\" "
If I run the following R command:
table <- fread(input = "/tmp/quoteprova.csv", header=FALSE, sep = "\t")
the table variable will look like this:
V1
1: Hello \\"World\\"
I would expect instead this result:
V1
1: Hello \"World\"
Am I missing to specify some options in order to get the expected behavior?

You are geting what you want. \\" is two characters: a normal character \ and a ". Because \ is used to escape special characters and \* would be interpreted as a special character that are escaped with \. Thefore the additional \ (the first one) here will tell you that the second \ is not used to escape " and should be treated as is.
see this example:
> nchar('\\"')
[1] 2
> nchar('\"')
[1] 1
also this R faq

Related

r - Remove single backslash

I have a string with backslash and I want to remove them.
test = "m \"#\""
Have tried the following but none works :
gsub( "\\\\", "", test )
gsub( "\\\\", "", test, fixed = T )
gsub( "\\", "", test )
gsub( "\\", "", test, fixed = T )
Have looked into similar questions but none of the solutions work.
Replace single backslash in R
Remove Single Backslash String R
Edit : Actually this text is going to be passed in system() function to run a mosquitto client. User will give various parameters as input and the command will be created up on the fly.
The full command looks like this : mosquitto_sub -h test.mosquitto.org -q 0 -k 60 -t \"#\"
However it is expected to be like this : mosquitto_sub -h test.mosquitto.org -q 0 -k 60 -t "#"
Otherwise system() does not take it. Hence is the requirement to remove the backslaches.
The parameters 0 and 60 and # are supplied by user. Hence using paste0() to make this string. After the string is created the backslashes comes up.
The string given in text here is to create a reproducible and short example here.
I think JvdV is right, I don't think the string "m "#"" can exist within R without the backslashes. The backslash makes " and # characters rather than acting as open/close quotation marks and a comment mark respectively.
If you had """" you'd get an error as you have a set of quotation marks within another set which is not possible. The same might occur for "m "#"". However if you input "m "#"" the hash acts a comment symbol, making everything after it a comment and you get "m ". You need backslashes to make " not a quotation mark but a character, and # not a comment symbol but a character.
There is no backslash in your string
grepl("\\", test, fixed = TRUE)
# FALSE

gsub remove backslash and numbers from string [duplicate]

I'm writing strings which contain backslashes (\) to a file:
x1 = "\\str"
x2 = "\\\str"
# Error: '\s' is an unrecognized escape in character string starting "\\\s"
x2="\\\\str"
write(file = 'test', c(x1, x2))
When I open the file named test, I see this:
\str
\\str
If I want to get a string containing 5 backslashes, should I write 10 backslashes, like this?
x = "\\\\\\\\\\str"
[...] If I want to get a string containing 5 \ ,should i write 10 \ [...]
Yes, you should. To write a single \ in a string, you write it as "\\".
This is because the \ is a special character, reserved to escape the character that follows it. (Perhaps you recognize \n as newline.) It's also useful if you want to write a string containing a single ". You write it as "\"".
The reason why \\\str is invalid, is because it's interpreted as \\ (which corresponds to a single \) followed by \s, which is not valid, since "escaped s" has no meaning.
Have a read of this section about character vectors.
In essence, it says that when you enter character string literals you enclose them in a pair of quotes (" or '). Inside those quotes, you can create special characters using \ as an escape character.
For example, \n denotes new line or \" can be used to enter a " without R thinking it's the end of the string. Since \ is an escape character, you need a way to enter an actual . This is done by using \\. Escaping the escape!
Note that the doubling of backslashes is because you are entering the string at the command line and the string is first parsed by the R parser. You can enter strings in different ways, some of which don't need the doubling. For example:
> tmp <- scan(what='')
1: \\\\\str
2:
Read 1 item
> print(tmp)
[1] "\\\\\\\\\\str"
> cat(tmp, '\n')
\\\\\str
>

Difference between \\ vs \ backreference for regex in r [duplicate]

I'm writing strings which contain backslashes (\) to a file:
x1 = "\\str"
x2 = "\\\str"
# Error: '\s' is an unrecognized escape in character string starting "\\\s"
x2="\\\\str"
write(file = 'test', c(x1, x2))
When I open the file named test, I see this:
\str
\\str
If I want to get a string containing 5 backslashes, should I write 10 backslashes, like this?
x = "\\\\\\\\\\str"
[...] If I want to get a string containing 5 \ ,should i write 10 \ [...]
Yes, you should. To write a single \ in a string, you write it as "\\".
This is because the \ is a special character, reserved to escape the character that follows it. (Perhaps you recognize \n as newline.) It's also useful if you want to write a string containing a single ". You write it as "\"".
The reason why \\\str is invalid, is because it's interpreted as \\ (which corresponds to a single \) followed by \s, which is not valid, since "escaped s" has no meaning.
Have a read of this section about character vectors.
In essence, it says that when you enter character string literals you enclose them in a pair of quotes (" or '). Inside those quotes, you can create special characters using \ as an escape character.
For example, \n denotes new line or \" can be used to enter a " without R thinking it's the end of the string. Since \ is an escape character, you need a way to enter an actual . This is done by using \\. Escaping the escape!
Note that the doubling of backslashes is because you are entering the string at the command line and the string is first parsed by the R parser. You can enter strings in different ways, some of which don't need the doubling. For example:
> tmp <- scan(what='')
1: \\\\\str
2:
Read 1 item
> print(tmp)
[1] "\\\\\\\\\\str"
> cat(tmp, '\n')
\\\\\str
>

How to write "\" character using cat() in R?

I am trying to use the cat() function in R to write data to a file. I would like to write a "\" character to the output, but it seems that the cat() function interprets this as a formatting command. Any ideas on how I can write this in the middle of formatting commands (e.g. "\t\t\t \ \n")?
In R, because \ is a metacharacter you need to use \\ to print a single backslash in cat(). One is an escape character. This can easily be verified by calling cat("\\"),
Here are a few examples:
> cat("a\nb\tc") ## standard output
a
b c
> cat("a\\nb\\tc") ## prints the control characters in the string
a\nb\tc
> cat("a\\nb\\t\\c") ## prints the control characters in the string,
a\nb\t\c ## and one backslash before "c"
> cat("a\tb\tc\t\\\nd") ## read as "a<tab>b<tab>c<tab>\<newline>d"
a b c \
d
Also, I've found this wikibooks link to be quite useful for learning about text processing with R.

How to escape backslashes in R string

I'm writing strings which contain backslashes (\) to a file:
x1 = "\\str"
x2 = "\\\str"
# Error: '\s' is an unrecognized escape in character string starting "\\\s"
x2="\\\\str"
write(file = 'test', c(x1, x2))
When I open the file named test, I see this:
\str
\\str
If I want to get a string containing 5 backslashes, should I write 10 backslashes, like this?
x = "\\\\\\\\\\str"
[...] If I want to get a string containing 5 \ ,should i write 10 \ [...]
Yes, you should. To write a single \ in a string, you write it as "\\".
This is because the \ is a special character, reserved to escape the character that follows it. (Perhaps you recognize \n as newline.) It's also useful if you want to write a string containing a single ". You write it as "\"".
The reason why \\\str is invalid, is because it's interpreted as \\ (which corresponds to a single \) followed by \s, which is not valid, since "escaped s" has no meaning.
Have a read of this section about character vectors.
In essence, it says that when you enter character string literals you enclose them in a pair of quotes (" or '). Inside those quotes, you can create special characters using \ as an escape character.
For example, \n denotes new line or \" can be used to enter a " without R thinking it's the end of the string. Since \ is an escape character, you need a way to enter an actual . This is done by using \\. Escaping the escape!
Note that the doubling of backslashes is because you are entering the string at the command line and the string is first parsed by the R parser. You can enter strings in different ways, some of which don't need the doubling. For example:
> tmp <- scan(what='')
1: \\\\\str
2:
Read 1 item
> print(tmp)
[1] "\\\\\\\\\\str"
> cat(tmp, '\n')
\\\\\str
>

Resources