How to pass a chr variable into r"(...)"? - r

I've seen that since 4.0.0, R supports raw strings using the syntax r"(...)". Thus, I could do:
r"(C:\THIS\IS\MY\PATH\TO\FILE.CSV)"
#> [1] "C:\\THIS\\IS\\MY\\PATH\\TO\\FILE.CSV"
While this is great, I can't figure out how to make this work with a variable, or better yet with a function. See this comment which I believe is asking the same question.
This one can't even be evaluated:
construct_path <- function(my_path) {
r"my_path"
}
Error: malformed raw string literal at line 2
}
Error: unexpected '}' in "}"
Nor this attempt:
construct_path_2 <- function(my_path) {
paste0(r, my_path)
}
construct_path_2("(C:\THIS\IS\MY\PATH\TO\FILE.CSV)")
Error: '\T' is an unrecognized escape in character string starting ""(C:\T"
Desired output
# pseudo-code
my_path <- "C:\THIS\IS\MY\PATH\TO\FILE.CSV"
construct_path(path)
#> [1] "C:\\THIS\\IS\\MY\\PATH\\TO\\FILE.CSV"
EDIT
In light of #KU99's comment, I want to add the context to the problem. I'm writing an R script to be run from command-line using WIndows's CMD and Rscript. I want to let the user who executes my R script to provide an argument where they want the script's output to be written to. And since Windows's CMD accepts paths in the format of C:\THIS\IS\MY\PATH\TO, then I want to be consistent with that format as the input to my R script. So ultimately I want to take that path input and convert it to a path format that is easy to work with inside R. I thought that the r"()" thing could be a proper solution.

I think you're getting confused about what the string literal syntax does. It just says "don't try to escape any of the following characters". For external inputs like text input or files, none of this matters.
For example, if you run this code
path <- readline("> enter path: ")
You will get this prompt:
> enter path:
and if you type in your (unescaped) path:
> enter path: C:\Windows\Dir
You get no error, and your variable is stored appropriately:
path
#> [1] "C:\\Windows\\Dir"
This is not in any special format that R uses, it is plain text. The backslashes are printed in this way to avoid ambiguity but they are "really" just single backslashes, as you can see by doing
cat(path)
#> C:\Windows\Dir
The string literal syntax is only useful for shortening what you need to type. There would be no point in trying to get it to do anything else, and we need to remember that it is a feature of the R interpreter - it is not a function nor is there any way to get R to use the string literal syntax dynamically in the way you are attempting. Even if you could, it would be a long way for a shortcut.

Related

Comparing the MD5 sum of a string to the contents of a file

I am trying to compare a string (in memory) to the contents of a file to see if they are the same. Boring details on motivation are below the question if anyone cares.
My confusion is that when I hash file contents, I get a different result than when I hash the string.
library(readr)
library(digest)
# write the string to the file
the_string <- "here is some stuff"
the_file <- "fake.txt"
readr::write_lines(the_string, the_file)
# both of these functions (predictably) give the same hash
tools::md5sum(the_file)
# "44b0350ee9f822d10f2f9ca7dbe54398"
digest(file = the_file)
# "44b0350ee9f822d10f2f9ca7dbe54398"
# now read it back to a string and get something different
back_to_a_string <- readr::read_file(the_file)
# "here is some stuff\n"
digest(back_to_a_string)
# "03ed1c8a2b997277100399bef6f88939"
# add a newline because that's what write_lines did
orig_with_newline <- paste0(the_string, "\n")
# "here is some stuff\n"
digest(orig_with_newline)
# "03ed1c8a2b997277100399bef6f88939"
What I want to do is just digest(orig_with_newline) == digest(file = the_file) to see if they're the same (they are) but that returns FALSE because, as shown, the hashes are different.
Obviously I could either read the file back to a string with read_file or write the string to a temp file, but both of those seem a bit silly and hacky. I guess both of those are actually fine solutions, I really just want to understand why this is happening so that I can better understand how the hashing works.
Boring details on motivation
The situation is that I have a function that will write a string to a file, but if the file already exists then it will error unless the user has explicitly passed .overwrite = TRUE. However, if the file exists, I would like to check whether the string about to be written to the file is in fact the same thing that's already in the file. If this is the case, then I will skip the error (and the write). This code could be called in a loop and it will be obnoxious for the user to continually see this error that they are about to overwrite a file with the same thing that's already in it.
Short answer: I think you need to set serialize=FALSE. Supposing that the file doesn't contain the extra newline (see below),
digest(the_string,serialize=FALSE) == digest(file=the_file) ## TRUE
(serialize has no effect on the file= version of the command)
dealing with newlines
If you read ?write_lines, it only says
sep: The line separator ... [information about defaults for different OSes]
To me, this seems ambiguous as to whether the separator will be added after the last line or not. (You don't expect a "comma-separated list" to end with a comma ...)
On the other hand, ?base::writeLines is a little more explicit,
sep: character string. A string to be written to the connection
after each line of text.
If you dig down into the source code of readr you can see that it uses
output << na << sep;
for each line of code, i.e. it's behaving the same way as writeLines.
If you really just want to write the string to the file with no added nonsense, I suggest cat():
identical(the_string, { cat(the_string,file=the_file); readr::read_file(the_file) }) ## TRUE

How to parse #{TEST TAGS} into only the Tags, eliminating current formatting?

Situation.. I have two tags defined, then I try to output them to the console. What comes out seems to be similar to an array, but I'd like to remove the formatting and just have the actual words outputted.
Here's what I currently have:
[Tags] ready ver10
Log To Console \n#{TEST TAGS}
And the result is
['ready', 'ver10']
So, how would I chuck the [', the ', ' and the '], thus only retaining the words ready and ver10?
Note: I was getting [u'ready', u'ver10'] - but once I got some advice to make sure I was running Python3 RobotFramework - after uninstalling robotframework via pip, and now only having robotframework installed via pip3, the u has vanished. That's great!
There are several ways to do it. For example, you could use a loop, or you could convert the list to a string before calling log to console
Using a loop.
Since the data is a list, it's easy to iterate over the list:
FOR ${tag} IN #{Test Tags}
log to console ${tag}
END
Converting to a string
You can use the evaluate keyword to convert the list to a string of values separated by a newline. Note: you have to use two backslashes in the call to evaluate since both robot and python use the backslash as an escape character. So, the first backslash escapes the second so that python will see \n and convert it to a newline.
${tags}= evaluate "\\n".join($test_tags)
log to console \n${tags}

How to process latex commands in R?

I work with knitr() and I wish to transform inline Latex commands like "\label" and "\ref", depending on the output target (Latex or HTML).
In order to do that, I need to (programmatically) generate valid R strings that correctly represent the backslash: for example "\label" should become "\\label". The goal would be to replace all backslashes in a text fragment with double-backslashes.
but it seems that I cannot even read these strings, let alone process them: if I define:
okstr <- function(str) "do something"
then when I call
okstr("\label")
I directly get an error "unrecognized escape sequence"
(of course, as \l is faultly)
So my question is : does anybody know a way to read strings (in R), without using the escaping mechanism ?
Yes, I know I could do it manually, but that's the point: I need to do it programmatically.
There are many questions that are close to this one, and I have spent some time browsing, but I have found none that yields a workable solution for this.
Best regards.
Inside R code, you need to adhere to R’s syntactic conventions. And since \ in strings is used as an escape character, it needs to form a valid escape sequence (and \l isn’t a valid escape sequence in R).
There is simply no way around this.
But if you are reading the string from elsewhere, e.g. using readLines, scan or any of the other file reading functions, you are already getting the correct string, and no handling is necessary.
Alternatively, if you absolutely want to write LaTeX-like commands in literal strings inside R, just use a different character for \; for instance, +. Just make sure that your function correctly handles it everywhere, and that you keep a way of getting a literal + back. Here’s a suggestion:
okstr("+label{1 ++ 2}")
The implementation of okstr then needs to replace single + by \, and double ++ by + (making the above result in \label{1 + 2}). But consider in which order this needs to happen, and how you’d like to treat more complex cases; for instance, what should the following yield: okstr("1 +++label")?

R customize error message when string contains unrecognized escape

I would like to give a more informative error message when users of my R functions supply a string with an unrecognized escape
my_string <- "sql\sql"
# Error: '\s' is an unrecognized escape in character string starting ""sql\s"
Something like this would be ideal.
my_string <- "sql\sql"
# Error: my_string contains an unrecognized escape. Try sql\\sql with double backslashes instead.
I have tried an if statement that looks for single backslashes
if (stringr::str_detect("sql\sql", "\")) stop("my error message")
but I get the same error.
Almost all of my users are Windows users running R 3.3 and up.
Code execution in R happens in two phases. First, R takes the raw string you enter and parses that into commands that can be run; then, R actually runs those commands. The parsing step makes sure what you've written actually makes sense as code. If it doesn't make any sense, then R can't even turn it into anything it can attempt to run.
The error message you are getting about the unrecognized escape sequence is happening at the parsing stage. That means R isn't really even attempting to execute the command, it just straight up can't understand what you are saying. There is no way to catch in error like this in code because there's no user code that's running at that point.
So if you are counting on your users writing code like my_string <- "something", then they need to write valid code. They can't change how strings are encoded or what the assignment operator looks like or how variables can be named. They also can't type !my_string! <=== %something% because R can't parse that either. R can't parse my_string <- "sql\sql" but it can parse my_string <- "sql\\sql" (slashes much be escaped in string literals). If they are not savy users, you might want to consider providing an alternative interface that can sanitize user input before trying to run it as code. Maybe make a shiny front end or have users pass arguments to your scripts via command line parameters.
If you're capturing your user input correctly, for a string input of\, R will store that in my_string as \\.
readline()
\
[1] "\\"
readline()
sql\sql
[1] "sql\\sql"
That means internally in R:
my_string <- "sql\\sql"
However
cat(my_string)
sql\sql
To check the input, you need to escape each escape, because you're looking for \\
stringr::str_detect(my_string, "\\\\")
Which returns TRUE if the input string is sql\sql. So the full line is:
if (stringr::str_detect("sql\\sql", "\\\\")) stop("my error message")

How can I keep toJSON from quoting my JSON string in R?

I am using OpenCPU and R to create a web API that takes in some inputs and returns a topoJSON file from a database, as well as some other information. OpenCPU automatically pushes the output through toJSON, which results in JSON output that has quoted JSON in it (i.e., the topoJSON). This is obviously not ideal--especially since it then gets incredibly cluttered with backticked quotes (\"). I tried using fromJSON to convert it to an R object, which could then be converted back (which is incredibly inefficient), but it returns a slightly different syntax and the result is that it doesn't work.
I feel like there should be some way to convert the string to some other type of object that results in toJSON calling a different handler that tells it to just leave it alone, but I can't figure out how to do that.
> s <- '{"type":"Topology","objects":{"map": "0"}}'
> fromJSON(s)
$type
[1] "Topology"
$objects
$objects$map
[1] "0"
> toJSON(fromJSON(s))
{"type":["Topology"],"objects":{"map":["0"]}}
That's just the beginning of the file (I replaced the actual map with "0"), and as you can see, brackets appeared around "Topology" and "0". Alternately, if I just keep it as a string, I end up with this mess:
> toJSON(s)
["{\"type\":\"Topology\",\"objects\":{\"0000595ab81ec4f34__csv\": \"0\"}}"]
Is there any way to fix this so that I just get the verbatim string but without quotes and backticks?
EDIT: Note that because I'm using OpenCPU, the output needs to come from toJSON (so no other function can be used, unfortunately), and I can't do any post-processing.
To it seems you just want the values rather than vectors. Set auto_unbox=TRUE to turn length-one vectors into scalar values
toJSON(fromJSON(s), auto_unbox = TRUE)
# {"type":"Topology","objects":{"map":"0"}}
That does print without escaping for me (using jsonlite_1.5). Maybe you are using an older version of jsonlite. You can also get around that by using cat() to print the result. You won't see the slashes when you do that.
cat(toJSON(fromJSON(s), auto_unbox = TRUE))
You can manually unbox the relevant entries:
library(jsonlite)
s <- '{"type":"Topology","objects":{"map": "0"}}'
j <- fromJSON(s)
j$type <- unbox(j$type)
j$objects$map <- unbox(j$objects$map)
toJSON(j)
# {"type":"Topology","objects":{"map":"0"}}

Resources