Removing '|' from object names? - r

I have created a data frame in R with the following name:
table_file1_C.txt|file2_C.txt
This name is was generated by the assign() function, in reference to a single .txt file that was generated by a program run on command line. Here is a sample from the loop that created this object:
assign(x=paste("table_",
dir(file.dir, pattern="\\.txt$")[i],
sep=''),
value=tmpTables[[i]])#tmpTables holds the data I'm manipulating, as read in from readHTMLtable
The issue is that I an unable to reference this object after its creation;
>table_file1_C.txt|file2_C.txt
Error: object 'file2_C.txt' not found
I believe that R is seeing the '|' character, and reading it as an instruction, not a part of the object's name, even though it already accepted it as part of the object's name.
So, I need to strip the | from the object's name. I planned to accomplish this with gsub() embedded within the assign() function, using something like this:
assign(x=paste("table_",#creating the name of the object
gsub(x=dir(file.dir, pattern="\\.txt$")[i],
pattern="|",
replacement="."),#need to remove the | characters!!
sep=''),
value=tmpTables[[i]])
However, this output gives something like this:
[1] ".t.a.b.l.e._.f.i.l.e.1...t.x.t.|.f.i.l.e.2...t.x.t."
As you can see, the name has been mangled, and the | has not actually been removed.
I need to find a way to remove the | from the name, so I can process the object that I have created. Or, prevent it from being included in the name in the first place. I can only do this within R, as I cannot modify the output of the program that I used to generate the data.
Does this make sense? Let me know if more information is needed. Thank you for taking the time to read this.

You need to escape the | character in the regular expression. Otherwise it is an empty pattern, which matches everything.
Escaping the character with brackets (character class):
x <- 'a|b'
gsub('[|]', '.', x)
## [1] "a.b"
Escaping with a backslash:
gsub('\\|', '.', x)
## [1] "a.b"
If you don't escape the | character, it is an "or" operation in the regular expression. Nothing or nothing, same as matching nothing. Thus it inserts the . between each character:
gsub('', '.', x)
## [1] ".a.|.b."
gsub('|', '.', x) # Same as above
## [1] ".a.|.b."

For some reason, escaping with ' ' as per Matthew Lundberg did not work correctly for me, but escaping with ` did.
> 'file1.txt|file2.txt'
[1] "file1.txt|file2.txt"
>`denovo_AR_C.txt|FOXA1_C.txt`
*data*
Thanks go to Matthew

Related

How to pass a chr variable into r"(...)"?

I've seen that since 4.0.0, R supports raw strings using the syntax r"(...)". Thus, I could do:
r"(C:\THIS\IS\MY\PATH\TO\FILE.CSV)"
#> [1] "C:\\THIS\\IS\\MY\\PATH\\TO\\FILE.CSV"
While this is great, I can't figure out how to make this work with a variable, or better yet with a function. See this comment which I believe is asking the same question.
This one can't even be evaluated:
construct_path <- function(my_path) {
r"my_path"
}
Error: malformed raw string literal at line 2
}
Error: unexpected '}' in "}"
Nor this attempt:
construct_path_2 <- function(my_path) {
paste0(r, my_path)
}
construct_path_2("(C:\THIS\IS\MY\PATH\TO\FILE.CSV)")
Error: '\T' is an unrecognized escape in character string starting ""(C:\T"
Desired output
# pseudo-code
my_path <- "C:\THIS\IS\MY\PATH\TO\FILE.CSV"
construct_path(path)
#> [1] "C:\\THIS\\IS\\MY\\PATH\\TO\\FILE.CSV"
EDIT
In light of #KU99's comment, I want to add the context to the problem. I'm writing an R script to be run from command-line using WIndows's CMD and Rscript. I want to let the user who executes my R script to provide an argument where they want the script's output to be written to. And since Windows's CMD accepts paths in the format of C:\THIS\IS\MY\PATH\TO, then I want to be consistent with that format as the input to my R script. So ultimately I want to take that path input and convert it to a path format that is easy to work with inside R. I thought that the r"()" thing could be a proper solution.
I think you're getting confused about what the string literal syntax does. It just says "don't try to escape any of the following characters". For external inputs like text input or files, none of this matters.
For example, if you run this code
path <- readline("> enter path: ")
You will get this prompt:
> enter path:
and if you type in your (unescaped) path:
> enter path: C:\Windows\Dir
You get no error, and your variable is stored appropriately:
path
#> [1] "C:\\Windows\\Dir"
This is not in any special format that R uses, it is plain text. The backslashes are printed in this way to avoid ambiguity but they are "really" just single backslashes, as you can see by doing
cat(path)
#> C:\Windows\Dir
The string literal syntax is only useful for shortening what you need to type. There would be no point in trying to get it to do anything else, and we need to remember that it is a feature of the R interpreter - it is not a function nor is there any way to get R to use the string literal syntax dynamically in the way you are attempting. Even if you could, it would be a long way for a shortcut.

How can I keep toJSON from quoting my JSON string in R?

I am using OpenCPU and R to create a web API that takes in some inputs and returns a topoJSON file from a database, as well as some other information. OpenCPU automatically pushes the output through toJSON, which results in JSON output that has quoted JSON in it (i.e., the topoJSON). This is obviously not ideal--especially since it then gets incredibly cluttered with backticked quotes (\"). I tried using fromJSON to convert it to an R object, which could then be converted back (which is incredibly inefficient), but it returns a slightly different syntax and the result is that it doesn't work.
I feel like there should be some way to convert the string to some other type of object that results in toJSON calling a different handler that tells it to just leave it alone, but I can't figure out how to do that.
> s <- '{"type":"Topology","objects":{"map": "0"}}'
> fromJSON(s)
$type
[1] "Topology"
$objects
$objects$map
[1] "0"
> toJSON(fromJSON(s))
{"type":["Topology"],"objects":{"map":["0"]}}
That's just the beginning of the file (I replaced the actual map with "0"), and as you can see, brackets appeared around "Topology" and "0". Alternately, if I just keep it as a string, I end up with this mess:
> toJSON(s)
["{\"type\":\"Topology\",\"objects\":{\"0000595ab81ec4f34__csv\": \"0\"}}"]
Is there any way to fix this so that I just get the verbatim string but without quotes and backticks?
EDIT: Note that because I'm using OpenCPU, the output needs to come from toJSON (so no other function can be used, unfortunately), and I can't do any post-processing.
To it seems you just want the values rather than vectors. Set auto_unbox=TRUE to turn length-one vectors into scalar values
toJSON(fromJSON(s), auto_unbox = TRUE)
# {"type":"Topology","objects":{"map":"0"}}
That does print without escaping for me (using jsonlite_1.5). Maybe you are using an older version of jsonlite. You can also get around that by using cat() to print the result. You won't see the slashes when you do that.
cat(toJSON(fromJSON(s), auto_unbox = TRUE))
You can manually unbox the relevant entries:
library(jsonlite)
s <- '{"type":"Topology","objects":{"map": "0"}}'
j <- fromJSON(s)
j$type <- unbox(j$type)
j$objects$map <- unbox(j$objects$map)
toJSON(j)
# {"type":"Topology","objects":{"map":"0"}}

How to paste special characters in R [duplicate]

This question already has an answer here:
paste quotation marks into character string, within a loop
(1 answer)
Closed 5 years ago.
I have started learning R and am trying to create vector as below:
c(""check"")
I need the output as : "check". But am getting syntax error. How to escape the quotes while creating a vector?
As #juba mentioned, one way is directly escaping the quotes.
Another way is to use single quotes around your character expression that has double quotes in it.
> x <- 'say "Hello!"'
> x
[1] "say \"Hello!\""
> cat(x)
say "Hello!"
Other answers nicely show how to deal with double quotes in your character strings when you create a vector, which was indeed the last thing you asked in your question. But given that you also mentioned display and output, you might want to keep dQuote in mind. It's useful if you want to surround each element of a character vector with double quotes, particularly if you don't have a specific need or desire to store the quotes in the actual character vector itself.
# default is to use "fancy quotes"
text <- c("check")
message(dQuote(text))
## “check”
# switch to straight quotes by setting an option
options(useFancyQuotes = FALSE)
message(dQuote(text))
## "check"
# assign result to create a vector of quoted character strings
text.quoted <- dQuote(text)
message(text.quoted)
## "check"
For what it's worth, the sQuote function does the same thing with single quotes.
Use a backslash :
x <- "say \"Hello!\""
And you don't need to use c if you don't build a vector.
If you want to output quotes unescaped, you may need to use cat instead of print :
R> cat(x)
say "Hello!"

How to modify i in an R loop?

I have several large R objects saved as .RData files: "this.RData", "that.RData", "andTheOther.RData" and so on. I don't have enough memory, so I want to load each in a loop, extract some rows, and unload it. However, once I load(i), I need to strip the ".RData" part of (i) before I can do anything with objects "this", "that", "andTheOther". I want to do the opposite of what is described in How to iterate over file names in a R script? How can I do that? Thx
Edit: I omitted to mention the files are not in the working directory and have a filepath as well. I came across Getting filename without extension in R and file_path_sans_ext takes out the extension but the rest of the path is still there.
Do you mean something like this?
i <- c("/path/to/this.RDat", "/another/path/to/that.RDat")
f <- gsub(".*/([^/]+)", "\\1", i)
f1 <- gsub("\\.RDat", "", f)
f1
[1] "this" "that"
On windows' paths you have to use "\\" instead of "/"
Edit: Explanation. Technically, these are called "regular
expressions" (regexps), not "patterns".
. any character
.* arbitrary number (including 0) of any kind of characters
.*/ arbitrary number of any kind of characters, followed by a
/
[^/] any character but not /
[^/]+ arbitrary number (1 or more) of any kind of characters,
but not /
( and ) enclose groups. You can use the groups when
replacing as \\1, \\2 etc.
So, look for any kind of character, followed by /, followed by
anything but not the path separator. Replace this with the "anything
but not separator".
There are many good tutorials for regexps, just look for it.
A simple way to do this using would be to extract the base name from the filepaths with base::basename() and then remove the file extension with tools::file_path_sans_ext().
paths_to_files <- c("./path/to/this.RData", "./another/path/to/that.RData")
tools::file_path_sans_ext(
basename(
paths_to_files
)
)
## Returns:
## [1] "this" "that"

skip lines in reading files using reg ex

i have files with similar contents
!software version: $Revision$
!date: 07/06/2016 $
!
! from Mouse Genome Database (MGD) & Gene Expression Database (GXD)
!
MGI
I am using read.csv to read the files. But I need to skip the lines with "!" in the beginning. How can I do that?
The read.csv function and read.table that it is based on have an argument called comment.char which can be used to specify a character that if seen will ignore the rest of that line. Setting that to "!" may be enough to do what you want.
If you really need a regular expression, then the best approach is to read the file using readLines (or similar function), then apply the regular expression to the resulting vector of character strings to drop to unwanted elements (rows), then pass the result to the text argument to read.table (or use a text connection).
To calculate the first line that doesn't start with a !,
to_skip <- min(grep('^[^!]', trimws(readLines('file.csv'))))
df <- read.csv('file.csv', skip = to_skip)

Resources