Importing, editing and saving JSON in a simple way? - r

I'm having problems with editing a JSON file and saving the results in a usable form.
My starting point is : modify-json-file-using-some-condition-and-create-new-json-file-in-r
In fact I want to do something even simpler and it still doesn't work! I'm using the jsonlite package.
An equivalent sample would look like this ...
$Apples
$Apples$Origin
$Apples$Origin$Id
[1] 2615
$Apples$Origin$season
[1] "Fall"
$Oranges
$Oranges$Origin
$Oranges$Origin$Id
[1] 2615
$Oranges$Origin$airportLabel
[1] "Orange airport"
$Oranges$Shipping
$Oranges$Shipping$ShipperId
[1] 123
$Oranges$Shipping$ShipperLabel
[1] "Brighter Orange"
I read the file, make some changes and save the resulting file back to HDD. Nothing simpler right?
json_list = read_json(path = "../documents/dummy.json")
json_list$Apples$Origin$Id = 1234
json_list$Oranges$Origin$Id = 4567
json_list$Oranges$Shipping$ShipperLabel = "Suntan Blue"
json_modified <- toJSON(json_list, pretty = TRUE)
write_json(json_modified, path = "../documents/dummy_new.json")
json_list appears as character format under the Rstudio file type column.
json_modified appears as json format under the Rstudio file type column.
Why this difference?
Now if I run the original file it works but the modified file fails. The JSON format checks out and I can't see any errors.
The real file is bigger than the example above but the method I've used is the same.
Am I doing something wrong in the way I edit or save the file?
I'm really new to JSON and this is really frustrating!
Any Ideas?
Thanks

In the absence of reproducible data, I can diagnose at least one potential problem.
Background
Within the jsonlite package, there exist functions that are mutual inverses:
jsonlite::fromJSON() converts from raw text (in JSON format) to R objects.
jsonlite::toJSON() converts from R objects to raw text (in JSON format).
Now this raw text (txt) might be
a JSON string, URL or file
As for jsonlite::read_json() and jsonlite::write_json(), they are also a pair of mutual inverses, which are like the former pair
except [that] they explicitly distinguish between path and literal input, and do not simplify by default.
That is, the latter are simply designed to handle file(path)s rather than strings of raw text.
So toJSON(fromJSON(txt = ...)) should return unchanged the text passed to txt, just as write_json(read_json(path = ...)) should write a file identical to that passed to path.
In short, toJSON() belongs with fromJSON(); while write_json() belongs with read_json().
The Problem
However, you have added a spurious step by mingling toJSON() with read_json() and write_json():
json_list = read_json(...)
# ...
json_modified <- toJSON(json_list, ...) # SPURIOUS STEP
# ...
write_json(json_modified, ...)
You see, write_json() already converts "to JSON", so toJSON() is wholly unnecessary. Indeed, toJSON() actually sabotages the process, since its textual return value is passed on (in json_modified) to write_json(), which expects (a structure of) R objects rather than text.
The Fix
Once you're done modifying json_list, just go straight to writing it:
json_list = read_json(path = "../documents/dummy.json")
json_list$Apples$Origin$Id = 1234
# Further modifications...
write_json(json_list, path = "../documents/dummy_new.json", pretty = TRUE)

Related

How to pass a chr variable into r"(...)"?

I've seen that since 4.0.0, R supports raw strings using the syntax r"(...)". Thus, I could do:
r"(C:\THIS\IS\MY\PATH\TO\FILE.CSV)"
#> [1] "C:\\THIS\\IS\\MY\\PATH\\TO\\FILE.CSV"
While this is great, I can't figure out how to make this work with a variable, or better yet with a function. See this comment which I believe is asking the same question.
This one can't even be evaluated:
construct_path <- function(my_path) {
r"my_path"
}
Error: malformed raw string literal at line 2
}
Error: unexpected '}' in "}"
Nor this attempt:
construct_path_2 <- function(my_path) {
paste0(r, my_path)
}
construct_path_2("(C:\THIS\IS\MY\PATH\TO\FILE.CSV)")
Error: '\T' is an unrecognized escape in character string starting ""(C:\T"
Desired output
# pseudo-code
my_path <- "C:\THIS\IS\MY\PATH\TO\FILE.CSV"
construct_path(path)
#> [1] "C:\\THIS\\IS\\MY\\PATH\\TO\\FILE.CSV"
EDIT
In light of #KU99's comment, I want to add the context to the problem. I'm writing an R script to be run from command-line using WIndows's CMD and Rscript. I want to let the user who executes my R script to provide an argument where they want the script's output to be written to. And since Windows's CMD accepts paths in the format of C:\THIS\IS\MY\PATH\TO, then I want to be consistent with that format as the input to my R script. So ultimately I want to take that path input and convert it to a path format that is easy to work with inside R. I thought that the r"()" thing could be a proper solution.
I think you're getting confused about what the string literal syntax does. It just says "don't try to escape any of the following characters". For external inputs like text input or files, none of this matters.
For example, if you run this code
path <- readline("> enter path: ")
You will get this prompt:
> enter path:
and if you type in your (unescaped) path:
> enter path: C:\Windows\Dir
You get no error, and your variable is stored appropriately:
path
#> [1] "C:\\Windows\\Dir"
This is not in any special format that R uses, it is plain text. The backslashes are printed in this way to avoid ambiguity but they are "really" just single backslashes, as you can see by doing
cat(path)
#> C:\Windows\Dir
The string literal syntax is only useful for shortening what you need to type. There would be no point in trying to get it to do anything else, and we need to remember that it is a feature of the R interpreter - it is not a function nor is there any way to get R to use the string literal syntax dynamically in the way you are attempting. Even if you could, it would be a long way for a shortcut.

Comparing the MD5 sum of a string to the contents of a file

I am trying to compare a string (in memory) to the contents of a file to see if they are the same. Boring details on motivation are below the question if anyone cares.
My confusion is that when I hash file contents, I get a different result than when I hash the string.
library(readr)
library(digest)
# write the string to the file
the_string <- "here is some stuff"
the_file <- "fake.txt"
readr::write_lines(the_string, the_file)
# both of these functions (predictably) give the same hash
tools::md5sum(the_file)
# "44b0350ee9f822d10f2f9ca7dbe54398"
digest(file = the_file)
# "44b0350ee9f822d10f2f9ca7dbe54398"
# now read it back to a string and get something different
back_to_a_string <- readr::read_file(the_file)
# "here is some stuff\n"
digest(back_to_a_string)
# "03ed1c8a2b997277100399bef6f88939"
# add a newline because that's what write_lines did
orig_with_newline <- paste0(the_string, "\n")
# "here is some stuff\n"
digest(orig_with_newline)
# "03ed1c8a2b997277100399bef6f88939"
What I want to do is just digest(orig_with_newline) == digest(file = the_file) to see if they're the same (they are) but that returns FALSE because, as shown, the hashes are different.
Obviously I could either read the file back to a string with read_file or write the string to a temp file, but both of those seem a bit silly and hacky. I guess both of those are actually fine solutions, I really just want to understand why this is happening so that I can better understand how the hashing works.
Boring details on motivation
The situation is that I have a function that will write a string to a file, but if the file already exists then it will error unless the user has explicitly passed .overwrite = TRUE. However, if the file exists, I would like to check whether the string about to be written to the file is in fact the same thing that's already in the file. If this is the case, then I will skip the error (and the write). This code could be called in a loop and it will be obnoxious for the user to continually see this error that they are about to overwrite a file with the same thing that's already in it.
Short answer: I think you need to set serialize=FALSE. Supposing that the file doesn't contain the extra newline (see below),
digest(the_string,serialize=FALSE) == digest(file=the_file) ## TRUE
(serialize has no effect on the file= version of the command)
dealing with newlines
If you read ?write_lines, it only says
sep: The line separator ... [information about defaults for different OSes]
To me, this seems ambiguous as to whether the separator will be added after the last line or not. (You don't expect a "comma-separated list" to end with a comma ...)
On the other hand, ?base::writeLines is a little more explicit,
sep: character string. A string to be written to the connection
after each line of text.
If you dig down into the source code of readr you can see that it uses
output << na << sep;
for each line of code, i.e. it's behaving the same way as writeLines.
If you really just want to write the string to the file with no added nonsense, I suggest cat():
identical(the_string, { cat(the_string,file=the_file); readr::read_file(the_file) }) ## TRUE

How can I keep toJSON from quoting my JSON string in R?

I am using OpenCPU and R to create a web API that takes in some inputs and returns a topoJSON file from a database, as well as some other information. OpenCPU automatically pushes the output through toJSON, which results in JSON output that has quoted JSON in it (i.e., the topoJSON). This is obviously not ideal--especially since it then gets incredibly cluttered with backticked quotes (\"). I tried using fromJSON to convert it to an R object, which could then be converted back (which is incredibly inefficient), but it returns a slightly different syntax and the result is that it doesn't work.
I feel like there should be some way to convert the string to some other type of object that results in toJSON calling a different handler that tells it to just leave it alone, but I can't figure out how to do that.
> s <- '{"type":"Topology","objects":{"map": "0"}}'
> fromJSON(s)
$type
[1] "Topology"
$objects
$objects$map
[1] "0"
> toJSON(fromJSON(s))
{"type":["Topology"],"objects":{"map":["0"]}}
That's just the beginning of the file (I replaced the actual map with "0"), and as you can see, brackets appeared around "Topology" and "0". Alternately, if I just keep it as a string, I end up with this mess:
> toJSON(s)
["{\"type\":\"Topology\",\"objects\":{\"0000595ab81ec4f34__csv\": \"0\"}}"]
Is there any way to fix this so that I just get the verbatim string but without quotes and backticks?
EDIT: Note that because I'm using OpenCPU, the output needs to come from toJSON (so no other function can be used, unfortunately), and I can't do any post-processing.
To it seems you just want the values rather than vectors. Set auto_unbox=TRUE to turn length-one vectors into scalar values
toJSON(fromJSON(s), auto_unbox = TRUE)
# {"type":"Topology","objects":{"map":"0"}}
That does print without escaping for me (using jsonlite_1.5). Maybe you are using an older version of jsonlite. You can also get around that by using cat() to print the result. You won't see the slashes when you do that.
cat(toJSON(fromJSON(s), auto_unbox = TRUE))
You can manually unbox the relevant entries:
library(jsonlite)
s <- '{"type":"Topology","objects":{"map": "0"}}'
j <- fromJSON(s)
j$type <- unbox(j$type)
j$objects$map <- unbox(j$objects$map)
toJSON(j)
# {"type":"Topology","objects":{"map":"0"}}

R - write/read data frame to file including attributes

I have a data frame with somewhat difficult-to-understand feature names, so I have an attribute that gives more description for each column. However, if I write this file using write.csv the attribute doesn't get included in the file. The function write.arff does actually write something called "attribute" to file, but it's just additional column information like data type and factor labels/levels. When reading the file back in to R with read.arff, it doesn't appear that this additional info is brought back in with the rest of the data.
Is there a way to write a data frame to standard file type (csv, xml, fixed-width text, tab-separated) with its attributes intact?
I understand that there are functions like save and dput, but I need the body data to be able to be read by other programs too, like Excel, KNIME, or SPSS. The attributes I'll probably be using only in R, but if possible I'd like to keep them in the same file with the data. Thanks!
Here's an example of the attribute:
> attr(hipparcos, 'desc')
[1] "Identifier (HIP number) (H1)" "*[HT] Proximity flag (H2)"
[3] "? Magnitude in Johnson V (H5)" "*[1,3]? Coarse variability flag (H6)"
[5] "*[GHT] Source of magnitude (H7)" "*? alpha, degrees (ICRS, Epoch=J1991.25) (H8)"
[7] "*? delta, degrees (ICRS, Epoch=J1991.25) (H9)" "*[*+A-Z] Reference flag for astrometry (H10)"
[9] "? Trigonometric parallax (H11)" "*? Proper motion mu_alpha.cos(delta) ICRS (H12)"
Correction: the .arff file does not write the attribute to file - it writes data type, factor labels and column name and calls them "attributes". My mistake!
If it can help to someone, I have used the following code:
library(rlist)
file <- attr(hipparcos, 'desc')
list.save(file, 'your//pathway//file.yaml')
The yaml file can be opened with a text editor

reading configuration from text file

I have a txt file which has entries
indexUrl=http://192.168.2.105:9200
jarFilePath = /home/soumy/lib
How can I read this file from R and get the value of jarFilePath ?
I need this to set the .jaddClassPath()... I have problem to copying the jar to classpath because of the difference in slashes in windows and linux
in linux I want to use
.jaddClassPath(dir("target/mavenLib", full.names=TRUE ))
but in windows
.jaddClassPath(dir("target\\mavenLib", full.names=TRUE ))
So thinking to read location of jar from property file !!!
If there is anyother alternative please let me know that also
As of Sept 2016, CRAN has the package properties.
It handles = in property values correctly (but does not handle spaces after the first = sign).
Example:
Contents of properties file /tmp/my.properties:
host=123.22.22.1
port=798
user=someone
pass=a=b
R code:
install.packages("properties")
library(properties)
myProps <- read.properties("/tmp/my.properties")
Then you can access the properties like myProps$host, etc., In particular, myProps$pass is a=b as expected.
I do not know whether a package offers a specific interface.
If not, I would first load the data in a data frame using read.table:
myProp <- read.table("path/to/file/filename.txt, header=FALSE, sep="=", row.names=1, strip.white=TRUE, na.strings="NA", stringsAsFactors=FALSE)
sep="=" is obviously the separator, this will nicely separate your property names and values.
row.names=1 says the first column contains your row names, so you can index your data properties this way to retrieve each property you want.
For instance: myProp["jarFilePath", 2] will return "/home/soumy/lib".
strip.white=TRUE will strip leading and trailing spaces you probably don't care about.
One could conveniently convert the loaded data frame into a named vector for a cleaner way to retrieve the property values: myPropVec <- setNames(myProp[[2]], myProp[[1]]).
Then to retrieve a property value from its name: myPropVec["jarFilePath"] will return "/home/soumy/lib" as well.

Resources