Reading in JSON files in R with unicode strings - r

I have a txt file, that contains a list of json objects, with 1 of the keys containing an emoji representation in JavaCode format.
I want to parse the json file and convert it to CSV, with the column called emoji, and the emoji to be kept in its JavaCode representation for eg. "\uD83D\uDEB6\u200D\u2642\uFE0F".
Is there any way to do that? Currently, I'm using jsonlite to read it in but it seems to auto-convert the emoji data to be like this (which I assume is its Unicode representation): "<U+0001F6B6><U+200D><U+2640><U+FE0F>"
This is the code to read it in:
df <- fromJSON(data_file)
head(df)
Edited the question to include a sample json object from the list:
{
"x": [
{
"y": "a",
"emoji": "\uD83D\uDEB6\u200D\u2642\uFE0F"
}
],
"z": "b"
}

Related

r json mongodb query $in operator syntax error due to double quotes?

I'm building a json query to pass to a mongodb database in R.
In one scenario, I have a vector of dates and I want to query the database to return all records which have a date in the relevant field that matches a date in my vector of dates.
The second scenario is the same as the first, but this time I have a vector of character strings (IDs) and need to return all the records with matching IDs.
I understood the correct way to do this in a json query is to use the $in operator, and then put my vector in an array.
However, when I pass the query to my mongodb database, the exportLogId returns NULL. I'm quite sure that the problem is something to do with how I am representing the $in operator in the final query, since I have very similarly structured queries without the $in operator and they are all working. If I look for just one of my target dates or character strings, I get the desired result.
I followed the mongodb manual here to construct my query, and the only issue I can see is that the $in operator in the output of jsonlite::toJSON() is enclosed in double quotes; whereas I think it might need to be in single quotes (or no quotes at all, but I don't know how to write the syntax for that).
I'm creating my query in two steps:
Create the query as a series of nested lists
Convert the list object to json with jsonlite::toJSON()
Here is my code:
# Load libraries:
library(jsonlite)
# Create list of example dates to query in mongodb format:
sampledates <- c("2022-08-11T00:00:00.000Z",
"2022-08-15T00:00:00.000Z",
"2022-08-16T00:00:00.000Z",
"2022-08-17T00:00:00.000Z",
"2022-08-19T00:00:00.000Z")
# Create query as a list object:
query_list_l <- list(filter =
# Add where clause:
list(where =
# Filter results by list of sample dates:
list(dateSampleTaken = list('$in' = sampledates),
# Define format of column names and values:
useDbColumns = "true",
dontTranslateValues = "true",
jsonReplaceUndefinedWithNull = "true"),
# Define columns to return:
fields = c("id",
"updatedAt",
"person.visualId",
"labName",
"sampleIdentifier",
"dateSampleTaken",
"sequence.hasSequence")))
# Convert list object to JSON:
query_json = jsonlite::toJSON(x = query_list_l,
pretty = TRUE,
auto_unbox = TRUE)
The JSON query now looks like this:
> query_json
{
"filter": {
"where": {
"dateSampleTaken": {
"$in": ["2022-08-11T00:00:00.000Z", "2022-08-15T00:00:00.000Z", "2022-08-16T00:00:00.000Z", "2022-08-17T00:00:00.000Z", "2022-08-19T00:00:00.000Z"]
},
"useDbColumns": "true",
"dontTranslateValues": "true",
"jsonReplaceUndefinedWithNull": "true"
},
"fields": ["id", "updatedAt", "person.visualId", "labName", "sampleIdentifier", "dateSampleTaken", "sequence.hasSequence"]
}
}
As you can see, $in is now enclosed in double quotes, even though I put it in single quotes when I created the query as a list object. I have tried replacing with sprintf() but that just adds a lot of backslashes to my query. I also tried:
query_fixed <- gsub(pattern = "\\"\\$\\in\\"",
replacement = "\\'$in\\'",
x = query_json)
... but this fails with an error.
I would be very grateful to know if:
The syntax problem that is preventing $in from working is actually the double quotes?
If double quotes is the problem, how do I replace them with single quotes without messing up the JSON format?
UPDATE:
The issue seems to occur when R is passing the query to the database, but I still can't work out exactly why.
If I try the query out in loopback explorer in the database, it works and using the export log ID produced, I can then fetch the results with httr::GET() in R. Example query results are shown below (sorry for the hashes - the main point is you can see the format of the returned values):
[1] "[{\"_id\":\"e59953b6-a106-4b69-9e25-1c54eef5264a\",\"updatedAt\":\"2022-09-12T20:08:39.554Z\",\"dateSampleTaken\":\"2022-08-16T00:00:00.000Z\",\"labName\":\"LNG_REFERENCE_DATA_CATEGORY_LAB_NAME_LAB_A\",\"sampleIdentifier\":\"LS0044-SCV2-PCR\",\"sequence\":{\"hasSequence\":false},\"person\":{\"visualId\":\"C-2022-0002\"}},{\"_id\":\"af5cd9cc-4813-4194-b60b-7d130bae47bc\",\"updatedAt\":\"2022-09-12T20:11:07.467Z\",\"dateSampleTaken\":\"2022-08-17T00:00:00.000Z\",\"labName\":\"LNG_REFERENCE_DATA_CATEGORY_LAB_NAME_LAB_A\",\"sampleIdentifier\":\"LS0061-SCV2-PCR\",\"sequence\":{\"hasSequence\":false},\"person\":{\"visualId\":\"C-2022-0003\"}},{\"_id\":\"b5930079-8d57-43a8-85c0-c95f7e0338d9\",\"updatedAt\":\"2022-09-12T20:13:54.378Z\",\"dateSampleTaken\":\"2022-08-16T00:00:00.000Z\",\"labName\":\"LNG_REFERENCE_DATA_CATEGORY_LAB_NAME_LAB_A\",\"sampleIdentifier\":\"LS0043-SCV2-PCR\",\"sequence\":{\"hasSequence\":false},\"person\":{\"visualId\":\"C-2022-0004\"}}]"

Error in fromJSON("employee.json") : not all data was parsed (0 chars were parsed out of a total of 13 chars)

I was trying to read the JSON file from my r studio as a purpose of learning how to read JSON file, but suddenly i got an parsing error.
employee.json
{
"id" : ["1","2","3","4","5","6","7","8" ],
"name" : ["Shubham","Nishka","Gunjan","Sumit","Arpita","Vaishali","Anisha","Ginni" ],
"salary" : ["623","552","669","825","762","882","783","964"],
"start_date" : [ "1/1/2012","9/15/2013","11/23/2013","5/11/2014","3/27/2015","5/21/2013","7/30/2013","6/17/2014"],
"dept" : [ "IT","Operations","Finance","HR","Finance","IT","Operations","Finance"]
}
.R file
library(rjson)
emp = fromJSON("employee.json")
e = as.data.frame(emp)
print(e)
The first argument to rjson::fromJSON is a JSON string. So your code is interpreting "employee.json" (note it has 13 characters) as JSON.
If you have saved a file named employee.json, you need to specify file = :
emp <- rjson::fromJSON(file = "employee.json")
This is not an issue when using jsonlite::fromJSON because the first argument can be a string, file or URL.

R Jsonlite - How to iterate a JSON list of objects?

I'm very new in R, but I was tasked with reading a JSON file that looks like the following :
{
"revisions" : [
{"number": 1, "description" : "first revision"},
{"number": 2, "description" : "second revision"},
{"number": 3, "description" : "third revision"}
]
}
I need to do some data manipulation iterating over revisions, but I can't understand what type of data structure jsonlite is transforming this list into, it seems it transposed it.
This is what I've tried :
json = fromJSON('data.json')
for (revision in json$revisions) {
print(revision$number) # Doesn't work
print(revision['number']) # Doesn't work
}
How can I read the json file in the way I'm trying above?
Using R 3.6.1, ideally I need to keep it to the base functions
json$revisions is a data.frame so you can try something like
for (i in seq(nrow(json$revisions))) {
print(json$revisions$number[i])
}

Is there a way in R to add comma's to incorrect JSON format?

I'm trying to work with a JSON file in R, but unfortunately the JSON file is unreadable by jsonlite in its current state. It's missing commas between the objects(arrays elements?). My objective is to form a data frame from this almost-JSON file. Example JSON file, code, and result below.
[
{"Source":"ADSB","Id":43061,"FlightId":"N668XX","Latitude":44.000083,"Longitude":-96.654788,"Alt":4450}
{"Source":"ADSB","Id":43062,"FlightId":"N683XX","Latitude":44.000083,"Longitude":-96.654788,"Alt":4450}
{"Source":"ADSB","Id":43063,"FlightId":"N652XX","Latitude":44.000083,"Longitude":-96.654788,"Alt":4450}
]
> jsondata = fromJSON("asdf.json")
Error in parse_con(txt, bigint_as_char) :
parse error: after array element, I expect ',' or ']'
"Heading":280,"Speed":124} {"Source":"ADSB","Id":43062,"Fl
(right here) ------^
After inserting comma's between the objects in the JSON file, it works no problem.
[
{"Source":"ADSB","Id":43061,"FlightId":"N668XX","Latitude":44.000083,"Longitude":-96.654788,"Alt":4450},
{"Source":"ADSB","Id":43062,"FlightId":"N683XX","Latitude":44.000083,"Longitude":-96.654788,"Alt":4450},
{"Source":"ADSB","Id":43063,"FlightId":"N652XX","Latitude":44.000083,"Longitude":-96.654788,"Alt":4450},
]
> jsondata = fromJSON("asdf.json")
> names(jsondata)
[1] "Source" "Id" "FlightId" "Latitude" "Longitude" "Alt"
How do I insert commas throughout this JSON file between all of the curvy brackets? (i.e. "}{" --> "},{"
Or is there another way for R to read my incomplete JSON file?
I'm less than a novice, so any help is much appreciated, thanks!!

Altering Json output in Drupal

I am outputing a json document from a view that looks like this:
[
{
"NewsTitle": "asdas",
"ItemDescription": "asdasdasd\r\n",
"NumofViews": "3",
"Likes": "0",
"PostDate": "10 Mar, 2016",
"ImageUrl": "6_n_0.jpg",
"NewsType": "8",
"ShareURL": "",
"VideoURL": "https://www.youtube.com/",
}
]
I want to remove the brackets in the beginning to be outputed in this form :
{
"menu": {
"NewsTitle": "asdas",
"ItemDescription": "asdasdasd\r\n",
"NumofViews": "3",
"Likes": "0",
"PostDate": "10 Mar, 2016",
"ImageUrl": "6_n_0.jpg",
"NewsType": "8",
"ShareURL": "",
"VideoURL": "https://www.youtube.com/",
}
}
This is the configuration i am using:
in Views:
FORMAT
Format:JSON data document | Settings
FIELDS
in Format:JSON data document | Settings
Root object name
*empty*
The name of the root object in the JSON document. e.g nodes or users or forum_posts
Top-level child object
*empty*
The name of each top-level child object in the JSON document. e.g node or user or forum_post
Field output
Normal *chosen*
Raw
For each row in the view, fields can be output as either the field rendered by Views, or by the raw content of the field.
Plaintext output *selected*
For each row in the view, strip all markup from the field output.
Remove newlines
Strip newline characters from the field output.
JSON data format
Simple *selected*
MIT Simile/Exhibit
To be consumed by jqGrid
What object format will be used for JSON output.
JSONP prefix
If used the JSON output will be enclosed with parentheses and prefixed by this label, as in the JSONP format.
Content-Type
Default: application/json *selected*
text/json
application/javascript
The Content-Type header that will be sent with the JSON output.
Views API mode
With Views API mode the JSON will embedded as normal content so normal page processing is used. Leave it unchecked when JSON should be printed directly to the client.
Object arrays
Outputs an object rather than an array when a non-associative array is used. Especially useful when the recipient of the output is expecting an object and the array is empty.
Numeric strings
Encodes numeric strings as numbers.
Numeric strings *selected*
Encodes large integers as their original string value.
Pretty print *selected*
Use whitespace in returned data to format it.
Unescaped slashes *selected*
Don't escape forward slashes /.
Unescaped unicode
Encode multibyte Unicode characters literally (default is to escape as \uXXXX).

Resources