get all possible keys / key paths in a json string in r - r

How can I get all the different possible json paths in a json string? Often I get huge
For example, I would like to get something back like:
result = data.frame(paths = c('name',
'name.first'
,'name.last'
,'address'
,'address.city'
,'address.state'
,'age'
,'income'
,'block'))
result
given something like this...
myjson='{
"name": {
"first": "jack",
"last": "smith"
},
"address": {"city": "bigtown", "state": "texas"},
"age": "21",
"income": "123",
"block" :["abc","xyz"]
}'
I've tried experimenting with jsonlite::fromJson but that doesn't seem to get me to what I'm after exactly.

This will get you the full paths:
data.frame(result = names(as.data.frame(jsonlite::fromJSON(myjson))))
result
1 name.first
2 name.last
3 address.city
4 address.state
5 age
6 income
7 block
If you need all partial paths along with all full paths:
data.frame(
result = sort(unique(
c(names(fromJSON(myjson)),
names(as.data.frame(jsonlite::fromJSON(myjson))))))
)
result
1 address
2 address.city
3 address.state
4 age
5 block
6 income
7 name
8 name.first
9 name.last

Related

I'm having trouble with parsing a JSON file

I am attempting to use a .json file I found online, but I'm starting to think that there is an underlying issue with the file. I am not very knowledgeable in .json files, so I am trying to convert it into a CSV file. I have yet to find a website that can do that for me.
I've tried using R to convert the file since the file is also quite large and I can only assume that most websites have a size limit. I have tried flattening it in r with this code:
library(jsonlite)
library(tidyr)
library(tidyverse)
json_string <- readLines("data.json")
json_data <- fromJSON(json_string)
json_data <- flatten(json_data)
df <- as_data_frame(json_data)
write_csv(df, "output.csv")
but it returns this error:
! Tibble columns must have compatible sizes.
* Size 2: Columns `A-Alrund, God of the Cosmos // A-Hakka, Whispering Raven`, `A-Blessed Hippogriff // A-Tyr's Blessing`, `A-Emerald Dragon // A-Dissonant Wave`, `A-Monster Manual // A-Zoological Study`, `A-Rowan, Scholar of Sparks // A-Will, Scholar of Frost`, and 484 more.
* Size 3: Column `Smelt // Herd // Saw`.
* Size 5: Column `Who // What // When // Where // Why`.
* Size 6: Columns `Everythingamajig`, `Garbage Elemental`, `Ineffable Blessing`, `Knight of the Kitchen Sink`, `Scavenger Hunt`, and 4 more.
i Only values of size one are recycled.
Backtrace:
1. tibble::as_data_frame(json_data)
3. tibble:::as_tibble.list(x, ...)
4. tibble:::lst_to_tibble(x, .rows, .name_repair, col_lengths(x))
5. tibble:::recycle_columns(x, .rows, lengths)
Here is what the first 2 items of the .json file look like
{"data": {"\"Ach! Hans, Run!\"": [{"colorIdentity": ["G", "R"], "colors": ["G", "R"], "convertedManaCost": 6.0, "foreignData": [], "identifiers": {"scryfallOracleId": "a2c5ee76-6084-413c-bb70-45490d818374"}, "isFunny": true, "layout": "normal", "legalities": {}, "manaCost": "{2}{R}{R}{G}{G}", "manaValue": 6.0, "name": "\"Ach! Hans, Run!\"", "printings": ["UNH"], "purchaseUrls": {"cardKingdom": "https://mtgjson.com/links/84dfefe718a51cf8", "cardKingdomFoil": "https://mtgjson.com/links/d8c9f3fc1e93c89c", "cardmarket": "https://mtgjson.com/links/b9d69f0d1a9fb80c", "tcgplayer": "https://mtgjson.com/links/c51d2b13ff76f1f0"}, "rulings": [], "subtypes": [], "supertypes": [], "text": "At the beginning of your upkeep, you may say \"Ach! Hans, run! It's the . . .\" and the name of a creature card. If you do, search your library for a card with that name, put it onto the battlefield, then shuffle. That creature gains haste. Exile it at the beginning of the next end step.", "type": "Enchantment", "types": ["Enchantment"]}], "\"Brims\" Barone, Midway Mobster": [{"colorIdentity": ["B", "W"], "colors": ["B", "W"], "convertedManaCost": 5.0, "foreignData": [], "identifiers": {"scryfallOracleId": "c64c31f2-c1be-414e-9dff-c3b77ba97545"}, "isFunny": true, "layout": "normal", "leadershipSkills": {"brawl": false, "commander": true, "oathbreaker": false}, "legalities": {}, "manaCost": "{3}{W}{B}", "manaValue": 5.0, "name": "\"Brims\" Barone, Midway Mobster", "power": "5", "printings": ["UNF"], "purchaseUrls": {"cardKingdom": "https://mtgjson.com/links/d1e320bd9d6813c0", "cardKingdomFoil": "https://mtgjson.com/links/18f86e8a04682c34", "cardmarket": "https://mtgjson.com/links/d5a3d8cfb60767d4", "tcgplayer": "https://mtgjson.com/links/980f45f2bc8c3733"}, "rulings": [], "subtypes": ["Human", "Rogue"], "supertypes": ["Legendary"], "text": "When \"Brims\" Barone, Midway Mobster enters the battlefield, put a +1/+1 counter on each other creature you control that has a hat.\n\"Brims\" Barone, Midway Mobster has menace as long as you're wearing a hat.", "toughness": "4", "type": "Legendary Creature — Human Rogue", "types": ["Creature"]}]}
I am hoping that the resulting csv file has the keys as the column names, and the values to be assigned to the columns based on their keys.
EDIT:
I have now attached a screenshot of what the json_data structure looks like.Structure of json_data
Assuming it's one of the JSON dumps from scryfall, try this:
library(jsonlite)
library(tidyr)
library(tidyverse)
todo <- list.files(pattern = ".json")
json_data <- fromJSON(todo)
json_data_flat_jsl <- jsonlite::flatten(json_data)
df <- as_tibble(json_data_flat_jsl)
write_csv(df, "output.csv")

Group nested array objects to parent key in JQ

I have JSON coming from an external application, formatted like so:
{
"ticket_fields": [
{
"url": "https://example.com/1122334455.json",
"id": 1122334455,
"type": "tagger",
"custom_field_options": [
{
"id": 123456789,
"name": "I have a problem",
"raw_name": "I have a problem",
"value": "help_i_have_problem",
"default": false
},
{
"id": 456789123,
"name": "I have feedback",
"raw_name": "I have feedback",
"value": "help_i_have_feedback",
"default": false
},
]
}
{
"url": "https://example.com/6677889900.json",
"id": 6677889900,
"type": "tagger",
"custom_field_options": [
{
"id": 321654987,
"name": "United States,
"raw_name": "United States",
"value": "location_123_united_states",
"default": false
},
{
"id": 987456321,
"name": "Germany",
"raw_name": "Germany",
"value": "location_456_germany",
"default": false
}
]
}
]
}
The end goal is to be able to get the data into a TSV in the sense that each object in the custom_field_options array is grouped by the parent ID (ticket_fields.id), and then transposed such that each object would be represented on a single line, like so:
Ticket Field ID
Name
Value
1122334455
I have a problem
help_i_have_problem
1122334455
I have feedback
help_i_have_feedback
6677889900
United States
location_123_united_states
6677889900
Germany
location_456_germany
I have been able to export the data successfully to TSV already, but it reads per-line, and without preserving order, like so:
Using jq -r '.ticket_fields[] | select(.type=="tagger") | [.id, .custom_field_options[].name, .custom_field_options[].value] | #tsv'
Ticket Field ID
Name
Name
Value
Value
1122334455
I have a problem
I have feedback
help_i_have_problem
help_i_have_feedback
6677889900
United States
Germany
location_123_united_states
location_456_germany
Each of the custom_field_options arrays in production may consist of any number of objects (not limited to 2 each). But I seem to be stuck on how to appropriately group or map these objects to their parent ticket_fields.id and to transpose the data in a clean manner. The select(.type=="tagger") is mentioned in the query as there are multiple values for ticket_fields.type which need to be filtered out.
Based on another answer on here, I did try variants of jq -r '.ticket_fields[] | select(.type=="tagger") | map(.custom_field_options |= from_entries) | group_by(.custom_field_options.ticket_fields) | map(map( .custom_field_options |= to_entries))' without success. Any assistance would be greatly appreciated!
You need two nested iterations, one in each array. Save the value of .id in a variable to access it later.
jq -r '
.ticket_fields[] | select(.type=="tagger") | .id as $id
| .custom_field_options[] | [$id, .name, .value]
| #tsv
'

Is it possible to explode JSON array on ingestion stage?

Azure Data Explorer is receiving data through Event Hub subscription. The payload is compressed JSON of the type:
{
"foo": "bar",
"why": 42,
"data": [
{"field1": "abc", "field2": 123},
{"field1": "xyz", "field2": 456},
{"field1": "pqr", "field2": 789}
]
}
I need to convert data into tabular format:
filed1 field2
-------------
abc 123
xyz 456
pqr 789
or even better:
foo why field1 field2
---------------------------
bar 42 abc 123
bar 42 xyz 456
bar 42 pqr 789
I need to create an ingestion mapping, which is a case of data mapping. Looking at the path syntax, I cannot figure out how to create such a mapping.
Is it possible? If not, what is the best way to set up such a transformation during ingestion?
You can achieve that using an update policy.
There's an example you can follow here: https://learn.microsoft.com/en-us/azure/data-explorer/ingest-json-formats?tabs=kusto-query-language#ingest-json-records-containing-arrays

Find an edge that is already connected with vertices to a specific vertex ID, and merge it with the result

Consider facebook search results of the people list scenario. I want to get all the people from the database (hasLabel('person')). For each of these people, I want to know whether the logged in person already have connected and follows. What is the best solution to get this in gremlin (possibly avoiding duplication)?
g.addV('person').property('id',1).as('1').
addV('person').property('id',2).as('2').
addV('person').property('id',3).as('3').
addV('person').property('id',4).as('4').
addE('connected').from('1').to('2').
addE('connected').from('2').to('3').
addE('connected').from('3').to('1').
addE('connected').from('4').to('2').
addE('follows').from('1').to('2').
addE('follows').from('1').to('3').
addE('follows').from('1').to('4').
addE('follows').from('2').to('1').
addE('follows').from('2').to('3').
addE('follows').from('3').to('1').
addE('follows').from('3').to('4').
addE('follows').from('4').to('2').
addE('follows').from('4').to('3').iterate()
For instance, if the logged-in person id is 2, the formatted JSON response will be
[
{
"id": 1,
"follows": true,
"connected": true
},
{
"id": 3,
"follows": true,
"connected": false
},
{
"id": 4,
"follows": false,
"connected": true
}
]
and if the logged-in person id is 4
[
{
"id": 1,
"follows": false,
"connected": false
},
{
"id": 2,
"follows": true,
"connected": true
},
{
"id": 3,
"follows": true,
"connected": false
}
]
Note: The JSON response is provided to understand the outcome, but I just wanted the Gremlin query to get the outcome.
Below is the general pattern you are looking for however based on the script you have listed above and the direction of the edges it's unclear exactly when to return true and when not to.
g.V().
hasLabel('person').
not(has('id', 2)). //find me person 2
project('id', 'follows', 'connected').
by('id').
by(
__.in('follows').
has('id', 2). //traverse all inbound follows edges to find if they go to person 2
fold(). //create an array (empty if nothing)
coalesce(unfold().constant(true), constant(false))). //return true if edge exists, else false
by(
__.out('connected').
has('id', 2).
fold().
coalesce(unfold().constant(true), constant(false)))
Based on the script you provided there is no way to get the answers you asked for. Let's look at just the connected edges.
For vertex 2:
using in() we would get true for 1 and 4 and false for 3
using out() we would get true for 3 and false for 1 and 4
using both() all would be true
So based on the results above it looks like you want to use in() edges. However when we apply that to vertex 4 all the results would be false

How to use the function "table:get" (table extension) when 2 keys are required?

I have a file .txt with 3 columns: ID-polygon-1, ID-polygon-2 and distance.
When I import my file into Netlogo, I obtain 3 lists [[list1][list2][list3]] which corresponds with the 3 columns.
I used table:from-list list to create a table with the content of 3 lists.
I obtain {{table: [[1 1] [67 518] [815 127]]}} (The table displays the first two lines of my dataset).
For example, I would like to get the value of distance (list3) between ID-polygon-1 = 1 (list1) and ID-polygon-2 = 67 (list1), that is, 815.
How can I use table:get table key when I have need of 2 keys (ID-polygon-1 and ID-polygon-2) ?
Thanks very much your help.
Using table:from-list will not help you there: it expects "a list of two element lists, or pairs" where the "the first element in the pair is the key and the second element is the value." That's not what you have in your original list.
Furthermore, NetLogo tables (and associative arrays in general) cannot have two keys. They are always just key-value pairs. Nothing prevents the value from being another table, however, and in your case, that is what you need: a table of tables!
There is no primitive to build that directly, however. You will need to build it yourself:
extensions [ table ]
globals [ t ]
to setup
let lists [
[ 1 1 ] ; ID-polygon-1 column
[ 67 518 ] ; ID-polygon-2 column
[ 815 127 ] ; distance column
]
set t table:make
foreach n-values length first lists [ ? ] [
let id1 item ? (item 0 lists)
let id2 item ? (item 1 lists)
let dist item ? (item 2 lists)
if not table:has-key? t id1 [
table:put t id1 table:make
]
table:put (table:get t id1) id2 dist
]
end
Here is what you get when you print the resulting table:
{{table: [[1 {{table: [[67 815] [518 127]]}}]]}}
And here is a small reporter to make it convenient to get a distance from the table:
to-report get-dist [ id1 id2 ]
report table:get (table:get t id1) id2
end
Using get-dist 1 67 will give the 815 result you were looking for.

Resources