Use jq to extract a subset of a nested object's fields? - jq

I have a 26MB JSON file containing UN/LOCODE data that I want to restructure and remove some data from so that it takes less space in my app's binary package.
The JSON contains an array of objects like this:
{
"Change": null,
"Coordinates": "4234N 00135E",
"Country": "AD",
"Date": "0307",
"Function": "--3-----",
"IATA": null,
"Location": "CAN",
"Name": "Canillo",
"NameWoDiacritics": "Canillo",
"Remarks": null,
"Status": "RL",
"Subdivision": null
}
The desired structure is an object rather than an array, keyed on the concatenation of the Country and Location fields, but the only nested fields that I am interested in are "Name" and "Coordinates".
I have been able to accomplish the first step with:
jq 'INDEX("\(.Country)-\(.Location)")'
giving me:
{
"AD-CAN": {
"Change": null,
"Coordinates": "4234N 00135E",
"Country": "AD",
"Date": "0307",
"Function": "--3-----",
"IATA": null,
"Location": "CAN",
"Name": "Canillo",
"NameWoDiacritics": "Canillo",
"Remarks": null,
"Status": "RL",
"Subdivision": null
},
...
}
but I cannot figure out how to get only the desired keys from the nested objects inside the new top-level object.
If this can't be done with jq I'll have to resort to a custom script to do it.

Just reshape the objects using map_values:
jq 'INDEX("\(.Country)-\(.Location)") | map_values({Name, Coordinates})'
Or do it all in one go using reduce:
jq 'reduce .[] as $i ({}; ."\($i.Country)-\($i.Location)" = ($i|{Name, Coordinates}))'

You can use map_values, which applies to filter to every property's value in the object:
INDEX("\(.Country)-\(.Location)") | map_values({Name, Coordinates})
Output:
{
"AD-CAN": {
"Name": "Canillo",
"Coordinates": "4234N 00135E"
}
}

Related

Using nested fields in the projection of a DynamoDB GSI

I've got a Dynamo table storing documents that look like this:
{
"guid": "<some UUID>"
"created_at": 1550778260030,
"display_name": "person",
"updated_at": 1550778260030,
"info": {
"locked": false,
"confirmed": true,
"deactivated": false,
"email": "person#example.com"
}
}
The table has a Global Secondary Index managed by Terraform defined as such:
global_secondary_index {
name = "display_name_index"
hash_key = "display_name"
projection_type = "INCLUDE"
non_key_attributes = [
"updated_at",
"info.email",
"created_at"
]
}
However when I query the table the info.email field isn't returned:
aws dynamodb query \
--table-name "accounts" \
--index-name "display_name_index" \
--key-condition-expression "display_name = :display_name" \
--expression-attribute-values '{":display_name":{"S":"person"}}'
{
"Count": 1,
"Items": [
{
"created_at": {
"N": "1550778260030"
},
"display_name": {
"S": "person"
}
"updated_at": {
"N": "1550778260030"
}
}
],
"ScannedCount": 1,
"ConsumedCapacity": null
}
If I change the non_key_attributes to include info it returns the full info blob just fine, and I can use a projection-expression of info.email to retrieve that field just fine:
{
"Count": 1,
"Items": [
{
"info": {
"M": {
"email": {
"S": "person#example.com"
}
}
}
}
],
"ScannedCount": 1,
"ConsumedCapacity": null
}
The Dynamo docs do specify that index keys have to be top-level, but they don't mention anything about non-key attributes in a projection having to be top-level. Therefore I'd assume that anything that works in a projection-expression should work in an index projection, but that seems to not be the case?
Am I doing something wrong with this index definition or the query? Or does Dynamo just not support nested non-key attributes as part of an index's projection?
In simple words nested attribute can not be used as a GSI projection. It is not supported in DDB yet.
I walked into the same thing. I see there are no answers jet to your question. Not sure I have the right answer, but maybe it can help you out.
First of all, I think it's very weird creating GSI the API allows you to add a projection of "info.email" (this will also visible on the index overview page) but can never be retrieved again.
I found out when creating a GSI you are stuck on the attributes you have provided.
On the other hand, creating LSI you can use the attributes you have provided while creating the LSI.
You can found a little about this in this document (search for "Projected Attributes"):
https://docs.amazonaws.cn/en_us/amazondynamodb/latest/developerguide/SecondaryIndexes.html
I hope you can do something with this info.

Get vertices with a simpler format

Is there a way to get a list of vertices with a simpler format?
Currently, the following query:
g.V().has(label, 'Quantity').has('text', '627 km');
returns an object like this:
{
"id": 42545168,
"label": "Quantity",
"type": "vertex",
"properties": {
"sentence": [
{
"id": "pkbgi-pbw28-745",
"value": "null"
}
],
"updated_text": [
{
"id": "pk9vm-pbw28-5j9",
"value": "627 km"
}
],[...]
And when I get a list of edges it is formatted in a simpler format:
g.E().has(label, 'locatedAt').has('out_entity_id','41573-41579');
returns:
{
"id": "ozfnt-ip8o-2mtx-g8vs",
"label": "locatedAt",
"type": "edge",
"inVLabel": "Location",
"outVLabel": "Location",
"inV": 758008,
"outV": 872520,
"properties": {
"sentence": "Bolloré is a corporation (société anonyme) with a Board of Directors whose registered offi ce is located at Odet, 29500 Ergué-Gabéric in France.",
"in_entity_id": "41544-41548",
"score": "0.795793",
"out_entity_id": "41573-41579"
}
}
How so?
Is there a way to get vertices formatted this way?
My advice is to rather than have your query return the whole vertex, return the specific properties that you are interested in. For example the vertex ID or some selected properties that you are interested in or a valueMap. Otherwise what you will get back is essentially everything. This is really the same as in SQL trying to not do a "select *" but selecting only what you really care about.
Edited to add an example that returns the IDs of matching vertices.
g.V().has(label, 'Quantity').has('text', '627 km').id().fold()
Will yield a result that looks like this
{"requestId":"73f40519-87c8-4037-a9fc-41be82b3b227","status":{"message":"","code":200,"attributes":{}},"result":{"data":[[20608,28920,32912,106744,123080,135200,139296,143464,143488,143560,151584,155688,155752,159784,188520,254016,282688,286968,311360,323832,348408,4344,835648,8336,1343616,12352]],"meta":{}}}

How to gather Freebase Aliases for type location/location?

I'd like to pull information (MID and US English name) about all locations in Freebase AND also their Korean names and any Korean aliases via an MQL query. This is as far as I've gotten:
[{
"id": null,
"name": null,
"mid": null,
"type": "/location/location",
"Korean:name": [{
"lang": "/lang/ko",
"value": null
}]
}]
I'm only getting the Korean name, but not any Korean aliases. I don't know how to write a query that outputs properties of 2 different types in the same query. Can you get data about both /location/location AND common/topic/alias for the same entity in the same MQL query/output? Is my approach just wrong here?
Any help appreciated.
When you need to combine properties from many different types you need to use the fully qualified property ID like this:
[{
"id": null,
"name": null,
"mid": null,
"type": "/location/location",
"Korean:name": [{
"lang": "/lang/ko",
"value": null
}],
"/common/topic/alias": [{
"lang": "/lang/ko",
"value": null,
"optional": true
}]
}]
Whenever you use shortened property IDs they are assumed to be in the same type as the type you specify in your query (or /type/object if no type is given). So for example, if you were to use "geolocation" in your query it would be interpreted as "/location/location/geolocation". The only excepts are "id", "name" and "type" which you can use without using the full IDs eg. "/type/object/name".
You'll also note that I made aliases "optional" so that it would return results for locations that don't have any aliases.

freebase city regions above neighborhood

Using freebase how can I find say, all the burrows/subcities of NY? (queens, brooklyn, etc.)
And will it be similar to other cities? Say if I want to know the subdivisions of Prague (Zizkov, Old Town, etc.) or Berlin, etc?
I've tried various combos but haven't hit one yet.
{
"id": "/en/new_york",
"guid": null,
"name": null,
"/location/location/containedby": [
],
"/location/location/contains" : [],
"/location/place_with_neighborhoods/neighborhoods": [
]
}​
The property /location/location/contains is the one that you want, but you're going to have two problems:
It's only sparsely populated
It has multiple levels of containment as a hack to work around API limitations
There's not much you can do about #1 unless you want to work on improving the data yourself. For #2, you can subtract the set of locations which are contained in another location in the "contains" set.
Someone might be able to give a better answer but this will get major districts like in NY but probably not for smaller cities which are more like regions.
{
"id": "/en/new_york",
"guid": null,
"name": null,
"/location/location/containedby": [
],
"/location/location/contains" : [
"name": null,
"type": "/location/citytown"
]
}​
or to select multiple items that might be it
{
"id": "/en/new_york",
"guid": null,
"name": null,
"/location/location/containedby": [
],
"/location/location/contains" : [
"name": null,
"type|=" : [
"/location/citytown",
"/location/neighborhood",
"/location/administrative_division",
"/location/de_borough",
"/location/place_with_neighborhoods/neighborhoods"
]
]
}​

Freebase: Get name & Wikipedia ID in one query in a certain language

Is it possible to do one query in MQL to obtain the name and the wikipedia ID for a certain language from Freebase? If that's possible, is it also possible to do this for a set of languages (eg. german & english)?
Asked and answered, but here's a slightly better form of the query:
[{
"id": "/en/white_house",
"mid": null,
"de:name": {
"lang": "/lang/de",
"value": null
},
"en:name": null,
"wiki_de:key": {
"/type/key/namespace": "/wikipedia/de_id",
"value": null,
"optional": True,
},
"wiki_en:key": {
"/type/key/namespace": "/wikipedia/en_id",
"value": null,
"optional": True,
}
}]​
The Wikipedia keys will be escaped if they contain special characters, so you should consult
http://wiki.freebase.com/wiki/MQL_key_escaping for how to unescape them.
Some of the reasons this query is better include:
English is the default language, so it doesn't need to be specified for names
it removes the ambiguity of the namespace lookup. Your original query is actually looking for the key "white_house" in any namespace (and finding it in "/en" which is equivalent to the id "/en/white_house")
Note that you don't need to do the lookup by ID. You can use any lookup facility that MQL provides such as looking up by one of the Wikipedia keys or using "name~=":"white house" to find all topics containing that string or anything else that works for your starting data and your use case.
After playing around with MQL I finally came up with the following query (for white_house):
[{
"id": null,
"mid": null,
"de:name": {
"lang": "/lang/de",
"value": null
},
"en:name": {
"lang": "/lang/en",
"value": null
},
"key": {
"value": "white_house"
},
"wiki_de:key": {
"/type/key/namespace": "/wikipedia/de_id",
"value": null
},
"wiki_en:key": {
"/type/key/namespace": "/wikipedia/en_id",
"value": null
}
}]

Resources