HIVE: how to access elements in a MAP object - dictionary

in my HIVE table MYTABLE I have one column "MYCOL" that contains this:
{"id": "a651b57f",
"items": {
"ITEM1": {
"code": "CODE1",
"name": "NAME1"},
"ITEM2": {
"code": "CODE2",
"name": "NAME2"}},
"myinfo": {
"c7daf1a9": {
"id": "c7daf1a9",
"name": "newname",
"type": "newtype",
"appliedto": ["ITEM1", "ITEM2"]}},
"info2": 12}
I would like to access the elements into "myinfo" and I tried something like this:
select GET_JSON_OBJECT(t.MYCOL,'$.myinfo') FROM MYTABLE
but it doesn't work....
may someone help me?
thanks

Make sure the data in HDFS file have one line for each json row (not multiple new lines for one row).
If json row is having multiple new lines then we need to replace all newlines for each row before storing into HDFS.
Example:
HDFS file data:
{"id": "a651b57f","items": {"ITEM1": {"code": "CODE1","name": "NAME1"},"ITEM2": {"code": "CODE2","name": "NAME2"}},"myinfo": {"c7daf1a9": {"id": "c7daf1a9","name": "newname","type": "newtype","appliedto": ["ITEM1", "ITEM2"]}},"info2": 12}
Hive:
with cte as (select string('{"id": "a651b57f","items": {"ITEM1": {"code": "CODE1","name": "NAME1"},"ITEM2": {"code": "CODE2","name": "NAME2"}},"myinfo": {"c7daf1a9": {"id": "c7daf1a9","name": "newname","type": "newtype","appliedto": ["ITEM1", "ITEM2"]}},"info2": 12}')my_col) --sample data
select get_json_object(my_col,'$.myinfo')jsn from cte;
Output:
{"c7daf1a9":{"id":"c7daf1a9","name":"newname","type":"newtype","appliedto":["ITEM1","ITEM2"]}}
Update
--to access name subfield we need to specify the path of json object
hive> select get_json_object(my_col,'$.myinfo.c7daf1a9.name')jsn from <table_name>;
--result
newname
hive> select get_json_object(my_col,'$.myinfo.c7daf1a9.appliedto')jsn from <table_name>;
--result
["ITEM1","ITEM2"]
hive> select get_json_object(my_col,'$.myinfo.c7daf1a9.appliedto[0]')jsn from <table_name>;
--result
ITEM1

Related

How to extrapolate values in one AWS CLI output with values from two separate CLI outputs as input files?

I am trying to build an audit/compliance report from IAM identity center. We need a list of groups and the respective group members. At current count we have 1,500+ users and 700+ Groups across 120 accounts in AWS.
There isn't an API command to spit this data out, so I'm putting a few commands together to extract the groups to files in Cloudshell. Then I need to cross-reference and throw everything into a CSV for filtering in Excel for the auditors.
Retrieve UserName and UserID - store in UserID.json
aws identitystore list-users --identity-store-id d-123456789| jq '.Users[] | {Name: .UserName, ID:.UserId}' > UsersIds.json
Retrieve Groups and GroupIDs - store in GroupsID.json
aws identitystore list-groups --identity-store-id d-123456789| jq '.Groups[] | {GroupName: .DisplayName, ID:.GroupId}' > GroupsID.json
Retrieve list of All Users per Group - store in GroupMembers.json
result=$(aws identitystore list-groups --identity-store-id d-123456789| jq -r '.Groups[].GroupId')
for val in $result; do
aws identitystore list-group-memberships --identity-store-id d-123456789--group-id $val | jq -r '.GroupMemberships[] | \
{GroupID: .GroupId, Member:User.Id} ' >> GroupMembers.json
done
Example output from UserIds.json:
{
"Name": "first.last#example.com",
"ID": "123456789-9876543210-ABCD-4321-1234"
}
{
"Name": "last.first#example.com",
"ID": "12345678-4321-1234-2233-9876543210"
}
Example output from GroupsID.json:
{
"GroupName": "sso-aws-zone-role-CloudCoreOps",
"ID": "123456789-55668877-1234-5522-2255-987654321"
}
{
"GroupName": "sso-aws-zone-role-CloudCoreRO",
"ID": "1234567890-11224455-2255-5522-1343-9876543210"
}
Example Output from GroupsMembers.json:
{
"GroupID": "123456789-55668877-1234-5522-2255-987654321",
"Member": "123456789-9876543210-ABCD-4321-1234"
}
{
"GroupID": "1234567890-11224455-2255-5522-1343-9876543210",
"Member": "12345678-4321-1234-2233-9876543210"
}
Now I just need to correlate and I have read you can use JQ like SED. So, that means I should be able to replace the key values in GroupMembers.json. First is to replace the GroupID with the correct GroupName matched from the GroupsID.json file and the Member with the User Name that matches the ID from the UserID.json file.
I think this can be done in a loop, but I want need to learn not only how to do this, but the best way.
It should be doable with INDEX and JOIN in a two-level nesting:
jq --slurpfile users UserIds.json --slurpfile groups GroupsID.json '
JOIN($groups | INDEX(.ID);
JOIN($users | INDEX(.ID); .; .Member; add);
.GroupID; add) | {Name, GroupName}
' GroupsMembers.json
{
"Name": "first.last#example.com",
"GroupName": "sso-aws-zone-role-CloudCoreOps"
}
{
"Name": "last.first#example.com",
"GroupName": "sso-aws-zone-role-CloudCoreRO"
}

Print the key and a subset of fields if a field is not a specific value

I am new to jq and can't seem to quite get the syntax right for what I want to do. I am executing a command and piping its JSON output into jq. The structure looks like this:
{
"timestamp": 1658186185,
"nodes": {
"x3006c0s13b1n0": {
"Mom": "x3006c0s13b1n0.hsn.cm",
"Port": 15002,
"state": "free",
"pcpus": 64,
"resources_available": {
"arch": "linux",
"gputype": "A100",
"host": "x3006c0s13b1n0",
"mem": "527672488kb",
"ncpus": 64,
"ngpus": 4,
"system": "polaris",
"tier0": "x3006-g1",
"tier1": "g1",
"vnode": "x3006c0s13b1n0"
},
"resources_assigned": {},
"comment": "CHC- Offlined due to node health check failure",
"resv_enable": "True",
"sharing": "default_shared",
"license": "l",
"last_state_change_time": 1658175652,
"last_used_time": 1658175652
},
And so on with a record for each node. In psuedocode, what I want to do is this:
if state is not free then display nodename : {comment = "Why is the node down"}
The nodename is the key, but could be extracted from a field inside the record. However, for future reference, I would like to understand how to get the key. I figured out (I think) that you can't use == on strings, but instead have to use the regex functions.
This gives me the if state is not free part:
<stdin> | jq '.nodes[] | .state | test("free") | not'
This gives me an object with the Mom (which includes the key) and the comment:
jq '.nodes[] | {Mom: .Mom, comment: .comment}'
The question is how do I put all that together? And as for the keys, this gives me a list of the keys: jq '.nodes | keys' but that uses the non-array version of nodes.
One way without touching the keys would be to only select those array items that match the condition, and map the remaining items' value to the comment itself using map_values:
jq '.nodes | map_values(select(.state != "free").comment)'
{
"x3006c0s13b1n0": "CHC- Offlined due to node health check failure"
}
Keeping the whole comments object, which is closer to your desired output, would be similar:
jq '.nodes | map_values(select(.state != "free") | {comment})'
{
"x3006c0s13b1n0": {
"comment": "CHC- Offlined due to node health check failure"
}
}
Accessing the keys directly is still possible though. You may want to have a look at keys, keys_unsorted or to_entries.

How to use jq package to parse name and id from json?

I have an output that i am getting in this format :-
[
{
"_class": "hudson.model.FreeStyleProject",
"name": "my-name",
"id": "123"
},
{
"_class": "hudson.model.FreeStyleProject",
"name": "my-name2",
"id": "456"
},
{
"_class": "hudson.model.FreeStyleProject",
"name": "my-name3",
"id": "789"
}
]
How can i parse the name and id using jq?
I tried to use [].name
but i get curl: (23) Failed writing body (320 != 1338)
Any help will be appreciated. Thank you.
You failed to mention the relevant error:
jq: error (at <stdin>:17): Cannot index array with string "name"
The program should be
.[].name
Because you provided an incorrect program to jq, it exited earlier than it normally would. This caused the pipe between curl and jq to close, which cause curl to become unable to write to the pipe, which caused curl to emit the error message you did provide.
Demo
https://jqplay.org/s/nolGbk3sD1
Use filter
.[] | .name, .id

JQ to filter only vaule of id

the following is the JSON data. need to get only of id key
{apps:[ {
"id": "/application1/4b693882-ffba-4c93-a0f2-cccafcb4d7dd",
"cmd": null,
"args": null,
"user": null,
"env": {},
"constraints": [
[
"hostname",
"GROUP_BY",
"5"
]
},
{
"id": "/application2/4b693882-ffba-4c93-a0f2-cccafcb4d7dd",
"cmd": null,
"args": null,
"user": null,
"env": {},
"constraints": [
[
"hostname",
"GROUP_BY",
"5"
]
]},
output expected is
/application1/4b693882-ffba-4c93-a0f2-cccafcb4d7dd
/application2/4b693882-ffba-4c93-a0f2-cccafcb4d7dd
Thanks in advance
After fixing the errors in your JSON, we can use the following jq filter to get the desired output:
.apps[] | .id
JqPlay Demo
Result jq -r '.apps[] | .id':
/application1/4b693882-ffba-4c93-a0f2-cccafcb4d7dd
/application2/4b693882-ffba-4c93-a0f2-cccafcb4d7dd
You can use map() to create an array from the properties of the objects. Try this:
let data = {apps:[{"id":"/application1/4b693882-ffba-4c93-a0f2-cccafcb4d7dd","cmd":null,"args":null,"user":null,"env":{},"constraints":["hostname","GROUP_BY","5"]},{"id":"/application2/4b693882-ffba-4c93-a0f2-cccafcb4d7dd","cmd":null,"args":null,"user":null,"env":{},"constraints":["hostname","GROUP_BY","5"]}]}
let ids = data.apps.map(o => o.id);
console.log(ids);
Note that I corrected the invalid brace/bracket combinations in the data structure you posted in the question. I assume this is just a typo in that example, otherwise there would be parsing errors in the console.

jq replace values based on external map

I would like to change a field in my json file as specified by another json file. My input file is something like:
{"id": 10, "name": "foo", "some_other_field": "value 1"}
{"id": 20, "name": "bar", "some_other_field": "value 2"}
{"id": 25, "name": "baz", "some_other_field": "value 10"}
I have an external override file that specifies how name in certain objects should be overridden, for example:
{"id": 20, "name": "Bar"}
{"id": 10, "name": "foo edited"}
As shown above, the override may be shorter than input, in which case the name should be unchanged. Both files can easily fit into available memory.
Given the above input and the override, I would like to obtain the following output:
{"id": 10, "name": "foo edited", "some_other_field": "value 1"}
{"id": 20, "name": "Bar", "some_other_field": "value 2"}
{"id": 25, "name": "baz", "some_other_field": "value 10"}
Being a beginner with jq, I wasn't really sure where to start. While there are some questions that cover similar ground (the closest being this one), I couldn't figure out how to apply the solutions to my case.
There are many possibilities, but probably the simplest, efficient solution would use the built-in function: INDEX/2, e.g. as follows:
jq -n --slurpfile dict f2.json '
(INDEX($dict[]; .id) | map_values(.name)) as $d
| inputs
| .name = ($d[.id|tostring] // .name)
' f1.json
This uses inputs with the -n option to read the first file so that each JSON object can be processed in turn.
Since the solution is so short, it should be easy enough to figure it out with the aid of the online jq manual.
Caveat
This solution comes with a caveat: that there are no "collisions" between ids in the dictionary as a result of the use of "tostring" (e.g. if {"id": 10} and {"id": "10"} both occurred).
If the dictionary does or might have such collisions, then the above solution can be tweaked accordingly, but it is a bit tricky.

Resources