Compare json files but ignore values

Compare json files but ignore values - jq

I would like to compare two json files and report differencies but I am interested in keys only and not values. So for example the "json-diff" between the following two files (of course they are much more complicated):
{
"http": {
"https": true,
"swagger": {
"enabled": false
},
"scalingFactors": [0.1, 0.2]
}
}
{
"http": {
"https": true,
"swagger": {
"enabled": true
},
"scalingFactors": [0.1, 0.1],
"test": true
}
}
should report that there is missing key:
http.test
but
should not report that the following keys have different values:
http.swagger.enabled
http.scalingFactors
I looked at the jq tool but I am not sure how to ignore values.

Ignoring potential complications having to do with arrays, looking at the "symmetric difference" of the sets of paths to scalars would make sense. As a starting point, you could thus consider:
jq -c '
[paths(scalars)] as $f1
| [input | paths(scalars)] as $f2
| ($f1 - $f2) + ($f2 - $f1)' file1.json file2.json
You might want to stringify the paths, but then again, it might be wise to avoid doing so if the mapping to the strings is not invertible.
If arrays are present, you might want to compare the paths while ignoring the array indices:
def p: [paths(scalars) | map(select(type=="string"))] | unique;
p as $f1
| (input | p) as $f2
| ($f1 - $f2) + ($f2 - $f1)
| .[]
The last line ensures that the result is a (possibly empty) stream, the point being that this makes it easy to check the return code to determine whether any difference was detected: simply use the -e command-line option. If there are no differences, the return code will then be 4.
One way to check if the stream is empty would be to use the -4

Related

Print the key and a subset of fields if a field is not a specific value

I am new to jq and can't seem to quite get the syntax right for what I want to do. I am executing a command and piping its JSON output into jq. The structure looks like this:
{
"timestamp": 1658186185,
"nodes": {
"x3006c0s13b1n0": {
"Mom": "x3006c0s13b1n0.hsn.cm",
"Port": 15002,
"state": "free",
"pcpus": 64,
"resources_available": {
"arch": "linux",
"gputype": "A100",
"host": "x3006c0s13b1n0",
"mem": "527672488kb",
"ncpus": 64,
"ngpus": 4,
"system": "polaris",
"tier0": "x3006-g1",
"tier1": "g1",
"vnode": "x3006c0s13b1n0"
},
"resources_assigned": {},
"comment": "CHC- Offlined due to node health check failure",
"resv_enable": "True",
"sharing": "default_shared",
"license": "l",
"last_state_change_time": 1658175652,
"last_used_time": 1658175652
},
And so on with a record for each node. In psuedocode, what I want to do is this:
if state is not free then display nodename : {comment = "Why is the node down"}
The nodename is the key, but could be extracted from a field inside the record. However, for future reference, I would like to understand how to get the key. I figured out (I think) that you can't use == on strings, but instead have to use the regex functions.
This gives me the if state is not free part:
<stdin> | jq '.nodes[] | .state | test("free") | not'
This gives me an object with the Mom (which includes the key) and the comment:
jq '.nodes[] | {Mom: .Mom, comment: .comment}'
The question is how do I put all that together? And as for the keys, this gives me a list of the keys: jq '.nodes | keys' but that uses the non-array version of nodes.

One way without touching the keys would be to only select those array items that match the condition, and map the remaining items' value to the comment itself using map_values:
jq '.nodes | map_values(select(.state != "free").comment)'
{
"x3006c0s13b1n0": "CHC- Offlined due to node health check failure"
}
Keeping the whole comments object, which is closer to your desired output, would be similar:
jq '.nodes | map_values(select(.state != "free") | {comment})'
{
"x3006c0s13b1n0": {
"comment": "CHC- Offlined due to node health check failure"
}
}
Accessing the keys directly is still possible though. You may want to have a look at keys, keys_unsorted or to_entries.

Trying to get the correct output from JQ

I'm trying to get this output the device name "test"
My filter is .[] | [.deviceName] and it's returning error: (at :7): Cannot index array with string "deviceName"
{
"test": [
{
"deviceName": "test",
"monitoringServer": "server1"
}
]
}

Presumably you meant:
jq '.test[] | [.deviceName]'
or perhaps:
jq '.[][] | [.deviceName]'
but without knowing your requirements, it's hard to say. That's one of the reasons why the http://stackoverflow.com/help/mcve guidelines were formulated.

Parsing JSON dict of CloudFormation parameters for '--parameter-overrides'

I'm using AWS CloudFormation at the moment, and I need to parse out parameters due to differences between stack creation and deployment. Command aws cloudformation create accepts a JSON file, but aws cloudformation deploy only accepts inlined application parameters of Key=Value type.
I have this JSON file:
[
{
"ParameterKey": "EC2KeyPair",
"ParameterValue": "$YOUR_EC2_KEY_PAIR"
},
{
"ParameterKey": "SSHLocation",
"ParameterValue": "$YOUR_SSH_LOCATION"
},
{
"ParameterKey": "DjangoEnvVarDebug",
"ParameterValue": "$YOUR_DJANGO_ENV_VAR_DEBUG"
},
{
"ParameterKey": "DjangoEnvVarSecretKey",
"ParameterValue": "$YOUR_DJANGO_ENV_VAR_SECRET_KEY"
},
{
"ParameterKey": "DjangoEnvVarDBName",
"ParameterValue": "$YOUR_DJANGO_ENV_VAR_DB_NAME"
},
{
"ParameterKey": "DjangoEnvVarDBUser",
"ParameterValue": "$YOUR_DJANGO_ENV_VAR_DB_USER"
},
{
"ParameterKey": "DjangoEnvVarDBPassword",
"ParameterValue": "$YOUR_DJANGO_ENV_VAR_DB_PASSWORD"
},
{
"ParameterKey": "DjangoEnvVarDBHost",
"ParameterValue": "$YOUR_DJANGO_ENV_VAR_DB_HOST"
}
]
And I want to turn it into this:
'EC2KeyPair=$YOUR_EC2_KEY_PAIR SSHLocation=$YOUR_SSH_LOCATION DjangoEnvVarDebug=$YOUR_DJANGO_ENV_VAR_DEBU
G DjangoEnvVarSecretKey=$YOUR_DJANGO_ENV_VAR_SECRET_KEY DjangoEnvVarDBName=$YOUR_DJANGO_ENV_VAR_DB_NAME D
jangoEnvVarDBUser=$YOUR_DJANGO_ENV_VAR_DB_USER DjangoEnvVarDBPassword=$YOUR_DJANGO_ENV_VAR_DB_PASSWORD Dj
angoEnvVarDBHost=$YOUR_DJANGO_ENV_VAR_DB_HOST'
This would be the equivalent Python code:
thing = json.load(open('stack-params.example.json', 'r'))
convert = lambda item: f'{item["ParameterKey"]}={item["ParameterValue"]}'
>>> print(list(map(convert, thing)))
['EC2KeyPair=$YOUR_EC2_KEY_PAIR', 'SSHLocation=$YOUR_SSH_LOCATION', 'DjangoEnvVarDebug=$YOUR_DJANGO_ENV_V
AR_DEBUG', 'DjangoEnvVarSecretKey=$YOUR_DJANGO_ENV_VAR_SECRET_KEY', 'DjangoEnvVarDBName=$YOUR_DJANGO_ENV_
VAR_DB_NAME', 'DjangoEnvVarDBUser=$YOUR_DJANGO_ENV_VAR_DB_USER', 'DjangoEnvVarDBPassword=$YOUR_DJANGO_EN$
_VAR_DB_PASSWORD', 'DjangoEnvVarDBHost=$YOUR_DJANGO_ENV_VAR_DB_HOST']
>>> ' '.join(map(convert, thing))
'EC2KeyPair=$YOUR_EC2_KEY_PAIR SSHLocation=$YOUR_SSH_LOCATION DjangoEnvVarDebug=$YOUR_DJANGO_ENV_VAR_DEBU
G DjangoEnvVarSecretKey=$YOUR_DJANGO_ENV_VAR_SECRET_KEY DjangoEnvVarDBName=$YOUR_DJANGO_ENV_VAR_DB_NAME D
jangoEnvVarDBUser=$YOUR_DJANGO_ENV_VAR_DB_USER DjangoEnvVarDBPassword=$YOUR_DJANGO_ENV_VAR_DB_PASSWORD Dj
angoEnvVarDBHost=$YOUR_DJANGO_ENV_VAR_DB_HOST'
I have this little snippet:
$ cat stack-params.example.json | jq '.[] | "\(.ParameterKey)=\(.ParameterValue)"'
"EC2KeyPair=$YOUR_EC2_KEY_PAIR"
"SSHLocation=$YOUR_SSH_LOCATION"
"DjangoEnvVarDebug=$YOUR_DJANGO_ENV_VAR_DEBUG"
"DjangoEnvVarSecretKey=$YOUR_DJANGO_ENV_VAR_SECRET_KEY"
"DjangoEnvVarDBName=$YOUR_DJANGO_ENV_VAR_DB_NAME"
"DjangoEnvVarDBUser=$YOUR_DJANGO_ENV_VAR_DB_USER"
"DjangoEnvVarDBPassword=$YOUR_DJANGO_ENV_VAR_DB_PASSWORD"
"DjangoEnvVarDBHost=$YOUR_DJANGO_ENV_VAR_DB_HOST"
But I'm not sure how to join the strings together. I was looking at reduce but I think it only works on lists, and streams of strings aren't lists. So I'm thinking the correct approach is to convert the key : value association into 'key=value' strings within the list, then join altogether, though I have trouble working with the regex. Does anybody have any tips?

The goal as exemplified by the illustrative output seems highly dubious, but it can easily be achieved using the -r command-line option together with the filter:
map("\(.ParameterKey)=\(.ParameterValue)") | "'" + join(" ") + "'"
Footnote
I was looking at reduce but I think it only works on lists, and streams of strings aren't lists.
To use reduce on a list, say $l, you could simply use [] as in:
reduce $l[] as $x (_;_)

How to expand variable value to build a

Sorry if this is included somewhere, looked for a good 30-60 minutes for something along these lines. I am sure I just missed something! Total jq nub!
Basically I am trying to do a pick operation that is dynamic. My thought process was to do something like this:
pickJSON() {
getSomeJSON | jq -r --arg PICK "$1" '{ $PICK }'
}
pickJSON "foo, bar"
but this produces
{ "PICK": "foo, bar" }
Is there a way to essentially ask it to expand shell-style?
Desired Result:
pickJSON() {
getSomeJSON | jq -r --arg PICK "$1" '{ $PICK }'
# perhaps something like...
# getSomeJSON | jq -r --arg PICK "$1" '{ ...$PICK }'
}
pickJSON "foo, bar"
{ "foo": "foovalue", "bar": "barvalue" }
Note that I am new to jq and i just simplified what i am doing - if the syntax is broken that is why :-D my actual implementaiton has a few pipes in there and it does work if i dont try to pick the values out of it.

After a fairly long experimentation phase trying to make this work, I finally came up with what seems like a feasible and reliable solution without the extremely unsettling flaws that could come from utilizing eval.
To better highlight the overall final solution, I am providing a bit more of the handling that I am currently working with below:
Goal
Grab a secret from AWS Secrets Manager
Parse the returned JSON, which looks like this:
{
"ARN": "arn:aws:secretsmanager:us-west-2:123456789012:secret:MyTestDatabaseSecret-a1b2c3",
"Name": "MyTestDatabaseSecret",
"VersionId": "EXAMPLE1-90ab-cdef-fedc-ba987EXAMPLE",
"SecretString": "{\n \"username\":\"david\",\n \"password\":\"BnQw&XDWgaEeT9XGTT29\"\n}\n",
"VersionStages": [
"AWSPREVIOUS"
],
"CreatedDate": 1523477145.713
}
Run some modifications on the JSON string received and pick only the statically requested keys from the secret
Set and export those values as environment variables
Script
# Capture a AWS Secret from secretsmanager, parse the JSON and expand the given
# variables within it to pick them from the secret and return given portion of
# the secret requested.
# #note similar to _.pick(obj, ["foo", "bar"])
getKeysFromSecret() {
aws secretsmanager get-secret-value --secret-id "$1" \
| jq -r '.SecretString | fromjson' \
| jq -r "{ $2 }"
}
# Uses `getKeysFromSecret` to capture the requested keys from the secret
# then transforms the JSON into a string that we can read and loop through
# to set each resulting value as an exported environment variable.
#
## Transformation Flow:
# { "foo": "bar", "baz": "qux" }
# -->
# foo=bar
# baz=qux
# -->
# export foo=bar
# export baz=qux
exportVariablesInSecret() {
while IFS== read -r key value; do
if [ -n "$value" ]; then
export "${key}"="${value}";
fi
done < <(getKeysFromSecret "$1" "$2" | jq -r 'to_entries | .[] | .key + "=" + .value')
}
Example JSON
{
...othervalues
"SecretString": "{\"foo\": \"bar\", \"baz\": \"qux\"}"
}
Example Usage
exportVariablesInSecret MY_SECRET "foo, bar"
echo $foo
# bar
Some Notes / Context
This is meant to set a given set of values as variables so that we aren't just setting an entire arbitrary JSON object as variables that could possibly cause issues / shadowing if someone adds a value like "path" to a secret
A critical goal was to absolutely never use eval to prevent possible injection situations. Far too easy to inject things otherwise.
Happy to see if anyone has a nicer way of accomplishing this. I saw many people recommending the use of declare but that sets the var to the local function scope only so its essentially useless.
Thanks to #cas https://unix.stackexchange.com/a/413917/308550 for getting me on the right track!

fish shell --- how to simulate or implement a hash table, associative array, or key-value store

I am migrating from ksh to fish. I am finding that I miss the ability to define an associative array, hash table, dictionary, or whatever you wish to call it. Some cases can be simulated as in
set dictionary$key $value
eval echo '$'dictionary$key
But this approach is heavily limited; for example, $key may contain only letters, numbers, and underscores.
I understand that the fish approach is to find an external command when one is available, but I am a little reluctant to store key-value information in the filesystem, even in /run/user/<uid>, because that limits me to "universal" scope.
How do fish programmers work around the lack of a key-value store? Is there some simple approach that I am just missing?
Here's an example of the sort of problem I would like to solve: I would like to modify the fish_prompt function so that certain directories print not using prompt_pwd but using special abbreviations. I could certainly do this with a switch command, but I would much rather have a universal dictionary so I can just look up a directory and see if it has an abbreviation. Then I could change the abbreviations using set instead of having to edit a function.

You can store the keys in one variable and values in the other, and then use something like
if set -l index (contains -i -- foo $keys) # `set` won't modify $status, so this succeeds if `contains` succeeds
echo $values[$index]
end
to retrieve the corresponding value.
Other possibilities include alternating between key and value in one variable, though iterating through this is a pain, especially when you try to do it only with builtins. Or you could use a separator character and store a key-value pair as one element, though this won't work for directories because variables cannot contain \0 (which is the only possible separator for paths).

Here is how I implemented the alternative solution mentioned by #faho
I'm using '__' as seperator.
function set_field --argument-names dict key value
set -g $dict'__'$key $value
end
function get_field --argument-names dict key
eval echo \$$dict'__'$key
end

If you wanted to use a single variable with paired key/values, it's possible but as #faho mentioned, it is more complicated. Here's how you could do it:
function dict_keys -d "Print keys from a key/value paired list"
for idx in (seq 1 2 (count $argv))
echo $argv[$idx]
end
end
function dict_values -d "Print values from a key/value paired list"
for idx in (seq 2 2 (count $argv))
echo $argv[$idx]
end
end
function dict_get -a key -d "Get the value associated with a key in a k/v paired list"
test (count $argv) -gt 2 || return 1
set -l keyseq (seq 2 2 (count $argv))
# we can't simply use `contains` because it won't distinguish keys from values
for idx in $keyseq
if test $key = $argv[$idx]
echo $argv[(math $idx + 1)]
return
end
end
return 1
end
Then you could use these functions like this:
$ set -l mydict \
yellow banana \
red cherry \
green grape \
blue berry
$ dict_keys $mydict
yellow
red
green
blue
$ dict_values $mydict
banana
cherry
grape
berry
$ dict_get blue $mydict
berry
$ dict_get purple $mydict || echo "not found"
not found

#faho's answer got me thinking about this and there are a few this I wanted to add.
At first I wrote a small set of fish functions (A sort of library, if you will) that dealt with serialization, you would call a dict function with a key name, an operation (get, set, add or del) and it would use global variables to keep track of keys and their values. Works fine for flat hashes/dicts/objects, but felt somewhat unsatisfactory.
Then I realized I could use something like jq to (de-)serialize JSON. That would also make it a lot easier to deal with nesting, plus that allows having different dicts which use the same name for a key without any issues. It also separates "dealing-with-environment-variables" and "dealing-with-dicts(/hashes/etc)", which seems like a good idea. I'll focus on jq here, but the same applies to yq or pretty much anything, the core point is: Serialize data before storing, de-serialize when reading, and use some tool to work with such data.
I then proceeded to rewrite my functions using jq. however I soon realized it was easier to just use jq without any functions. To summarize the workfolow, let's consider OP's scenario and imagine we want to use abbreviations for User folders, or even better, we wanna use icons for such folders. To do that, let's assume we use Nerdfonts and have their icons availabe. A quick search for folders on Nerdfont's cheat sheet show we only have folder icons for the home folder (f74b), downloads(f74c) and images(f74e), so I'll use Material Design Icon's "File document box" (f719) for documents, and Material Design Icon's "Video" (fa66) for Videos.
So our Codepoints are:
User folder: \uf74b
Downloads \uf74c
Images: \uf74e
Documents: \uf719
Videos: \ufa66
So our JSON is:
{"~":"\uf74b","downloads":"\uf74c","images":"\uf74e","documents":"\uf719","videos":"\ufa66"}
I kept it in a single line for a reason which will become obvious now. Let's visualize this using jq:
echo '{"~":"\uf74b","downloads":"\uf74c","images":"\uf74e","documents":"\uf719","videos":"\ufa66"}' | jq 
For completeness sake, here's how it looks with Nerdfonts installed:
Now let's store this as a variable:
set -g FOLDER_ICONS (echo '{"~":"\uf74b","downloads":"\uf74c","images":"\uf74e","documents":"\uf719","videos":"\ufa66"}' | jq -c)
jq -c interprets JSON and outputs JSON in a compact structure, i.e., a single line. Ideal for storing variables.
If you need to edit something you can use jq, lat's say you want to change the abbreviation for documents to "doc" instead of an icon. Just do:
set -g FOLDER_ICONS (echo $FOLDER_ICONS | jq -c '.["documents"]="doc"')
The echo part is for reading a variable, and the set -g is for updating the variable. So those can be ignored if you're not working with variables.
As for retrieving values, jq also does that, obviously. Let's say you want to get the abbreviation for the documents folder, you can simply do:
echo $FOLDER_ICONS | jq -r '.["documents"]'
It will return doc. If you leave out the -r it will return "doc", with quotes, since strings are quoted in JSON.
You can also remove keys pretty easily, i.e.:
set -g FOLDER_ICONS (echo $FOLDER_ICONS | jq -c 'del(."documents")')
will set the variable FOLDER_ICONS to the result of reading it and passing its contents to jq -c 'del(."documents")', which tels jq to delete the key "documents" and output a compact representation of the JSON, i.e. a single line.
Everything I tried worked perfectly fine with nested JSON objects, so it seems like a pretty good solution. It's just a matter of keeping the operations in mind:
reading .["key"]
writing .["key"]="value"
deleting del(."key")
jq also has many other nice features, I wanted to showcase a bit of them so I tried looking for stuff that might be nice to include here. One of the things I use jq for is dealing with wayland stuff, especially swaymsg -t get_tree, which I've just ran and, with a mere 4 workspaces with a single window in each, outputs a 706-line JSON from hell (Was 929 when I wrote this, 6 windows across 5 workspaces, later I closed 2 windows I was done with so I came back here and re-ran the command to share the lowest possible value).
To give a more complex example of how jq might be used, here's parsing the swaymsg -t get_tree:
swaymsg -t get_tree | jq -C '{"id": .id, "type": .type, "name": .name, "nodes": (.nodes | map(.nodes) | flatten | map({"id": .id, "type": .type, "name": .name, "nodes": (.nodes | map(.nodes) | flatten | map({"id": .id, "type": .type, "name": .name}))}))}'
This will give you a tree with only id, type, name and nodes, where nodes is an array of objects, each consisting of the id, type, name and nodes of the children, with the children nodes also being an array of objects, now consisting of only id, type and name. In my case, it returned:
{
"id": 1,
"type": "root",
"name": "root",
"nodes": [
{
"id": 2147483646,
"type": "workspace",
"name": "__i3_scratch",
"nodes": []
},
{
"id": 184,
"type": "workspace",
"name": "1",
"nodes": []
},
{
"id": 145,
"type": "workspace",
"name": "2",
"nodes": []
},
{
"id": 172,
"type": "workspace",
"name": "3",
"nodes": [
{
"id": 173,
"type": "con",
"name": "Untitled-4 - Code - OSS"
}
]
},
{
"id": 5,
"type": "workspace",
"name": "4",
"nodes": []
}
]
}
You can also easily make a flattened version of that with jq by slightly changing the command:
swaymsg -t get_tree | jq -C '[{"id": .id, "type": .type, "name": .name}, (.nodes | map(.nodes) | flatten | map([{"id": .id, "type": .type, "name": .name}, (.nodes | map(.nodes) | flatten | map({"id": .id, "type": .type, "name": .name}))]))] | flatten'
Now instead of having a key nodes, the child nodes are also in the parent's array, flattened, in my case:
[
{
"id": 1,
"type": "root",
"name": "root"
},
{
"id": 2147483646,
"type": "workspace",
"name": "__i3_scratch"
},
{
"id": 184,
"type": "workspace",
"name": "1"
},
{
"id": 145,
"type": "workspace",
"name": "2"
},
{
"id": 172,
"type": "workspace",
"name": "3"
},
{
"id": 173,
"type": "con",
"name": "Untitled-4 - Code - OSS"
},
{
"id": 5,
"type": "workspace",
"name": "4"
}
]
It's pretty nifty, not limited to environment variables, and solves pretty much every problem I can think of. The only con is verbosity, so it may be a good idea to write a few fish functions for dealing with that, but that's beyond the scope here, as I'm focusing on a general approach to (de-)serialization of key-value mappings (i.e., dicts, hashes, objects etc), which can be (also) used with environment variables. For reference, a good starting point if dealing with variables might be:
function dict
switch $argv[2]
case write
read data
set -xg $argv[1] "$data"
case read, '*'
echo $$argv[1]
end
end
This simply deals with reading and writing to a variable, the only reason it's worth sharing is, first, that it allows piping something to a variable, and second, that it sets a starting point to make something more complex, i.e. automatically piping the echoed value to jq, or adding an add operation or whatever.
There's also the option of writing a script to deal with that, instead of using jq. Ruby's Marshal and to_yaml seems like interesting options, since I like ruby, but each person has their own preferences. For Python, pickle, pyyaml and json seem worth mentioning.
It's worth mentioning I'm not affiliated to jq in any way, never contributed nor even posted anything on issues or whatever, I just use it, and as someone who used to write scripts whenever I had to deal with JSON or YAML, it was quite surprising when I realized how powerful it was.

I finally needed this for an application, and I'm not super comfortable with fish builtins, so here is an implementation in Lua: https://gist.github.com/nrnrnr/b302db5c59c600dd75c38d460423cc3d. This code uses the alternating key/value representation:
key1 value1 key2 value2 ...

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Compare json files but ignore values - jq

Related

Print the key and a subset of fields if a field is not a specific value

Trying to get the correct output from JQ

Parsing JSON dict of CloudFormation parameters for '--parameter-overrides'

How to expand variable value to build a

fish shell --- how to simulate or implement a hash table, associative array, or key-value store

Categories

Resources