How to extract the operationId from an OpenAPI spec using JQ

How to extract the operationId from an OpenAPI spec using JQ - jq

The OpenAPI spec defines operationId as child element of the method (GET, POST) of a path (e.g /customers ) which is a child of path. As dot notation it would look like:
.paths."/customers".get.operationId
.paths."/customers".post.operationId
.paths."/reports/daily".get.operationId
Or written with with wildcards: .paths.*.*.operationId
I looked at the following information:
How to use jq wildcard
https://github.com/stedolan/jq/issues/82
https://unix.stackexchange.com/questions/320145/wildcard-in-jq-with-comparatives
and tried .path | .. | .. | .operationId but with no success.
My goal is to get a list of operationIds. What do I miss?
Update (as requested): Sample Json OpenAPI spec
Update To clarify from another question: The operationId might occur in other positions, other than 3 levels below path. Don't need those

My goal is to get a list of operationIds
Since you only want the operationId values under .path, you could write:
.path | .. | objects | select(has("operationId")) | .operationId
or if you want more control over the admissible paths, you could use the following as a template:
.path
| (paths | select(.[-1] == "operationId")) as $p
| getpath($p)
"Wildcard" approach
.. is a recursive operation that in effect follows all "paths" under a JSON entity; in this context, therefore, the wildcard "*" expression is more akin to jq's .[]? expression (i.e. postfix []?). So your meta-expression .paths.*.*.operationId could be written in jq as:
.paths[]?[]?.operationId
or
.paths[]?[]?.operationId?
depending on your interpretation of the expression with wildcards.
With the sample input, either of these produces:
"listPets"
"createPets"
"showPetById"

Related

Does `FIND PATH` clause support `WHERE` in NebulaGraph Database

The FIND PATH clause is useful in graph traversing. For example
FIND SHORTEST PATH FROM "player102" TO "team204" OVER * YIELD path AS p;
But the filter condition is commonly used in traversing, I.e, the WEHER clause.
The manual didn't give any examples or syntax about how to write a WHERE clause. Some statements work and some won't
FIND ALL PATH FROM "player100" TO "team204" OVER * WHERE follow.degree is EMPTY or follow.degree >=0 YIELD path AS p;
+------------------------------------------------------------------------------+
| p |
+------------------------------------------------------------------------------+
| <("player100")-[:serve#0 {}]->("team204")> |
| <("player100")-[:follow#0 {}]->("player125")-[:serve#0 {}]->("team204")> |
| <("player100")-[:follow#0 {}]->("player101")-[:serve#0 {}]->("team204")> |
|... |
+------------------------------------------------------------------------------+
FIND ALL PATH FROM "player100" TO "team204" OVER * where player.age is EMPTY or follow.degree >=0
--- a syntax error occurs.
Does it support WHERE, and how to write WHERE clause

FIND PATH only supports filtering properties of edges with WHERE clauses, such as WHERE follow.degree is EMPTY or follow.degree >=0.
Filtering properties of vertices is not supported currently by version 3.3.0.

What is the structure of the Presentation-Data-Value in P-Data-TF?

I find these for Presentation Data Value:
0 | 0 | 0 | 0 | 0 | 0 | last-or-not | command-or-dataset | **Some Message**
But I couldn't find Some Message part. I suppose this part include C-Find, C-Get etc. How can I know this structure?

Where do you have this from? In fact it is a bit different.
Your example should read
0 | 0 | 0 | 0 | 0 | 0 | last-or-not | command-or-data*set* | **Some Message**
So the "command-or-dataset" flag indicates whether the following bytes are encoding a command (as defined in PS3.7) or a dataset as defined in PS3.3 or 3.4 respectively).
E.g. for DICOM Queries, there is a C-FIND command defined in PS3.7, Chapter 9.1.2.1. In C-FIND, the query criteria are part of the command ("Identifier") in table 9.1-2. How the identifier is formed and all its semantics is subject of the Query/Retrieve Service Class as defined in PS3.4, C.4.1.
For transferring objects, there is a C-STORE command, also defined in PS3.7 (chapter 9.1.1.1). The Data-Set is also a part of the C-STORE command, and its contents depend on the type of data (SOP Class). This is referred to as an Information Object Definition (IOD) and defined in PS3.3. The protocol for Strorage is also defined in PS3.4 (Annex B)
However, the length limitation of the PDV will only allow to have the whole object encoded in a single PDV and needs to be split. For the following PDVs, no command set will be present but only a fragment of the dataset. In this case, the "command-or-dataset" bit must be set to 0.
I hope I could make it a bit clear. It is a bit difficult in the beginning of learning DICOM to know all the terms and the interrelationships.
Encoding
Logically command- and dataset are encoded in the same way. The data dictionary (Part 6) is a complete list of all possible attributes and the major difference between command- and data set attributes is that command attributes are having the group number 0 while data set attributes have "any even number but 0".
For each attribute, the data dictionary gives you the Value Representation (VR) which needs to be considered for encoding the value. E.g. "PN" for patient name "UI" for Unique Identifier and so forth. The VRs are defined in PS3.5, Chapter 6.2.
The encoding of attributes is then
group | element | (VR) | length (always even) | value
How this is transformed to the binary level depends on the Transfer Syntax (TS) that was agreed for the service during association negotiation. For this reason "VR" is enclosed in brackets above - it depends whether it is an implicit or explicit TS if this must/must not be present.
There are some more things to consider (endianess, sequence encoding) when encoding code sets or datasets in binary form. Basically everything about it is described in various chapters in PS3.5

How do I flatten a large, complex, deeply nested JSON file into multiple CSV files a linking identifier

I have a complex JSON file (~8GB) containing publically available data for businesses. We have decided to split the files up into multiple CSV files (or tabs in a .xlsx), so clients can easily consume the data. These files will be linked by the NZBN column/key.
I'm using R and jsonlite to read a small sample in (before scaling up to the full file). I'm guessing I need some way to specify what key/columns go in each file (i.e, the first file will have headers: australianBusinessNumber, australianCompanyNumber, australianServiceAddress, the second file will have headers: annualReturnFilingMonth, annualReturnLastFiled, countryOfOrigin...)
Here's a sample of two businesses/entities (I've bunged some of the data as well so ignore the actual values): test file
I've read almost every post on s/o of similar questions and none seem to be giving me any luck. I've tried variations of purrr, *apply commands, custom flattening functions and jqr (an r version of 'jq' - looks promising but I can't seem to run it).
Here's an attempt at creating my separate files, but I'm unsure how to include the linking identifier (NZBN) + I keep running into further nested lists (i'm unsure how many levels of nesting there are)
bulk <- jsonlite::fromJSON("bd_test.json")
coreEntity <- data.frame(bulk$companies)
coreEntity <- coreEntity[,sapply(coreEntity, is.list)==FALSE]
company <- bulk$companies$entity$company
company <- purrr::reduce(company, dplyr::bind_rows)
shareholding <- company$shareholding
shareholding <- purrr::reduce(shareholding, dplyr::bind_rows)
shareAllocation <- shareholding$shareAllocation
shareAllocation <- purrr::reduce(shareAllocation, dplyr::bind_rows)
I'm not sure if it's easier to split the files up during the flattening/wrangling process, or just completely flatten the whole file so I just have one line per business/entity (and then gather columns as needed) - my only concern is that I need to scale this up to ~1.3million nodes (8GB JSON file).
Ideally I would want the csv files split every time there is a new collection, and the values in the collection would become the columns for the new csv/tab.
Any help or tips would be much appreciated.
------- UPDATE ------
Updated as my question was a little vague I think all I need is some code to produce one of the csv's/tabs and I replicate for the other collections.
Say for example, I wanted to create a csv of the following elements:
entityName (unique linking identifier)
nzbn (unique linking
identifier)
emailAddress__uniqueIdentifier
emailAddress__emailAddress
emailAddress__emailPurpose
emailAddress__emailPurposeDescription
emailAddress__startDate
How would I go about that?

i'm unsure how many levels of nesting there are
This will provide an answer to that quite efficiently:
jq '
def max(s): reduce s as $s (null;
if . == null then $s elif $s > . then $s else . end);
max(paths|length)' input.json
(With the test file, the answer is 14.)
To get an overall view (schema) of the data, you could
run:
jq 'include "schema"; schema' input.json
where schema.jq is available at this gist. This will produce a structural schema.
"Say for example, I wanted to create a csv of the following elements:"
Here's a jq solution, apart from the headers:
.companies.entity[]
| [.entityName, .nzbn]
+ (.emailAddress[] | [.uniqueIdentifier, .emailAddress, .emailPurpose, .emailPurposeDescription, .startDate])
| #csv
shareholding
The shareholding data is complex, so in the following I've used the to_table function defined elsewhere on this page.
The sample data does not include a "company name" field so in the following, I've added a 0-based "company index" field:
.companies.entity[]
| [.entityName, .nzbn] as $ix
| .company
| range(0;length) as $cix
| .[$cix]
| $ix + [$cix] + (.shareholding[] | to_table(false))
jqr
The above solutions use the standalone jq executable, but all going well, it should be trivial to use the same filters with jqr, though to use jq's include, it might be simplest to specify the path explicitly, as for example:
include "schema" {search: "~/.jq"};

If the input JSON is sufficiently regular, you
might find the following flattening function helpful, especially as it can emit a header in the form of an array of strings based on the "paths" to the leaf elements of the input, which can be arbitrarily nested:
# to_table produces a flat array.
# If hdr == true, then ONLY emit a header line (in prettified form, i.e. as an array of strings);
# if hdr is an array, it should be the prettified form and is used to check consistency.
def to_table(hdr):
def prettify: map( (map(tostring)|join(":") ));
def composite: type == "object" or type == "array";
def check:
select(hdr|type == "array")
| if prettify == hdr then empty
else error("expected head is \(hdr) but imputed header is \(.)")
end ;
. as $in
| [paths(composite|not)] # the paths in array-of-array form
| if hdr==true then prettify
else check, map(. as $p | $in | getpath($p))
end;
For example, to produce the desired table (without headers) for .emailAddress, one could write:
.companies.entity[]
| [.entityName, .nzbn] as $ix
| $ix + (.emailAddress[] | to_table(false))
| #tsv
(Adding the headers and checking for consistency,
are left as an exercise for now, but are dealt with below.)
Generating multiple files
More interestingly, you could select the level you want, and produce multiple tables automagically. One way to partition the output into separate files efficiently would be to use awk. For example, you could pipe the output obtained using this jq filter:
["entityName", "nzbn"] as $common
| .companies.entity[]
| [.entityName, .nzbn] as $ix
| (to_entries[] | select(.value | type == "array") | .key) as $key
| ($ix + [$key] | join("-")) as $filename
| (.[$key][0]|to_table(true)) as $header
# First emit the line giving all the headers:
| $filename, ($common + $header | #tsv),
# Then emit the rows of the table:
(.[$key][]
| ($filename, ($ix + to_table(false) | #tsv)))
to
awk -F\\t 'fn {print >> fn; fn=0;next} {fn=$1".tsv"}'
This will produce headers in each file; if you want consistency checking, change to_table(false) to to_table($header).

jq: error (at <stdin>:0): Cannot iterate over null (null)

I've been working with an API call to structure it in JSON format so I might later push it into a database. Then code looks like this:
getPage() {
curl --fail -X GET 'https://api.app.com/v1/test?page=1&pageSize=1000&sort=desc' \
-H 'Authorization: Bearer 123abc456pickupsticks789' \
-H 'cache-control: no-cache'
}
getPage \
| jq -c '.items | .[] | {landing_id: .landing_id, submitted_at: .submitted_at, answers: .answers, email: .hidden.email}' \
> testpush.json
When I run it though, it produces this error: jq: error (at <stdin>:0): Cannot iterate over null (null)
I've looked at solutions such as this one, or this one from this site, and this response.
The common solution seemed to be using a ? in front of [] and I tried it in the jq line towards the bottom, but it still does not work. It just produces an empty json file.
Am I misreading the takeaway from those other answers and not putting my ? in the right place?>

To protect against the possibility that .items is not an array, you could write:
.items | .[]?
or even more robustly:
try .items[]
which is equivalent to (.items[])?.
In summary:
try E is equivalent to try E catch empty
try E is equivalent to (E)?
(Note that the expressions .items[]? and (.items[])? are not identical.)
However none of these will provide protection against input that is invalid JSON.
p.s. In future, please follow the mcve guidelines (http://stackoverflow.com/help/mcve); in the present case, it would have helped if you had provided an illustrative JSON snippet based on the output produced by the curl command.

It is necessary to let JSON know that it can continue after an unexpected value while parsing that array. try or ? are perfect options for that.
Bear in mind that it is either necessary to guarantee the data or let the interpreter to know that it is ok to continue. It may sounds redundant, but it is something like a fail-safe approach to prevent unexpected results that are harder to track/notice.
Also, it is necessary to be aware about the differences for "testing" between ? vs try.
Assuming that $sample meets JSON standards the code bellow will work always:
sample='{"myvar":1,"var2":"foo"}'
jq '{newVar: ((.op[]? | .item) // 0)}' <<< $sample
so, the op array is null for $sample as above, but it is clear to jq that it can continue without asking for your intervention/fix.
But if you do assume ? as the same as try, you may get an error (took me a loot to learn this, and it is not clear in the documentation). As an example of improper use of ? we have:
sample='{"myvar":1,"var2":"foo"}'
jq '{newVar: (.op[].item? // 0)}' <<< $sample
So, as op is null it will lead to an error, because you are telling to jq to ignore an error while retrieving .item, while there is mention about the possibility of an error during the attempt to iterate over null (in this case .op[]), and that attempt happened before that point checking for .item.
On the other hand, try would work in this case:
sample='{"myvar":1,"var2":"foo"}'
jq '{newVar: (try .op[].item catch 0)}' <<< $sample
This is a small use difference that can lead to a large difference in the result

Dictionary as variable in Robot Framework: code runs ok but the IDE yields error

I'm trying to set up a dictionary as a variable (so I can use it as a Resource and access its values from another file) and there is something that is driving me crazy.
Here is the code I have (just for testing purposes):
*** Settings ***
Documentation Suite description
Library Collections
*** Variables ***
&{SOME DICT} key1=value1 key2=value2
*** Test Cases ***
Dict Test # why $ instead of &?
${RANDOM VAR}= Get From Dictionary ${SOME DICT} key1
Log ${RANDOM VAR} WARN
If I run that, I got the expected result ([ WARN ] value1) BUT the IDE (PyCharm) is complaining about that ${SOME DICT} variable is not defined, and the dictionary declaration is not highlighted the same as variable or a list.
If I change that to &{SOME DICT} the IDE won't complain anymore, but the test fails with the following output:
Dict Test | FAIL |
Keyword 'Collections.Get From Dictionary' got positional argument after named arguments.
That is puzzling me to no end: why I have to use a $ instead of a & if it's a dictionary to make it work? Is there something I am doing wrong and it is just running by luck?
Thanks for any advice or guidance you may have!

Have a look into "Get from Dictionary" libdoc,looks like example is showing the same as your working snippet:
Name: Get From Dictionary
Source: Library (Collections)
Arguments: [dictionary, key]
Returns a value from the given ``dictionary`` based on the given ``key``.
If the given ``key`` cannot be found from the ``dictionary``, this
keyword fails.
The given dictionary is never altered by this keyword.
Example:
| ${value} = | Get From Dictionary | ${D3} | b |
=>
| ${value} = 2
Keyword implementation details are as follows:
try:
return dictionary[key]
except KeyError:
raise RuntimeError("Dictionary does not contain key '%s'." % key)
So indeed, Robot sends representation of dict content and not dict name thus value for key can be returned.
This is the same as direct call in python:
a = {u'key1': u'value1', u'key2': u'value2'}
print(a['key1'])
In the end, libdoc for that KW is not straightforward but your PyCharm plugin for Robot does not work properly in this case.
In RED Robot Editor (Eclipse based), proper case does not rise any warnings in editor, wrong-case provides error marker about arguments (better but still not clear what is exactly wrong. Blame minimalistic libdoc info).
ps. I am lead of RED project to be clear.

Simple Example to Use Key Value Variable in robot framework
Set value to dictionary
Get value from dictionary
&{initValues} Create Dictionary key1=value1 key2=value2
Set To Dictionary ${initValues} key1=newvalue1
Set To Dictionary ${initValues} key2=newvalue2
Set To Dictionary ${initValues} key3=newvalue3
${value} Get From Dictionary ${intialValues} key1