JSON to CSV conversion using jq - jq
I have a 1GB JSON file I like to convert to CSV format. The file contains information about UK company people with significant control (PSC).
file source: http://download.companieshouse.gov.uk/en_pscdata.html
here is a data snippet of PSC Data product:
{"company_number":"04502074","data":{"address":{"address_line_1":"Grove Hall","address_line_2":"Ashbourne Green","locality":"Ashbourne","postal_code":"DE6 1JD","region":"Derbyshire"},"country_of_residence":"England","date_of_birth":{"month":11,"year":1964},"etag":"f9a632332f63b61f004569f99d6b15e3e6d28192","kind":"individual-person-with-significant-control","links":{"self":"/company/04502074/persons-with-significant-control/individual/34zTsx2BFGMyn0lJe2REL656U8w"},"name":"Mr Philip Anthony Donlan","name_elements":{"forename":"Philip","middle_name":"Anthony","surname":"Donlan","title":"Mr"},"nationality":"British","natures_of_control":["ownership-of-shares-50-to-75-percent"],"notified_on":"2016-04-06"}}
{"company_number":"10260075","data":{"address":{"country":"England","locality":"Widnes","postal_code":"WA8 9DH","premises":"1 Stockswell Farm Court"},"country_of_residence":"England","date_of_birth":{"month":12,"year":1978},"etag":"dbf13fc08cb9136450089681b6e9364eb8458129","kind":"individual-person-with-significant-control","links":{"self":"/company/10260075/persons-with-significant-control/individual/Br24rkYIl3ZKam3C9fT4o_9uF7k"},"name":"Mr Daniel Thomas Ross","name_elements":{"forename":"Daniel","middle_name":"Thomas","surname":"Ross","title":"Mr"},"nationality":"English","natures_of_control":["significant-influence-or-control"],"notified_on":"2016-07-01"}}
{"company_number":"SC539354","data":{"address":{"address_line_1":"5 West Victoria Dock Road","locality":"Dundee","postal_code":"DD1 3JT","premises":"Begbies Traynor (Central) Llp, River Court"},"country_of_residence":"Scotland","date_of_birth":{"month":4,"year":1980},"etag":"37599a22ede050072457db60af6e75ba8e237246","kind":"individual-person-with-significant-control","links":{"self":"/company/SC539354/persons-with-significant-control/individual/T7LPjXkKRuaunfMRjfFrnWiHEnI"},"name":"Mr Stuart Hemple","name_elements":{"forename":"Stuart","surname":"Hemple","title":"Mr"},"nationality":"British","natures_of_control":["ownership-of-shares-75-to-100-percent","voting-rights-75-to-100-percent","right-to-appoint-and-remove-directors","significant-influence-or-control"],"notified_on":"2016-07-01"}}
{"company_number":"02722495","data":{"address":{"address_line_1":"Beechdene","address_line_2":"108 Coventry Road","locality":"Warwick","postal_code":"CV34 5HH"},"country_of_residence":"England","date_of_birth":{"month":12,"year":1953},"etag":"610138d3809ab3237b609f3cb93bfe4bf89d7581","kind":"individual-person-with-significant-control","links":{"self":"/company/02722495/persons-with-significant-control/individual/8g4ED3usT4wLPEqra7dE97eqmHE"},"name":"Mr Marshall Fenn Stephenson","name_elements":{"forename":"Marshall","middle_name":"Fenn","surname":"Stephenson","title":"Mr"},"nationality":"British","natures_of_control":["ownership-of-shares-25-to-50-percent"],"notified_on":"2016-07-01"}}
{"company_number":"05495733","data":{"address":{"address_line_1":"Brompton Road","country":"England","locality":"London","postal_code":"SW3 2AS","premises":"253"},"country_of_residence":"Italy","date_of_birth":{"month":4,"year":1953},"etag":"d45e5d2aa905e769e6fd3aa364e301f73b047985","kind":"individual-person-with-significant-control","links":{"self":"/company/05495733/persons-with-significant-control/individual/Oqp-z-D5JTX0mjXTtmOqmct1vR4"},"name":"Mr Roberto Gavazzi","name_elements":{"forename":"Roberto","surname":"Gavazzi","title":"Mr"},"nationality":"Italian","natures_of_control":["ownership-of-shares-50-to-75-percent","voting-rights-50-to-75-percent","right-to-appoint-and-remove-directors-as-firm","significant-influence-or-control-as-firm"],"notified_on":"2016-06-30"}}
{"company_number":"SC539355","data":{"address":{"address_line_1":"6 Dryden Road","country":"Scotland","locality":"Loanhead","postal_code":"EH20 9LZ","premises":"Bilston Glen Business Centre"},"country_of_residence":"Scotland","date_of_birth":{"month":10,"year":1990},"etag":"b03abb8bb1b6f95039dd896210d7c231d8784c31","kind":"individual-person-with-significant-control","links":{"self":"/company/SC539355/persons-with-significant-control/individual/tYyjuJrp6Ifp327VxThVGeRswMM"},"name":"Mr David John Kelly","name_elements":{"forename":"David","middle_name":"John","surname":"Kelly","title":"Mr"},"nationality":"Scottish","natures_of_control":["ownership-of-shares-75-to-100-percent"],"notified_on":"2016-07-01"}}
{"company_number":"SC539356","data":{"address":{"address_line_1":"Scholes","country":"England","locality":"Wigan","postal_code":"WN1 1YF","premises":"106 Douglas House"},"country_of_residence":"England","date_of_birth":{"month":3,"year":1961},"etag":"2f15d0fbacc68763b00e203ab0820b0911ac5906","kind":"individual-person-with-significant-control","links":{"self":"/company/SC539356/persons-with-significant-control/individual/0eATs-Ecoj9ie0_pBCq29L6UtlM"},"name":"Mr Mark Edward Sowery","name_elements":{"forename":"Mark","middle_name":"Edward","surname":"Sowery","title":"Mr"},"nationality":"British","natures_of_control":["ownership-of-shares-75-to-100-percent"],"notified_on":"2016-07-01"}}
{"company_number":"07674942","data":{"address":{"address_line_1":"Old Gloucester Street","country":"England","locality":"London","postal_code":"WC1N 3AX","premises":"27"},"country_of_residence":"Sierra Leone","date_of_birth":{"month":2,"year":1979},"etag":"f2e9cc0033cd5ef24fcda06baeff06bb8ea72654","kind":"individual-person-with-significant-control","links":{"self":"/company/07674942/persons-with-significant-control/individual/D31C5Na0B1I4rqM1RYwpy3J8oKA"},"name":"Mr Muhammad Umar Babar","name_elements":{"forename":"Muhammad","middle_name":"Umar","surname":"Babar","title":"Mr"},"nationality":"Pakistani","natures_of_control":["ownership-of-shares-75-to-100-percent"],"notified_on":"2016-07-01"}}
{"company_number":"09639364","data":{"address":{"address_line_1":"Galmington Road","country":"United Kingdom","locality":"Taunton","postal_code":"TA1 5NP","premises":"58b","region":"Somerset"},"country_of_residence":"United Kingdom","date_of_birth":{"month":12,"year":1977},"etag":"25ff7d41f9b8f257f0d41ae82e88202017beff34","kind":"individual-person-with-significant-control","links":{"self":"/company/09639364/persons-with-significant-control/individual/qlPpucOQopiIgq1xzZIb6xjO5JQ"},"name":"Mr Li Ying Cao","name_elements":{"forename":"Li","middle_name":"Ying","surname":"Cao","title":"Mr"},"nationality":"British","natures_of_control":["ownership-of-shares-75-to-100-percent"],"notified_on":"2016-07-01"}}
{"company_number":"08541397","data":{"address":{"address_line_1":"Hedley Avenue","locality":"Blyth","postal_code":"NE24 3JP","premises":"27","region":"Northumberland"},"country_of_residence":"England","date_of_birth":{"month":11,"year":1949},"etag":"b843664ca67a4274ee6f6cc9816ab35cd8367190","kind":"individual-person-with-significant-control","links":{"self":"/company/08541397/persons-with-significant-control/individual/KuddC6fZH17ifaXSAVWEcC2ba74"},"name":"Mr David Harwood","name_elements":{"forename":"David","surname":"Harwood","title":"Mr"},"nationality":"British","natures_of_control":["ownership-of-shares-75-to-100-percent"],"notified_on":"2016-05-23"}}
{"company_number":"02832188","data":{"address":{"address_line_1":"Lodge Road","country":"England","locality":"London","postal_code":"NW4 4DD","premises":"1"},"country_of_residence":"England","date_of_birth":{"month":5,"year":1952},"etag":"0c21b2b560ee43ca0c2ffd9f07d5ca564536b6e2","kind":"individual-person-with-significant-control","links":{"self":"/company/02832188/persons-with-significant-control/individual/Rh8pb-L7fEZzkyhttuCwVjjL_eA"},"name":"Mrs Rachel Weissman","name_elements":{"forename":"Rachel","surname":"Weissman","title":"Mrs"},"nationality":"British","natures_of_control":["ownership-of-shares-25-to-50-percent","voting-rights-25-to-50-percent","right-to-appoint-and-remove-directors","significant-influence-or-control"],"notified_on":"2016-07-01"}}
I have created a input file in.json, contain my json data as providet by companies house: file source: http://download.companieshouse.gov.uk/en_pscdata.html
I have created a output file out.csv
I am trying to run the follwing code:
jq -r 'map({company_number,address_line_1,country,locality,postal_code,premises,ceased_on,country_of_residence,month,year,etag,kind}) | (first | keys_unsorted) as $keys | map([to_entries[] | .value]) as $rows | $keys,$rows[] | #csv' in.json > out.csv
im getting the following error:
jq: error (at in.json:0): Cannot index string with string "company_number"
please advise on what am I doing wrong and how to get this done.
Since you are selecting bits of data from different levels of the input objects, you will need to specify the selection more precisely.
As your input consists of a stream of JSON objects, let's start with a function for reading one of those objects:
# Input and output: a JSON object
def get:
{company_number} as $number
| .data
| (.address | {address_line_1,country,locality,postal_code,premises}) as $address
| {ceased_on,country_of_residence} as $details
| (.date_of_birth | {month, year}) as $dob
| $number + $address + $details + $dob + {etag,kind}
;
There are several ways to read JSON streams, but it's quite convenient to use use input and inputs with the -n command-line option.
To make things easy to read, let's next define another helper function for producing an array of the relevant data:
def getRow:
get | [.[]];
Putting it all together:
(input|get)
| keys_unsorted,
[.[]],
(inputs | getRow)
| #csv
Don't forget the -r and -n command-line options!
Footnote:
In general, using [.[]] to "flatten" a JSON object to a flat array of values is ill-advised, but in the present case, we have ensured a consistent ordering of keys in get, and it is reasonable to assume that none of the values in the selected fields are compound, as suggested by the snippet and the 500,000 records in one of the snapshot files. A "robustification" would, however, be trivial to achieve (e.g. using tostring), and might therefore be advisable.
If you were using gojq (the Go implementation of jq), you would have to do things slightly differently as gojq does not respect user-specified ordering of keys. You'd have to generate the header row differently and make minor changes to get.
Related
How can you filter on "Keys" using jq?
I am looking to filter a JSON stream based on its keys. Here is the public JSON file: https://s3.amazonaws.com/okta-ip-ranges/ip_ranges.json that I am trying to wrangle. When I filter this for keys jq 'keys', I get the following output [ "apac_cell_1", "emea_cell_1", "emea_cell_2", "preview_cell_1", "preview_cell_2", "preview_cell_3", "us_cell_1", "us_cell_10", "us_cell_11", "us_cell_12", "us_cell_2", "us_cell_3", "us_cell_4", "us_cell_5", "us_cell_6", "us_cell_7" ] I am trying to get all the ip_ranges associated with the keys starting with "us_cell_*" and I have not found a way to do it. Most of the filtering seems to be focused on the values and not the keys.
You can use the following : to_entries | map(select(.key | startswith("us_cell_")) | .value.ip_ranges) | add Try it here. to_entries maps the root object into an array of objects with key and value fields corresponding to the fields of the original object. We filter that to retain only those which have a key starting with "us_cell_", map it further to keep only the ip ranges and finally merge those arrays together.
Combine output of jq onto single line per record output
I currently have output from jq that looks like this: 1§ {"id":"1","name":"River Street , Clerkenwell","lastUpdate":"1601461560941"} 2§ {"id":"2","name":"Phillimore Gardens, Kensington","lastUpdate":"1601461560941"} and I would like it join the output lines per record, i.e.: 1§{"id":"1","name":"River Street , Clerkenwell","lastUpdate":"1601461560941"} 2§{"id":"2","name":"Phillimore Gardens, Kensington","lastUpdate":"1601461560941"} Here is the sample input and current filter: https://jqplay.org/s/qBfGyriA5B If I use -j then I get everything on the same line, which is not what I want 1§{"id":"1","name":"River Street , Clerkenwell","lastUpdate":"1601461560941"}2§{"id":"2","name":"Phillimore Gardens, Kensington","lastUpdate":"1601461560941"}
One way would be to use the join function by empty string on the created objects. Note that the object you are creating needs to converted to string type for it to work. Use the filter below with the --raw-output/-r mode .stations.station[] + { lastUpdate: .stations."#lastUpdate" } | [ .id + "§", tostring ] | join("") jqplay demo
kusto query with dynamic object value without key
I have a lot of data looking like {"tuesday":"<30, 60>"} {"friday":"<0, 5>"} {"saturday":"<5, 10>"} {"friday":"<0, 5>"} {"saturday":"<5, 10>"} {"sunday":"0"} {"monday":"<0, 5>"} All i want is the value regardless of the key. My query: customEvents | where name == "eventName" | extend d = parse_json(tostring(customDimensions.['Properties'])) | project d | take 7 d is a dynamic object and I can do d.monday for the value, but I'd like to get the value without the key. Is this possible with Kusto? Thanks
for the case of a single-property as you've demonstrated above, using the parse operator could work: datatable(d:dynamic) [ ,dynamic({"tuesday":"<30, 60>"}) ,dynamic({"friday":"<0, 5>"}) ,dynamic({"saturday":"<5, 10>"}) ,dynamic({"friday":"<0, 5>"}) ,dynamic({"saturday":"<5, 10>"}) ,dynamic({"sunday":"0"}) ,dynamic({"monday":"<0, 5>"}) ] | parse d with * ':"' value '"' * | project value Notes: In case your values are not necessarily encapsulated in double quotes (e.g. are numerics), then you should be able to specify kind=regex for the parse operator, and use a conditional expression for the existence of the double quotes. In case you have potentially more than 1 property per property bag, using extract_all() is an option. Relevant Docs: https://learn.microsoft.com/en-us/azure/kusto/query/parseoperator https://learn.microsoft.com/en-us/azure/kusto/query/extractallfunction
Application Insights Extract Nested CustomDimensions
I have some data in Application Insights Analytics that has a dynamic object as a property of custom dimensions. For example: | timestamp | name | customDimensions | etc | |-------------------------|---------|----------------------------------|-----| | 2017-09-11T19:56:20.000 | Spinner | { | ... | MyCustomDimension: "hi" Properties: context: "ABC" userMessage: "Some other" } Does that make sense? So a key/value pair inside of customDimensions. I'm trying to bring up the context property to be a proper column in the results. So expected would be : | timestamp | name | customDimensions | context| etc | |-------------------------|---------|----------------------------------|--------|-----| | 2017-09-11T19:56:20.000 | Spinner | { | ABC | ... MyCustomDimension: "hi" Properties: context: "ABC" userMessage: "Some other" } I've tried this: customEvents | where name == "Spinner" | extend Context = customDimensions.Properties["context"] and this: customEvents | where name == "Spinner" | extend Context = customDimensions.Properties.context but neither seem to work. They give me a column at the end named "Context" but the column is empty - no values. Any ideas? EDIT: Added a picture for clarifying the format of the data:
edited to working answer: customEvents | where name == "Spinner" | extend Properties = todynamic(tostring(customDimensions.Properties)) | extend Context = Properties.context you need an extra tostring and todynamic in here to get what you expect (and what i expected!) the explanation i was given: Dynamic field "promises" you the upper/outer level of key / value access (this is how you access customDimensions.Properties). Accessing internal structure of that json depends on the exact format of customDimensions.Properties content. It doesn’t have to be json by itself. Even if it looks like a well structured json, it still may be just a string that is not exactly well formatted json. So basically, it by default won't attempt to parse strings inside of a dynamic/json block because they don't want to spend a lot of time possibly trying and failing to convert nested content to json infinitely. I still think that extra tostring shouldn't be required inside there, since todynamic should already be allowing both string and dynamic in validly, so i'm checking to see if the team that owns the query stuff can make that step better.
Thanks sooo much.. just to expand on the answer from John. We needed to graph duration of end-points using custom events. This query made it so we could specify the duration as our Y-axis in the chart: customEvents | extend Properties = todynamic(tostring(customDimensions.Properties)) | extend duration = todouble(todecimal(Properties.duration)) | project timestamp, name, duration
JQ: Nested JSON Array transformation
Since some month ago i had a little problem with a jq Transformation (j1 1.5 on Windows 10). Since them the command worked excellent: "[{nid, title, nights, company: .operator.shortTitle, zone: .zones[0].title} + (.sails[] | { sails_nid: .nid, arrival, departure } ) + (.sails[].cabins[] | { cabinname: .cabinType.title, cabintype: .cabinType.kindName, cabinnid: .cabinType.nid, catalogPrice, discountPrice, discountPercentage, currency } )]". Since some days ago the api deliver "bigger" json files JSON File. With the jq command i got a lot of duplicates (with the attached file i got around 3146 objects, expected objects are arround 250). I tried to Change the jq command to avoid the duplicates but had no "luck" on that. The json files contains a variable amount of sails (10 in these case), while each sail has a variable amount of cabins (25 in this case). Any tips how i can realize that? Regards timo
This is probably what you're looking for: [{nid, title, nights, company: .operator.shortTitle, zone: .zones[0].title} + (.sails[] | ({ sails_nid: .nid, arrival, departure } + (.cabins[] | { cabinname: .cabinType.title, cabintype: .cabinType.kindName, cabinnid: .cabinType.nid, catalogPrice, discountPrice, discountPercentage, currency } ))) ] Hopefully the layout will clarify the difference with your jq filter.