How can I flatten this object stream without creating duplicate objects? - jq

I want to use a relational database to analyze information from Songkick's JSON API for local events.
The event objects in are complex and deeply nested, so I want to filter and flatten the event objects and convert them to CSV so I can load them with standard tools.
Can I use jq to filter and flatten the events?
A typical response from the API is too large to show here. I will show a simplified version with the same relative structure.
Applying the filter .resultsPage.results.event[] to the response produces a stream of event objects like this.
{
"start": {
"date": "2014-10-28"
},
"performance": [
{
"artist": {
"displayName": "James Keelaghan",
"identifier": [
{
"mbid": "08e5954e-efc0-4a95-95ac-d74cca5b79ff"
}
]
}
}
],
"venue": {
"displayName": "Live At The Star"
}
}
{
"start": {
"date": "2014-10-28"
},
"performance": [
{
"artist": {
"displayName": "Katy B",
"identifier": [
{
"mbid": "2df30b6c-997d-4c3f-abb5-5e0d6317ea57"
}
]
}
},
{
"artist": {
"displayName": "Becky Hill",
"identifier": [
{
"mbid": "27bc6f5b-4585-49ab-8d7d-c62b59f5f010"
}
]
}
}
],
"venue": {
"displayName": "O2 ABC"
}
}
Next I want to produce one output object for each object in the performance list. These new objects should have attributes from the containing event object, such as date and venue.
The correct output for the example would look like this.
{
"venue_name": "Live At The Star",
"artist_mbid": "08e5954e-efc0-4a95-95ac-d74cca5b79ff",
"artist_name": "James Keelaghan",
"start_date": "2014-10-28"
}
{
"venue_name": "O2 ABC",
"artist_mbid": "2df30b6c-997d-4c3f-abb5-5e0d6317ea57",
"artist_name": "Katy B",
"start_date": "2014-10-28"
}
{
"venue_name": "O2 ABC",
"artist_mbid": "2df30b6c-997d-4c3f-abb5-5e0d6317ea57",
"artist_name": "Becky Hill",
"start_date": "2014-10-28"
}
If I ignore the mbid, this jq filter gives me what I want.
{
start_date: .start.date,
artist_name: .performance[].artist.displayName,
venue_name: .venue.displayName
}
The result looks like this.
{
"venue_name": "Live At The Star",
"artist_name": "James Keelaghan",
"start_date": "2014-10-28"
}
{
"venue_name": "O2 ABC",
"artist_name": "Katy B",
"start_date": "2014-10-28"
}
{
"venue_name": "O2 ABC",
"artist_name": "Becky Hill",
"start_date": "2014-10-28"
}
I tried this filter to get the mbid as well.
{
start_date: .start.date,
artist_name: .performance[].artist.displayName,
artist_mbid: .performance[].artist.identifier[].mbid,
venue_name: .venue.displayName
}
The result looks like this.
{
"venue_name": "Live At The Star",
"artist_mbid": "08e5954e-efc0-4a95-95ac-d74cca5b79ff",
"artist_name": "James Keelaghan",
"start_date": "2014-10-28"
}
{
"venue_name": "O2 ABC",
"artist_mbid": "2df30b6c-997d-4c3f-abb5-5e0d6317ea57",
"artist_name": "Katy B",
"start_date": "2014-10-28"
}
{
"venue_name": "O2 ABC",
"artist_mbid": "27bc6f5b-4585-49ab-8d7d-c62b59f5f010",
"artist_name": "Katy B",
"start_date": "2014-10-28"
}
{
"venue_name": "O2 ABC",
"artist_mbid": "2df30b6c-997d-4c3f-abb5-5e0d6317ea57",
"artist_name": "Becky Hill",
"start_date": "2014-10-28"
}
{
"venue_name": "O2 ABC",
"artist_mbid": "27bc6f5b-4585-49ab-8d7d-c62b59f5f010",
"artist_name": "Becky Hill",
"start_date": "2014-10-28"
}
Each object looks right, but there are too many of them! The "Katy B"
and "Becky Hill" objects are duplicated.
What is the correct way to do this in jq?

This filter should work:
.resultsPage.results.event | map(
{
venue_name: .venue.displayName,
start_date: .start.date
}
+
(.performance[].artist | {
artist_mbid: .identifier[].mbid,
artist_name: .displayName
})
)
Though the fields aren't in the same order, but you could always reorder if needed:
[
{
"venue_name": "Live At The Star",
"start_date": "2014-10-28",
"artist_mbid": "08e5954e-efc0-4a95-95ac-d74cca5b79ff",
"artist_name": "James Keelaghan"
},
{
"venue_name": "O2 ABC",
"start_date": "2014-10-28",
"artist_mbid": "2df30b6c-997d-4c3f-abb5-5e0d6317ea57",
"artist_name": "Katy B"
},
{
"venue_name": "O2 ABC",
"start_date": "2014-10-28",
"artist_mbid": "27bc6f5b-4585-49ab-8d7d-c62b59f5f010",
"artist_name": "Becky Hill"
}
]
You're trying to create an object for every corresponding performance so you'll have to flatten it down a bit before you start collecting results.

Related

jq Contactenate arrays from two different files and save the output in the first file

Here's what I'm looking to do.
file1.json
{
"info": {
"id": "",
"name": "Text Fields",
"schema": "url"
},
"item": [
{
"name": "CompanyName Field",
"item": [
{
"name": "CompanyName is CompanyName1"
}
]
}
]
}
file2.json
[
{
"name": "Phone Field",
"item": [
{
"name": "Phone is 1234"
}
]
},
{
"name": "Job Field",
"item": [
{
"name": "Job is Job1"
}
]
}
]
Expected output after running jq
file1.json
{
"info": {
"id": "",
"name": "Text Fields",
"schema": "url"
},
"item": [
{
"name": "CompanyName Field",
"item": [
{
"name": "CompanyName is CompanyName1"
}
]
},
{
"name": "Phone Field",
"item": [
{
"name": "Phone is 1234"
}
]
},
{
"name": "Job Field",
"item": [
{
"name": "Job is Job1"
}
]
}
]
}
As a first step I tried to at least concatenate the arrays of the two files and get that as an output before trying to get them in the first file itself but that itself is not happening.
Here's what I tried
jq '.item .' file1.json file2.json
but I get the following error:
jq: error: syntax error, unexpected $end, expecting FORMAT or QQSTRING_START (Unix shell quoting issues?) at <top-level>, line 1:
.item .
jq: 1 compile error
I tried searching a lot, trust me. There are a lot of queries with similar titles but they all seem to be very specific problems when you look into each one. Please help.
Use --argfile to read in the second file into a variable, then += to add it to the existing array in .item
jq --argfile f file2.json '.item += $f' file1.json
{
"info": {
"id": "",
"name": "Text Fields",
"schema": "url"
},
"item": [
{
"name": "CompanyName Field",
"item": [
{
"name": "CompanyName is CompanyName1"
}
]
},
{
"name": "Phone Field",
"item": [
{
"name": "Phone is 1234"
}
]
},
{
"name": "Job Field",
"item": [
{
"name": "Job is Job1"
}
]
}
]
}

Use jq extract two values and build new output

Looking to extract values from api_http array. I am looking for output that looks like the following. Each element should have the name and the url value attached a key called api.
{ "name": "lookproduct1", "api": "http://testapi.api.com"}
{ "name": "lookproduct2", "api": "http://testapi2.api.com"}
{ "name": "lookproduct3", "api": "http://testapi3.api.com"}
{ "name": "lookproduct4", "api": "http://testapi4.api.com"}
the JSON data:
{
"meta": {
"details": {
"value": "Details"
},
"network": {
"label": "Network:",
"value": "test"
},
"title": {
"value": "Test Report"
},
"update": {
"label": "Validation last update:",
"value": "2020-07-15 17:40 UTC"
}
},
"report": {
"api_http": [
[
{
"html_name": "Product 1",
"name": "lookproduct1",
"rank": 3
},
"http://testapi.api.com",
"GB",
"TEST"
],
[
{
"html_name": "Product 2",
"name": "lookproduct2",
"rank": 3
},
"http://testapi2.api.com",
"GB",
"TEST"
],
[
{
"html_name": "Product 3",
"name": "lookproduct3",
"rank": 3
},
"http://testapi3.api.com",
"GB",
"TEST"
],
[
{
"html_name": "Product 4",
"name": "lookproduct4",
"rank": 3
},
"http://testapi.api.com",
"GB",
"TEST"
]
]
}
}
I got the following, but unsure to extract those final two values and create the new output.
.report[] | .[]
Try:
.report.api_http[]|{name:values[0]["name"],api:values[1]}
My output is:
{
"name": "lookproduct1",
"api": "http://testapi.api.com"
}
{
"name": "lookproduct2",
"api": "http://testapi2.api.com"
}
{
"name": "lookproduct3",
"api": "http://testapi3.api.com"
}
{
"name": "lookproduct4",
"api": "http://testapi.api.com"
}
You could use the -c command-line option in conjunction with the following jq filter:
.report.api_http[]
| {name: .[0].name, api: .[1]}

Cosmos DB SQL Query for nested objects

I have accounts collection in Cosmos DB. I tried different queries but failed what will be equivalent SQL Query to fetch only accounts which has selected subscription.
I tried this Query but failed
SELECT *
FROM account a
JOIN s IN c.subscriptions
WHERE s.id = "e5969a3c-2729-cb3c-a01b-2e62e0473646"
Account Collection Records
[
{
"id": "8c549b95-480e-47f9-acd6-13339179399f",
"odoo_id": "UpdatedDAta",
"entity_name": "Lakes High School123",
"entity_type": "family | teacher | school | district",
"contacts": [
{
"name": "Mr. Garcia1",
"email": "Garcia#junk.com"
},
{
"name": "Mr. Garcia3",
"email": "Garcia#junk.com"
}
],
"subscriptions": [
{
"id": null,
"type": "group | profile",
"group_name": "Year 4",
"teachers": [
"Ms Jones"
],
"start_date": "25/7/2018",
"end_date": "24/7/2019",
"seats": 4,
"group_key": "red-limping-pigeon"
},
{
"id": "e5969a3c-2729-cb3c-a01b-2e62e0473646",
"type": "group | profile",
"group_name": "Year 4",
"teachers": [
"Ms Jones",
"Waqar"
],
"start_date": "25/7/2018",
"end_date": "24/7/2021",
"seats": 4,
"group_key": "red-limping-pigeon"
}
],
"_rid": "bjcNANQrW3oGAAAAAAAAAA==",
"_self": "dbs/bjcNAA==/colls/bjcNANQrW3o=/docs/bjcNANQrW3oGAAAAAAAAAA==/",
"_etag": "\"01001c87-0000-0000-0000-5b7966850000\"",
"_attachments": "attachments/",
"_ts": 1534682757
}
]
Please use below sql to fetch your documents:
SELECT * FROM c
where ARRAY_CONTAINS(c.subscriptions,{"id": "e5969a3c-2729-cb3c-a01b-2e62e0473646"},true)
Array Contains could return a Boolean indicating whether the array contains the specified value.
Hope it helps you.

Training LUIS to recognise a job ticket number

I'm trying to train LUIS to recognise a request for a status update on a job ticket (analogous to a JIRA/GitHub issue ID). The job ticket number will be of the format [Letter S or s][One or more digits]. E.g.:
"What is that status on S344?"
Intent: StatusUpdate
Entity: Ticket = S344
After labelling a number of utterances LUIS can recognise the intent with high confidence, but is never able to identify the Ticket entity, even when I use the exact ticket number I've labelled as the entity in a labelled utterance.
I've also tried adding a Regex feature [sS]{1}\d+, but that doesn't seem to make any difference.
Is there something special I need to do to make this work, or do I just need to persevere adding more training utterances?
I just tried this myself and after 7 utterances, LUIS is recognizing the ticket just fine. What I did was:
Send a couple of utterances
Train
Send a new bunch of utterances (different tickets number and phrases)
Train again
I exported my LUIS App for you (below and here)
{
"luis_schema_version": "1.3.0",
"name": "testticket",
"desc": "",
"culture": "en-us",
"intents": [
{
"name": "None"
},
{
"name": "StatusUpdate"
}
],
"entities": [
{
"name": "Ticket"
}
],
"composites": [],
"bing_entities": [],
"actions": [],
"model_features": [],
"regex_features": [],
"utterances": [
{
"text": "what is that status on s344?",
"intent": "StatusUpdate",
"entities": [
{
"entity": "Ticket",
"startPos": 5,
"endPos": 5
}
]
},
{
"text": "status of s124",
"intent": "StatusUpdate",
"entities": [
{
"entity": "Ticket",
"startPos": 2,
"endPos": 2
}
]
},
{
"text": "what's the status of s4",
"intent": "StatusUpdate",
"entities": []
},
{
"text": "please tell me the status of s4",
"intent": "StatusUpdate",
"entities": [
{
"entity": "Ticket",
"startPos": 6,
"endPos": 6
}
]
},
{
"text": "whats the status of s5",
"intent": "StatusUpdate",
"entities": [
{
"entity": "Ticket",
"startPos": 4,
"endPos": 4
}
]
},
{
"text": "whats the status of s9",
"intent": "StatusUpdate",
"entities": [
{
"entity": "Ticket",
"startPos": 4,
"endPos": 4
}
]
},
{
"text": "please tell me the status of s24",
"intent": "StatusUpdate",
"entities": [
{
"entity": "Ticket",
"startPos": 6,
"endPos": 6
}
]
}
]
}

Calendar API sometimes returns NULL Event Attendees

Has anybody come across an issue with the V3 Calendar API where sometimes the Attendee data is returned empty even though there are valid attendees on an Event?
The Events always have valid attendees.
An Event Get returns Attendees = NULL
Here is a code snippet of an Event Query
calendarToolsV3 cv3 = new calendarToolsV3(true, calendar, int.Parse(organisationId));
EventsResource.GetRequest gr = new EventsResource.GetRequest(cv3.service, calendar, eventId);
gr.AlwaysIncludeEmail = true;
Event evv = gr.Execute();
litDiagnosis.Text = "Summary | " + evv.Summary + "<br/>";
litDiagnosis.Text += "Id | " + evv.Id + "<br/>";
litDiagnosis.Text += "RecurringEventId | " + evv.RecurringEventId + "<br/>";
litDiagnosis.Text += "Status | " + evv.Status + "<br/>";
litDiagnosis.Text += "Visibility | " + evv.Visibility + "<br/>";
litDiagnosis.Text += "Start | " + (evv.Start == null ? evv.Start.Date.ToString() : evv.Start.DateTime.ToString() + "<br/>");
if (evv.Attendees != null && evv.Attendees.Count() > 0)
{
foreach (EventAttendee ea in evv.Attendees)
{
litDiagnosis.Text += "Attendee | " + ea.Email + "|" + ea.ResponseStatus + "<br/>";
}
}
****** EDIT **************
I have done some further testing and it seems that this occurs where the original APPT is a recurring APPT.
The user that I am interrogating the Event with has declined the Event
The attendees are no longer visible to that user.
The attendees are however visible to the Creator of the Event. When the Event is deleted the Attendees become visible again to that user?
When the Event is not Recurring this behaviour is not observed and all users can view all Attendees even when declined/deleted?
*********************** EXAMPLE CAPTURE FROM API EXPLORER ***************
Event Created by admin#i3000.co. Attendee Lisa.Jones#i3000.co queries the Event and can see the Attendess
{
"kind": "calendar#event",
"etag": "\"2869345662384000\"",
"id": "7d3pmni42o6pg6taeudsskhfh8",
"status": "confirmed",
"htmlLink": "https://www.google.com/calendar/event?eid=N2QzcG1uaTQybzZwZzZ0YWV1ZHNza2hmaDhfMjAxNTA2MTlUMjIwMDAwWiBsaXNhLmpvbmVzQGkzMDAwLmNv",
"created": "2015-06-19T00:13:22.000Z",
"updated": "2015-06-19T00:13:51.192Z",
"summary": "ATTENDEE TEST",
"colorId": "11",
"creator": {
"email": "admin#i3000.co",
"displayName": "Admin User"
},
"organizer": {
"email": "admin#i3000.co",
"displayName": "Admin User"
},
"start": {
"dateTime": "2015-06-20T08:00:00+10:00",
"timeZone": "America/New_York"
},
"end": {
"dateTime": "2015-06-20T09:00:00+10:00",
"timeZone": "America/New_York"
},
"recurrence": [
"RRULE:FREQ=WEEKLY;COUNT=2;BYDAY=FR"
],
"iCalUID": "7d3pmni42o6pg6taeudsskhfh8#google.com",
"sequence": 0,
"attendees": [
{
"email": "admin#i3000.co",
"displayName": "Admin User",
"organizer": true,
"responseStatus": "accepted"
},
{
"email": "lisa.jones#i3000.co",
"displayName": "lisa jones",
"self": true,
"responseStatus": "needsAction"
}
],
"extendedProperties": {
"private": {
"ilink": "recur7d3pmni42o6pg6taeudsskhfh8"
}
},
"hangoutLink": "https://plus.google.com/hangouts/_/i3000.co/admin-lisa-jone?hceid=YWRtaW5AaTMwMDAuY28.7d3pmni42o6pg6taeudsskhfh8",
"reminders": {
"useDefault": true
}
}
Lisa.Jones Removes the Event from her Calendar and Queries again - Attendees are not Visible
{
"kind": "calendar#event",
"etag": "\"2869346050176000\"",
"id": "7d3pmni42o6pg6taeudsskhfh8",
"status": "cancelled",
"htmlLink": "https://www.google.com/calendar/event?eid=N2QzcG1uaTQybzZwZzZ0YWV1ZHNza2hmaDhfMjAxNTA2MTlUMjIwMDAwWiBsaXNhLmpvbmVzQGkzMDAwLmNv",
"created": "2015-06-19T00:13:22.000Z",
"updated": "2015-06-19T00:17:05.088Z",
"summary": "ATTENDEE TEST",
"colorId": "11",
"creator": {
"email": "admin#i3000.co",
"displayName": "Admin User"
},
"organizer": {
"email": "admin#i3000.co",
"displayName": "Admin User"
},
"start": {
"dateTime": "2015-06-20T08:00:00+10:00",
"timeZone": "America/New_York"
},
"end": {
"dateTime": "2015-06-20T09:00:00+10:00",
"timeZone": "America/New_York"
},
"recurrence": [
"RRULE:FREQ=WEEKLY;COUNT=2;BYDAY=FR"
],
"iCalUID": "7d3pmni42o6pg6taeudsskhfh8#google.com",
"sequence": 0,
"extendedProperties": {
"private": {
"ilink": "recur7d3pmni42o6pg6taeudsskhfh8"
}
},
"hangoutLink": "https://plus.google.com/hangouts/_/i3000.co/admin-lisa-jone?hceid=YWRtaW5AaTMwMDAuY28.7d3pmni42o6pg6taeudsskhfh8",
"reminders": {
"useDefault": true
}
}
Admin#i3000.co now deletes the Event. When Lisa.Jones Queries the Event the Attendees are again visible
{
"kind": "calendar#event",
"etag": "\"2869346122438000\"",
"id": "7d3pmni42o6pg6taeudsskhfh8",
"status": "cancelled",
"htmlLink": "https://www.google.com/calendar/event?eid=N2QzcG1uaTQybzZwZzZ0YWV1ZHNza2hmaDhfMjAxNTA2MTlUMjIwMDAwWiBsaXNhLmpvbmVzQGkzMDAwLmNv",
"created": "2015-06-19T00:13:22.000Z",
"updated": "2015-06-19T00:17:41.219Z",
"summary": "ATTENDEE TEST",
"colorId": "11",
"creator": {
"email": "admin#i3000.co",
"displayName": "Admin User"
},
"organizer": {
"email": "admin#i3000.co",
"displayName": "Admin User"
},
"start": {
"dateTime": "2015-06-20T08:00:00+10:00",
"timeZone": "America/New_York"
},
"end": {
"dateTime": "2015-06-20T09:00:00+10:00",
"timeZone": "America/New_York"
},
"recurrence": [
"RRULE:FREQ=WEEKLY;COUNT=2;BYDAY=FR"
],
"iCalUID": "7d3pmni42o6pg6taeudsskhfh8#google.com",
"sequence": 1,
"attendees": [
{
"email": "lisa.jones#i3000.co",
"displayName": "lisa jones",
"self": true,
"responseStatus": "needsAction"
},
{
"email": "admin#i3000.co",
"displayName": "Admin User",
"organizer": true,
"responseStatus": "accepted"
}
],
"extendedProperties": {
"private": {
"ilink": "recur7d3pmni42o6pg6taeudsskhfh8"
}
},
"hangoutLink": "https://plus.google.com/hangouts/_/i3000.co/admin-lisa-jone?hceid=YWRtaW5AaTMwMDAuY28.7d3pmni42o6pg6taeudsskhfh8",
"reminders": {
"useDefault": true
}
}

Resources