Merge all objects inside an array that share the same key - jq

I'm trying to deduplicate all objects inside the array results that share the same key id, and merge their path arrays.
JSON input:
[
{
"type": "apple",
"results": [
{
"id": "apple1",
"name": "appleName1",
"path": "/some/path/a"
},
{
"id": "apple1",
"name": "appleName1",
"path": "/some/path/b"
},
{
"id": "apple2",
"name": "appleName2",
"path": "/some/path/c"
}
]
},
{
"type": "orange",
"results": [
{
"id": "orange1",
"name": "orangeName1",
"path": "/some/path/a"
},
{
"id": "orange1",
"name": "orangeName1",
"path": "/some/path/b"
},
{
"id": "orange2",
"name": "orangeName2",
"path": "/some/path/c"
}
]
}
]
Expected output:
[
{
"type": "apple",
"results": [
{
"id": "apple1",
"name": "appleName1",
"path": [
"/some/path/a",
"/some/path/b"
]
},
{
"id": "apple2",
"name": "appleName2",
"path": [
"/some/path/c"
]
}
]
},
{
"type": "orange",
"results": [
{
"id": "orange1",
"name": "orangeName1",
"path": [
"/some/path/a",
"/some/path/b"
]
},
{
"id": "orange2",
"name": "orangeName2",
"path": [
"/some/path/c"
]
}
]
}
]
I've managed to get an approximate solution using:
jq '[{type: .[].type, results: .[].results | group_by(.id) | map({id: .[0].id, name: .[0].name, path: (map(.path))})}]'
But my solution produces two additional elements that aren't supposed to be there.
I know there are some similar questions already answered but I didn't manage to get them to work with this example. Any help is appreciated!

You could group_by the .id field, then for each group take the first item and replace its .path field with a map on the .path fields of all group members:
jq 'map(.results |= (group_by(.id) | map(first + {path: map(.path)})))'
[
{
"type": "apple",
"results": [
{
"id": "apple1",
"name": "appleName1",
"path": [
"/some/path/a",
"/some/path/b"
]
},
{
"id": "apple2",
"name": "appleName2",
"path": [
"/some/path/c"
]
}
]
},
{
"type": "orange",
"results": [
{
"id": "orange1",
"name": "orangeName1",
"path": [
"/some/path/a",
"/some/path/b"
]
},
{
"id": "orange2",
"name": "orangeName2",
"path": [
"/some/path/c"
]
}
]
}
]
Demo

Related

jq array filter for nested array elements

I am trying to add a new user in below json which matches group NP01-RW. i am able to do without NP01-RW but not able to select users under NP01-RW and return updated json.
{
"id": 181,
"guid": "c9b7dbde-63de-42cc-9840-1b4a06e13364",
"isEnabled": true,
"version": 17,
"service": "Np-Hue",
"name": "DATASCIENCE-CUROPT-RO",
"policyType": 0,
"policyPriority": 0,
"isAuditEnabled": true,
"resources": {
"database": {
"values": [
"hive_cur_acct_1dev",
"hive_cur_acct_1eng",
"hive_cur_acct_1rwy",
"hive_cur_acct_1stg",
"hive_opt_acct_1dev",
"hive_opt_acct_1eng",
"hive_opt_acct_1stg",
"hive_opt_acct_1rwy"
],
"isExcludes": false,
"isRecursive": false
},
"column": {
"values": [
"*"
],
"isExcludes": false,
"isRecursive": false
},
"table": {
"values": [
"*"
],
"isExcludes": false,
"isRecursive": false
}
},
"policyItems": [
{
"accesses": [
{
"type": "select",
"isAllowed": true
},
{
"type": "update",
"isAllowed": true
},
{
"type": "create",
"isAllowed": true
},
{
"type": "drop",
"isAllowed": true
},
{
"type": "alter",
"isAllowed": true
},
{
"type": "index",
"isAllowed": true
},
{
"type": "lock",
"isAllowed": true
},
{
"type": "all",
"isAllowed": true
},
{
"type": "read",
"isAllowed": true
},
{
"type": "write",
"isAllowed": true
}
],
"users": [
"user1",
"user2",
"user3"
],
"groups": [
"NP01-RW"
],
"conditions": [],
"delegateAdmin": false
},
{
"accesses": [
{
"type": "select",
"isAllowed": true
}
],
"users": [
"user1"
],
"groups": [
"NP01-RO"
],
"conditions": [],
"delegateAdmin": false
}
],
"denyPolicyItems": [],
"allowExceptions": [],
"denyExceptions": [],
"dataMaskPolicyItems": [],
"rowFilterPolicyItems": [],
"options": {},
"validitySchedules": [],
"policyLabels": [
"DATASCIENCE-CurOpt-RO_NP01"
]
}
below is what i have tried but it returns part of the JSON matching NP01-RW and not full JSON
jq --arg username "$sync_userName" '.policyItems[] | select(.groups[] | IN("NP01-RO")).users += [$username]' > ${sync_policyName}.json
Operator precedence in jq is not always intuitive. Your program is parsed as:
.policyItems[] | (select(.groups[] | IN("NP01-RO")).users += [$username])
Which first streams all policyItems and only then changes them, leaving you with policyItems only in the output.
You need to make sure that the stream selects the correct values, which you can then assign:
(.policyItems[] | select(.groups[] | IN("NP01-RO")).users) += [$username]
This will do the assignment, but still return the full input (.).

How to project values from a Gremlin traversal with nested and()/or() steps

I have the graph model below which represents the sub-pattern I'd like to traverse or fetch. The nodes and their properties are shown below as well.
The expected response to my query would look something like this:
where 's', 'c', 'aid', 'qid', 'p', 'r1', 'r2' are the nodes that make up the subpattern or subgraph.
[
{
"s": {
"id": "345fbdad-9c67-47bb-9f3b-cf50c8cdbee4",
"label": "severity",
"type": "vertex",
"properties": {
"severity": [
{
"id": "a6a9e38f-0802-48b6-ac37-490f45e824e9",
"value": "High"
}
],
"pk": [
{
"id": "345fbdad-9c67-47bb-9f3b-cf50c8cdbee4|pk",
"value": "pk"
}
]
}
},
"c": {
"id": "345fbdad-9c67-47bb-9f3b-cf50c8cdbee4",
"label": "cve",
"type": "vertex",
"properties": {
"cve_id": [
{
"id": "a6a9e38f-0802-48b6-ac37-490f45e824e9",
"value": "CVE-xxxx-xxxx"
}
],
"publishedOn": [
{
"id": "fc5dde4d-c027-4c19-9b16-b3314b2b10c6",
"value": "xxx"
}
],
"pk": [
{
"id": "345fbdad-9c67-47bb-9f3b-cf50c8cdbee4|pk",
"value": "pk"
}
]
}
},
"aid": {
"id": "345fbdad-9c67-47bb-9f3b-cf50c8cdbee4",
"label": "aid",
"type": "vertex",
"properties": {
"aid": [
{
"id": "a6a9e38f-0802-48b6-ac37-490f45e824e9",
"value": "xxxx-xxxx"
}
"pk": [
{
"id": "345fbdad-9c67-47bb-9f3b-cf50c8cdbee4|pk",
"value": "pk"
}
]
}
},
"qid": {
"id": "345fbdad-9c67-47bb-9f3b-cf50c8cdbee4",
"label": "qid",
"type": "vertex",
"properties": {
"qid": [
{
"id": "a6a9e38f-0802-48b6-ac37-490f45e824e9",
"value": "xxxx-xxxx"
}
"pk": [
{
"id": "345fbdad-9c67-47bb-9f3b-cf50c8cdbee4|pk",
"value": "pk"
}
]
}
},
"p": {
"id": "345fbdad-9c67-47bb-9f3b-cf50c8cdbee4",
"label": "package",
"type": "vertex",
"properties": {
"name": [
{
"id": "a6a9e38f-0802-48b6-ac37-490f45e824e9",
"value": "xxxxx"
}
],
"version": [
{
"id": "fc5dde4d-c027-4c19-9b16-b3314b2b10c6",
"value": "xxx"
}
],
"pk": [
{
"id": "345fbdad-9c67-47bb-9f3b-cf50c8cdbee4|pk",
"value": "pk"
}
]
}
},
"r1": {
"id": "345fbdad-9c67-47bb-9f3b-cf50c8cdbee4",
"label": "release",
"type": "vertex",
"properties": {
"source": [
{
"id": "a6a9e38f-0802-48b6-ac37-490f45e824e9",
"value": "xxxx-xxxx"
}
],
"status": [
{
"id": "fc5dde4d-c027-4c19-9b16-b3314b2b10c6",
"value": "xxx"
}
],
"pk": [
{
"id": "345fbdad-9c67-47bb-9f3b-cf50c8cdbee4|pk",
"value": "pk"
}
]
}
},
"r2": {
"id": "345fbdad-9c67-47bb-9f3b-cf50c8cdbee4",
"label": "release",
"type": "vertex",
"properties": {
"source": [
{
"id": "a6a9e38f-0802-48b6-ac37-490f45e824e9",
"value": "xxxx-xxxx"
}
],
"status": [
{
"id": "fc5dde4d-c027-4c19-9b16-b3314b2b10c6",
"value": "xxx"
}
],
"pk": [
{
"id": "345fbdad-9c67-47bb-9f3b-cf50c8cdbee4|pk",
"value": "pk"
}
]
}
},
{
....
....
},
{
....
..
}
]
My question is how do I build my traversal query to achieve this end result?
What I have so far is this, but the project() step is not working as expected
g.V().hasLabel('cve').as('c').and(
__.in('severity').as('s'),
__.out('cve_to_aid').as('aid').and(
__.out('has_qid').as('qid'),
__.in('package_to_aid').as('p'),
or(
__.in('r1_to_aid').has('status', 'Patched').as('r1'),
__.in('r2_to_aid').has('status', 'Patched').as('r2')
)
)
).project('c', 's', 'aid', 'qid', 'p', 'r1', 'r2').
by(('c').values('cve_id')).
by(('s').values('severity')).
by(('aid').values('aid')).
by(('qid').values('qid')).
by(('p').values()).
by(('r1').values()).
by(('r2').values()).
I am doing this on CosmosDB, so please only provide answers using supported steps found here: https://learn.microsoft.com/en-us/azure/cosmos-db/gremlin/support
It is possible to nest project() steps, e.g. on the TinkerGraph:
gremlin> g = TinkerFactory.createModern().traversal()
==>graphtraversalsource[tinkergraph[vertices:6 edges:6], standard]
gremlin> g.V(1).as('x').project('x').by(
select('x').project('id', 'label','properties').by(id).by(label).by(
project('name').by(properties())
)
)
==>[x:[id:1,label:person,properties:[name:vp[name->marko]]]]
gremlin>
but then you end up coding your entire data model into your query.
In full TinkerPop you could turn your result into a subGraph() and write it to graphSon with the io() step. In Cosmos you can add the returned vertices to a TinkerGraph instance clientside and again use the io() step to serialize the TinkerGraph to graphSon.

Need help parsing json output with jq for a complex json

For the below JSON, I need the result.id and result.name output using jq for the ones having
authorization.roles[].name == "Supervisor"
What is the command for jq to to that ? For the below json we expect 1231 id and name AAAA alone as output as that only has Supervisor as role
{
"results": [{
"id": "1231",
"name": "AAAA",
"div": {
"id": "AAA",
"name": "DDSAA",
"selfUri": ""
},
"chat": {
"jabberId": "nn"
},
"department": "Shared Services Organization",
"email": "Test#gmail.com",
"primaryContactInfo": [{
"address": "Test#gmail.com",
"mediaType": "EMAIL",
"type": "PRIMARY"
}],
"addresses": [],
"state": "active",
"title": "AAA",
"username": "Test#gmail.com",
"version": 27,
"authorization": {
"roles": [{
"id": "01256689-c5ed-43a5-b370-58522402830d",
"name": "AA"
}, {
"id": "1e65b009-9f8f-4eef-9844-83944002c095",
"name": "BBB"
}, {
"id": "8a19f1ff-40e5-45d2-b758-14550a173323",
"name": "CCC"
}, {
"id": "d02250e2-7071-46bf-885b-43edff2d88a6",
"name": "Supervisor"
}]
}
}, {
"id": "1255",
"name": "BBBB",
"div": {
"id": "AAA",
"name": "DDSAA",
"selfUri": ""
},
"chat": {
"jabberId": "nn"
},
"department": "Shared Services Organization",
"email": "Test#gmail.com",
"primaryContactInfo": [{
"address": "Test#gmail.com",
"mediaType": "EMAIL",
"type": "PRIMARY"
}],
"addresses": [],
"state": "active",
"title": "AAA",
"username": "Test#gmail.com",
"version": 27,
"authorization": {
"roles": [{
"id": "01256689-c5ed-43a5-b370-58522402830d",
"name": "AA"
}, {
"id": "1e65b009-9f8f-4eef-9844-83944002c095",
"name": "BBB"
}, {
"id": "8a19f1ff-40e5-45d2-b758-14550a173323",
"name": "CCC"
}, {
"id": "d02250e2-7071-46bf-885b-43edff2d88a6",
"name": "Tester"
}]
}
}]
}
Don't put commas before closing brackets or curly braces (it's not valid JSON). Your input should look like this:
{
"results": [
{
"id": "1231",
"name": "AAAA",
"div": {
"id": "AAA",
"name": "DDSAA",
"selfUri": ""
},
"chat": {
"jabberId": "nn"
},
"department": "Shared Services Organization",
"email": "Test#gmail.com",
"primaryContactInfo": [
{
"address": "Test#gmail.com",
"mediaType": "EMAIL",
"type": "PRIMARY"
}
],
"addresses": [],
"state": "active",
"title": "AAA",
"username": "Test#gmail.com",
"version": 27,
"authorization": {
"roles": [
{
"id": "01256689-c5ed-43a5-b370-58522402830d",
"name": "AA"
},
{
"id": "1e65b009-9f8f-4eef-9844-83944002c095",
"name": "BBB"
},
{
"id": "8a19f1ff-40e5-45d2-b758-14550a173323",
"name": "CCC"
},
{
"id": "d02250e2-7071-46bf-885b-43edff2d88a6",
"name": "Supervisor"
}
]
}
},
{
"id": "1255",
"name": "BBBB",
"div": {
"id": "AAA",
"name": "DDSAA",
"selfUri": ""
},
"chat": {
"jabberId": "nn"
},
"department": "Shared Services Organization",
"email": "Test#gmail.com",
"primaryContactInfo": [
{
"address": "Test#gmail.com",
"mediaType": "EMAIL",
"type": "PRIMARY"
}
],
"addresses": [],
"state": "active",
"title": "AAA",
"username": "Test#gmail.com",
"version": 27,
"authorization": {
"roles": [
{
"id": "01256689-c5ed-43a5-b370-58522402830d",
"name": "AA"
},
{
"id": "1e65b009-9f8f-4eef-9844-83944002c095",
"name": "BBB"
},
{
"id": "8a19f1ff-40e5-45d2-b758-14550a173323",
"name": "CCC"
},
{
"id": "d02250e2-7071-46bf-885b-43edff2d88a6",
"name": "Tester"
}
]
}
}
]
}
Then, you can use select to narrow down your target objects (here using any to check if at least one of the role names matches your string -- thx #ikegami), then output any part of the resulting object(s):
jq '
.results[]
| select(any(.authorization.roles[]; .name == "Supervisor"))
| {id, name}
'
{
"id": "1231",
"name": "AAAA"
}
Demo
If instead of a JSON output you need raw text, use the -r (or --raw-output) flag, and provide the fields you are interested in:
jq -r '
.results[]
| select(any(.authorization.roles[]; .name == "Supervisor"))
| .id, .name
'
1231
AAAA
Demo

Get the value after group by in gremlin?

g.V('JobDefinition1').out("JobDefinitionToJobHistory").has("Timestamp", between("2022-02-01T00:00:00Z", "2022-02-03T00:00:00Z")).group().by("ttl").by(limit(1))
I had a gremlin query above and get the result below.
[
{
"776": [
{
"id": "JobHistory-2-1-2022 12:19:15 AM",
"label": "JobHistory",
"type": "vertex",
"properties": {
"Timestamp": [
{
"id": "6d187ccf-160d-4d87-a360-48526b7a1461",
"value": "2022-02-01T00:00:00Z"
}
],
"ttl": [
{
"id": "JobHistory-2-1-2022 12:19:15 AM|ttl",
"value": "776"
}
]
}
}
],
"888": [
{
"id": "JobHistory-2-1-2022 12:19:15 AM",
"label": "JobHistory",
"type": "vertex",
"properties": {
"Timestamp": [
{
"id": "6d187ccf-160d-4d87-a360-48526b7a1461",
"value": "2022-02-01T00:00:00Z"
}
],
"ttl": [
{
"id": "JobHistory-2-1-2022 12:19:15 AM|ttl",
"value": "888"
}
]
}
}
]
}
]
But I want to only get the value of the result after group by, the excepted result is shown below. I want the groupby result value without the key(as you can see, the excepted result don't have key info such as "776" and "888"). Is there any gremlin method to help me achieve this goal. Hope you can give me some help. Thanks!
[
{
"id": "JobHistory-2-1-2022 12:19:15 AM",
"label": "JobHistory",
"type": "vertex",
"properties": {
"Timestamp": [
{
"id": "6d187ccf-160d-4d87-a360-48526b7a1461",
"value": "2022-02-01T00:00:00Z"
}
],
"ttl": [
{
"id": "JobHistory-2-1-2022 12:19:15 AM|ttl",
"value": "776"
}
]
}
}
,
{
"id": "JobHistory-2-1-2022 12:19:15 AM",
"label": "JobHistory",
"type": "vertex",
"properties": {
"Timestamp": [
{
"id": "6d187ccf-160d-4d87-a360-48526b7a1461",
"value": "2022-02-01T00:00:00Z"
}
],
"ttl": [
{
"id": "JobHistory-2-1-2022 12:19:15 AM|ttl",
"value": "888"
}
]
}
}
]
You can get values from a Map with select(values):
gremlin> g.V().groupCount().by(label)
==>[software:2,person:4]
gremlin> g.V().groupCount().by(label).select(values)
==>[2,4]

How to convert JSON data to tidy format in R

I never have worked with json data in R and unfortunately, I was sent a sample of data as:
{
"task_id": "104",
"status": "succeeded",
"metrics": {
"requests_made": 2,
"network_errors": 0,
"unique_locations_visited": 0,
"requests_queued": 0,
"queue_items_completed": 2,
"queue_items_waiting": 0,
"issue_events": 9,
"caption": "",
"progress": 100
},
"message": "",
"issue_events": [
{
"id": "1234",
"type": "issue_found",
"issue": {
"name": "policy not enforced",
"type_index": 123456789,
"serial_number": "123456789183923712",
"origin": "https://test.com",
"path": "/robots.txt",
"severity": "low",
"confidence": "certain",
"caption": "/robots.txt",
"evidence": [
{
"type": "FirstOrderEvidence",
"detail": {
"band_flags": [
"in_band"
]
},
"request_response": {
"url": "https://test.com/robots.txt",
"request": [
{
"type": "DataSegment",
"data": "jaghsdjgasdgaskjdgasdgashdgsahdgasjkdgh==",
"length": 313
}
],
"response": [
{
"type": "DataSegment",
"data": "asudasjdgasaaasgdasgaksjdhgasjdgkjghKGKGgKJgKJgKJGKgh==",
"length": 303
}
],
"was_redirect_followed": false,
"request_time": "1234567890"
}
}
],
"internal_data": "jdfhgjhJHkjhdskfhkjhjs0sajkdfhKHKhkj=="
}
},
{
"id": "1235",
"type": "issue_found",
"issue": {
"name": "certificate",
"type_index": 12345845684,
"serial_number": "123456789165637150",
"origin": "https://test.com",
"path": "/",
"severity": "info",
"confidence": "certain",
"description": "The server description a valid, trusted certificate. This issue is purely informational.<br><br>The server presented the following certificates:<br><br><h4>Server certificate</h4><table><tr><td><b>Issued to:</b> </td><td>test.ie, test.com, www.test.com, www.test.ie</td></tr><tr><td><b>Issued by:</b> </td><td>GeoTrust EV RSA CA 2018</td></tr><tr><td><b>Valid from:</b> </td><td>Tue May 12 00:00:00 UTC 2020</td></tr><tr><td><b>Valid to:</b> </td><td>Tue May 17 12:00:00 UTC 2022</td></tr></table><h4>Certificate chain #1</h4><table><tr><td><b>Issued to:</b> </td><td>GeoTrust EV RSA CA 2018</td></tr><tr><td><b>Issued by:</b> </td><td> High Assurance EV Root CA</td></tr><tr><td><b>Valid from:</b> </td><td>Mon Nov 06 12:22:46 UTC 2017</td></tr><tr><td><b>Valid to:</b> </td><td>Sat Nov 06 12:22:46 UTC 2027</td></tr></table><h4>Certificate chain #2</h4><table><tr><td><b>Issued to:</b> </td><td> High Assurance EV Root CA</td></tr><tr><td><b>Issued by:</b> </td><td> High Assurance EV Root CA</td></tr><tr><td><b>Valid from:</b> </td><td>Fri Nov 10 00:00:00 UTC 2006</td></tr><tr><td><b>Valid to:</b> </td><td>Mon Nov 10 00:00:00 UTC 2031</td></tr></table>",
"caption": "/",
"evidence": [],
"internal_data": "sjhdgsajdggJGJHgjfgjhGJHgjhsdgfgjhGJHGjhsdgfjhsgfdsjfg098867hjhgJHGJHG=="
}
},
{
"id": "1236",
"type": "issue_found",
"issue": {
"name": "without flag set",
"type_index": 1254392,
"serial_number": "12345678965616",
"origin": "https://test.com",
"path": "/robots.txt",
"severity": "info",
"confidence": "certain",
"description": "my description text here....",
"caption": "/robots.txt",
"evidence": [
{
"type": "InformationListEvidence",
"request_response": {
"url": "https://test.com/robots.txt",
"request": [
{
"type": "DataSegment",
"data": "adjkhajksdhaskjdhkjHKJHjkhaskjdhkjasdhKHKJHkjsdhfkjsdhfkjsdhKHJKHjksdfhsdjkfhksdjhKHKJHJKhsdkfjhsdkfjhKHJKHjksdkfjhsdkjfhKHKJHjkhsdkfjhsdkjfhsdjkfhksdjfhKJHKjksdhfsdjkfhksdjfhsdkjhKHJKhsdkfhsdkjfhsdkfhdskjhKHKjhsdfkjhsdjkfh==",
"length": 313
}
],
"response": [
{
"type": "DataSegment",
"data": "adjkhajksdhaskjdhkjHKJHjkhaskjdhkjasdhKHKJHkjsdhfkjsdhfkjsdhKHJKHjksdfhsdjkfhksdjhKHKJHJKhsdkfjhsdkfjhKHJKHjksdkfjhsdkjfhKHKJHjkhsdkfjhsdkjfhsdjkfhksdjfhKJHKjksdhfsdjkfhksdjfhsdkjhKHJKhsdkfhsdkjfhsdkfhdskjhKHKjhsdfkjhsdjkfh=",
"length": 161
},
{
"type": "HighlightSegment",
"data": "adjkhajksdhaskjdhkjHKJHjkhaskjdhkjasdhKHKJHkjsdhfkjsdhfkjsdhKHJKHjksdfhsdjkfhksdjhKHKJHJKhsdkfjhsdkfjhKHJKHjksdkfjhsdkjfhKHKJHjkhsdkfjhsdkjfhsdjkfhksdjfhKJHKjksdhfsdjkfhksdjfhsdkjhKHJKhsdkfhsdkjfhsdkfhdskjhKHKjhsdf=",
"length": 119
},
{
"type": "DataSegment",
"data": "AasjkdhasjkhkjHKJSDHFJKSDFHKhjkHSKADJFHKhjkhjkh=",
"length": 23
}
],
"was_redirect_followed": false,
"request_time": "178454751191465"
},
"information_items": [
"Other: user_id"
]
}
],
"internal_data": "adjkhajksdhaskjdhkjHKJHjkhaskjdhkjasdhKHKJHkjsdhfkjsdhfkjsdhKHJKHjksdfhsdjkfhksdjhKHKJHJKhsdkfjhsdkfjhKHJKHjksdkfjhsdkjfhKHKJHjkhsdkfjhsdkjfhsdjkfhksdjfhKJHKjksdhfsdjkfhksdjfhsdkjhKHJKhsdkfhsdkjfhsdkfhdskjhKH=="
}
},
{
"id": "1237",
"type": "issue_found",
"issue": {
"name": "without flag set",
"type_index": 1234567,
"serial_number": "123456789056704",
"origin": "https://test.com",
"path": "/",
"severity": "info",
"confidence": "certain",
"description": "long description here zjkhasdjkh hsajkdhsajkd hasjkdhbsjkdash d",
"caption": "/",
"evidence": [
{
"type": "InformationListEvidence",
"request_response": {
"url": "https://test.com/",
"request": [
{
"type": "DataSegment",
"data": "adjkhajksdhaskjdhkjHKJHjkhaskjdhkjasdhKHKJHkjsdhfkjsdhfkjsdhKHJKHjksdfhsdjkfhksdjhKHKJHJKhsdkfjhsdkfjhKHJKHjksdkfjhsdkjfhKHKJHjkhsdkfjhsdkjfhsdjkfhksdjfhKJHKjksdhfsdjkfhksdjfhsdkjhKHJKhsdkfhsdkjfhsdkfhdskjhKHKjhsdfkjhsdjkfhsfdsfdsfdsfdsfdsfsdfdsf",
"length": 303
}
],
"response": [
{
"type": "DataSegment",
"data": "adjkhajksdhaskjdhkjHKJHjkhaskjdhkjasdhKHKJHkjsdhfkjsdhfkjsdhKHJKHjksdfhsdjkfhksdjhKHKJHJKhsdkfjhsdkfjhKHJKHjksdkfjhsdkjfhKHKJHjkhsdkfjhsdkjfhsdjkfhksdjfhKJHKjksdhfsdjkfhksdjfhsdkjhKHJKhsdkfhsdkjfhsdkfhdskjhKHKjhsdfkjhsdjkfh==",
"length": 151
},
{
"type": "HighlightSegment",
"data": "adjkhajksdhaskjdhkjHKJHjkhaskjdhkjasdhKHKJHkjsdhfkjsdhfkjsdhKHJKHjksdfhsdjkfhksdjhKHKJHJKhsdkfjhsdkfjhKHJKHjksdkfjhsdkjfhKHKJHjkhsdkfjhsdkjfhsdjkfhksdjfhKJHKjksdhfsdjkfhksdjfhsdkjhKHJKhsdkfhsdkjfhsdkfhdskjhKHKjhsdfkjhsdjkfh=",
"length": 119
},
{
"type": "DataSegment",
"data": "sdfdsfsdfSDFSDFdSFDS546SDFSDFDSFG657=",
"length": 23
}
],
"was_redirect_followed": false,
"request_time": "123541191466"
},
"information_items": [
"Other: user_id"
]
}
],
"internal_data": "adjkhajksdhaskjdhkjHKJHjkhaskjdhkjasdhKHKJHkjsdhfkjsdhfkjsdhKHJKHjksdfhsdjkfhksdjhKHKJHJKhsdkfjhsdkfjhKHJKHjksdkfjhsdkjfhKHKJHjkhsdkfjhsdkjfhsdjkfhksdjfhKJHKjksdhfsdjkfhksdjfhsd=="
}
},
{
"id": "1238",
"type": "issue_found",
"issue": {
"name": "parameter pollution",
"type_index": 4137000,
"serial_number": "123456789810290176",
"origin": "https://test.com",
"path": "/robots.txt",
"severity": "low",
"confidence": "firm",
"description": "very long description text here...",
"caption": "/robots.txt [URL path filename]",
"evidence": [
{
"type": "FirstOrderEvidence",
"detail": {
"payload": {
"bytes": "Q3jkeiZkcmg8MQ==",
"flags": 0
},
"band_flags": [
"in_band"
]
},
"request_response": {
"url": "https://test.com/%3fhdz%26drh%3d1",
"request": [
{
"type": "DataSegment",
"data": "W1QOIC8=",
"length": 5
},
{
"type": "HighlightSegment",
"data": "WRMnBGR6JTI2ZHJoJTNkMQ==",
"length": 16
},
{
"type": "DataSegment",
"data": "adjkhajksdhaskjdhkjHKJHjkhaskjdhkjasdhKHKJHkjsdhfkjsdhfkjsdhKHJKHjksdfhsdjkfhksdjhKHKJHJKhsdkfjhsdkfjhKHJKHjksdkfjhsdkjfhKHKJHjkhsdkfjhsdkjfhsdjkfhksdjfhKJHKjksdhfsdjkfhksdjfhsdkjhKHJKhsdkfhsdkjfhsdkfhdskjhKHKjhsdfkjhsdjkfhcvxxcvklxcvjkxclvjxclkvjxcklvjlxckjvlxckjvklxcjvxcklvjxcklvjxckljvlxckjvxcklvjxckljvxcklvjcklxjvcxkl==",
"length": 298
}
],
"response": [
{
"type": "DataSegment",
"data": "adjkhajksdhaskjdhkjHKJHjkhaskjdhkjasdhKHKJHkjsdhfkjsdhfkjsdhKHJKHjksdfhsdjkfhksdjhKHKJHJKhsdkfjhsdkfjhKHJKHjksdkfjhsdkjfhKHKJHjkhsdkfjhsdkjfhsdjkfhksdjfhKJHKjksdhfsdjkfhksdjfhsdkjhKHJKhsdkfhsdkjfhsdkfhdskjhKHKjhsdfkjhsdjkfh==",
"length": 130
},
{
"type": "HighlightSegment",
"data": "Q4jleiZkcmg9MQ==",
"length": 10
},
{
"type": "DataSegment",
"data": "adjkhajksdhaskjdhkjHKJHjkhaskjdhkjasdhKHKJHkjsdhfkjsdhfkjsdhKHJKHjksdfhsdjkfhksdjhKHKJHJKhsdkfjhsdkfjhKHJKHjksdkfjhsdkjfhKHKJHjkhsdkfjhsdkjfhsdjkfhksdjfhKJHKjksdhfsdjkfhksdjfhsdkjhKHJKhsdkfhsdkjfhsdkfhdskjhKHKjhsdfkjhsdjkfh==",
"length": 163
}
],
"was_redirect_followed": false,
"request_time": "51"
}
}
],
"internal_data": "adjkhajksdhaskjdhkjHKJHjkhaskjdhkjasdhKHKJHkjsdhfkjsdhfkjsdhKHJKHjksdfhsdjkfhksdjhKHKJHJKhsdkfjhsdkfjhKHJKHjksdkfjhsdkjfhKHKJHjkhsdkfjhsdkjfhsdjkfhksdjfhKJHKjksdhfsdjkfhksdjfhsdkjhKHJKhsdkfhsdkjfhsdkfhdskjhKHKjhsdfkjhsdjkfh="
}
}
],
"event_logs": [],
"audit_items": []
}
I read it in R using jsonlite:
df_orig <- fromJSON('dast_sample_output.json', flatten= T)
This gives a nested list type R object. I wish to convert this list to a data frame in a tidy format with all the arrays and sub arrays being unnested.
If you run the str(df_orig), you could see the nested data frames in there.
How do I convert it to tidy format?
I tried unnest(), purrr but struggling to get into the tidy format for analysis? Any pointers would be highly appreciated.
Cheers,
use the jsonlite package function fromJSON()
edit:
set option flatten=T
edit2:
use content( x, 'text') before flattening
here is a full example converting to data.table:
get.json <- GET( apicall.text )
get.json.text <- content( get.json , 'text')
get.json.flat <- fromJSON( get.json.text , flatten = T)
dt <- as.data.table( get.json.flat )

Resources