query<-'{
"query": [
{
"code": "Region",
"selection": {
"filter": "item",
"values": [
"3010",
"4020"
]
}
},
{
"code": "Sex",
"selection": {
"filter": "item",
"values": [
"1",
"2"
]
}
}
],
"response": {
"format": "json-stat2"
}
}'
Now - how can I write the query string to file and retrieve it. So far I've tried writelines, save etc, but I retrieve multiple strings rather than one long string.
The issue is that you have "\n" in your string, which forces new lines in your saved file, and then when you read it you will get multiple string. One option is to read it and then collapse it
paste0(readLines("abc.txt"),collapse="")
another is to deal away with the nuisances when writing the file, then you will get only one string
writeLines(gsub('"',"'",gsub("\n| ","",query)),"abc.txt")
Related
have a very large JSON data like below
{
"10.10.10.1": {
"asset_id": 1,
"referencekey": "ASSET-00001",
"hostname": "testDev01",
"fqdn": "ip-10-10.10.1.ap-northeast-2.compute.internal",
"network_zone": [
"DEV",
"Dev"
],
"service": {
"name": "TEST_SVC",
"account": "AWS_TEST",
"billing": "Testpay"
},
"aws": {
"tags": {
"Name": "testDev01",
"Service": "TEST_SVC",
"Usecase": "Dev",
"billing": "Testpay",
"OsVersion": "20.04"
},
"instance_type": "t3.micro",
"ami_imageid": "ami-e000001",
"state": "running"
}
},
"10.10.10.2": {
"asset_id": 3,
"referencekey": "ASSET-47728",
"hostname": "Infra_Live01",
"fqdn": "ip-10-10-10-2.ap-northeast-2.compute.internal",
"network_zone": [
"PROD",
"Live"
],
"service": {
"name": "Infra",
"account": "AWS_TEST",
"billing": "infra"
},
"aws": {
"tags": {
"Name": "Infra_Live01",
"Service": "Infra",
"Usecase": "Live",
"billing": "infra",
"OsVersion": "16.04"
},
"instance_type": "r5.large",
"ami_imageid": "ami-e592398b",
"state": "running"
}
}
}
Can I use JQ to make the conversion like below?
Or is there an easier way to solve it?
Thank you
Expected result
_key,asset_id,referencekey,hostname,fqdn,network_zone/0,network_zone/1,service/name,service/account,service/billing,aws/tags/Name,aws/tags/Service,aws/tags/Usecase,aws/tags/billing,aws/tags/OsVersion,aws/instance_type,aws/ami_imageid,aws/state
10.10.10.1,1,ASSET-00001,testDev01,ip-10-10.10.1.ap-northeast-2.compute.internal,DEV,Dev,TEST_SVC,AWS_TEST,Testpay,testDev01,TEST_SVC,Dev,Testpay,20.04,t3.micro,ami-e000001,running
10.10.10.2,3,ASSET-47728,Infra_Live01,ip-10-10-10-2.ap-northeast-2.compute.internal,PROD,Live,Infra,AWS_TEST,infra,Infra_Live01,Infra,Live,infra,16.04,r5.large,ami-e592398b,running
jq let's you do the conversion to CSV easily. The following code produces the desired output:
jq -r 'to_entries
| map([.key,
.value.asset_id, .value.referencekey, .value.hostname, .value.fqdn,
.value.network_zone[0], .value.network_zone[1],
.value.service.name, .value.service.account, .value.service.billing,
.value.aws.tags.Name, .value.aws.tags.Service, .value.aws.tags.Usecase, .value.aws.tags.billing, .value.aws.tags.OsVersion,
.value.aws.instance_type, .value.aws.ami_imageid, .value.aws.state])
| ["_key","asset_id","referencekey","hostname","fqdn","network_zone/0","network_zone/1","service/name","service/account","service/billing","aws/tags/Name","aws/tags/Service","aws/tags/Usecase","aws/tags/billing","aws/tags/OsVersion","aws/instance_type","aws/ami_imageid","aws/state"]
, .[]
| #csv' "$INPUT"
Remarks
If some nodes in the input JSON are missing, the code does not break but fills in empty values in the CSV file.
If more than two network zones are given, only the first two are covered in the CSV file
I have one parquet file ,trying to get the data in to the table. In one column it have json with multiple values. Can someone help me how to do in kusto?
Pasting the json's schema:
{
"$schema": "http://json-schema.org/draft-04/schema#",
"type": "object",
"properties": {
"path": {
"type": "string"
},
"partitionValues": {
"type": "object",
"properties": {
"deviceId": {
"type": "string"
},
"date": {
"type": "string"
}
},
"required": [
"deviceId",
"date"
]
},
"size": {
"type": "integer"
},
"modificationTime": {
"type": "integer"
},
"dataChange": {
"type": "boolean"
},
"stats": {
"type": "string"
}
},
"required": [
"path",
"partitionValues",
"size",
"modificationTime",
"dataChange",
"stats"
]
}
If I understood correctly, your PQ file contains a column with a JSON of the specified schema. If you want to ingest it as-is, ingest it into Kusto column with type "dynamic" and query later. If you'd like to ingest just part of this JSON data (like some inner fields), use ingestion mapping and provide appropriate JSON path.
Another option is to ingest as-is into a source table with a Retention policy with SoftDeletePeriod of zero. Define an Update Policy with a KQL query and the transformed data will be pushed into a target table.
Assuming we have the following data structure
"data": [
{
"type": "node--press",
"id": "f04eab99-9174-4d00-bbbe-cdf45056660e",
"attributes": {
"nid": 130,
"uuid": "f04eab99-9174-4d00-bbbe-cdf45056660e",
"title": "TITLE OF NODE",
"revision_translation_affected": true,
"path": {
"alias": "/press/title-of-node",
"pid": 428,
"langcode": "es"
}
...
}
The data returned is compliant with JSON API standards, and I have no problem retrieving and processing it, except for the fact that I need to be able to filter the nodes returned by the path pid.
How can I filter my data by path.pid?
I have tried:
- node-press?filter[path][pid]=428
- node-press?filter[path][pid][value]=428
to no avail
It's not well defined in the filters section of the specification but other parameters such as include describe accessing nested keys with dot-notation. You could try ?filter[path.pid]=428 and parse the filter that way.
"field_country": {
"data": {
"type": "taxonomy_term--country",
"id": "818f11ab-dd9d-406b-b1ca-f79491eedd73"
}
}
Above structure can be filtered by ?filter[field_country.id]=818f11ab-dd9d-406b-b1ca-f79491eedd73
I am trying to extract a large amount of Data from a json document. There are about 1500 nodes per json document. When I attempt to load the body node I get the 128KB limit error. I have found a way to load the node but I have to go all the way down to the array list. JsonExtractor("body.nprobe.items[*]"); The issue I am having is that I cannot access any other part of the json document, I need to get the meta data, like: Id, SerialNumber etc. Should the json file be changed in some way? The data I need is 3 levels down. The json has be obfuscated and shortened, the real file is about 33K lines of formatted json about 1500 items with n-20 fields in each.
{
"headers": {
"pin": "12345",
"Type": "URJ201W-GNZGK",
"RTicks": "3345",
"SD": "211",
"Jov": "juju",
"Market": "Dal",
"Drst": "derre",
"Model": "qw22",
"DNum": "de34",
"API": "34f",
"Id": "821402444150002501"
},
"Id": "db5aacae3778",
"ModelType": "URJ",
"body": {
"uHeader": {
"ID": "821402444150002501",
"SerialNo": "ee028861",
"ServerName": "BRXTTY123"
},
"header": {
"form": 4,
"app": 0,
"Flg": 1,
"Version": 11056,
"uploadID": 1,
"uDate": "2016-04-14T18:29"
},
"nprobe": {
"items": [{
"purchaseDate": "2016-04-14T18:21:09",
"storeLoc": {
"latitude": 135.052335,
"longitude": 77.167005
},
"sr": {
"ticks": 3822,
"SkuId": "24",
"Data": {
"X": "0.00068",
"Y": "0.07246",
}
}
},
{
"purchaseDate": "2016-04-14T18:21:09",
"storeLoc": {
"latitude": 135.052335,
"longitude": 77.167005
},
"sr": {
"ticks": 3823,
"SkuId": "25",
"Data": {
"X": "0",
"Y": "2",
}
}
}]
}
},
"Messages": []
}
Thanks.
You'll have to use CROSS APPLY:
https://msdn.microsoft.com/en-us/library/azure/mt621307.aspx
and EXPLODE:
https://msdn.microsoft.com/en-us/library/azure/mt621306.aspx
See a worked out solution here:
https://github.com/algattik/USQLHackathon/blob/master/VS-Solution/USQLApplication/ciam-to-sqldw.usql
https://github.com/algattik/USQLHackathon/blob/master/Samples/Customer/customers%202016-08-10.json
-- IMPROVED ANSWER: --
As this solution will not work for you since your inner JSON is too large to fit in a string, you can parse the input twice:
DECLARE #input string = #"/so.json";
REFERENCE ASSEMBLY [Newtonsoft.Json];
REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats];
#meta =
EXTRACT Id string
FROM #input
USING new Microsoft.Analytics.Samples.Formats.Json.JsonExtractor();
#items =
EXTRACT purchaseDate string
FROM #input
USING new Microsoft.Analytics.Samples.Formats.Json.JsonExtractor("body.nprobe.items[*]");
#itemsFull =
SELECT Id,
purchaseDate
FROM #meta
CROSS JOIN #items;
OUTPUT #itemsFull
TO "/items_full.csv"
USING Outputters.Csv();
I'm using Following Query :
g.V(741440).outE('Notification').order().by('PostedDateLong', decr).range(0,1).as('notificationInfo').match(
__.as('notificationInfo').inV().as('postInfo'),
).select('notificationInfo','postInfo')
it is giving following result :
{
"requestId": "9846447c-4217-4103-ac2e-de3536a3c62a",
"status": {
"message": "",
"code": 200,
"attributes": { }
},
"result": {
"data": [
{
"notificationInfo": {
"id": "c0zs-fw3k-347p-g2g0",
"label": "Notification",
"type": "edge",
"inVLabel": "Comment",
"outVLabel": "User",
"inV": 749664,
"outV": 741440,
"properties": {
"ParentPostId": "823488",
"PostedDate": "2016-05-26T02:35:52.3889982Z",
"PostedDateLong": 635998269523889982,
"Type": "CommentedOnPostNotification",
"NotificationInitiatedByVertexId": "1540312"
}
},
"postInfo": {
"id": 749664,
"label": "Comment",
"type": "vertex",
"properties": {
"PostImage": [
{
"id": "amto-g2g0-2wat",
"value": ""
}
],
"PostedByUser": [
{
"id": "am18-g2g0-2txh",
"value": "orbitpage#gmail.com"
}
],
"PostedTime": [
{
"id": "amfg-g2g0-2upx",
"value": "2016-05-26T02:35:39.1489483Z"
}
],
"PostMessage": [
{
"id": "aln0-g2g0-2t51",
"value": "hi"
}
]
}
}
}
],
"meta": { }
}
}
I want to get information of Vertex "NotificationInitiatedByVertexId" (Edge Property ) in the response as well.
For that i tried following query :
g.V(741440).outE('Notification').order().by('PostedDateLong', decr).range(0,2).as('notificationInfo').match(
__.as('notificationInfo').inV().as('postInfo'),
g.V(1540312).next().as('notificationByUser')
).select('notificationInfo','postInfo','notificationByUser')
Note : I tried directly with vertex Id in subquery as I wasn't aware how to dynamically get value from edge property in query itself.
It is giving error. I tried a lot but am not able to find any solution.
I'm assuming that you are storing a Titan generated identifier in that edge property called NotificationInitiatedByVertexId. If so, please consider the following even though this first part doesn't really answer your question. I don't think you should store a vertex identifier on the edge. Your graph model should explicitly track the relationship of NotificationInitiatedBy with an edge and by storing the identifier of the vertex on the edge itself you are bypassing that. Also, if you ever have to migrate your data in some way, the ids won't be preserved (Titan will generate new ones) and trying to sort that out will be a mess.
Even if that is not a Titan generated identifier and a logical one you created, I still think I would look to adjust your graph schema and promote that Notification to a vertex. Then your Gremlin traversals would flow more easily.
Now, assuming you don't change that, then I don't see a reason to not just issue two queries in the same request and then combine the results to one data structure. You just need to do a lookup with the vertex id which is going to be pretty fast and inexpensive:
edgeStuff = g.V(741440).outE('Notification').
order().by('PostedDateLong', decr).range(0,1).as('notificationInfo').
... // whatever logic you have
select('notificationInfo','postInfo').next()
vertexStuff = g.V(edgeStuff.get('notificationInfo').value('NotificationInitiatedByVertexId')).next()
[notificationInitiatedBy: vertexStuff, notification: edgeStuff]