Multi line Json read using USQL Custom extractor - u-sql

I could not find proper solution to make use Newtonsoft JsonExtractor to parse Input file with new line delimiter.
From the Newtonsoft JsonExtractor I can read the first line successfully when exploded with "$.d.results[*]" but it's not moving to the next line,
As the data is >4MB for each row, can't extract as text. So parsing need to be performed using custom extractor to proceed.
Sample Input:
{"d":{"results":[{"data":{"Field_1":"1","Field_2":"2"},"Field_3":"3","Field_4":"4"}]}}
{"d":{"results":[{"data":{"Field_1":"11","Field_2":"21"},"Field_3":"31","Field_4":"41"}]}}
{"d":{"results":[{"data":{"Field_1":"12","Field_2":"22"},"Field_3":"32","Field_4":"42"}]}}
Expected Output:
Field_1|Field_2|Field_3|Field_4
1 |2 |3 |4
11 |21 |31 |41
12 |22 |32 |42
USQL Code:
CREATE ASSEMBLY IF NOT EXISTS [Microsoft.Analytics.Samples.Formats] FROM #"Microsoft.Analytics.Samples.Formats.dll";
CREATE ASSEMBLY IF NOT EXISTS [Newtonsoft.Json] FROM #"Newtonsoft.Json.dll";
REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats];
REFERENCE ASSEMBLY [Newtonsoft.Json];
USING Microsoft.Analytics.Samples.Formats.Json;
DECLARE #DATA_SOURCE string = "Input.data" ;
#SOURCE =
EXTRACT Field_1 string,
Field_2 string,
Field_3 string,
Field_4 string
FROM #DATA_SOURCE
USING new JsonExtractor("$.d.results[*]");

Considering your source input as below, i.e. more than one JSON document in a file.
{
"d": {
"results": [
{
"data": {
"Field_1": "1",
"Field_2": "2"
},
"Field_3": "3",
"Field_4": "4"
}
]
}
}
{
"d": {
"results": [
{
"data": {
"Field_1": "11",
"Field_2": "21"
},
"Field_3": "31",
"Field_4": "41"
}
]
}
}
{
"d": {
"results": [
{
"data": {
"Field_1": "12",
"Field_2": "22"
},
"Field_3": "32",
"Field_4": "42"
}
]
}
}
// if more than one JSON document in a file
The JSON assembly includes a MultiLevelJsonExtractor which allows us to extract data from multiple JSON paths at differing levels within a single pass. See the underlying code and inline documentation over at Github.
Use it by supplying multiple levels of Json Paths. They will be assigned to the schema by index.
The code snippet shows the MultiLevelJsonExtractor in action.
The first parameter (rowpath) specifies the base path to start from.
The second parameter (bypassWarning) is expecting a boolean value.
True = If path isn't found: don't error, return null. False = If path
isn't found: error.
The third parameter (jsonPaths) is a list of JSON paths starting at
the base path otherwise the extractor will recurse to the top of the
tree to locate it.
in your case....
#json =
EXTRACT
Field_1 string,
Field_2 string,
Field_3 string,
Field_4 string,
FROM
#DATA_SOURCE
USING new MultiLevelJsonExtractor("d.results[*]",
true,
"data.Field_1",
"data.Field_2",
"Field_3",
"Field_4",
);

Related

JSON path evaluation inside JSON path expression

I've got this very simple json :
{
"authors": [
{
"id": 1,
"name": "Douglas Adams"
},
{
"id": 2,
"name": "John Doe"
}
],
"books": [
{
"name": "The Hitchhiker's Guide to the Galaxy",
"author_id": 1
}
]
}
I would like to request the name of the author of "The Hitchhiker's Guide to the Galaxy".
I've tried this JSON path but it doesn't work:
$.authors[?(#.id == $.books[?(#.name == "The Hitchhiker's Guide to the Galaxy")].author_id)].name
All online tools I tried indicate a syntax error which seems due to the presence of a JSON path inside my filter.
Could anyone please help me figure out what's wrong and what is the right syntax?
Thanks!
When you running this filter
$.books[?(#.name == "The Hitchhiker's Guide to the Galaxy")].author_id
it returns an array instead of a value:
[
1
]
Syntax error occurs when you pass an array to compare with the value of id:
$.authors[?(#.id == {the array value}].author_id)].name
However, you may not be able to extract the value using JSONPath, depends on the language you are using. See Getting a single value from a JSON object using JSONPath

How to return true if x data exists in JSON or CSV from API on Wordpress website

is there any easy method to call APIs from Wordpress website and return true or false, depends if some data is there?
Here is the API:
https://api.covalenthq.com/v1/137/address/0x3FEb1D627c96cD918f2E554A803210DA09084462/balances_v2/?&format=JSON&nft=true&no-nft-fetch=true&key=ckey_docs
here is a JSON:
{
"data": {
"address": "0x3feb1d627c96cd918f2e554a803210da09084462",
"updated_at": "2021-11-13T23:25:27.639021367Z",
"next_update_at": "2021-11-13T23:30:27.639021727Z",
"quote_currency": "USD",
"chain_id": 137,
"items": [
{
"contract_decimals": 0,
"contract_name": "PublicServiceKoalas",
"contract_ticker_symbol": "PSK",
"contract_address": "0xc5df71db9055e6e1d9a37a86411fd6189ca2dbbb",
"supports_erc": [
"erc20"
],
"logo_url": "https://logos.covalenthq.com/tokens/137/0xc5df71db9055e6e1d9a37a86411fd6189ca2dbbb.png",
"last_transferred_at": "2021-11-13T09:45:36Z",
"type": "nft",
"balance": "0",
"balance_24h": null,
"quote_rate": 0.0,
"quote_rate_24h": null,
"quote": 0.0,
"quote_24h": null,
"nft_data": null
}
],
"pagination": null
},
"error": false,
"error_message": null,
"error_code": null
}
I want to check if there is "PSK" in contract_ticker_symbol, if it exist and "balance" is > 0 ... then return true.
Is there any painless method because I'm not a programmer...
The Python requests library can handle this. You'll have to install it with pip first (package installer for Python).
I also used a website called JSON Parser Online to see what was going on with all of the data first so that I would be able to make sense of it in my code:
import requests
def main():
url = "https://api.covalenthq.com/v1/137/address/0x3FEb1D627c96cD918f2E554A803210DA09084462/balances_v2/?&format" \
"=JSON&nft=true&no-nft-fetch=true&key=ckey_docs "
try:
response = requests.get(url).json()
for item in response['data']['items']:
# First, find 'PSK' in the list
if item['contract_ticker_symbol'] == "PSK":
# Now, check the balance
if item['balance'] == 0:
return True
else:
return False
except requests.ConnectionError:
print("Exception")
if __name__ == "__main__":
print(main())
This is what is going on:
I am pulling all of the data from the API.
I am using a try/except clause because I need the code to
handle if I can't make a connection to the site.
I am looping through all of the 'items' to find the correct 'item'
that includes the contract ticker symbol for 'PSK'.
I am checking the balance in that item and returning the logic that you wanted.
The script is running itself at the end, but you can always just rename this function and have some other code call it to check it.

Karate: Using data-driven embedded template approach for API testing

I want to write data-driven tests passing dynamic values reading from external file (csv).
Able to pass dynamic values from csv for simple strings (account number & affiliate id below). But, using embedded expressions, how can I pass dynamic values from csv file for "DealerReportFormats" json array below?
Any help is highly-appreciated!!
Scenario Outline: Dealer dynamic requests
Given path '/dealer-reports/retrieval'
And request read('../DealerTemplate.json')
When method POST
Then status 200
Examples:
| read('../DealerData.csv') |
DealerTemplate.json is below
{
"DealerId": "FIXED",
"DealerName": "FIXED",
"DealerType": "FIXED",
"DealerCredentials": {
"accountNumber": "#(DealerCredentials_AccountNumber)",
"affiliateId": "#(DealerCredentials_AffiliateId)"
},
"DealerReportFormats": [
{
"name": "SalesReport",
"format": "xml"
},
{
"name": "CustomerReport",
"format": "txt"
}
]
}
DealerData.csv:
DealerCredentials_AccountNumber,DealerCredentials_AffiliateId
testaccount1,123
testaccount2,12345
testaccount3,123456
CSV is only for "flat" structures, so trying to mix that with JSON is too ambitious in my honest opinion. Please look for another framework if needed :)
That said I see 2 options:
a) use proper quoting and escaping in the CSV
b) refer to JSON files
Here is an example:
Scenario Outline:
* json foo = foo
* print foo
Examples:
| read('test.csv') |
And test.csv is:
foo,bar
"{ a: 'a1', b: 'b1' }",test1
"{ a: 'a2', b: 'b2' }",test2
I leave it as an exercise to you if you want to escape double-quotes. It is possible.
Option (b) is you can refer to stand-alone JSON files and read them:
foo,bar
j1.json,test1
j2.json,test2
And you can do * def foo = read(foo) in your feature.

How to present empty relationship in JSONAPI?

Looked at http://jsonapi.org/format/ but don't see any description about empty relationship format, for example:
{
"type": "articles",
"id": "1",
"attributes": {
"title": "Rails is Omakase"
},
"relationships": {
"comments": {
"data": []
}
}
}
This article doesn't have comments, what is the correct way to present a empty relationship?
"data": [] or "data": null or no "relationships" at all?
Thanks!
It's described in resource linkage chapter of JSON API specification:
Resource linkage MUST be represented as one of the following:
null for empty to-one relationships.
an empty array ([]) for empty to-many relationships.
a single resource identifier object for non-empty to-one relationships.
an array of resource identifier objects for non-empty to-many relationships.
Note that you don't have to use resource linkage at all. You could also use relationship links. You could find more information about that one in this part of specification.

"Reverse formatting" Riak search results

Let's say I have an object in the test bucket in my Riak installation with the following structure:
{
"animals": {
"dog": "woof",
"cat: "miaow",
"cow": "moo"
}
}
When performing a search request for this object, the structure of the search results is as follows:
{
"responseHeader": {
"status": 0,
"QTime": 3,
"params": {
"q": "animals_cow:moo",
"q.op": "or",
"filter":"",
"wt": "json"
}
},
"response": {
"numFound": 1,
"start": 0,
"maxScore": "0.353553",
"docs": [
{
"id": "test",
"index": "test",
"fields": {
"animals_cat": "miaow",
"animals_cow": "moo",
"animals_dog": "woof"
},
"props": {}
}
]
}
}
As you can see, the way the object is stored, the cat, cow and dog keys are nested within animals. However, when the search results come back, none of the keys are nested, and are simply separated by _.
My question is this: Is there any way provided by Riak to "reverse format" the search, and return the fields of the object in the correct (nested) format? This becomes a problem when storing and returning user data that might possibly contain _.
I do see that the latest version of Riak (beta release) provides a search schema, but I can't seem to see whether my question would be answered by this.
What you receive back in the search result is what the object looked like after passing through the json analyzer. If you need the data formatted differently, you can use a custom analyzer. However, this will only affect newly put data.
For existing data, you can use the id field and issue a get request for the original object, or use the solr query as input to a MapReduce job.

Resources