Azure Data Explorer Data Connection DataFormat - azure-data-explorer

What is the difference between JSON and MULTIJSON data format? Isn't it all a JSON?
When should we use one or the other?
How should the payload look when using one or the other?

please see: https://learn.microsoft.com/en-us/azure/data-explorer/ingest-json-formats?tabs=kusto-query-language#the-json-format
Azure Data Explorer supports two JSON file formats:
json: Line separated JSON. Each line in the input data has exactly one JSON record.
multijson: Multi-lined JSON. The parser ignores the line separators and reads a record from the previous position to the end of a valid JSON.
and: https://learn.microsoft.com/en-us/azure/data-explorer/ingestion-supported-formats
JSON: A text file with JSON objects delimited by \n or \r\n. See JSON Lines (JSONL).
MultiJSON: A text file with a JSON array of property bags (each representing a record), or any number of property bags delimited by whitespace, \n or \r\n. Each property bag can be spread on multiple lines. This format is preferred over JSON, unless the data is non-property bags.
you should choose according to how your source data is formatted. if in doubt, choose multijson, as it 'contains' json.
example for json with 2 records:
{"Hello":"World"}
{"Foo":{"Bar":"x"}}
example for multijson with 2 records:
{
"Hello": "World"
}
{
"Foo": {
"Bar": "x"
}
}

Related

JanusGraph 3.x graphson import fails on prettified json

Does anyone else have this problem importing prettified json/graphson to JanusGraph?
Exactly the same file but not prettified (no carriage returns, tabs, whitespace) will import perfectly but if prettified it fails with the following error:
graph.io(graphson()).readGraph("data/tgraph2.json")
Could not deserialize the JSON value as required. Nested exception: java.lang.InstantiationException: Cannot deserialize the value with the detected type contained in the JSON ('tinker:graph') to the type specified in parameter to the object mapper (class java.util.LinkedHashMap). Those types are incompatible. at [Source: (ByteArrayInputStream); line: 1, column: 3]
Note, editing the file to remove the first line break manually and the error message changes to ... at [Source: (ByteArrayInputStream); line: 1, column: 12] etc - so it is definitely an issue with whitespace in the file.
Version 3.x
Surely this is not desired behaviour. json should work the same whether prettified with whitespace or minified.
Something to be aware of is that there are two forms of GraphSON. In one form, the entire file is not a single JSON document, instead each line is. Each line represents the adjacency list for a vertex.
That is the default GraphSON format and it is designed that way for streaming and so that the file can easily be broken up for multi-threaded operations. The other form is a single JSON document that contains all the vertices and then all the edges.
The formats are documented here
http://tinkerpop.apache.org/docs/3.4.1/dev/io/#graphson

How to avoid inconsistency when adding data to Firestore?

We are building an app where we want to display cities. Each city has also an image which is stored in Firebase Storage. We are adding data to the db either using the Firebase console or programmatically. The problem arrives when we add data that contain special characters, for instance, I have this url:
https://firebasestorage.googleapis.com ... München.png
This is how it looks like in the browser. If we are adding this url using the Firebase console it will be saved the same as above, however, when we do it programmatically, that url is saved:
https://firebasestorage.googleapis.com ... M%C3%BCnchen.png
So the following query:
db.collection("cities")
.whereEqualTo(
"cityPictureUrl",
"https://firebasestorage.googleapis.com ... München.png"
);
Won't work since the name in the database is M%C3%BCnchen and not München. How to have the data stored in most correct way to avoid inconsistency?
You may encode the URI before querying, as follows:
String imageURI = "https://firebasestorage.googleapis.com ... München.png";
String imageURIEncoded = URLEncoder.encode(imageURI, "utf-8");
db.collection("cities")
.whereEqualTo(
"cityPictureUrl",
imageURIEncoded
);
URLEncoder.encode() will "encode a Uniform Resource Identifier (URI) component by replacing each instance of certain characters by one, two, three, or four escape sequences representing the UTF-8 encoding of the character"

Jayway Jsonpath syntax for string array filter?

I am attempting to use the EvaluateJsonPath processor in Nifi, and am having trouble with the jayway jsonpath syntax.
My object looks like the following:
{"text":"my stuff", "tags":["abc", "xyz", "beq"]}
I want to route messages based on the tags - I want everything containing "xyz" to be routed one way, and everything not containing it to be routed another way.
Using http://jsonpath.herokuapp.com/ I've been testing and trying to figure out the syntax to filter based on a json object containing an array of strings matching. I can match based on overt index (so $.[?(#.tags[1] =~ /xyz/i)] works just fine), but I can't guarantee the order or number of objects in the tags field.
Is there a way to do this in the jayway json module? I saw filter the Json according to string in an array in JSONPATH which I've tried, but it doesn't appear to work in the simulator above.
I do not know how to do this in one EvaluateJsonPath processor step. But it can certainly be done in a two-step process:
Use EvaluateJsonPath to filter "xyz" tags out of the tags array, using a JsonPath expression like $.tags[?(# =~ /xyz/i)] and setting the processors return-type to json so an array may be returned. This will result in ["xyz"] for a match and [] for non-matching files
Use RouteOnAttribute to route based on the resulting array, with an expression like ${matchingTags:toLower():contains('xyz')}.
It might also be worth considering evaluating the JSON as text against a regular expression to match the tag.

"<" character in JSON data is serialized to \u003c

I have a JSON object where the value of one element is a string. In this string there are the characters "<RPC>". I take this entire JSON object and in my ASP.NET server code, I perform the following to take the object named rpc_response and add it to the data in a POST response:
var serializer = new System.Web.Script.Serialization.JavaScriptSerializer();
HttpContext.Current.Response.AddHeader("Pragma", "no-cache");
HttpContext.Current.Response.AddHeader("Cache-Control", "private, no-cache");
HttpContext.Current.Response.AddHeader("Content-Disposition", "inline; filename=\"files.json\"");
HttpContext.Current.Response.Write(serializer.Serialize(rpc_response));
HttpContext.Current.Response.ContentType = "application/json";
HttpContext.Current.Response.StatusCode = 200;
After the object is serialized, I receive it on the other end (not a web browser), and that particular string looks like: \u003cRPC\u003e.
What can I do to prevent these (and other) characters from not being encoded properly, still being able to serialize my JSON object?
The characters are being encoded "properly"!1 Use a working JSON library to correctly access the JSON data - it is a valid JSON encoding.
Escaping these characters prevents HTML injection via JSON - and makes the JSON XML-friendly. That is, even if the JSON is emited directly into JavaScript (as is done fairly often as JSON is a valid2 subset of JavaScript), it cannot be used to terminate the <script> element early because the relevant characters (e.g. <, >) are encoded within JSON itself.
The standard JavaScriptSerializer does not have the ability to change this behavior. Such escaping might be configurable (or different) in the Json.NET implementation - but, it shouldn't matter because a valid JSON client/library must understand the \u escapes.
1 From RFC 4627: The application/json Media Type for JavaScript Object Notation (JSON),
Any character may be escaped. If the character is in the Basic Multilingual Plane (U+0000 through U+FFFF), then it may be represented as a six-character sequence: a reverse solidus, followed by the lowercase letter u, followed by four hexadecimal digits that encode the character's code point ..
See also C# To transform Facebook Response to proper encoded string (which is also related to the JSON escaping).
2 There is a rare case when this does not hold, but ignoring (or accounting for) that..

Backslashes in JSON string

I'm not familiar with this format:
{"d":"{\"Table\":[{\"pCol\":12345,\"fCol\":\"jeff\",\"lCol\":\"Smith\",\"dId\":1111111,\"tDate\":\"\\/Date(1153033200000-0700)\\/\"}]}"}
I'm using Newtonsoft to serialize my DataSet that I'm returning from my ASP.Net webservice. The above JSON string is what Firebug is returning. I have checked this JSON using jsLint and it is good.
In firebug I see the JSON data and my first alert('success'); However when I try to alert(msg.d.Table); I get nothing. Not an alert box or an error in Firebug... I think it has something to do with these backslashes... But I'm not sure.
Any ideas?
Those backslashes are escape characters. They are escaping the double quotes inside of the string associated with d. The reason you cant alert msg.d.Table is because the value of d is a string. You have to use JSON.parse to parse that JSON string into a JSON object.
Then, you have to convert Table back to a string to alert it.
Something like this:
var dObj = JSON.parse(msg.d);
alert(JSON.stringify(dObj.Table, null, 2));
The ASP.Net webservice is already serializing the return value to JSON. (in a d property for security reasons)
When you return pre-serialized JSON data, it thinks you're giving it a normal string, and proceeds to serialize the string as JSON.
Therefore, you get a JSON object with a d property that contains the raw JSON string (with correctly escaped quotes) that you returned.
You should return the raw object and let ASP.Net serialize it for you instead of serializing it yourself.

Resources