Convert large xml values into double type json? - xquery

I'm forming an xml whose snippet is -
<cache-properties>
<list-cache-hit-rate>
<units>hits/sec</units>
<value>1.5308452E6</value>
</list-cache-hit-rate>
<list-cache-miss-rate>
<units>misses/sec</units>
<value>25422.167</value>
</list-cache-miss-rate>
<compressed-tree-cache-hit-rate>
<units>hits/sec</units>
<value>970.2339</value>
Notice the value 1.5308452E6 is big enough that the values are stored as exponent while performing fn:sum() behind the scene.
Later, I'm converting the xml to json by the following function -
let $arr := json:to-array(local:tojson($data))
return (($data))
and value converted looks like this -
cache-properties": {
"list-cache-hit-rate": {
"units": "hits/sec",
"value": 1.5308452E6
},
"list-cache-miss-rate": {
"units": "misses/sec",
"value": "25422.167"
},
"compressed-tree-cache-hit-rate": {
"units": "hits/sec",
"value": "970.2339"
},
Notice the values are enclosed in quotes except 1.5308452E6 this value. This is not in quotes. What correction is needed here ? Or is this correct? I'd rather have all values in quotes. This is my custom transform function-
declare function local:tojson($func){
let $custom := let $config := json:config("custom")
let $_ := map:put( $config, "whitespace", "ignore" )
let $_ := map:put( $config, "array-element-names", "Video" )
return $config
return json:transform-to-json($func,$custom)
};

Take a look at the xml schema. Your snippets appear to be similar or identical to marklogic system status xml schema however you mention 'fn:sum in the background' so Im guessing you have applied a transformation which has changed the xsd type.
The json transformation code uses the XSD type if in scope to determine the typed output in JSON (for XML numeric types). Also if the number is 'too large' it can convert to string to avoid JavaScript issue.
( it basically uses fn:data(value) to convert )
If needed you can either force a string type onto your xml, or you can specialize the transformation by overriding one of the json-custom: primitives in json/custom.xqy by supplying the appropriate mapping in the config. Look into the source for the full list of overridable functions. They are not fully documented as they are not with full generality in mind and may not be obvious, easy or possibly to change behaviour in every conceivable way.
The strategies are to either
Use an XML with schema in scope that types atomic values explicitly (in your case as xs:string),
Override one of the low level functions in custom.xqy
Convert the JSON by post-processing and 'stringify' the desired elements
Roll your own (not too difficult with the samples show)
All of the above

Related

Passing additional arguments to _normalise_coerse methods in cerberus

I have some code see EOM; it's by no means final but is the best way (so far) I've seen/conceived for validating multiple date formats in a somewhat performant way.
I'm wondering if there is a means to pass an additional argument to this kind of function (_normalise_coerce), it would be nice if the date format string could be defined in the schema. something like
{
"a_date":{
"type": "datetime",
"coerce": "to_datetime",
"coerce_args": "%m/%d/%Y %H:%M"
}
}
Vs making a code change in the function to support an additional date format. I've looked through the docs and not found anything striking. Fairly good chance I'm looking at this all wrong but figured asking the experts was the best approach. I think defining within the schema is the cleanest solution to the problem, but I'm all eyes and ears for facts, thoughts and opinions.
Some context:
Performance is essential as this could be running against millions of rows in AWS lambdas (and Cerbie (my nickname for cerberus) isn't exactly a spring chicken :P ).
None of the schemas will be native python dicts as they're all defined in JSON/YAML, so it all needs to be string friendly.
Not using the built-in coercion as the python types cannot be parsed from strings
I don't need the datetime object, so regex is a possibility, just less explicit and less futureproof.
If this is all wrong and I'm grossly incompetent, please be gentle (づ。◕‿‿◕。)づ
def _normalize_coerce_to_datetime(self, value: Union(str, datetime, None)) -> Union(datetime, str, None):
'''
Casts valid datetime strings to the datetime python type.
:param value: (str, datetime, None): python datetime, datetime string
:return: datetime, string, None. python datetime,
invalid datetime string or None if the value is empty or None
'''
datetime_formats = ['%m/%d/%Y %H:%M']
if isinstance(value, datetime):
return value
if value and not value.isspace():
for format in datetime_formats:
try:
return datetime.strptime(value, format)
except ValueError:
date_time = value
return date_time
else:
return None
I have attempted to do this myself and have not found a way to pass additional arguments to a custom normalize_coerce rule. If you want to extend the Cerberus library to include custom validators then you can include arguments and then access these through the constraints in the custom validator. The below is an example that I have used for a conditional to default coercer, but as I needed to specify the condition and both the value to check against and the value to return I couldn't find a way to do this with the normalize_coerce and hence applied inside a validate rule and edited the self.document, as seen by the code.
Schema:
{
"columns":{
"Customer ID":{
"type":"number",
"conditional_to_default":{
"condition":"greater_than",
"value_to_check_against":100,
"value_to_return":22
}
}
}
}
def _validate_conditional_to_default(self, constraint, field, value):
"""
Test the values and transform if conditions are met.
:param constraint: Dictionary with the args needed for the conditional check.
:param field: Field name.
:param value: Field value.
:return: the new document value if applicable, or keep the existing document value if not
"""
value_to_check_against = constraint["value_to_check_against"]
value_to_return = constraint["value_to_return"]
rule_name = 'conditional_to_default'
condition_mapping_dict = {"greater_than": operator.gt, "less_than": operator.lt, "equal_to": operator.eq,
"less_than_or_equal_to": operator.le,
"greater_than_or_equal_to": operator.ge}
if constraint["condition"] in condition_mapping_dict:
if condition_mapping_dict[constraint["condition"]](value, value_to_check_against):
self.document[field] = value_to_return
return self.document
else:
return self.document
if constraint["condition"] not in condition_mapping_dict:
custom_errors_list = []
custom_error = cerberus.errors.ValidationError(document_path=(field, ), schema_path=(field, rule_name),
code=0x03, rule=rule_name, constraint="Condition must be "
"one of: "
"{condition_vals}"
.format(condition_vals=list(condition_mapping_dict.keys())),
value=value, info=())
custom_errors_list.append(custom_error)
self._error(custom_errors_list)
return self.document
This is probably the wrong way to do it, but I hope the above gives you some inspiration and gets you a bit further. Equally I'm following this to see if anyone else has found a way to pass arguments to the _normlize_coerce function.

JSONPath - Filter expression to print a field if an array contains a string

I have the following JSON and am trying to write a JSON Path expression which will return me the isbn number when I have a id of either '123456789' or '987654321'. I tried the following but this did not work. Can anybody tell me what I am doing wrong please. Thanks in advance
JSON Path Expression
$.books[?(#.ids== '123456789' )].isbnNumber
JSON
{
"books": [{
"title": "10",
"isbnNumber": "621197725636",
"ids": [
"123456789",
"987654321"
]
}]
}
The (more traditional) JSONPath implementations that stick close(r) to Goessner's reference specification do not offer handy functions like in which are available in extended implementations like JayWay's JSONPath.
Using Gatling's JSONPath, one thing we could do if the positions of the Ids in question are fixed is accessing their respective indices directly to make the comparison:
$.books[?(#.ids[0] == "123456789" || #.ids[1] == "987654321")].isbnNumber
This will give you the desired result of your example; however, some books only have one of the two indices, or they Id to compare to shows up on a different position it won't work.

Add element to arrays, that are values to a given key name (json transformation with jq)

I'm a jq newbie, and I try to transform a json (a Swagger spec). I want to add an element to the array value of the "parameter" keys:
{
...
"paths": {
"/great/endpoint1": {
"get": {
"parameters": [] <<--- add a value here
}
}
"/great/endpoint2": {
"post": {
"parameters": [] <<-- and here too here too etc.
....
The following jqplay almost works. It adds values to the right arrays, but it has the nasty side effect of also removing the "x-id" value from the root of the input json. It's probably because of a faulty if-condition. As the paths contain a varying string (the endpoint names), I don't know how to write a wildcard path expression to address those, which is why I have tried using walk instead:
https://jqplay.org/s/az56quLZa3
Since the sample data is incomplete, it's difficult to say exactly what you're looking for but it looks like you should be using parameters in the call to walk:
walk(if type=="object" and has("parameters")
then .parameters += [{"extra": "value"}]
else . end)
If you want to restrict the walk to the top-level paths, you would preface the above with: .paths |=

eXist-db serialize is expand-xincludes=no ignored?

In eXist-db 4.4, Xquery 3.1, I am compressing a number of XML files to a .zip in a directory. The compression process uses serialize().
The XML files have some large xincludes which according to the documentation are automatically processed in serializing. I have attempted to 'turn off' the xinclude serialization in two places in the code (prologue declare and map), but the serializer is still outputting all xincludes:
declare option exist:serialize "expand-xincludes=no";
declare function zip:get-entries-for-zip()
{
(: get documents prefixed by 'MS609' :)
let $pref := "MS609"
(: get list of document names :)
let $doclist := xmldb:get-child-resources($globalvar:URIdata)[starts-with(., $pref)]
(: output serialized entries :)
let $entries :=
for $n in $doclist
return
<entry name="{$n}" type='text' method='store'>
{serialize(doc(concat($globalvar:URIdata, "/", $n)), map { "method": "xml", "expand-xincludes": "no"})}
</entry>
return $entries
};
The XML data with xincludes to reproduce this problem can be found here http://medieval-inquisition.huma-num.fr/downloads under the description "BM MS609 Edition (tei-xml)".
Many thanks in advance.
The expand-xincludes serialization parameter is specific to eXist and, as such (or at least at present), cannot be set using the fn:serialize() function. Instead, use the util:serialize() function:
util:serialize($document, "expand-xincludes=no")
Alternatively, since you're ultimately interested in zipping the contents of a collection, you can skip the explicit serialization step, declare your serialization options in the query's prolog (or set it inline using util:declare-option()), and simply provide the compression:zip() function the URI path(s) to the collections/documents you want to zip. For example:
xquery version "3.1";
declare option exist:serialize "expand-xincludes=no";
let $sources := "/db/apps/my-app/my-data" (: or a sequence of paths to individual docs:) ! xs:anyURI(.)
let $preserve-collection-structure := false()
let $zip := compression:zip($sources, $preserve-collection-structure),
return
xmldb:store("/db", "my-data.zip", $zip)
For more on serialization options in eXist, see my earlier answer to a similar question: https://stackoverflow.com/a/49290616/659732.

idl: pass keyword dynamically to isa function to test structure read by read_csv

I am using IDL 8.4. I want to use isa() function to determine input type read by read_csv(). I want to use /number, /integer, /float and /string as some field I want to make sure float, other to be integer and other I don't care. I can do like this, but it is not very readable to human eye.
str = read_csv(filename, header=inheader)
; TODO check header
if not isa(str.(0), /integer) then stop
if not isa(str.(1), /number) then stop
if not isa(str.(2), /float) then stop
I am hoping I can do something like
expected_header = ['id', 'x', 'val']
expected_type = ['/integer', '/number', '/float']
str = read_csv(filename, header=inheader)
if not array_equal(strlowcase(inheader), expected_header) then stop
for i=0l,n_elements(expected_type) do
if not isa(str.(i), expected_type[i]) then stop
endfor
the above doesn't work, as '/integer' is taken literally and I guess isa() is looking for named structure. How can you do something similar?
Ideally I want to pick expected type based on header read from file, so that script still works as long as header specifies expected field.
EDIT:
my tentative solution is to write a wrapper for ISA(). Not very pretty, but does what I wanted... if there is cleaner solution , please let me know.
Also, read_csv is defined to return only one of long, long64, double and string, so I could write function to test with this limitation. but I just wanted to make it to work in general so that I can reuse them for other similar cases.
function isa_generic,var,typ
; calls isa() http://www.exelisvis.com/docs/ISA.html with keyword
; if 'n', test /number
; if 'i', test /integer
; if 'f', test /float
; if 's', test /string
if typ eq 'n' then return, isa(var, /number)
if typ eq 'i' then then return, isa(var, /integer)
if typ eq 'f' then then return, isa(var, /float)
if typ eq 's' then then return, isa(var, /string)
print, 'unexpected typename: ', typ
stop
end
IDL has some limited reflection abilities, which will do exactly what you want:
expected_types = ['integer', 'number', 'float']
expected_header = ['id', 'x', 'val']
str = read_csv(filename, header=inheader)
if ~array_equal(strlowcase(inheader), expected_header) then stop
foreach type, expected_types, index do begin
if ~isa(str.(index), _extra=create_struct(type, 1)) then stop
endforeach
It's debatable if this is really "easier to read" in your case, since there are only three cases to test. If there were 500 cases, it would be a lot cleaner than writing 500 slightly different lines.
This snipped used some rather esoteric IDL features, so let me explain what's happening a bit:
expected_types is just a list of (string) keyword names in the order they should be used.
The foreach part iterates over expected_types, putting the keyword string into the type variable and the iteration count into index.
This is equivalent to using for index = 0, n_elements(expected_types) - 1 do and then using expected_types[index] instead of type, but the foreach loop is easier to read IMHO. Reference here.
_extra is a special keyword that can pass a structure as if it were a set of keywords. Each of the structure's tags is interpreted as a keyword. Reference here.
The create_struct function takes one or more pairs of (string) tag names and (any type) values, then returns a structure with those tag names and values. Reference here.
Finally, I replaced not (bitwise not) with ~ (logical not). This step, like foreach vs for, is not necessary in this instance, but can avoid headache when debugging some types of code, where the distinction matters.
--
Reflective abilities like these can do an awful lot, and come in super handy. They're work-horses in other languages, but IDL programmers don't seem to use them as much. Here's a quick list of common reflective features I use in IDL, with links to the documentation for each:
create_struct - Create a structure from (string) tag names and values.
n_tags - Get the number of tags in a structure.
_extra, _strict_extra, and _ref_extra - Pass keywords by structure or reference.
call_function - Call a function by its (string) name.
call_procedure - Call a procedure by its (string) name.
call_method - Call a method (of an object) by its (string) name.
execute - Run complete IDL commands stored in a string.
Note: Be very careful using the execute function. It will blindly execute any IDL statement you (or a user, file, web form, etc.) feed it. Never ever feed untrusted or web user input to the IDL execute function.
You can't access the keywords quite like that, but there is a typename parameter to ISA that might be useful. This is untested, but should work:
expected_header = ['id', 'x', 'val']
expected_type = ['int', 'long', 'float']
str = read_cv(filename, header=inheader)
if not array_equal(strlowcase(inheader), expected_header) then stop
for i = 0L, n_elemented(expected_type) - 1L do begin
if not isa(str.(i), expected_type[i]) then stop
endfor

Resources