I'm trying to define a schema such that all elements of a sequence are matched to a nested schema.
E.g.
data = {"items": ["hello", "bob", 123]}
schema = {"items": {"type": "list", schemas: [
{"type": "string", "length": 5},
{"type": "string", "length": 3,
{"type": "integer"},
]}}}
Where the validation would fail if any element in that sequence was missing, multiple elements matched the same schema or any elements don't match any schema.
I'm about to implement this as a customer validator implemented pseudocode as:
def _validate_contains_all(self, constraints, field, values) -> bool:
"""Test all the rules match one item"""
for constraint in constraints:
for i, value in enumerate(values):
if self.validate(value, constraint):
del values[i]
break
if len(values) > 0:
return False
return True
Is there an existing way to achieve this? The goal is to receive a json list of USB devices and validate that the match a specification of which devices should be present.
Related
What I want
Apply a JSONPath to given json response, to match specific elements by comparing their children's node keys with a value.
Input
{
"data": {
"ticket": {
"1": "foo",
"2": "bar",
"3": "baz"
}
}
}
Output (expected)
"3": "baz"
Case description
I want to apply a JSONPath expression, to filter ticket elements with ticket key greater than "2", so in this case it should match only the 3rd "baz" ticket.
ticket keys are only integer numbers in my data
Code area
This matches all node keys aka ticket keys
$.data.ticket.*~
This is a basic example of filtering
$..book[?(#.price<10)] // -> filter all books cheaper than 10
I am trying somehow to combine them in order to achieve the desired result
Where I test it
https://jsonpath.com/
References
https://goessner.net/articles/JsonPath/
It is possible with jsonpath-plus. The site https://jsonpath.com/ uses jsonpath-plus library internally.
It has some convenient additions or elaborations not provided in the original spec of jsonpath.
Use the #property to compare the value of the key.
$.data.ticket[?(#property > 2)]
Example json:
{
"a": 1,
"c": {
"ca": 1.1
},
"d": {},
"e": [1,2,3],
"f": [
{
"fa": "vf1",
"fb": "vf2",
"fc": [],
"fffs232/232": {
"z": 1
}
},
{
"fa": "vf3",
"fb": "vf4",
"fc": [1.1,2.3],
"fffs232/232": {
"z": 2
}
}
]
}
I want a full path jq expression that gives me the values of "z". Such expression should not explicitly mention "fffs232/232" since that key is dynamic.
Is this possible with jq?
Thanks!
You could use .., e.g. along the lines of:
jq '.. | objects | .z // empty'
If .z can take the value null, then adjust according to your requirements.
If the name is dynamic but the position is known, you can iterate over field candidates using .[] and check if a subfield "z" exists using select and has:
.f[][] | select(has("z")?).z
Demo
Alternatively, if the depths are also unknown, you can traverse the whole document using ..:
.. | select(has("z")?).z
Demo
Lets say I have this schema:
source_id -> String, HashKey
created_at -> String, RangeKey
required_capabilities -> StringSet
required_capabilities is a Set of Strings that we need to provide in the query in order to be able to retrieve a particular element.
For example:
If I have this three elements:
{
"source_id": "1",
"created_at": "2021-01-18T10:53:25Z",
"required_capabilities": ["Cap1", "Cap2", "Cap3"]
},
{
"source_id": "1",
"created_at": "2021-01-18T10:59:31Z",
"required_capabilities": ["Cap1", "Cap3"]
},
{
"source_id": "1",
"created_at": "2021-01-18T11:05:15Z"
}
I want to create a query, filtering for example source_id = "1" and providing a FilterExpression with the required_capabilities = ["Cap1", "Cap3", "Cap4"].
And I would expect as a result:
{
"source_id": "1",
"created_at": "2021-01-18T10:59:31Z",
"required_capabilities": ["Cap1", "Cap3"] // Since I've provided "Cap1", "Cap3" and "Cap4"
},
{
"source_id": "1",
"created_at": "2021-01-18T11:05:15Z" // Since it doesn't require any capability.
}
I've tried the IN operator as follows, since the stored StringSet should be IN (or Contained by) the given SS, but it didn't work.
aws dynamodb query --table-name TableName --key-condition-expression "source_id = :id" --filter-expression "required_capabilities IN (:rq)" --expression-attribute-values '{":id": {"S": "1"}, ":rq": { "SS": ["Cap1", "Cap3", "Cap4"] }}'
It works only when I provide the exact same StringSet, but If I provide a set that contains the saved one and also have more values, it doesn't return anything.
it seems your issue is around the use of the IN keyword, which does not work with sets. From the docs on conditionals
IN : Checks for matching elements in a list.
AttributeValueList can contain one or more AttributeValue elements of type String, Number, or Binary. These attributes are compared against an existing attribute of an item. If any elements of the input are equal to the item attribute, the expression evaluates to true.
I believe you want the CONTAINS keyword:
CONTAINS : Checks for a subsequence, or value in a set.
AttributeValueList can contain only one AttributeValue element of type String, Number, or Binary (not a set type). If the target attribute of the comparison is of type String, then the operator checks for a substring match. If the target attribute of the comparison is of type Binary, then the operator looks for a subsequence of the target that matches the input. If the target attribute of the comparison is a set ("SS", "NS", or "BS"), then the operator evaluates to true if it finds an exact match with any member of the set. CONTAINS is supported for lists: When evaluating "a CONTAINS b", "a" can be a list; however, "b" cannot be a set, a map, or a list.
Actually, I found out that dynamodb doesn't support the use case I needed, so I found a workaround.
Basically instead of modelling the required_capabilities as a StringSet, I've created a field called required_capability, containing a single required capability (which is ok so far for me) and using the IN operator to check.
If in the future I need to check for more than one capability, I just need to add new fields required_capability_2 and required_capability_3.
It's clearly not ideal, but I guess it's good enough, considering I won't have a lot of required capabilities in a single record, it's usually one, maybe two.
We've built a Document schema where job documents are uniquely identified by an Id property, and represent their link to a parent by a ParentJobId property.
For example:
{
"Type": "Request",
"StateName": "Success",
"id": "4365b7ec-6eee-468a-94f6-ab65d6434611",
"ParentJobId": null
},
{
"Type": "Machine",
"StateName": "ChildJobFailed",
"id": "27040208-add5-97e4-6bd2-d991de73c9b5",
"ParentJobId": "4365b7ec-6eee-468a-94f6-ab65d6434611"
},
{
"Type": "Application",
"StateName": "Error",
"id": "7ef36990-c321-81dd-a0c7-3b04fd64c86f",
"ParentJobId": "27040208-add5-97e4-6bd2-d991de73c9b5"
}
How can I query for all documents that are related to the root parent job?
There is no way in CosmosDB to do that in a single query. You could, of course recursively walk the tree with multiple round trips. You could even do it in one round trip to a stored procedure that you wrote that did multiple requests.
However, I've found that the best way to model hierarchies (trees) for fast retrieval in NoSQL databases is as an array containing a materialized path. Look at this example:
documents = [
{id: 'A', hierarchy: [1, 2, 3]},
{id: 'B', hierarchy: [1, 2, 4]},
{id: 'C', hierarchy: [5]},
{id: 'D', hierarchy: [1, 6]},
]
"A" is "in" Project 3 whose parent is Project 2, whose parent is Project 1. "B" is "in" Project 4 whose parent is Project 2 which still has Project 1 as its parent. Project 5 is another root Project like Project 1; and "D" is "in" Project 6 which is a child of project 1.
Now send in a query like this:
SELECT * FROM c WHERE ARRAY_CONTAINS(c.hierarchy, 1)
It will return documents A, B, and D. Try:
SELECT * FROM c WHERE ARRAY_CONTAINS(c.hierarchy, 2)
It will just return documents A, and B.
One word of caution though, I don't know how performant this approach is in DocumentDB which I don't think allows indexes on array fields. Maybe one of the DocumentDB product managers that monitor Stack Overflow can chime in on this.
This approach is commonly used with NoSQL databases like CouchDB and MongoDB (combining materialized path and array of ancestors) and even SQL databases supporting array types like Postgres.
I have gone through most of the JSONPath documentations out there and they all explain that the script filters such as $.items[(#.length - 1)] only applies to an array and not to a JSON object. This means that the path would work for the first JSON object below and not for the second one:
1:
{
"items": [
1,
2
]
}
2:
{
"items": {
"item1": 1,
"item2": 2
}
}
Can anyone confirm this? Also, if I am correct, is there a logical reason for this behavior? I can imagine that such a path could have been allowed to return the same value (2) in both cases.