We got multiple odd scenarios.
For example:
a) We are unable to order by _ts : empty results
SELECT * FROM data ORDER BY data._ts DESC
b) We can ORDER BY ASC and we get Results(more than >100). But if we ORDER BY DESC we get Zero results, has no sense for us :( ,
Assuming that c is an integer, this is the behavior we are seeing:
SELECT * FROM data ORDER BY data.c ASC = RESULTS
SELECT * FROM data ORDER BY data.c DESC = zero results
c) We have an UDF to do contains insesitive, but is not working for all cases, JS function its tested outside and IT is working, we don't understand
SELECT * FROM data r where udf.TEST(r.c, "AS") = RESULTS
SELECT * FROM data r where udf.TEST(r.c, "health") = zero results (but by other field I can find tha value)
Thanks a lot!
jamesjara and I synced offline... posting our discussion here for everyone else's benefit :)
1) Query response limits and continuation tokens
There are limits for how long a query will execute on DocumentDB. These limits include the query's resource consumption (you can ballpark this w/ the amount of provisioned RU/sec * 5 sec + an undisclosed buffer), response size (1mb), and timeout (5 sec).
If these limits are hit, then a partial set of results may be returned. The work done by the query execution is preserved by passing the state back in the form of a continuation token (x-ms-continuation in the HTTP response header). You can resume the execution of the query by passing the continuation token in a follow-up query. The Client SDKs make this interaction easier by automatically paging through results via toList() or toArray() (depending on the SDK flavor).
It's possible to get an empty page in the result. This can happen when the resource consumption limit is reached before the query engine finds the first result (e.g. when scanning through a collection to look for few documents in a large dataset).
2) ORDER BY and Index Policies
In order to use ORDER BY or range comparisons (<, >, etc) within your queries, you should specify an index policy that contains a Range index with the maximum precision (precision = -1) over the JSON properties used to sort with. This allows the query engine to take advantage of an index.
Otherwise, you can force a scan by specifying the x-ms-documentdb-query-enable-scan HTTP request header w/ the value set to true. In the client SDKs, this is exposed via the FeedOptions object.
Suggested Index Policy for ORDER BY:
{
"indexingMode": "consistent",
"automatic": true,
"includedPaths": [
{
"path": "/*",
"indexes": [
{
"kind": "Range",
"dataType": "Number",
"precision": -1
},
{
"kind": "Range",
"dataType": "String",
"precision": -1
}
]
},
{
"path": "/_ts/?",
"indexes": [
{
"kind": "Range",
"dataType": "Number",
"precision": -1
}
]
}
],
"excludedPaths": []
}
3) UDFs and indexing
UDFs are not able to take advantage of indexes, and will result in a scan. Therefore, it is advised to include additional filters in your queries WHERE clause to reduce the amount of documents to be scanned.
Related
I want to ensure that my users only set one of two fields:
rotations:
- type: weekly
time_restrictions:
# Allow only ONE of the following fields:
weekday_time_of_day: {...}
time_of_day: [...]
I came across the OneOf pattern on Cuetorials, but this does only seem to help when wanting to enforce a schema while writing cue files.
#OneOfTimeRestrictions: {time_of_day: [...string]} | { weekday_time_of_day: {...string} }
rotations: [{
type: *"weekly" | "daily"
restrictions: #oneOf_timerestrictions | {} # won't work, naturally, because nothing is "chosen"
}]
(the the values of the mutually exclusive fields are actually additional, more complex structs, not strings, in case that might matter - but for the sake of a shorter example I've omitted them).
However, I'm trying to vet YAML instead.
The problem is that when defining this:
#OneOfTimeRestrictions:
rotations: [{
type: *"weekly" | "daily"
restrictions: {time_of_day: [...string]} | { weekday_time_of_day: {...string} }
}]
Both fields are acceptable, including when giving them at the same time.
Pointers?
I am doing a scan on a DynamoDb table. The strange thing is that it returns zero items on the first call, and then more than zero items when I provide the last evaluated key to the ExclusiveStartKey parameter.
let lastEvaluatedKey;
do {
const { Items, LastEvaluatedKey } = await documentClient.scan({
TableName: "myTable",
FilterExpression: "begins_with(pk, :prefix)",
ExpressionAttributeValues: {
":prefix": "something#",
},
ExpressionAttributeNames: {
"#type": "type",
},
ProjectionExpression: "pk",
Limit: 10,
ExclusiveStartKey: lastEvaluatedKey,
});
console.log(`Scanned and found ${Items?.length ?? 0} items`);
lastEvaluatedKey = LastEvaluatedKey;
} while (lastEvaluatedKey);
Output from above is
Scanned and found 2 items.
I am 100% certain that no rows were inserted between the calls. The pattern is consistent but with different numbers of calls with 0 items. I would have expected the 2 items to be returned on the first call. What is going on?
You have some number of partitions. Some partitions are empty or don’t have items that match your filter. Your scan happens to start with an empty/unmatching partition. Single calls don’t cross partition boundaries, so your first retrieval reads from the empty/unmatching partition then your next call reads from the next partition.
I am learning Firebase after many years of using SQL RDBMSs. This is definitely a challenge.
Say, I have a collection of objects. Each object can belong to any number of categories. Categories have user-editable labels (e.g. user may rename the label after the fact.
SQL RDBMS
So, in RDBMS I would have:
Object table -> { object_id, ... }
Category table -> { category_id, label, ... }
ObjectCategory -> { object_id, category_id }
I see the following options to implement this in Firebase:
1. Objects collection with category label arrays in objects:
/user/objects -> [{ object_id, categories: [ 'category_label1', 'category_label2' ] }, ... ]
Seems yucky. Renaming/deleting a category will mean updating all the objects.
2. Objects referring categories by id
/user/objects -> [{ object_id, categories: [ 'category_id1', 'category_id2' ] }, ... ]
/user/categories -> [{category_id, label, is_deleted: false}, ...]
This seems more reasonable and maintainable. Except sometimes (I think pretty rarely) there will be 2 queries.
3. Collection of object and object categories
/user/objects -> [{object_id1, ...}, {object_id2, ...}]
/user/object_id1/labels -> [{categorylabel1}, {categorylabel2}]
This is largely comparable to option 1 but requires less churn on object documents and makes updates smaller. Renaming/deleting a category becomes a pain.
So, what is the recommended approach?
Lets say I have this schema:
source_id -> String, HashKey
created_at -> String, RangeKey
required_capabilities -> StringSet
required_capabilities is a Set of Strings that we need to provide in the query in order to be able to retrieve a particular element.
For example:
If I have this three elements:
{
"source_id": "1",
"created_at": "2021-01-18T10:53:25Z",
"required_capabilities": ["Cap1", "Cap2", "Cap3"]
},
{
"source_id": "1",
"created_at": "2021-01-18T10:59:31Z",
"required_capabilities": ["Cap1", "Cap3"]
},
{
"source_id": "1",
"created_at": "2021-01-18T11:05:15Z"
}
I want to create a query, filtering for example source_id = "1" and providing a FilterExpression with the required_capabilities = ["Cap1", "Cap3", "Cap4"].
And I would expect as a result:
{
"source_id": "1",
"created_at": "2021-01-18T10:59:31Z",
"required_capabilities": ["Cap1", "Cap3"] // Since I've provided "Cap1", "Cap3" and "Cap4"
},
{
"source_id": "1",
"created_at": "2021-01-18T11:05:15Z" // Since it doesn't require any capability.
}
I've tried the IN operator as follows, since the stored StringSet should be IN (or Contained by) the given SS, but it didn't work.
aws dynamodb query --table-name TableName --key-condition-expression "source_id = :id" --filter-expression "required_capabilities IN (:rq)" --expression-attribute-values '{":id": {"S": "1"}, ":rq": { "SS": ["Cap1", "Cap3", "Cap4"] }}'
It works only when I provide the exact same StringSet, but If I provide a set that contains the saved one and also have more values, it doesn't return anything.
it seems your issue is around the use of the IN keyword, which does not work with sets. From the docs on conditionals
IN : Checks for matching elements in a list.
AttributeValueList can contain one or more AttributeValue elements of type String, Number, or Binary. These attributes are compared against an existing attribute of an item. If any elements of the input are equal to the item attribute, the expression evaluates to true.
I believe you want the CONTAINS keyword:
CONTAINS : Checks for a subsequence, or value in a set.
AttributeValueList can contain only one AttributeValue element of type String, Number, or Binary (not a set type). If the target attribute of the comparison is of type String, then the operator checks for a substring match. If the target attribute of the comparison is of type Binary, then the operator looks for a subsequence of the target that matches the input. If the target attribute of the comparison is a set ("SS", "NS", or "BS"), then the operator evaluates to true if it finds an exact match with any member of the set. CONTAINS is supported for lists: When evaluating "a CONTAINS b", "a" can be a list; however, "b" cannot be a set, a map, or a list.
Actually, I found out that dynamodb doesn't support the use case I needed, so I found a workaround.
Basically instead of modelling the required_capabilities as a StringSet, I've created a field called required_capability, containing a single required capability (which is ok so far for me) and using the IN operator to check.
If in the future I need to check for more than one capability, I just need to add new fields required_capability_2 and required_capability_3.
It's clearly not ideal, but I guess it's good enough, considering I won't have a lot of required capabilities in a single record, it's usually one, maybe two.
So I want to scan the table and filter the result by the sum of values stored in 2 fields.
Currently my js code looks like this
var params = {
TableName : "core-atv-quota-table-dev",
FilterExpression: "#info.#branch = :branchId AND {{#info.#points / #info.#location}} > :score",
ExpressionAttributeNames:{
"#info": "matching_info",
"#branch": "branch",
"#points": "points",
"#location": "location"
},
ExpressionAttributeValues: {
":branchId": "3",
":score": "8"
},
ReturnConsumedCapacity: 'TOTAL'
};
docClient.scan(params, function(err, data) {
if (err) console.error(err); // an error occurred
else console.log(data); // successful response
});
As you can see, {{#points/#location}} would not work. In this particular case, I can probably compute and store the value on insert and compare to that but in my use case I don't know which fields from matching_info will be used to filter the results.
I am looking for a way to get dynamodb to grab values from an item, process them (addition, subtraction etc) and use the processed info to match to a value given to filter.
Unfortunately, this isn't possible. FilterExpression doesn't allow for arithmetic operations.
The syntax for a filter expression is identical to that of a condition
expression. Filter expressions can use the same comparators,
functions, and logical operators as a condition expression.
FilterExpression Docs
Comparison Operator and Function Reference