How to do collections-not for TDE with Marklogic - collections

I am looking to how to implement the collections-not for Marklogic TDE.
The equivalent one for the CTS query is
cts:not-query(cts:collection-query("archived"))
According to TDE documentation, it only supports AND, OR collections. I am looking for NOT collections with TDE schema.

EDIT: I changed the sample from: '.. = "include"..' to '... != "exclude"...' They both work. However, for the context of the question, not-equal makes more sense in an example.
As odd as it seems, the feature that you are asking for is not available. However, there is a way to make use of the context that can help you. First of all, it is still good to set a collection or collection scope so that we minimize the sample set to analyze the context path.
The approach is to use xPath and xQuery on the context as a filter.
Below is a working sample for Query Console. Please not the ';' in the code as it is a multi-statement sample.
xquery version "1.0-ml";
let $_ := xdmp:document-insert("/llamas/Jalliue.xml", <llama><name>Jalliue</name></llama>, map:new()=>map:with("collections", ("llama", "include")))
let $_ := xdmp:document-insert("/llamas/Sven.xml", <llama><name>Sven</name></llama>, map:new()=>map:with("collections", ("llama", "exclude")))
return ();
let $docs := (fn:doc("/llamas/Jalliue.xml"), fn:doc("/llamas/Sven.xml"))
let $template :=
<template xmlns="http://marklogic.com/xdmp/tde">
<description>llama list</description>
<context>/llama[xdmp:node-collections(.) != "exclude"]</context>
<rows>
<row>
<schema-name>llama</schema-name>
<view-name>list</view-name>
<columns>
<column>
<name>name</name>
<scalar-type>string</scalar-type>
<val>name</val>
</column>
</columns>
</row>
</rows>
</template>
return tde:node-data-extract($docs, $template)
The result shows that both documents were considered, but only the one with the collection "include" is parsed.
{
"/llamas/Jalliue.xml": [
{
"row": {
"schema": "llama",
"view": "list",
"data": {
"rownum": "1",
"name": "Jalliue"
}
}
}
],
"/llamas/Sven.xml": []
}

Related

Does Boto3 DynamoDB have reserved attribute names for update_item with conditions expressions? Unexpected attribute SET behavior

I've implemented a simple object versioning scheme that allows the calling code to supply a current_version integer that that will set the ConditionExpression. I've also implemented a simple timestamping scheme to set an attribute named auto_timestamp to the current unix timestamp.
When the ConditionExpression is supplied with the object's current version integer, the update occurs, but also sets auto_timestamp to the current version value, rather than the value supplied in ExpressionAttributeValues. This only occurs if the attribute names are #a0, #a1 ... and values are :v0, :v1 ...
For example, this runs as expected without the condition, and auto_timestamp is set to 1643476414 in the table. The if_not_exists is used to start the object version at 0 if the item does not yet exist or did not previously have a auto_object_version attribute.
update_kwargs = {
"Key": {"user_id": user_id},
"UpdateExpression": 'SET #a0 = :v0, #a1 = if_not_exists(#a1, :zero) + :v1',
"ExpressionAttributeNames": {"#a0": "auto_timestamp", "#a1": "auto_object_version"},
"ExpressionAttributeValues": {":v0": 1643476414, ":v1": 1, ":zero": 0}
}
table.update_item(**update_kwargs)
However, this example runs without exception, but auto_timestamp is set to 1. This behavior continues for each subsequent increment of current_version for additional calls to update_item
from boto3.dynamodb.conditions import Attr
update_kwargs = {
"Key": {"user_id": user_id},
"UpdateExpression": 'SET #a0 = :v0, #a1 = if_not_exists(#a1, :zero) + :v1',
"ExpressionAttributeNames": {"#a0": "auto_timestamp", "#a1": "auto_object_version"},
"ExpressionAttributeValues": {":v0": 1643476414, ":v1": 1, ":zero": 0}
"ConditionExpression": Attr("auto_object_version").eq(1)
}
table.update_item(**update_kwargs)
While debugging, I changed the scheme by which I am labeling the attribute names and values to use #att instead of #a and :val instead of :v and the following works as desired and auto_timestamp is set to 1643476414:
from boto3.dynamodb.conditions import Attr
update_kwargs = {
"Key": {"user_id": user_id},
"UpdateExpression": 'SET #att0 = :val0, #att1 = if_not_exists(#att1, :zero) + :val1',
"ExpressionAttributeNames": {"#att0": "auto_timestamp", "#att1": "auto_object_version"},
"ExpressionAttributeValues": {":val0": 1643476414, ":val1": 1, ":zero": 0}
"ConditionExpression": Attr("auto_object_version").eq(1)
}
table.update_item(**update_kwargs)
I couldn't find any documentation on reserved attribute names or values that shouldn't be used for keys in ExpressionAttributeNames or ExpressionAttributeValues.
Is this behavior anyone has witnessed before? The behavior is easily worked around when switching the string formatting used to generate the keys but was very unexpected.
There are no reserved attribute or value names, and I routinely use names like :v1 and #a1 in my own tests, and they seem to work fine.
Assuming you correctly copied-pasted your code into the question, it seems to me you simply have a syntax error in your code - you are missing a double-quote after the "auto_timestamp. What I don't understand, though, is how this compiles or why changing a to att changed anything. Please be more careful in pasting a self-contained code snippet that works or doesn't work.

DynamoDB transactional insert with multiple conditions (PK/SK attribute_not_exists and SK attribute_exists)

I have a table with PK (String) and SK (Integer) - e.g.
PK_id SK_version Data
-------------------------------------------------------
c3d4cfc8-8985-4e5... 1 First version
c3d4cfc8-8985-4e5... 2 Second version
I can do a conditional insert to ensure we don't overwrite the PK/SK pair using ConditionalExpression (in the GoLang SDK):
putWriteItem := dynamodb.Put{
TableName: "example_table",
Item: itemMap,
ConditionExpression: aws.String("attribute_not_exists(PK_id) AND attribute_not_exists(SK_version)"),
}
However I would also like to ensure that the SK_version is always consecutive but don't know how to write the expression. In pseudo-code this is:
putWriteItem := dynamodb.Put{
TableName: "example_table",
Item: itemMap,
ConditionExpression: aws.String("attribute_not_exists(PK_id) AND attribute_not_exists(SK_version) **AND attribute_exists(SK_version = :SK_prev_version)**"),
}
Can someone advise how I can write this?
in SQL I'd do something like:
INSERT INTO example_table (PK_id, SK_version, Data)
SELECT {pk}, {sk}, {data}
WHERE NOT EXISTS (
SELECT 1
FROM example_table
WHERE PK_id = {pk}
AND SK_version = {sk}
)
AND EXISTS (
SELECT 1
FROM example_table
WHERE PK_id = {pk}
AND SK_version = {sk} - 1
)
Thanks
A conditional check is applied to a single item. It cannot be spanned across multiple items. In other words, you simply need multiple conditional checks. DynamoDb has transactWriteItems API which performs multiple conditional checks, along with writes/deletes. The code below is in nodejs.
const previousVersionCheck = {
TableName: 'example_table',
Key: {
PK_id: 'prev_pk_id',
SK_version: 'prev_sk_version'
},
ConditionExpression: 'attribute_exists(PK_id)'
}
const newVersionPut = {
TableName: 'example_table',
Item: {
// your item data
},
ConditionExpression: 'attribute_not_exists(PK_id)'
}
await documentClient.transactWrite({
TransactItems: [
{ ConditionCheck: previousVersionCheck },
{ Put: newVersionPut }
]
}).promise()
The transaction has 2 operations: one is a validation against the previous version, and the other is an conditional write. Any of their conditional checks fails, the transaction fails.
You are hitting your head on some of the differences between a SQL and a no-SQL database. DynamoDB is, of course, a no-SQL database. It does not, out of the box, support optimistic locking. I see two straight forward options:
Use a software layer to give you locking on your DynamoDB table. This may or may not be feasible depending on how often updates are made to your table. How fast 'versions' are generated and the maximum time your application can be gated on the lock will likely tell you if this can work foryou. I am not familiar with Go, but the Java API supports this. Again, this isn't a built-in feature of DynamoDB. If there is no such Go API equivalent, you could use the technique described in the link to 'lock' the table for updates. Generally speaking, locking a no-SQL DB isn't a typical pattern as it isn't exactly what it was created to do (part of which is achieving large scale on unstructured documents to allow fast access to many consumers at once)
Stop using an incrementor to guarantee uniqueness. Typically, incrementors are frowned upon in DynamoDB, in part due to the lack of intrinsic support for it and in part because of how DynamoDB shards you don't want a lot of similarity between records. Using a UUID will solve the uniqueness problem, but if you are porting an existing application that means more changes to the elements that create that ID and updates to reading the ID (perhaps to include a creation-time field so you can tell which is the newest, or the prepending or appending of an epoch time to the UUID to do the same). Here is a pertinent link to a SO question explaining on why to use UUIDs instead of incrementing integers.
Based on Hung Tran's answer, here is a Go example:
checkItem := dynamodb.TransactWriteItem{
ConditionCheck: &dynamodb.ConditionCheck{
TableName: "example_table",
ConditionExpression: aws.String("attribute_exists(pk_id) AND attribute_exists(version)"),
Key: map[string]*dynamodb.AttributeValue{"pk_id": {S: id}, "version": {N: prevVer}},
},
}
putItem := dynamodb.TransactWriteItem{
Put: &dynamodb.Put{
TableName: "example_table",
ConditionExpression: aws.String("attribute_not_exists(pk_id) AND attribute_not_exists(version)"),
Item: data,
},
}
writeItems := []*dynamodb.TransactWriteItem{&checkItem, &putItem}
_, _ = db.TransactWriteItems(&dynamodb.TransactWriteItemsInput{TransactItems: writeItems})

CosmosDB SQL query syntax for if statement

I'm trying to find the correct syntax for doing an If/Case type of statement in an Azure ComsmosDB SQL query. Here is the document that I have
{
"CurrentStage": "Stage2",
"Stage1": {
"Title": "Stage 1"
},
"Stage2": {
"Title": "Stage 2"
},
"Stage3": {
"Title": "Stage 3"
}
}
What I want to do is create a query that looks something like
Select c.CurrentStage,
if (CurrentStage == 'Stage1') { c.Stage1.Title }
else if (CurrentStage == 'Stage2') { c.Stage2.Title }
else if (CurrentStage == 'Stage3') { c.Stage3.Title } as Title
From c
Obviously the document and query that I have is a lot more complicated then this, but this gives you the general idea of what I'm trying to do. I have 1 of the fields in the select to be variable based on some other fields in the document.
While udf suggested by Jay Gong may be more comfortable to use if you need to reuse this function a lot, you can do this without udf using ternary operator syntax.
For example:
select
c.CurrentStage = 'stage1' ? c.Stage1.Title
: c.CurrentStage = 'stage2' ? c.Stage2.Title
: c.CurrentStage = 'stage3' ? c.Stage3.Title
: 'your default value should you wish one'
as title
from c
Advice: Provider SQL solution has the benefit over UDF that it is self-contained and does not require setting up the logic on the server before executing. Also, note that logic versioning is simpler if logic is stored in client apps entirely, not shared across client and server as in the UDF case. UDF does have it's uses (ex:heavy reuse across queries), but usually it's better to do without.
I suggest you using User Defined Function in Cosmos DB.
udf code:
function stage(c){
switch(c.CurrentStage){
case "Stage1" : return c.Stage1.Title;
case "Stage2" : return c.Stage2.Title;
case "Stage3" : return c.Stage3.Title;
default: return "";
}
}
SQL :
Select c.CurrentStage,
udf.stage(c) as Title
From c
Output result:
Hope it helps you.

How to handle inner Json when using JsonOutputter

I'm converting some csv files into Json using the JsonOutputter. In the csv files I have a field containing Json like this (pipe character is delimiter):
...|{ "type":"Point", "coordinates":[ 18.7726, 74.5091 ] }|...
When it's output to Json, the result looks like this:
"Location": "{ \"type\":\"Point\", \"coordinates\":[ 18.7726, 74.5091 ] }"
I would like to get rid of the outer quotes to make the Json look like this:
"Location": { "type":"Point", "coordinates":[ 18.7726, 74.5091 ] }
What is the best way to accomplish this? The output Json will be stored in Cosmos DB, so I guess the "cleaning up" of the Json could be done either in U-SQL or in Cosmos DB?
The sample outputter is only generating flat JSON. Since we do not have a JSON datatype, any string value has to be escaped to be a string value.
You can write your own custom Outputter that for example takes SqlMap instances for nested values and output them as nested JSON, or - if you know that some strings in the rowsets are really JSON and not just strings, serialize them without the quotes.
If JsonOutputter is not the only choice to that
,we could covert csv file to Json with our custom code.
I test it with following csv file.
number|Location
1|{ "type":"Point", "coordinates":[ 13.7726, 73.5091 ] }
2|{ "type":"Point", "coordinates":[ 14.7726, 74.5091 ] }
Please have a try to use the following code, it works correctly on my side.
var lines = File.ReadAllText(#"C:\Tom\tomtest.csv").Replace("\r", "").Split('\n');
var csv = lines.Select(l => l.Split('|')).ToList();
var headers = csv[0];
var dicts = csv.Skip(1).Select(row => headers.Zip(row, Tuple.Create).ToDictionary(p => p.Item1, p => p.Item2)).ToArray().Select(x=>new
{
number = x["number"],
location = JObject.Parse(x["Location"])
});
string json = JsonConvert.SerializeObject(dicts);
Console.WriteLine(json);
Test result:

How do I prevent xdmp:node-delete() from adding whitespace in my xml doc

I am trying to MOVE a node from one xml document to another. Both documents are using the same namespace. I am trying to accomplish this by doing xdmp:node-insert-child() on the first document then xdmp:node-delete() on the second document in a sequence. The problem is that the xdmp:node-delete() is leaving spaces and returns in my xml doc. How can I keep this from happening?
Here is a code example...
let $documentId := 12345
let $newStatus := 123
let $processNode := $PROCESS-DOC//pex:process[(#documentId = $documentId)]
let $newNode :=
element { QName($TNS, 'process') } {
attribute status { $newStatus },
attribute documentId { $processNode/#documentId },
}
return
if ($processNode and $newNode) then
(xdmp:node-insert-child($PROCESS-COMPLETE-DOC/pex:processes, $newNode),xdmp:node-delete($processNode))
else ()
It sounds like the whitespace is held in text nodes on either side of the node you are deleting. You could verify this by inspecting xdmp:describe($processNode/preceding-sibling::text()) and xdmp:describe($processNode/following-sibling::text()). And if you like, you could xdmp:node-delete some or all of those text nodes too.

Resources