How to split into Sub Documents with Nesting separator? - azure-cosmosdb

I am migrating data from SQL to COSMOS DB through Azure cosmos DB Data Migration Tool
Can someone please help help to migrate data in Sub-Documents, how to specify in nesting separator
SELECT TOP 5 P.ProjectDocumentId, P.ProjectId, PU.UpdatedByFullName
FROM [Docs].[ProjectDocuments] P
INNER JOIN [Docs].[ProjectDocumentUpdate] PU ON P.ProjectDocumentID = PU.ProjectDocumentID
WHERE P.ProjectDocumentId = '7DA0011B-7105-4B6C-AF13-12B5AC50B608'
Result:
Expected Document in Cosmos DB:
{
"ProjectDocumentId": "7da0011b-7105-4b6c-af13-12b5ac50b608",
"ProjectId": "ed1e0e47-ff1c-47be-b5e9-c235aef76161",
"ProjectDocumentUpdate": {
"UpdatedByFullName" : "Unnati"
}, {
"UpdatedByFullName" : "Eugene"
},
{
"UpdatedByFullName" : "Meghana"
}
}

According to your description,your need is not simple producing nested JSON data into cosmos db.You wanna producing JSON contains Json array group by some columns.Something like merge UpdatedByFullName by same ProjectDocumentId and ProjectId.
Based on my test and some researches on the Migration Tool document,it seems that Import data from SQL SERVER feature can't handle with producing json array group by some columns.
So,i figure out a workaround which is lead by this case:SQL to JSON - Grouping Results into JSON Array and this doc.My sample data as below:
SQL:
select s1.name,
'ageArr' = (
SELECT
age AS 'age'
FROM
dbo.student as s2
where s2.name = s1.name
FOR JSON PATH)
from dbo.student as s1
group by s1.name
FOR JSON Path;
Output as below:
Then you have such json output so that you can import it into cosmos db simply.

Related

PartiQL BatchExecuteStatementCommand in DynamoDB SDK v3

I'm attempting to send a batch of PartiQL statements in the NodeJS AWS SDK v3. The statement works fine for a single ExecuteStatementCommand, but the Batch command doesn't.
The statement looks like
const statement = `
SELECT *
FROM "my-table"
WHERE "partitionKey" = '1234'
AND "filterKey" = '5678'
`
This code snippet works as expected:
const result = await dynamodbClient.send(new ExecuteStatementCommand(
{ Statement: statement}
))
The batch snippet does not:
const result = await dynamodbClient.send(new BatchExecuteStatementCommand({
Statements: [
{
Statement: statement
}
]
}))
The batch call produces the following error:
"Code": "ValidationError",
"Message": "Select statements within BatchExecuteStatement must specify the primary key in the where clause."
Any insight is greatly appreciated. Thanks for taking the time to reading my question!
Seems like what I needed was a rubber duck.
DynamoDB primary keys consists of partition key + sort key. My particular table has a sort key, which is missing from the statement. Batch jobs cannot handle filtering of responses, and each statement must match a single item in the database.

Data ingestion issue with KQL update policy ; Query schema does not match table schema

I'm writing a function which takes in raw data table (contains multijson telemetry data) and reformat it to a multiple cols. I use .set MyTable <| myfunction|limit 0 to create my target table based off of the function and use update policy to alert my target table.
Here is the code :
.set-or-append MyTargetTable <|
myfunction
| limit 0
.alter table MyTargetTable policy update
#'[{ "IsEnabled": true, "Source": "raw", "Query": "myfunction()", "IsTransactional": false, "PropagateIngestionProperties": false}]'
But I'm getting ingestion failures: Here is the ingestion failure message :
Failed to invoke update policy. Target Table = 'MyTargetTable', Query = '
let raw = __table("raw", 'All', 'AllButRowStore')
| where extent_id() in (guid(659e3b3c-6859-426d-9c37-003623834455));
myfunction()': Query schema does not match table schema
I double check the query schema and target table; they are the same . I'm not sure what this error means.
Also, I ran count on both the raw and mytarget tables; there are relatively large discrepancies (400 rows for My target and 2000 rows in raw table).
Any advise will be appreciated.
Generally speaking - to find the root of the mismatch between schemas, you can run something along the following lines, and filter for differences:
myfunction
| getschema
| join kind=leftouter (
table('MyTargetTable')
| getschema
) on ColumnOrdinal, ColumnType
In addition - you should make sure the output schema of the function you use in your update policy is 'stable', i.e. isn't affected by the input data
The output schema of some query plugins such as pivot() and bag_unpack() depends on the input data, and therefore it isn't recommended to use those in update policies.

Use Rs mongolite to correctly (insert? update?) add data to existing collection

I have the following function written in R that (I think) is doing a poor job of updating my mongo databases collections.
library(mongolite)
con <- mongolite::mongo(collection = "mongo_collection_1", db = 'mydb', url = 'myurl')
myRdataframe1 <- con$find(query = '{}', fields = '{}')
rm(con)
con <- mongolite::mongo(collection = "mongo_collection_2", db = 'mydb', url = 'myurl')
myRdataframe2 <- con$find(query = '{}', fields = '{}')
rm(con)
... code to update my dataframes (rbind additional rows onto each of them) ...
# write dataframes to database
write.dfs.to.mongodb.collections <- function() {
collections <- c("mongo_collection_1", "mongo_collection_2")
my.dataframes <- c("myRdataframe1", "myRdataframe2")
# loop dataframes, write colllections
for(i in 1:length(collections)) {
# connect and add data to this table
con <- mongo(collection = collections[i], db = 'mydb', url = 'myurl')
con$remove('{}')
con$insert(get(my.dataframes[i]))
con$count()
rm(con)
}
}
write.dfs.to.mongodb.collections()
My dataframes myRdataframe1 and myRdataframe2 are very large dataframes, currently ~100K rows and ~50 columns. Each time my script runs, it:
uses con$find('{}') to pull the mongodb collection into R, saved as a dataframe myRdataframe1
scrapes new data from a data provider that gets appended as new rows to myRdataframe1
uses con$remove() and con$insert to fully remove the data in the mongodb collection, and then re-insert the entire myRdataframe1
This last bullet point is iffy, because I run this R script daily in a cronjob and I don't like that each time I am entirely wiping the mongo db collection and re-inserting the R dataframe to the collection.
If I remove the con$remove() line, I receive an error that states I have duplicate _id keys. It appears I cannot simply append using con$insert().
Any thoughts on this are greatly appreciated!
When you attempt to insert documents into MongoDB that already exist in the database as per their primary key you will get the duplicate key exception. In order to work around that you can simply unset the _id column using something like this before the con$insert:
my.dataframes[i]$_id <- NULL
This way, the newly inserted document will automatically get a new _id assigned.
you can use upsert ( which matches document with the first condition if found it will update it, if not it will insert a new one,
first you need to separate id from each doc
_id= my.dataframes[i]$_id
updateData = my.dataframes[i]
updateData$_id <- NULL
then use upsert ( there might be some easier way to concatenate strings in R)
con$update(paste('{"_id":"', _id, '"}' ,sep="" ) , paste('{"$set":', updateData,'}', sep=""), upsert = TRUE)

How to remove collection or edge document using for loop in ArangoDB?

I'm using the latest ArangoDB 3.1 on Windows 10.
Here I want to remove the collection document and edge document using the for loop. But I'm getting an error like document not found (vName).
vName contains the many collection names. But I dunno how to use it in for loop.
This is the AQL I am using to remove the documents from the graph:
LET op = (FOR v, e IN 1..1 ANY 'User/588751454' GRAPH 'my_graph'
COLLECT vid = v._id, eid = e._id
RETURN { vid, eid }
)
FOR doc IN op
COLLECT vName = SPLIT(doc.vid,'/')[0],
vid = SPLIT(doc.vid,'/')[1],
eName = SPLIT(doc.eid,'/')[0],
eid = SPLIT(doc.eid,'/')[1]
REMOVE { _key: vid } in vName
Return output im getting from the AQL (Web UI screenshot)
vName is a variable introduced by COLLECT. It is a string with the collection name of a vertex (extracted from vid / v._id). You then try to use it in the removal operation REMOVE { ... } IN vName.
AQL does not support dynamic collection names however, collection names must be known at query compile time:
Each REMOVE operation is restricted to a single collection, and the collection name must not be dynamic.
Source: https://docs.arangodb.com/3.2/AQL/Operations/Remove.html
So, you either have to hardcode the collection into the query, e.g. REMOVE { ... } IN User, or use the special bind parameter syntax for collections, e.g. REMOVE { ... } IN ##coll and bind parameters: {"#coll": "User", ...}.
This also means that REMOVE can only delete documents in a single collection.
It's possible to workaround the limitation somewhat by using subqueries like this:
LET x1 = (FOR doc IN User REMOVE aa IN User)
LET x2 = (FOR doc IN relations REMOVE bb IN relations)
RETURN 1
The variables x1 and x2 are syntactically required and receive an empty array as subquery result. The query also requires a RETURN statement, even if we don't expect any result.
Do not attempt to remove from the same collection twice in the same query though, as it would raise a access after data-modification error.

Query DynamoDB with a hash key and a range key with Boto3

I am having trouble using AWS Boto3 to query DynamoDB with a hash key and a range key at the same time using the recommend KeyConditionExpression. I have attached an example query:
import boto3
from boto3 import dynamodb
from boto3.session import Session
dynamodb_session = Session(aws_access_key_id=AWS_KEY,
aws_secret_access_key=AWS_PASS,
region_name=DYNAMODB_REGION)
dynamodb = dynamodb_session.resource('dynamodb')
table=dynamodb.Table(TABLE_NAME)
request = {
'ExpressionAttributeNames': {
'#n0': 'hash_key',
'#n1': 'range_key'
},
'ExpressionAttributeValues': {
':v0': {'S': MY_HASH_KEY},
':v1': {'N': GT_RANGE_KEY}
},
'KeyConditionExpression': '(#n0 = :v0) AND (#n1 > :v1)',
'TableName': TABLE_NAME
}
response = table.query(**request)
When I run this against a table with the following scheme:
Table Name: TABLE_NAME
Primary Hash Key: hash_key (String)
Primary Range Key: range_key (Number)
I get the following error and I cannot understand why:
ClientError: An error occurred (ValidationException) when calling the Query operation: Invalid KeyConditionExpression: Incorrect operand type for operator or function; operator or function: >, operand type: M
From my understanding the type M would be a map or dictionary type and I am using a type N which is a number type and matches my table scheme for the range key. If someone could explain why this error is happening or I am also open to a different way of accomplishing the same query even if you cannot explain why this error exists.
The Boto 3 SDK constructs a Condition Expression for you when you use the Key and Attr functions imported from boto3.dynamodb.conditions:
response = table.query(
KeyConditionExpression=Key('hash_key').eq(hash_value) & Key('range_key').eq(range_key_value)
)
Reference: Step 4: Query and Scan the Data
Hope it helps
Adding this solution as the accepted answer did not address why the query used did not work.
TLDR: Using query on a Table resource in boto3 has subtle differences as opposed to using client.query(...) and requires a different syntax.
The syntax is valid for a query on a client, but not on a Table. The ExpressionAttributeValues on a table do not require you to specify the data type. Also if you are executing a query on a Table resource you do not have to specify the TableName again.
Working solution:
from boto3.session import Session
dynamodb_session = Session(aws_access_key_id=AWS_KEY,aws_secret_access_key=AWS_PASS,region_name=DYNAMODB_REGION)
dynamodb = dynamodb_session.resource('dynamodb')
table = dynamodb.Table(TABLE_NAME)
request = {
'ExpressionAttributeNames': {
'#n0': 'hash_key',
'#n1': 'range_key'
},
'ExpressionAttributeValues': {
':v0': MY_HASH_KEY,
':v1': GT_RANGE_KEY
},
'KeyConditionExpression': '(#n0 = :v0) AND (#n1 > :v1)',
}
response = table.query(**request)
I am the author of a package called botoful which might be useful to avoid dealing with these complexities. The code using botoful will be as follows:
import boto3
from botoful import Query
client = boto3.Session(
aws_access_key_id=AWS_KEY,
aws_secret_access_key=AWS_PASS,
region_name=DYNAMODB_REGION
).client('dynamodb')
results = (
Query(TABLE_NAME)
.key(hash_key=MY_HASH_KEY, range_key__gt=GT_RANGE_KEY)
.execute(client)
)
print(results.items)

Resources