Using java to stream index into elastic-search from dynamodb using trigger - amazon-dynamodb

I am a novice on dynamodb and elasticsearch.
Need to stream indexes into elastic search from dynamodb table by trigger using Java i.e. whenever a new record is inserted in dynamodb table the same has to updated in the elastic-search.
Most of the examples available in web are either incomplete or implemented in python/nodejs. If there can be any explanation on how to achieve this in Java or any links/reference articles are also welcome.

Related

Does aws appsync have scan operations to scan dynamoDB

I am building a serveless web app with aws amplify - graphql - dynamodb. I want to know what exactly a scan operation is in this context. For example, I have an User table and queries listUsers and getUser were generated from amplify schema. Are they scan operations or queries?
Thank you for your answers in advance as I could only find the definition of a scan operation but there aren't example for me to identify one when it comes to graphql.
Amplify uses Filter Expressions which are a type of Query.
You can see this yourself by looking at the .vtl files that amplify generates and uploads to appsync.
They are located here: amplify/#current-cloud-backend/api/[API NAME]/build/resolvers
In that folder you can open up one of the Query.list[Model].req.vtl. Even if you are not familiar with Velocity Template Language you can still get the idea. You can see that it uses the expression $util.transform.toDynamoDBFilterExpression.
More info about that util and then looking at the docs for toDynamoDBFilterExpression.

Bulk delete support in Cosmos DB using .NET SDK

Based on documentation related to the cosmos db bulk executer(https://learn.microsoft.com/en-us/azure/cosmos-db/bulk-executor-dot-net), there is support for a bulk delete via the bulk executer.
However, the examples under the new bulk support within the .NET SDK (https://devblogs.microsoft.com/cosmosdb/introducing-bulk-support-in-the-net-sdk/) does not explicitly state anything about deletion
I wanted to understand if there were any drawbacks to attempting a delete on several documents using the new bulk execution support (here: https://devblogs.microsoft.com/cosmosdb/introducing-bulk-support-in-the-net-sdk/), or if it is okay to proceed with using a similar pattern as the "Create" flow described in the sample.
When Bulk mode is enabled, any point operation (ReadItem, CreateItem, UpsertItem, DeleteItem, ReplaceItem) will benefit from it, just follow the same pattern of the concurrent Tasks but instead of CreateItem, DeleteItem (or you could even mix different operation types).

Can Cosmos DB read data from File Blob or Csv or Json file at a batch size?

I am currently researching around reading data using cosmos db, basically our current approach is using a .Net Core C# application with Cosmos DB SDK to read entire data from a file blob or csv or json file, and then use the for loop, one by one pulling its information from cosmos db and compare/insert/update, this somehow feels inefficient.
We're curious if cosmos DB could perform the ability to read a bunch of data (let's say a batch size of 5000 records) from file blob or csv or json file and similar like SQL server, do a bulk insert or merge statement within the cosmos DB directly? Basically the goal is not doing same operation one by one for each item interacting with cosmos DB.
I've noticed and researched in BulkExecutor as well, the BulkUpdate looks like a more straightforward way of directly updating an item without considering if it should update. In my case for example, if I have 1000 items, only 300 items' properties got changed, so I'll just need to update those 300 items without updating the irrelevant remaining 700 items as well. Basically I need to find out a way to have Cosmos DB do the data compare as in a collection, not inside a loop and focus on each single item, it could either perform a update or output a collection that I can use for later updating as well.
Would the (.Net + SDK) application be able to perform that or would a cosmos DB stored procedure could handle similar job? Any other Azure tool is welcome as well!
What you are looking for is the Cosmos DB Bulk Executor library
It is designed to operate using millions of records in bulk and it is very efficient.
You can find the .NET documentation here

How to delete all data in a partition?

I have a CosmosDB collection with a number of different partitions. I want to delete all of the data in one of the partitions so I tried to run the command:
db.myCollection.deleteAll({PartitionKey: 'pop-9q'})
Where PartitionKey is the field that I partition/shard based on. But when I execute this it returns the not very helpful message:
ERROR: An Error has occurred
Why would I be getting this message and how can I either get more details on the cause or find a resolution?
Currently, at this time, you are unable to perform a bulk delete. Please Up Vote and Comment on this functionality: Add the ability to delete ALL data in a partition
Additionally, which API are you consuming? For Gremlin API you could execute something like the following: g.V().drop()
The Microsoft.Azure.Cosmos SDK has added this ability - currently only available as a preview feature (which requires you to opt-in via the portal)
See here for more details:
https://learn.microsoft.com/en-us/azure/cosmos-db/nosql/how-to-delete-by-partition-key?tabs=dotnet-example
Sample code included there:
// Get reference to the container
var container = cosmosClient.GetContainer("DatabaseName", "ContainerName");
// Delete by logical partition key
ResponseMessage deleteResponse = await container.DeleteAllItemsByPartitionKeyStreamAsync(new PartitionKey("Contoso"));
if (deleteResponse.IsSuccessStatusCode) {
Console.WriteLine($"Delete all documents with partition key operation has successfully started");
}
As #Mike said, a "delete all data" feature is not supported yet in Cosmos db SQL API and Mongo API. I notice that you have already added comments in above link. I just provide you with a workaround here that using bulk delete stored procedure for Cosmos db SQL API.
(sample code: https://gist.github.com/deepumi/2a23c5380202bddf0b85e83baf5833be)
For Mongo API, unfortunately, even stored procedure is not supported. You could create an Azure HTTP Trigger Function to execute bulk delete code in the function whenever you want or merge it into your program code.

DocumentDB Data Migration Tool - transformDocument procedure with partitions

I am trying to convert data from SQL Server to DocumentDb. I need to create embedded arrays in the DocumentDb document.
I am using the DocumentDb Data Migration Tool and it describes using the transformDocument for a bulk insert stored proc...unfortunately we are using partitioned collections and they do not support bulk insert.
Am I missing something or is this not currently supported?
The migration tool only supports sequential data import to a partitioned collection. Please follow sample below to bulk-import data efficiently into a partition collection.
https://github.com/Azure/azure-documentdb-dotnet/tree/master/samples/documentdb-benchmark

Resources