Clear DocumentDB Collection before inserting documents - azure-cosmosdb

I need to know how to clear documentdb collection before inserting new documents. I am using datafactory pipeline activity to fecth data from on-prem sql server and insert into documentdb collection. The frequency is set to every 2 hrs. So when the next cycle runs, I want to first clear the exisitng data in documentdb collection. How do I do that?

The easiest way is to programmatically delete the collection and recreate it with the same name. Our test scripts do this automatically. There is the potential for this to fail due to a subtle race condition, but we've found that adding a half second delay between the delete and recreate avoids this.
Alternatively, it would be possible to fetch every document id and then delete them one at a time. This would be most efficiently done from a stored procedure (sproc) so you didn't have to send it all over the wire, but it would still be consuming RUs and take time.

Related

How to structure data Firestore, for multiple user enteries

This is my first time using a NOSQL database and I'm really struggling to work out how to structure my data.
I have an app that predicts a users mood and then the user can select if that's right or not. So I need to save both the prediction and the actual result. I want to be able to pull the latest result from firebase and display it on the app.
I understand how I'd do this on an SQL DB and understand how to write an SQL query to get that data back out.
For my Firebase DB I thought of the following structure
the document name is the usersID and store multiple arrays based on the timestamp but I can't seem to user OrderBy on a document only a collection so not sure how to get this back.
The fact that this seems so difficult less me to believe I've implemented the DB wrong to begin with.
Structure of DB is as follows:
I should add that it all works fine for the USER_TABLE as its one document id and a single entry, so I've no problem retrieving that.
Thanks for your help!
orderBy is an instruction to the database to order documents on the server, before it returns them to your app. To store the fields inside the document, you can just do that inside your application code after it receives the document(s).
There is in itself nothing wrong with storing these entries in a single document, Just keep in mind that:
A document can be at most be 1MB in size, so make sure this fits your maximum number of entries.
Firestore only ever returns full documents, so you will either get all entries in a document, or none of them.
You won't be able to order or filter the entries inside a single document. If that is a requirement for you, consider storing each entry in its own document in a subcollection. Note that this will increase the number of documents each user reads though, which will increase the cost.

How to use DynamoDB streams to maintain duplicated data consistency?

From what I understand one of the uses cases of DynamoDB Streams is to maintain/update duplicated data.
Let's say I have a User object, and its name attribute is replicated in many Invoice objects.
When a User edits/updates its name, I will have a lambda using DynamoDb Streams to then update all Invoices related to this user with his new name.
There could be thousands of Invoices related to this user so this updating could take a while, specially because I will want to do a rate limited batch_write so that this operation doesn't throttle my table.
The question is : How can my (web)application know that the lambda has finished updating? For example, I want to show a loading screen to the client using the application untill the duplicated data updating is done, so that he doesn't see any outdated information on his browser.
Or is there other ways of rapidly dealing with updating thousands of duplicated data?
Why aren't you capturing the output of Lambda. You can make Lambda return successful status, once all the updates are persisting to DDB.
Invoice can keep a reference to User object instead of storing the exact name and can fetch name at the time of generating/printing

Updating records one by one

I have 5000 records I am calculating salary of one user and update his data in database. So it’s taking quit long to update 5000 records. I want to calculate all users’ salary first and then update to records in db.
Is there any other way we can update db in single click
It really depends on how your are managing your data access layer and what data you need for doing the calculation? Do you have all the data you need in just one table or for each record you need to fetch some other data from another tables?
One way is to retrieve each record and do the calculation in a transaction and then store it on the database. In this way, you can also take advantage of ajax UI to inform the user about the progress of calculation. In this way, you should use SqlDataReader to fetching the data as it is very optimized and has less overhead than using DataSet and DataTables and also you can prevent several type-castings. In addition, you can also make it optimized by taking advantage of TPL or make it configurable for fetching/updating N records each time. This approach works if you have the ID of the records. You also need to have a field for your records to track your calculations in case of any disconnection or crashes or iisreset execution so that you can resume the calculation instead of rerunning it again.

Attaching two memory databases

I am collecting data every second and storing it in a ":memory" database. Inserting data into this database is inside a transaction.
Everytime one request is sending to server and server will read data from the first memory, do some calculation, store it in the second database and send it back to the client. For this, I am creating another ":memory:" database to store the aggregated information of the first db. I cannot use the same db because I need to do some large calculation to get the aggregated result. This cannot be done inside the transaction( because if one collection takes 5 sec I will lose all the 4 seconds data). I cannot create table in the same database because I will not be able to write the aggregate data while it is collecting and inserting the original data(it is inside transaction and it is collecting every one second)
-- Sometimes I want to retrieve data from both the databses. How can I link both these memory databases? Using attach database stmt, I can attach the second db to the first one. But the problem is next time when a request comes how will I check the second db is exist or not?
-- Suppose, I am attaching the second memory db to first one. Will it lock the second database, when we write data to the first db?
-- Is there any other way to store this aggregated data??
As far as I got your idea, I don't think that you need two databases at all. I suppose you are misinterpreting the idea of transactions in sql.
If you are beginning a transaction other processes will be still allowed to read data. If you are reading data, you probably don't need a database lock.
A possible workflow could look as the following.
Insert some data to the database (use a transaction just for the
insertion process)
Perform heavy calculations on the database (but do not use a transaction, otherwise it will prevent other processes of inserting any data to your database). Even if this step includes really heavy computation, you can still insert and read data by using another process as SELECT statements will not lock your database.
Write results to the database (again, by using a transaction)
Just make sure that heavy calculations are not performed within a transaction.
If you want a more detailed description of this solution, look at the documentation about the file locking behaviour of sqlite3: http://www.sqlite.org/lockingv3.html

Mongodb automatically write into capped collection

I need to manage the acquisition of many record at hour. About 1000000 records. And I need to get every second the last insert value for every primary key. It works quit well with sharding. I was thinking to try the use os capped collection to get only the last record for every primary key. In order to do this, I made two separated insert, there is a way, into mongodb, to make some kind of trigger to propagate the insert into a collection to another collection?
MongoDB does not have any support for triggers or similar behavior.
The only way to do this is to make it happen in your code. So the code that writes the first entry should also write the second.
People have definitely requested triggers. If they are necessary for your solution, please cast a vote on the feature request.
I disagree with "triggers is needed". People, MongoDB was created to be very fast and to provide as basic functionalities as can be. This is a power of this solution.
I think that here the best think is to create triggers inside Your application as a part of Data Access layer.

Resources