best practice for bulk update in document DB - azure-cosmosdb

we have a scenario where we need to populate the collection every one hour with the latest data whenever we receive the data file in blob from external sources and at the same time , we do not want to impact the live users while updating the collection.
So, we have done below
Created 2 databases and collection 1 in both databases
Created a another collection in different database( configuration database ) with property as Active and Passive and this will have the Database1 and Database2 as values for the above properties
Now , our web job will run every time it sees the file in blob and check this configuration database and identify which one is active or passive and process the xml file and update the collection in passive database as that is not used by the live feed and once it is done , will update the active database to current and passive to live
now , our service will always check which one is active and passive and fetch the data accordingly and show to user
As we have to delete the data and insert the newly data in web job , wanted to know is this is best design we have come up with ? Does deleting and inserting the data will cost ? Is there better way to do bulks delete and insert as we are doing sequentially now

wanted to know is this is best design we have come up with ?
As David Makogon said, as for your solution, you need to manage and pay for multiple databases. If possible, you could create new documents in same collection and control which document is active in your program logic.
Does deleting and inserting the data will cost ?
the operation/request will consume the request units, which will be charged. To know Request Units and DocumentDB Pricing details, please refer to:
What is a Request Unit
DocumentDB pricing details
Is there better way to do bulks delete and insert as we are doing sequentially now
Stored Procedure that provides a way to group operations like inserts and submit them in bulk. You could create the stored procedures and then execute the stored procedure in your Webjobs function.

Related

Scan entire dynamo db and update records based on a condition

We have a business requirement to deprecate certain field values("**State**"). So we need to scan the entire db and find these deprecated field values and take the last record of that partition key(as there can be multiple records for the same partition key, sort key is LastUpdatedTimeepoch), then update the record. Right now the table contains around 600k records. What's the best way to do this without bringing down the db service in production?
I see this thread could help me
https://stackoverflow.com/questions/36780856/complete-scan-of-dynamodb-with-boto3
But my main concern is -
This is a one time activity. As this will take time, we cannot run this in AWS lambda since it will exceed 15 minutes. So where can I keep the code running for this?
Create EC2 instance and assign role to access dynamo db and run function in EC2 instance.

Identify deleted records in PDMlink by querying tables

The PDmlink records are hard deleted in WIndchill from the backend tables.
Users have access to delete the objects created and hence i need to find a way to identify the records deleted.
Is there any table which gives this information in PDMlink database?
Regards
Maha
There is a table called AUDITRECORD in DB which has the details about all the events happened it includes Delete event. But the information will be available only if you have configured Audit Event Recording.
If you have configured then you can get the details by using below query.
select * from auditrecord where eventlabel='Delete';
Refer Windchill helpcenter for steps to configure audit event recording.
In case you have not done this configuration and still wants to fetch the data about deleted objects then
Apache Web server logs and looking at backup of database or current
database doing a negation query to match part object identifer in
apache logs against those still remaining in database. Lot more work,
but not impossible.

Can we insert a data into Firebase realtime Database?

One child node of my Firebase realtime database has become huge (aroung 20 GB) and I need to purge this and insert the the extracted data of last month from the backup into the Firebase realtime database using Python Admin SDK.
In the documentation, I see the following options:
set - Write or replace data to a defined path, like messages/users/
update - Update some of the keys for a defined path without replacing all of the data
push - Add to a list of data in the database. Every time you push a new node onto a list, your database generates a unique key, like messages/users//
transaction - Use transactions when working with complex data that could be corrupted by concurrent updates
However, I want to add/insert the data from the firebase backup. I have to insert because the app is used in production and I cannot afford the overwrite of data.
Is there any method available to insert/add the data and not overwrite the data?
Any help/support is greatly appreciated.
There is no way to do this in Firebase Realtime Database without reading the current value of each location.
The only operation that allows you to update data based on its existing value is a transaction. A Firebase transaction gives you the (likely) current value at a location, and you then return what the new value should become.
But if the data you're restoring is (largely) the same as the data you have in the database, you might be able to use an update() call with sufficiently deep paths.

cosmosdb - archive data older than n years into cold storage

I researched several places and could not find any direction on what options are there to archive old data from cosmosdb into a cold storage. I see for DynamoDb in AWS it is mentioned that you can move dynamodb data into S3. But not sure what options are for cosmosdb. I understand there is time to live option where the data will be deleted after certain date but I am interested in archiving versus deleting. Any direction would be greatly appreciated. Thanks
I don't think there is a single-click built-in feature in CosmosDB to achieve that.
Still, as you mentioned appreciating any directions, then I suggest you consider DocumentDB Data Migration Tool.
Notes about Data Migration Tool:
you can specify a query to extract only the cold-data (for example, by creation date stored within documents).
supports exporting export to various targets (JSON file, blob
storage, DB, another cosmosDB collection, etc..),
compacts the data in the process - can merge documents into single array document and zip it.
Once you have the configuration set up you can script this
to be triggered automatically using your favorite scheduling tool.
you can easily reverse the source and target to restore the cold data to active store (or to dev, test, backup, etc).
To remove exported data you could use the mentioned TTL feature, but that could cause data loss should your export step fail. I would suggest writing and executing a Stored Procedure to query and delete all exported documents with single call. That SP would not execute automatically but could be included in the automation script and executed only if data was exported successfully first.
See: Azure Cosmos DB server-side programming: Stored procedures, database triggers, and UDFs.
UPDATE:
These days CosmosDB has added Change feed. this really simplifies writing a carbon copy somewhere else.

some basic oracle concepts

Hi:
In our new application we have to use the oracle as the db,and we use mysql/sqlserver before,when I come to oracle I am confused by its concepts,for exmaple,the table space,the object,the schema table,index, procedure, database link,...:(
And the schema is closed to the user,I can not make it.
Since when we use the mysql,I just know that one database contain as many tables,and contain as many users,user have different authentication for different table.
But in oracle,everything is different.
Anyone can tell me some basic concepts of oracle,and some quick start docs?
Oracle has specific meanings for commonly-used terms, and you're right, it is confusing. I'll build a hierarchy of terms from the bottom up:
Database - In Oracle, the database is the collection of files that make up your overall collection of data. To get a handle on what Oracle means, picture the database management system (dbms) in a non-running state. All those files are your "database."
Instance - When you start the Oracle software, all those files become active, things get loaded into memory, and there's an entity to which you can connect. Many people would use the term "database" to describe a running dbms, but, once everything is up-and-running, Oracle calls it an, "instance."
Tablespace - A abstraction that allows you to think about a chunk of storage without worrying about the physical details. When you create a user, you ask Oracle to put that user's data in a specific tablespace. Oracle manages storage via the tablespace metaphor.
Data file - The physical files that actually store the data. Data files are grouped into tablespaces. If you use all the storage you have allocated to a user, or group of users, you add data files (or make the existing files bigger) to the tablespace they're configured to use.
User - An abstraction that encapsulates the privileges, authentication information, and default storage areas for an account that can log on to an Oracle instance.
Schema - The tables, indices, constraints, triggers, etc. that are owned by a particular user. There is a one-to-one correspondence between users and schemas. The schema has the same name as the user. The difference between the two is that the user concept is all about account information, while the schema concept deals with logical database objects.
This is a very simplified list of terms. There are different states of "running" for an Oracle instance, for example, and it's easy to get into very nuanced discussions of what things mean. Here's a practical exercise that will let you put your hands on these things, and will make the distinctions clearer:
Start an already-created Oracle instance. This step will transform a group of files, or as Oracle would say, a database, into a running Oracle instance.
Create a tablespace with the CREATE TABLESPACE command. You'll have to specify some data files to put into the tablespace, as well as some storage parameters.
Create a user with the CREATE USER command. You'll see that the items you have to specify have to do with passwords, privileges, quotas, and the like. Specify that the user's data be stored in the tablespace you created in step 2.
Connect to the Oracle using the credentials you created with the new user from step 3. Type, "SELECT * FROM CAT". Nothing should come back yet. Your user has a schema, but it's empty.
Run a CREATE TABLE command. INSERT some data into the table. The schema now contains some objects.
table spaces: these are basically
storage definitions. when defining a
table or index, etc., you can specify
storage options simply by putting
your table in a specific table_space
table, index, procedure: these are pretty much the same
user, schema: explained well before
database link: you can join table A in instance A and table B in instance B using a - database link between the two instances (while logged in on of them)
object: has properties (like a columns in a table) and methods that operate on those poperties (pretty much like in OO design); these are not widely used
A few links:
Start page for 11g rel 2 docs http://www.oracle.com/pls/db112/homepage
Database concepts, Table of contents http://download.oracle.com/docs/cd/E11882_01/server.112/e16508/toc.htm

Resources