Archive Data from Cosmos DB - Mongo DB - azure-cosmosdb

in the project i am working on, we have a database per tenant and each tenant consists of at least 1 department. One of the requirements we have is that when an admin user deletes a department using a custom frontend we've provided, the system should first archive the data of that department on a blob storage before the data is deleted. The same we have for the tenant, we need to archive the data before the database of that tenant is removed from the account.
Now, my question: is there any best practice to do this? We are planning to retrieve all the data from all collections, using a mongo query, based on the department id (which is also the partition key) and then send it to a blob storage. The challenge we have is the execution of the query to retrieve all the data because it can be a huge amount and the RUs required for that action may affect the performance of the system because other users may be using the system while we remove the data.
I looked at mongodump and mongoexport but these are applications so we cannot execute it from our code?
Any ideas? Thanks a lot.

I think one way to solve this is by using ChangeFeed, as it reallyhelps and simplifies writing a carbon copy somewhere else.
However, as of now the change feed processor won't notify you for deleted documents so you can't listen for them, this feature is planned as of now.
Your best bet is to write some custom application that does archiving using Query language support

Related

Is it still a good idea to create comos db collection without partition key?

One colleague said that cosmos db will stop supporting collections without a partition key. But I can't find any information about this statement from Microsoft.
The application I'm working on has a collection of order records. A typical query returns 10s of thousands of these records. So if I use order id as partition key, it'll always run cross partition queries.... And the requirement is to get all records across all tenants, so partition by tenant id isn't an option, either.
I thought it'll be fine just create a collection without a partition key. I'll worry about archiving data later (probably with azure functions and change feed).
Is it a good idea to do so?
One colleague said that cosmos db will stop supporting collections
without a partition key. But I can't find any information about this
statement from Microsoft.
Based on the tips on the cosmos db portal,this message is confined to portal only so far.
You still could create non-partitioned collection by using sdk:
DocumentCollection collection = new DocumentCollection();
collection.set("id","jay");
ResourceResponse<DocumentCollection> createColl = client.createCollection("dbs/db",collection,null);
So,i think your service will not be affected by now. As for future trends, I suggest you pay more attention to Microsoft's official statement. If you have any special needs, you can submit feedback for help.

Combining Firebase Auth with Firebase Realtime Database or Google Cloud Datastore?

I'm planning a web application that requires user auth, plus the ability to display data for the users that is stored in a database. No interaction between the users is needed (yet), however the users should be able create objects and query their "own" objects. For example I list 10 book names (10 book objects), and User A should be able to pick a book and create a new object, call it userNoteObject that contains the name of a choosen book and a short note (that he/she writes).
With a basic pseudo code one book object would look like this:
bookObj = {"id": 1, "name": "book name"}
And the user's note object would be something like this:
userNoteObject = {
"id": 1,
"book_name": "random book name",
"owner_userid": "a1b2c3d",
"note": "some random string"
}
With MySQL I would create three tables, one for the users and one for the userNoteObject-s and another for the bookObj-s. Everytime an user saves a note, I would add it to the table that lists the saved notes. Then I can simply query the notes that belongs to X user based on the user's owner_userid. It's a quite simple functionality.
After reading about the possibilities I've made a decision to go with Firebase Auth (because in the future I might need Android and iOS compatibility) + Google Cloud Datastore or Firebase Realtime Database. However I'm a little bit scared about the Realtime Database of Firebase since I've never worked any DB like it. I also like to be able to modify records manually with something like PhpMyAdmin and I assume Cloud Datastore has a visual interface like that.
I'm familiar with JSON handling and creating JSON files, however the JSON based database is strange for me at the moment. Therefore I'm thinking about that maybe the other option would be a better choice. It's very important that I don't need realtime db features. I would load X number of entries into the table that holds the bookObj-s and sometimes update them. I assume when the user creates an userNoteObject it would be saved quickly with both and after deleting an userNoteObject I could refresh the page close to realtime with Datastore. But the table that holds the book objects must be able to store millions of entries easily.
So the important things:
One db table should be able to handle millions of records easily
Easy as possible querying
Visual interface for the DB (if it's possible)
I don't need realtime features like dynamic game score display/saving
Other info:
I would like to use Angular.js
I'm familiar with Python if it can help in something
So my question is that which database would be better for my needs? At the moment I say Datastore, but I'm totally new with these services so I'm not really against the Realtime Database, but Datastore looks more suitable since it has a visual interface. However I'm also not sure that how would work Datastore with Firebase. If there is a third option like combining both, Realtime Database for the objects save by the user and the static objects for Datastore for example, I would love to hear about it too. My overall goal is to be able to write and query the db easy and fast as it's possible and easily use it with Firebase auth.
UPDATE: I just discovered Firebase's Cloud Firestore, so if it can be more useful I could use it.
If you are going to use Firebase I would recommend you use Cloud Firestore instead of either Cloud Datastore or Firebase realtime database. You get the benefits of a real-time database plus a true document based JSON data store. The one downside is that you don`t have a UI to interact with the data. Datastore has one but its not as robust as say PHPMyAdmin. And since these are NoSQL datastores SQL support is pretty limited.
If you really want a true relational back-end you could try Cloud SQL which is basically MySQL running on Google Servers.
For the Firestore console/UI, see https://firebase.google.com/docs/firestore/using-console. Is that the kind of thing you're looking for?

Application data views in Azure Cosmos DB

Our application uses Cosmos DB to store data. The DB is partitioned based on User ID and all backend services work fine using the User ID.
However, we have a UI where only admins have access and they are different from the users stored in DB. They fetch data from DB based on time. That is, get reports from DB for last 3 days (doesn't matter what user ID). In this case, the query needs to fan out to all partitions. Moreover, stored procedures are scoped per partition and cannot be used in this scenario. Even though we are able to fetch data through query, it's a huge performance hit.
Can anybody please advise if there is a way in Cosmos DB to workaround this?

How to automatically generate a database in SQL Server from an app?

I'm currently developing an app where the users are first asked to create an account trough a website (ASP.NET) to use the app. For a special reason I need to automatically generate a database for each customer creating an account, on the hosted SQL Server. The databases for all the customers are the same.
I was thinking about doing like that: as I have the script for creating the database, I was thinking to insert it in stored procedure or a trigger that will be launched as soon as the user has fully created his account.
I don't really see other solutions, maybe somebody could give me some guidelines? Thanks in advance.
I think such a design has been shown to not scale. I'd recommend redesigning the schema to allow multiple customers in a single database.
Amazon does not such thing. Neither should you.
I agree duffymo on you would have scalability issues.
However there are situations where in you might prefer separate database as your multi-tenant data approach.
In my last project I had to adopt separate DB approach as business wanted complete isolation for each customer. It was a school administrative system and number of customer was not expected to grow in more than three digits in 5-10 years time.
So the solution I designed was, I used Entity Framework code first approach. Every school will have a unique school identifier which will be used to name the database uniquely for each school. The connection string was generated at runtime obviously. A connection factory was used to create the appropriate DataContext based on passed school identifier. The database is created on first usage if not exist. At the same time SQL script was executed to create db users during db creation if not exist.
If this approach sounds appealing I can share code snippet if that helps.

Best way to create a default Database setup via an .aspx page?

We are going to be selling a service that will be hosted by us, and each client we host will have their own database, but there will be one centralized website. I currently have a blank database with the few things that a new client will need. What is the best way to copy this database so I can setup another client? I want to be able to do this from an .aspx page. Thanks in advance!
Update:
By .aspx page, I just meant that I need to be able to kick off the process from an .aspx page.
Update2:
We're running SQL Server 2008.
Update 3: Referencing Cade Roux's answer... Thanks for a great answer, but...
What is the reason for merging all of the databases into one, and then distinguishing clients based on an identifier in each table? Wouldn't this greatly complicate the architecture of the entire product? I would need to add these Client ID columns to practically every table, and the DAL would need to know which client data its looking for. With the current setup I have, I just switch out the connection string in the DAL, depending on which user is accessing the site. That way, after the connection string is set, I never need to worry about finding client specific data! How do these approaches compare (and should I add this as a separate question?
You have a few different options:
You can detach your empty database, then when a user signs up, copy that database and mount it under a unique name for them and map it to their account in your master database, say.
You can create a database from scratch using scripts and populate any base data either from an online template database or scripting the base data and map it to their account in your master database.
You should seriously consider going to a multi-tenant architecture where all users are in the same database (with most tables having CustomerID columns to segregate the data) if you are going to have more than a few dozen customers.
Regarding your notes about option 3 - it depends on your application. Multi-tenant can be difficult to retrofit. On the other hand, managing and upgrading hundreds of individual customer databases can be difficult in the long haul.
There are previous Stack Overflow questions regarding this:
What are the advantages of using a single database for EACH client?
One database or many?
I think I'll see about re-tagging them with multi-tenant-db or something. Anyhow, I think that this comes up as a consideration secondary to your answer about a particular tactic does show the importance of including details about your overall goals in strategy in every question on StackOverflow.
Depending on what database you're using, there are several approaches. The simplest is to ask your database software to generate SQL code for creating the database and include that with your software. Another would be to just script out in C#/VB the steps needed to recreate your empty database.
Why the need for .aspx page?
You don't say what db version you're using but in SQL2005-2008, you have the ability to "script database as" and then "create to" and have it port the sql to a query window. You could then work with that to create a stored procedure that can be called from your .aspx page.
SQL Server has a system database called 'model'. Any database objects (tables, views, stored procedures) that exist in the model are added to any new database created.
You could create your 'client database' schema as model, and any new database would have all the same tables...
But, if you need to change your database schema later, your best option is to write change scripts which are part of your code-behind file. Since changes to the 'model' database are not propagated to existing databases, the application needs to detect and upgrade the database schema as necessary.
Disadvantage to this approach: If you want a database which isn't a 'client database' then you would need to create the database, and then delete the 'client database' tables.

Resources