Cosmos DB data modelling to optimize search [closed]

Cosmos DB data modelling to optimize search [closed] - azure-cosmosdb

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I watched this video on data modelling in Cosmos DB.
In the video, it is explained that if you can model your data such that your most common queries are in partition queries, then you can minimize RUs, which in turn minimizes cost and maximizes performance.
The example used in the video is a blogging system. They showed that by moving things around such that Blog Posts and Comments are stored as separate entities in the same collection all partitioned by blogId they could achieve a low RU for a common query.
They then showed that searching for all blog posts by a specific user, being a cross partition query, is very expensive. So they then duplicate all blog post data and add each blog post as a separate entity to the users collection, which is already partitioned by userId. Searching for posts by a user is now cheap. The argument is storage is much cheaper than CPU time so this is a fine thing to do.
My question is: do I continue to follow this pattern when I want to make more things efficiently searchable? For example, I want to be able to search on blog topic (of which there could be many per blog post), a discrete blog rating, and so on.
I feel like extending this pattern for each search term is unsustainable. In these cases, do I just have to live with high RU searches or is there some clever way of making things efficient?

The essentially comes down to knowing whether the cost of using change feed to copy data from one container to another is less than the cost of doing cross-partition queries. This requires knowing the access patterns of your application and also requires measuring the average cost of these queries versus the cost of using change feed to make another copy. Change feed consumes 2 RU/s when it polls the container, then 1 RU for each 1Kb or less read from the source container and ~8 RU for each 1Kb or less insert on target container depending on your index policy. Multiply that by the rate at which data is inserted or updated. Then calculate this per day or per month to compare cost.
If what you're looking for is to do free text search on your data, you may want to look at using Azure Search. This is simpler than using the approach using change feed, but Azure Search can be quite expensive as well.

Related

Firebase Firestore huge amount of reading document concern [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
EDIT: Basically do you get charge for the documents you query. The answers is no, if you query 100K documents but only get back 10 you only get charge for the 10 documents you get form your query.
Hi so I have been using firebase firestore and it's been great. However, I have some questions about it. Currently, I'm working on an app where users can place orders and by default, the 'isActive' property is true so that the admin can see the orders. When the order is completed the property turns false and doesn't show up. However, eventually, I will accumulate thousands of orders and my question is will I get charged for the documents that I read that is true or it will counts as reading all thousands of documents even though I don't use it.

You will most definitely be charged for stored data regardless of if you access it your not. If you do decide to access it you'll be charged for that as well. Fortunately you have 20K free reads a day and then you pay after that.
While I think that answers your question. If your app reads in 20,000 documents in one action (open a page and then it loads in everything) that's not going to scale well for you and that's a lot of data. Which means your client devices will have to process 20K documents which is not ideal.
I'd advise using limit and then do pagination or infinite scrolling.

It's fairly simple: if your document is read from/on the server, you will be charged for a document read.
If your clients are not requesting documents with isActive is false, they won't generate document read charges for those documents. I your admin is reading those documents, they will generate read charges for those documents.

Firestore - reducing amount of writes and reads by making changes locally, then pushing data to servers [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I'm doing a fair bit of work on a set of Firestore collections and documents. It amounts to a good amount of writes and reads, as I'm setting two-way refs and whatnot. Multiple documents are being written to multiple times.
Since Firestore offers offline capability, is it possible to reduce the number of writes via preparing all the data locally, then sending it to the server?
I'm using the Admin SDK.

It depends on what you mean. One document write is always going to cost one document write, no matter when or how that document was written. Batch writes don't in any way reduce the number of documents written, they just make all the document writes take effect at the exact same moment in time.
If you're staging lots of changes to a single document to take effect later, then feel free to do that. Just write the document whenever you've figured out what final document looks like, and no sooner.

I'am moving away from google appengine standard Python 2.7 NDB to Svelte, Firestore and RxFire.
I was able to dramatically reduce the number of reads and writes by batching hundreds of appengine NDB entities (datastore / data objects) into a single document using a data object map.
Every data object has a batchId prop to optimize (batched) batch writes / document writes. (batchId = docId)
Most of the querying is now done in the client using filters. This resulted in very simple reactive Firestore queries using RxFire observables. This also dramatically reduced the number of composite indexes.
doc:
batchId: docId
map: data Objects
batchId: docId
other props ...
....
I also used maps of data objects for putting all kinds of configuration and definition data into single documents. This setup is easy to maintain and available with a single doc read. The doc reads are part of observables to react to doc changes.

Firebase Realtime Database, what is a read exactly? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 3 years ago.
Improve this question
I am aware for Cloud Firestore a read is a document (whether the documents has 5 or 50 nodes). How does this compare to the RTDB?
If I have a query that has a limit of 25, is this going to be 25 reads, or 25 times x amount of items in each node?
Cheers.

Your question is a bit of a non-sequitur, as realtime database doesn't bill by reads, it bills by data transferred (and storage, of course). So, the thing that affects your cost is the size of the items transferred, which is only indirectly based on the number of items due to a limit on the query. Currently, the costs are about US $1 per GB downloaded assuming you are on the Blaze plan.
To compare this with the costs for Firestore would require knowing a lot more about the shape of your traffic -- how many reads and writes, average size of a read, etc. Note that Cloud Firestore also indirectly charges for data transferred, but at a much lower rate, as it is only the Google Cloud Network pricing.
This means that you can generally get quite a large number of Firestore document reads for the cost that RTDB charges for transferring 1 GB.. (e.g. at current prices for egress to most of the internet excluding some asia/pacific destinations, you could get 1 GB + over 1.4M firestore document reads for your $1 of 1 GB RTDB transfer).
The documentation references several things you can do to help control costs, including (but not limited to):
Prefer the native SDKs to the REST API
Monitoring your data usage and use the profiler tool to measure read operations.
Use fewer, longer lived connections, as SSL and connection overhead can contribute to your costs (but generally are not the bulk of your cost).
Ensure your listeners are limited to only the data you care about, and are as low in the database tree as possible, and only download updates, (e.g. on() vs once()).

What are some good practices when returning LARGE amounts of data from an ASP.net web service? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I'm currently working on some web services for a client of ours. Before we make it available to them we'd like to optimize performance of our database calls, as very LARGE amounts of data can potentially be returned. (could be tens of thousands of objects, could be millions, each object containing about 12 lists of other objects)
We don't want to strain our servers, nor do we want to limit the web service unnecessarily.
One of the web service methods returns all data within a specified date range, I was thinking that if the amount of data being returned was larger than a set amount, return a message saying something like:
"Data too large, please reduce date range"
Is limiting the user's scope like that a good idea??
I have to limit the amount of data our client can retrieve in one shot, but still keep at as convenient as possible for them. I mean, they're programmers too so it doesn't have to be too simple, but simple enough to use.
What are some good practices concerning returning large amount of data through a web service??
Thanks!

You might be able to adapt the common technique of paging data displayed in a list or grid. The call to the database specifies the number of records to return, and the page number.
So, for example, if they are displaying 10 records on a page, only 10 records will be returned for display. Records 1 - 10 (or 0 - 9, if you prefer) are returned for page 1, and 11 - 20 for page 2, and so on.
Also often returned is the total number of records available.
This way, the user can continue scrolling through a large number of records, or they can choose to refine their search criteria to yield a smaller resultset.
You could consider this kind of paging or chunking approach for your web service. The web service call supplies the number of records to be sent in each chunk, and the "page" or "chunk" number. The web service returns the requested records, along with the total number of records available.
With this approach, the developer who is consuming the web service remains in control.
The calling code can be placed in a loop so that it continues requesting chunks, if that behavior is desirable. If someone really wants a truckload of records, they can just set the number of records argument to be a very large number (or you could make that an optional parameter, and return everything if it is null, zero, empty).

It really depends on your needs. This seems more of a design question than a coding question to me, but in our systems we have two approaches. I'll share them to give you some ideas to consider.
In the first one, we're the ones providing the data. We allow customers to download transaction data on their accounts, and for some customers this can be a fairly large amount of data. We limit them to X days worth of data, and they are fine with this.
In the second one, we're consuming data from a web service from an established vendor that tracks vehicle location data, and other data of interest to our dispatchers and management. Every truck in our fleet gives regular updates of their geolocation, plus other data (loading/unloading/driver on break, etc)
In these web services, we request a date range, and the service returns a set limit of records (1000 records per call).
In addition to passing in the date range, we pass in a "position" integer field. On the first call the "position" is set to zero.
The web service returns a "More Data Exists" boolean field.
If "More Data Exists"=true, then we call the web service again with the "position" parameter incremented, and repeat until "More Data Exists" = false (It's a simple while loop in our code)
I think the first one is great for either programmers OR end users. The second works fine when dealing with programmers.

That sounds like potentially lot of data for a webservice.
But here's a page from MSDN which talks about setting up the server to send/receive 'large' volumes of data: http://msdn.microsoft.com/en-us/library/aa528822.aspx

You can give them a chunk of data per time like (1000 obj) They have to specified a start index and amount of data they want and you just use that to pull your data.
In SQL there is TAKE and Skip that can do this easy.

NoSQL DB for .NET document-based database (ECM) [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I'm halfway through coding a basic multi-tenant SaaS ECM solution. Each client has it's own instance of the database / datastore, but the .Net app is single instance. The documents are pretty much read only (i.e. an image archive of tiffs or PDFs)
I've used MSSQL so far, but then started thinking this might be viable in a NoSQL DB (e.g. MongoDB, CouchDB). The basic premise is that it stores documents, each with their own particular indexes. Each tenant can have multiple document types.
e.g. One tenant might have an invoice type, which has Customer ID, Invoice Number and Invoice Date. Another tenant might have an application form, which has Member Number, Application Number, Member Name, and Application Date.
So far I've used the old method which Sharepoint (used?) to use, and created a document table which has int_field_1, int_field_2, date_field_1, date_field_2, etc. Then, I've got a "mapping" table which stores the customer specific index name, and the database field that will map to. I've avoided the key-value pair model in the DB due to volume of documents.
This way, we can support multiple document types in the one table, and get reasonably high performance out of it, and allow for custom document type searches (i.e. user selects a document type, then they're presented with a list of search fields).
However, a NoSQL DB might make this a lot simpler, as I don't need to worry about denormalizing the document. However, I've just got concerns about the rest of the data around a document. We store an "action history" against the document. This tracks views, whether someone emails the document from within the system, and other "future" functionality (e.g. faxing).
We have control over the document load process, so we can manipulate the data however it needs to be to get it in the document store (e.g. assign unique IDs). Users will not be adding in their own documents, so we shouldn't need to worry about ACID compliance, as the documents are relatively static.
So, my questions I guess :
Is a NoSQL DB a good fit
Is MongoDB the best for Asp.Net (I saw Raven and Velocity, but they're still kinda beta)
Can I store a key for each document, and then store the action history in a MSSQL DB with this key? I don't need to do joins, it would be if a person clicks "View History" against a document.
How would performance compare between the two (NoSQL DB vs denormalized "document" table)
Volumes would be up to 200,000 new documents per month for a single tenant. My current scaling plan with the SQL DB involves moving the SQL DB into a cluster when certain thresholds are reached, and then reviewing partitioning and indexing structures.

Answer:
For the document oriented portions, and likely for the whole thing, a nosql solution should work fine.
I've played with, and heard good things about mongodb, and would probably recommend it first for a .net project.
For the document-oriented portion, the performance should be excellent compared to a sql database. for smaller scale, it should be equivalent, but the ability to scale out later will be a huge benefit.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex