Does MongoDB Projection in .NET happen in Database or Memory? - asp.net

Hi I am new to MongoDB and ASP.Net. I was wondering if MongoDB projection retrieves the fields from a document in the server itself or does it retrieve the entire document and filter it in memory.
For Example:
var filter = Builders<FoodItems>.Filter.Where(r => r.Fruits.Name == "Mango");
var result = Context.FruitCollection
.Find(filter)
.Project(r => new {r.Fruits.Cost, r.Fruits.Quantity})
.ToList();
return result;
Here, the "Cost" and "Quantity" fields are retrieved from the database directly or the entire "FoodItems" Document is retrieved and the corresponding fields are retrieved from the memory?
Thanks in advance.

Projection happens on the database server side; it is typically used to limit the amount of data that MongoDB sends to applications.
One particular example that shows how MongoDB makes use of projection for optimising its own running is when all the fields to be returned from a query are included in the index used to find the data; that is called a covered query, and it means that once MongoDB has found the index entries it can finish immediately, without having to go further to the actual collection data.

Related

How to create one stream listening to multiple Firestore documents created from list of documents references in Flutter

Im trying to create one stream, that is using multiple documents references that are stored and fetched from Firebase Firestore.
Lets say I have two collection named users and documents. When user is created he gets document with his id in users collection with field named documentsHasAccessTo that is list of references to documents inside documents collection. It is important, that these documents can be located in different sub collections inside documents collection so I dont want to query whole documents and filter it, in order to save Firestore transfer and make it faster I already know paths to documents stored in documentsHasAccessTo field.
So for example, I can have user with data inside users/<user uid> document with documentsHasAccessTo field that stores 3 different document references.
I would like to achieve something like this (untested):
final userId = 'blablakfn1n21n4109';
final usersDocumentRef = FirebaseFirestore.instance.doc('users/$userId');
usersDocumentRef.snapshots().listen((snapshot) {
final references = snapshot.data()['documentsHasAccessTo'] as List<DocumentReference>;
final documentsStream = // create single query stream using all references from list
});
Keep in mind, that it would also be great, if this stream would update query if documentsHasAccessTo changes like in the example above, hence I used snapshots() on usersDocumentReferences rather than single get() fetch.
The more I think about this Im starting to believe this is simple impossible or theres a more simple and clean solution. Im open to anything.
You could use rxdart's switchMap and MergeStream:
usersDocumentRef.snapshots().switchMap((snapshot) {
final references = snapshot.data()['documentsHasAccessTo'] as List<DocumentReference>;
return MergeStream(references.map(ref) => /* do something that creates a stream */));
});

How can I query for all new and updated documents since last query?

I need to query a collection and return all documents that are new or updated since the last query. The collection is partitioned by userId. I am looking for a value that I can use (or create and use) that would help facilitate this query. I considered using _ts:
SELECT * FROM collection WHERE userId=[some-user-id] AND _ts > [some-value]
The problem with _ts is that it is not granular enough and the query could miss updates made in the same second by another client.
In SQL Server I could accomplish this using an IDENTITY column in another table. Let's call the table version. In a transaction I would create a new row in the version table, do the updates to the other table (including updating the version column with the new value. To query for new and updated rows I would use a query like this:
SELECT * FROM table WHERE userId=[some-user-id] and version > [some-value]
How could I do something like this in Cosmos DB? The Change Feed seems like the right option, but without the ability to query the Change Feed, I'm not sure how I would go about this.
In case it matters, the (web/mobile) clients connect to data in Cosmos DB via a web api. I have control of the entire stack - from client to back-end.
As the statements in this link:
Today, you see all operations in the change feed. The functionality
where you can control change feed, for specific operations such as
updates only and not inserts is not yet available. You can add a “soft
marker” on the item for updates and filter based on that when
processing items in the change feed. Currently change feed doesn’t log
deletes. Similar to the previous example, you can add a soft marker on
the items that are being deleted, for example, you can add an
attribute in the item called "deleted" and set it to "true" and set a
TTL on the item, so that it can be automatically deleted. You can read
the change feed for historic items, for example, items that were added
five years ago. If the item is not deleted you can read the change
feed as far as the origin of your container.
Change feed is not available for your requirements.
My idea:
Use Azure Function Cosmos DB Trigger to collect all the operations in your specific cosmos collection. Follow this document to configure the input of azure function as cosmos db, then follow this document to configure the output as azure queue storage.
Get the ids of changed items and send them into queue storage as messages.When you want to query the changed item,just query the messages from the queue to consume them at a specific unit time and after that just clear the entire queue. No items will be missed.
With your approach, you can get added/updated documents and save reference value (_ts and id field) somewhere (like blob)
SELECT * FROM collection WHERE userId=[some-user-id] AND _ts > [some-value] and id !='guid' order by _ts desc
This is a similar approach we use to read data from Eventhub and store checkpointing information (epoch number, sequence number and offset value) in blob. And at a time only one function can take a lease of that blob.
If you go with ChangeFeed, you can create listener (Function or Job) to listen all add/update data from collection and you can store those value in some collection, while saving data you can add Identity/version field on every document. This approach may increase your cosmos DB bill.
This is what the transaction consistency levels are for: https://learn.microsoft.com/en-us/azure/cosmos-db/consistency-levels
Choose strong consistency and your queries will always return the latest write.
Strong: Strong consistency offers a linearizability guarantee. The
reads are guaranteed to return the most recent committed version of an
item. A client never sees an uncommitted or partial write. Users are
always guaranteed to read the latest committed write.

Efficient way of paging with MongoDB and ASP.NET MVC

We are creating an application MongoDB as database and we are using official C# driver for MongoDB. We have one collection which contains thousands of records and we want to create list with paging. I have gone through documentation but there is not efficient way of paging with MongoDB C# official driver.
My requirement is to exactly fetch only 50 records from database. I have seen many examples but that get all collection and perform skip and take via LINQ which is not going to work in our case as we don't want to fetch thousand of records in memory.
Please provide any example code or link for that. Any help will be appreciated.
Thanks in advance for help.
You can use SetLimit on the cursor that represents the query. That will limit the results from MongoDB, not only in memory:
var cursor = collection.FindAll(); // Or any other query.
cursor.SetLimit(50); // Will only return 50.
foreach (var item in cursor)
{
// Process item.
}
You can also use SetSkip to set a skip (surprisingly):
cursor.SetSkip(10);
Note: You must set those properties on the cursor before enumerating it. Setting those after will have no effect.
By the way, even if you do only use Linq's Skip and Take you won't be retrieving thousands of documents. MongoDB automatically batches the result by size (first batch is about 1mb, the rest are 4mb each) so you would only get the first batch and take the first 50 docs out of it. More on
Edit: I think there's some confusion about LINQ here:
that get all collection and perform skip and take via LINQ which is not going to work in our case as we don't want to fetch thousand of records in memory.
Skip and Take are extension methods on both IEnumerable and IQueryable. IEnumerable is meant for in memory collections, but IQueryable operations are translated by the specific provider (the C# driver in this case). So the above code is equivalent with:
foreach (var item in collection.AsQueryable().SetLimit(50))
{
// Process item.
}

LINQ to SQL - updating records

Using asp.net 4 though C#.
In my data access layer I have methods for saving and updating records. Saving is easy enough but the updating is tedious.
I previously used SubSonic which was great as it had active record and knew that if I loaded a record, changed a few entries and then saved it again, it recognised it as an update and didn't try to save a new entry in the DB.
I don't know how to do the same thing in LINQ. As a result my workflow is like this:
Web page grabs 'Record A' from the DB
Some values in it are changed by the user.
'Record A' is passed back to the data access layer
I now need to load Record A again, calling it 'SavedRecord A', update all values in this object with the values from the passed 'Record A' and then update/ save 'SavedRecord A'!
If I just save 'Record A' I end up with a new entry in the DB.
Obviously it would be nicer to just pass Record A and do something like:
RecordA.Update();
I'm presuming there's something I'm missing here but I can't find a straightforward answer on-line.
You can accomplish what you want using the Attach method on the Table instance, and committing via the SubmitChanges() method on the DataContext.
This process may not be as straight-forward as we would like, but you can read David DeWinter's LINQ to SQL: Updating Entities for a more in depth explanation/tutorial.
let's say you have a product class OR DB, then you will have to do this.
DbContext _db = new DbContext();
var _product = ( from p in _db.Products
where p.Id == 1 // suppose you getting the first product
select p).First(); // this will fetch the first record.
_product.ProductName = "New Product";
_db.SaveChanges();
// this is for EF LINQ to Objects
_db.Entry(_product).State = EntityState.Modified;
_db.SaveChanges();
-------------------------------------------------------------------------------------
this is another example using Attach
-------------------------------------------------------------------------------------
public static void Update(IEnumerable<Sample> samples , DataClassesDataContext db)
{
db.Samples.AttachAll(samples);
db.Refresh(RefreshMode.KeepCurrentValues, samples)
db.SubmitChanges();
}
If you attach your entities to the context and then Refresh (with KeepCurrentValues selected), Linq to SQL will get those entities from the server, compare them, and mark updated those that are different
When LINQ-to-SQL updates a record in the database, it needs to know exactly what fields were changed in order to only update those. You basically have three options:
When the updated data is posted back to the web server, load the existing data from the database, assign all properties to the loaded object and call SubmitChanges(). Any properties that are assigned the existing value will not be updated.
Keep track of the unmodified state of the object and use Attach with both the unmodified and modified values.
Initialize a new object with all state required by the optimistic concurrency check (if enabled, which it is by default). Then attach the object and finally update any changed properties after the attach to make the DataContext change tracker be aware of those updated.
I usually use the first option as it is easiest. There is a performance penalty with two DB calls but unless you're doing lots of updates it won't matter.

LINQ to Entities pulling back entire table

In my application I'm pulling back a user's "feed". This contains all of that user's activities, events, friend requests from other users, etc. When I pull back the feed I'm calling various functions to filter the request along the way.
var userFeed = GetFeed(db); // Query to pull back all data
userFeed = FilterByUser(userFeed, user, db); // Filter for the user
userFeed = SortFeed(userFeed, page, sortBy, typeName); // Sort it
The data that is returned is exactly what I need, however when I look at a SQL Profile Trace I can see that the query that is getting this data does not filter it at the database level and instead is selecting ALL data in the table(s).
This query does not execute until I iterate through the results on my view. All of these functions return an IEnumerable object.
I was under the impression that LINQ would take all of my filters and form one query to pull back the data I want instead of pulling back all the data and then filtering it on the server.
What am I doing wrong or what don't I understand about the way LINQ evaluates queries?
If GetFeed returns an IEnumerable, FilterByUser will receive an IEnumerable. When it calls some LINQ operator, i.e. Where, it will use the IEnumerable Where, which will start to ask for information, which will eventually download the entire table. Change the type of GetFeed to IQueryable to make sure that IQueryable's LINQ operators are called instead, which will keep delaying the query.

Resources