We are creating an application MongoDB as database and we are using official C# driver for MongoDB. We have one collection which contains thousands of records and we want to create list with paging. I have gone through documentation but there is not efficient way of paging with MongoDB C# official driver.
My requirement is to exactly fetch only 50 records from database. I have seen many examples but that get all collection and perform skip and take via LINQ which is not going to work in our case as we don't want to fetch thousand of records in memory.
Please provide any example code or link for that. Any help will be appreciated.
Thanks in advance for help.
You can use SetLimit on the cursor that represents the query. That will limit the results from MongoDB, not only in memory:
var cursor = collection.FindAll(); // Or any other query.
cursor.SetLimit(50); // Will only return 50.
foreach (var item in cursor)
{
// Process item.
}
You can also use SetSkip to set a skip (surprisingly):
cursor.SetSkip(10);
Note: You must set those properties on the cursor before enumerating it. Setting those after will have no effect.
By the way, even if you do only use Linq's Skip and Take you won't be retrieving thousands of documents. MongoDB automatically batches the result by size (first batch is about 1mb, the rest are 4mb each) so you would only get the first batch and take the first 50 docs out of it. More on
Edit: I think there's some confusion about LINQ here:
that get all collection and perform skip and take via LINQ which is not going to work in our case as we don't want to fetch thousand of records in memory.
Skip and Take are extension methods on both IEnumerable and IQueryable. IEnumerable is meant for in memory collections, but IQueryable operations are translated by the specific provider (the C# driver in this case). So the above code is equivalent with:
foreach (var item in collection.AsQueryable().SetLimit(50))
{
// Process item.
}
Related
Ive been building a serverless app using Dynamodb as the database, and have been following the single table design pattern (e.g. https://www.alexdebrie.com/posts/dynamodb-single-table/). Something that I'm starting to come up against is the use of dynamodb streams - I want to be able to use a dynamodb stream to keep an Elasticsearch instance up to date.
At the minute the single dynamodb table holds about 10 different types of items (which will continue to expand), one of these item types, 'event' (as in a sporting event) will be sent to the elastic search instance for complex querying/searching. Therefore any changes to an 'event' item will need to be updated in Elasticsearch via a lambda function triggered by the stream.
What I am struggling with is that I will have a lambda being triggered on 'update' on any of the table items, but that could also be an update of one of the other 9+ item types, I get that inside the lambda I can check for the item that was updated and check its type etc, but it seems wasteful that pretty much any update to any item type will trigger the lambda, which could be potentially a lot more times than needed.
Is there a better way to handle this to be less wasteful and more targeted to only one item type? I'm thinking that as the app grows and more stream triggers are needed, at least there would be an 'update' lambda already being triggered that I could run some logic to see what type of item was updated, but I'm just concerned i've missed a point on something.
You can use Lambda Event Filtering. This will allow you to prevent specific events from ever invoking your function. In the case of your single table DynamoDB design pattern, you can filter out only records with type: EVENT.
If you so happen to be utilizing the Serverless Framework, the following yaml snippet showcases how you can easily implement this feature.
functionName:
handler: src/functionName/function.handler
# other properties
events:
- stream:
type: dynamodb
arn: !GetAtt DynamoDbTable.StreamArn
maximumRetryAttempts: 1
batchSize: 1
filterPatterns:
- eventName: [MODIFY]
dynamodb:
MyTableName:
type:
S: [EVENT]
Note multiple comparison operators exist such as begins with i.e. [{"prefix":"EVENT"}] ~ see Filter rule syntax for more.
Source Pawel Zubkiewicz on Dev.to
Unfortunately, the approach you are describing is the only way to process DynamoDb streams. I went down the same path myself, thinking it could not be the correct usage, but it is the only way you can process streams.
I need to query a collection and return all documents that are new or updated since the last query. The collection is partitioned by userId. I am looking for a value that I can use (or create and use) that would help facilitate this query. I considered using _ts:
SELECT * FROM collection WHERE userId=[some-user-id] AND _ts > [some-value]
The problem with _ts is that it is not granular enough and the query could miss updates made in the same second by another client.
In SQL Server I could accomplish this using an IDENTITY column in another table. Let's call the table version. In a transaction I would create a new row in the version table, do the updates to the other table (including updating the version column with the new value. To query for new and updated rows I would use a query like this:
SELECT * FROM table WHERE userId=[some-user-id] and version > [some-value]
How could I do something like this in Cosmos DB? The Change Feed seems like the right option, but without the ability to query the Change Feed, I'm not sure how I would go about this.
In case it matters, the (web/mobile) clients connect to data in Cosmos DB via a web api. I have control of the entire stack - from client to back-end.
As the statements in this link:
Today, you see all operations in the change feed. The functionality
where you can control change feed, for specific operations such as
updates only and not inserts is not yet available. You can add a “soft
marker” on the item for updates and filter based on that when
processing items in the change feed. Currently change feed doesn’t log
deletes. Similar to the previous example, you can add a soft marker on
the items that are being deleted, for example, you can add an
attribute in the item called "deleted" and set it to "true" and set a
TTL on the item, so that it can be automatically deleted. You can read
the change feed for historic items, for example, items that were added
five years ago. If the item is not deleted you can read the change
feed as far as the origin of your container.
Change feed is not available for your requirements.
My idea:
Use Azure Function Cosmos DB Trigger to collect all the operations in your specific cosmos collection. Follow this document to configure the input of azure function as cosmos db, then follow this document to configure the output as azure queue storage.
Get the ids of changed items and send them into queue storage as messages.When you want to query the changed item,just query the messages from the queue to consume them at a specific unit time and after that just clear the entire queue. No items will be missed.
With your approach, you can get added/updated documents and save reference value (_ts and id field) somewhere (like blob)
SELECT * FROM collection WHERE userId=[some-user-id] AND _ts > [some-value] and id !='guid' order by _ts desc
This is a similar approach we use to read data from Eventhub and store checkpointing information (epoch number, sequence number and offset value) in blob. And at a time only one function can take a lease of that blob.
If you go with ChangeFeed, you can create listener (Function or Job) to listen all add/update data from collection and you can store those value in some collection, while saving data you can add Identity/version field on every document. This approach may increase your cosmos DB bill.
This is what the transaction consistency levels are for: https://learn.microsoft.com/en-us/azure/cosmos-db/consistency-levels
Choose strong consistency and your queries will always return the latest write.
Strong: Strong consistency offers a linearizability guarantee. The
reads are guaranteed to return the most recent committed version of an
item. A client never sees an uncommitted or partial write. Users are
always guaranteed to read the latest committed write.
Firebase allows having multiple projects in a single application.
// Initialize another app with a different config
var secondary = firebase.initializeApp(secondaryAppConfig, "secondary");
// Retrieve the database.
var secondaryDatabase = secondary.database();
Example:
Project 1 has my users collection; Project 2 has my friends collection (suppose there's a reason for that). When I add a new friend in the Project 2 database, I want to increment the friendsCount in the user document in Project 1. For this reason, I want to create a transaction/batch write to insure consistency in the data.
How can I achieve this? Can I create a transaction or a batch write between different Firestore instances?
No, you cannot use the database transaction feature across multiple databases.
If absolutely required, I'd probably instead create a custom locking feature. From wiki,
To allow several users to edit a database table at the same time and also prevent inconsistencies created by unrestricted access, a single record can be locked when retrieved for editing or updating. Anyone attempting to retrieve the same record for editing is denied write access because of the lock (although, depending on the implementation, they may be able to view the record without editing it). Once the record is saved or edits are canceled, the lock is released. Records can never be saved so as to overwrite other changes, preserving data integrity.
In database management theory, locking is used to implement isolation among multiple database users. This is the "I" in the acronym ACID.
Source: https://en.wikipedia.org/wiki/Record_locking
It's been three years since the question, I know, but since I needed the same thing I found a working solution to perform the double (or even ^n) transaction. You have to nest the transactions like this.
db1.runTransaction(t1 => db2.runTransaction(t2 => async () => {
await t1.set(.....
await t2.update(.....
etc....
})).then(...).catch(...)
Since the error is propagated in the nested promises it is safe to execute the double transaction in this way because for a failure in any one of the databases it results in the error in all of them.
Hi I am new to MongoDB and ASP.Net. I was wondering if MongoDB projection retrieves the fields from a document in the server itself or does it retrieve the entire document and filter it in memory.
For Example:
var filter = Builders<FoodItems>.Filter.Where(r => r.Fruits.Name == "Mango");
var result = Context.FruitCollection
.Find(filter)
.Project(r => new {r.Fruits.Cost, r.Fruits.Quantity})
.ToList();
return result;
Here, the "Cost" and "Quantity" fields are retrieved from the database directly or the entire "FoodItems" Document is retrieved and the corresponding fields are retrieved from the memory?
Thanks in advance.
Projection happens on the database server side; it is typically used to limit the amount of data that MongoDB sends to applications.
One particular example that shows how MongoDB makes use of projection for optimising its own running is when all the fields to be returned from a query are included in the index used to find the data; that is called a covered query, and it means that once MongoDB has found the index entries it can finish immediately, without having to go further to the actual collection data.
Just dipping my toes into Linq2sql project after years of rolling my own SQL Server DB access routines.
Before I spend too much time figuring out how to make linq2sql behave like my custom code used to, I want to check to make sure that it isn't already "built" in behavior that I can just use by setting up the relationships right in the designer...
Very simple example:
I have two tables: Person and Notes, with a 1 to many relationship (1 Person, many notes), linked by Person.ID->Note.PersonID.
I have a stored procedure (all data access is done via SP's and I plan on continuing that) which makes the Link2SQL a bit more work for me.
sp_PersonGet(#ID int) which returns the person record and sp_PersonNotesGet(#PersonID) which returns a set of related notes for this person.
So far so good, I have an object:
Dim myPerson As Person = db.PersonGet(pnID).Single
and I can access my fields: myPerson.Name, myPerson.Phone etc.
and I can also do a
Dim myNotes As Notes = db.PersonNotesGet(pnID)
to get a set of notes and I can iterate thru this list like:
For Each N As Note In myNotes
( do something)
Next
This all works fine...BUT....What I would prefer is that if I call:
myPerson = db.PersonGet(pnID)
that I also end up with a myPerson.Notes collection that I can iterate thru.
For Each N As Note In myPerson.Notes
( do something)
Next
Basically, Linq2SQl would need to call 2 stored procedures each time a Person record is created...
Is this doable "out of the box", or is this something I need to code around for myself?
This is normally what we would call child collections and they can be eager loaded or lazy loaded. Read these:
http://davidhayden.com/blog/dave/archive/2009/01/08/QuickExamplesLINQToSQLPerformanceTuningPerformanceProfilingORMapper.aspx
http://www.thinqlinq.com/default/Fetching-child-records-using-Stored-Procedures-with-LINQ-to-SQL.aspx
It uses partial classes. You can add your own "Notes" property to your Person class and initialize it in it's GETter function. This would be better than populating the notes every time you load a person record.
I believe that you can do this more or less out of the box, although I haven't tried it -- I don't use stored procedures with LINQ. What you would need to do is change the Insert/Delete/Update methods from using the runtime to use your stored procedures. Then you'd create an Association between your two entity tables which would create an EntitySet of Notes on the Person class and a EntityRef of Person on the Notes class. You can set this up to load automatically or using lazy loading.
The only tricky bit, as far as I can see, is the change from using the runtime generated methods to using your stored procedures. I believe that you have to add them into the data context as methods (by dropping it on your table from the server explorer in the designer) before it is available to use instead.