ChangeFeed - Last Successful Operation Processed - azure-cosmosdb

The code snippet below iterates over the change feed. If we need to track the last successful processed record is that calculated by the continuation plus index (continuation + i) in the loop and/or the ETag of the document. IF there is a failure, how do I query the changefeed from that exact place? It isn't clear because when I start at 0 and request 1000, the continuation token in my test was 1120.
IDocumentQuery < Document > query = client.CreateDocumentChangeFeedQuery(
collectionUri,
new ChangeFeedOptions {
PartitionKeyRangeId = pkRange.Id,
StartFromBeginning = true,
RequestContinuation = continuation,
MaxItemCount = 1000
});
while (query.HasMoreResults) {
Dictionary < string, BlastTimeRange > br = new Dictionary < string, BlastTimeRange > ();
var readChangesResponse = query.ExecuteNextAsync < Document > ().Result;
int i =0;
foreach(Document changedDocument in readChangesResponse.AsEnumerable().ToList()) {
// processing each one
// the continuation and i represent the place or is it better to store off the ETag?
}
}

The best way to do this today is track the continuation token (same as the ETag in REST API), and the list of _rid values for the documents that you've read within the batch. When you read the next batch, you must exclude the _rid values that you have processed before.
The easiest way to do this without writing custom code is to use the DocumentDB team's ChangeFeedProcessor library (in preview). In order to get access. please email askdocdb#microsoft.com.

Related

Server performance question about streaming from cosmos dB

I read the article here about IAsyncEnumerable, more specifically towards a Cosmos Db-datasource
public async IAsyncEnumerable<T> Get<T>(string containerName, string sqlQuery)
{
var container = GetContainer(containerName);
using FeedIterator<T> iterator = container.GetItemQueryIterator<T>(sqlQuery);
while (iterator.HasMoreResults)
{
foreach (var item in await iterator.ReadNextAsync())
{
yield return item;
}
}
}
I am wondering how the CosmosDB is handling this, compared to paging, lets say 100 documents at the time. We have had some "429 - Request rate too large"-errors in the past and I dont wish to create new ones.
So, how will this affect server load/performance.
I dont see a big difference from the servers perspective, between when client is streaming (and doing some quick checks), and old way, get all document and while (iterator.HasMoreResults) and collect the items in a list.
The SDK will retrieve batches of documents that can be adjusted in size using the QueryRequestOptions and changing the MaxItemCount (which defaults to 100 if not set). It has no option though to throttle the RU usage apart from it running into the 429 error and using the retry mechanism the SDK offers to retry a while later. Depending on how generous you set the retry mechanism it'll retry oft & long enough to get a proper response.
If you have a situation where you want to limit the RU usage for e.g. there's multiple processes using your cosmos and you don't want those to result in 429 errors you would have to write the logic yourself.
An example of how something like that could look:
var qry = container
.GetItemLinqQueryable<Item>(requestOptions: new() { MaxItemCount = 2000 })
.ToFeedIterator();
var results = new List<Item>();
var stopwatch = new Stopwatch();
var targetRuMsRate = 200d / 1000; //target 200RU/s
var previousElapsed = 0L;
var delay = 0;
stopwatch.Start();
var totalCharge = 0d;
while (qry.HasMoreResults)
{
if (delay > 0)
{
await Task.Delay(delay);
}
previousElapsed = stopwatch.ElapsedMilliseconds;
var response = await qry.ReadNextAsync();
var charge = response.RequestCharge;
var elapsed = stopwatch.ElapsedMilliseconds;
var delta = elapsed - previousElapsed;
delay = (int) ((charge - targetRuMsRate * delta) / targetRuMsRate);
foreach (var item in response)
{
results.Add(item);
}
}
Edit:
Internally the SDK will call the underlying Cosmos REST API. Once your code reaches the iterator.ReadNextSync() it will call the query documents method in the background. If you would dig into the source code or intercept the message send to HttpClient you can observe the resulting message which lacks the x-ms-max-item-count header that determines the number of the documents it'll try to retrieve (unless you have specified a MaxItemCount yourself). According to the Microsoft Docs it'll default to 100 if not set:
Query requests support pagination through the x-ms-max-item-count and x-ms-continuation request headers. The x-ms-max-item-count header specifies the maximum number of values that can be returned by the query execution. This can be between 1 and 1000, and is configured with a default of 100.

Firestore rule to only add/remove one item of array

To optimize usage, I have a Firestore collection with only one document, consisting in a single field, which is an array of strings.
This is what the data looks like in the collection. Just one document with one field, which is an array:
On the client side, the app is simply retrieving the entire status document, picking one at random, and then sending the entire array back minus the one it picked
var all = await metaRef.doc("status").get();
List tokens=all['all'];
var r=new Random();
int numar=r.nextInt(tokens.length);
var ales=tokens[numar];
tokens.removeAt(numar);
metaRef.doc("status").set({"all":tokens});
Then it tries to do some stuff with the string, which may fail or succeed. If it succeeds, then no more writing to the database, but if it fails it fetches that array again, adds the string back and pushes it:
var all = await metaRef.doc("status").get();
List tokens=all['all'];
List<String> toate=(tokens.map((element) => element as String).toList());
toate.add(ales.toString());
metaRef.doc("status").set({"all":toate});
You can use the methods associated with the Set object.
Here is an example to check that only 1 item was removed:
allow update: if checkremoveonlyoneitem()
function checkremoveonlyoneitem() {
let set = resource.data.array.toSet();
let setafter = request.resource.data.array.toSet();
return set.size() == setafter.size() + 1
&& set.intersection(setafter).size() == 1;
}
Then you can check that only one item was added. And you should also add additional checks in case the array does not exist on your doc.
If you are not sure about how the app performs the task i.e., successfully or not, then I guess it is nice idea to implement this logic in the client code. You can just make a simple conditional block which deletes the field from the document if the operation succeeds, either due to offline condition or any other issue. You can find the following sample from the following document regarding how to do it. Like this, with just one write you can delete the field which the user picks without updating the whole document.
city_ref = db.collection(u'cities').document(u'BJ')
city_ref.update({
u'capital': firestore.DELETE_FIELD
})snippets.py

Flutter & Firebase Get more than 10 Firebase Documents into a Stream<List<Map>>

With Flutter and Firestore, I am trying to get more than 10 documents into a Stream<List>. I can do this with a .where clause on a collection mapping the QuerySnapshot. However, the 10 limit is a killer.
I'm using the provider package in my app. So, in building a stream in Flutter with a StreamProvider, I can return a
Stream<List<Map from the entire collection. too expensive. 200 plus docs on these collections and too many users. Need to get more efficient.
Stream<List<Map using a .where from a Collection that returns a Stream List 10 max on the list...doesn't cut the mustard.
Stream<Map from a Document, that returns 1 stream of 1 document.
I need something in between 1 and 2.
I have a Collection with up to 500 Documents, and the user will choose any possible combination of those 500 to view. The user assembles class rosters to view their lists of users.
So I'm looking for a way to get individual streams of, say 30 documents, and then compile them into a List: But I need this List<Stream<Map to be a Stream itself so each individual doc is live, and I can also filter and sort this list of Streams. I'm using the Provider Package, and if possible would like to stay consistent with that. Here's where I am currently stuck:
So, my current effort:
Future<Stream<List<AttendeeData>>> getStreams() async {
List<Stream<AttendeeData>> getStreamsOutput = [];
for (var i = 0; i < teacherRosterList.length; i++) {
Stream thisStream = await returnTeacherRosterListStream(facility, teacherRosterList[i]);
getStreamsOutput.add(thisStream);
}
return StreamZip(getStreamsOutput).asBroadcastStream();
}
Feels like I'm cheating below: I get an await error if I put the snapshot directly in Stream thisStream above as Stream is not a future if I await, and if I don't await, it moves too fast and gets a null error.
Future<Stream<AttendeeData>> returnTeacherRosterListStream(String thisFacility, String thisID) async {
return facilityList.doc(thisFacility).collection('attendance').doc(thisID).snapshots().map(_teacherRosterListFromSnapshot);
}
}
Example of how I'm mapping in _teacherRosterListFromSnapshot (not having any problem here):
AttendeeData _teacherRosterListFromSnapshot(DocumentSnapshot doc) {
// return snapshot.docs.map((doc) {
return AttendeeData(
id: doc.data()['id'] ?? '',
authorCreatedUID: doc.data()['authorCreatedUID'] ?? '',
);
}
My StreamProvider Logic and the error:
return MultiProvider(
providers: [
StreamProvider<List<AttendeeData>>.value(
value: DatabaseService(
teacherRosterList: programList,
facility: user.claimsFacility,
).getStreams()),
]
Error: The argument type 'Future<Stream<List>>' can't be assigned to the parameter type 'Stream<List>'.
AttendeeData is my Map Class name.
So, the summary of questions:
Can I even do this? I'm basically Streaming a List of Streams of Maps....is this a thing?
If I can, how do I do it?
a. I can't get this into the StreamProvider because getStreams is a Future...how can I overcome this?
I can get the data in using another method from StreamProvider, but it's not behaving like a Stream and the state isn't updating. i'm hoping to just get this into Provider, as I'm comfortable there, and I can manage state very easily that way. However, beggars can't be choosers.
Solved this myself, and since there is a dearth of good start to finish answers, I submit my example for the poor souls who come after me trying to learn these things on their own. I'm a beginner, so this was a slog:
Objective:
You have any number of docs in a collection and you want to submit a list of any number of docs by their doc number and return a single stream of a list of those mapped documents. You want more than 10 (firestore limit on .where query), less than all the docs...so somewhere between a QuerySnapshot and a DocumentSnapshot.
Solution: We're going to get a list of QuerySnapshots, we're going to combine them and map them and spit them out as a single stream. So we're getting 10each in chunks (the max) and then some odd number left over. I plug mine into a Provider so I can get it whenever and wherever I want.
So from my provider I call this as the Stream value:
Stream<List<AttendeeData>> filteredRosterList() {
var chunks = [];
for (var i = 0; i < teacherRosterList.length; i += 10) {
chunks.add(teacherRosterList.sublist(i, i + 10 > teacherRosterList.length ? teacherRosterList.length : i + 10));
} //break a list of whatever size into chunks of 10.
List<Stream<QuerySnapshot>> combineList = [];
for (var i = 0; i < chunks.length; i++) {
combineList.add(*[point to your collection]*.where('id', whereIn: chunks[i]).snapshots());
} //get a list of the streams, which will have 10 each.
CombineLatestStream<QuerySnapshot, List<QuerySnapshot>> mergedQuerySnapshot = CombineLatestStream.list(combineList);
//now we combine all the streams....but it'll be a list of QuerySnapshots.
//and you'll want to look closely at the map, as it iterates, consolidates and returns as a single stream of List<AttendeeData>
return mergedQuerySnapshot.map(rosterListFromTeacherListDocumentSnapshot);
}
Here's a look at how I mapped it for your reference (took out all the fields for brevity):
List<AttendeeData> rosterListFromTeacherListDocumentSnapshot(List<QuerySnapshot> snapshot) {
List<AttendeeData> listToReturn = [];
snapshot.forEach((element) {
listToReturn.addAll(element.docs.map((doc) {
return AttendeeData(
id: doc.data()['id'] ?? '',
authorCreatedUID: doc.data()['authorCreatedUID'] ?? '',
);
}).toList());
});
return listToReturn;
}

DynamoDB Mapper Query Doesn't Respect QueryExpression Limit

Imagine the following function which is querying a GlobalSecondaryIndex and associated Range Key in order to find a limited number of results:
#Override
public List<Statement> getAllStatementsOlderThan(String userId, String startingDate, int limit) {
if(StringUtils.isNullOrEmpty(startingDate)) {
startingDate = UTC.now().toString();
}
LOG.info("Attempting to find all Statements older than ({})", startingDate);
Map<String, AttributeValue> eav = Maps.newHashMap();
eav.put(":userId", new AttributeValue().withS(userId));
eav.put(":receivedDate", new AttributeValue().withS(startingDate));
DynamoDBQueryExpression<Statement> queryExpression = new DynamoDBQueryExpression<Statement>()
.withKeyConditionExpression("userId = :userId and receivedDate < :receivedDate").withExpressionAttributeValues(eav)
.withIndexName("userId-index")
.withConsistentRead(false);
if(limit > 0) {
queryExpression.setLimit(limit);
}
List<Statement> statementResults = mapper.query(Statement.class, queryExpression);
LOG.info("Successfully retrieved ({}) values", statementResults.size());
return statementResults;
}
List<Statement> results = statementRepository.getAllStatementsOlderThan(userId, UTC.now().toString(), 5);
assertThat(results.size()).isEqualTo(5); // NEVER passes
The limit isn't respected whenever I query against the database. I always get back all results that match my search criteria so if I set the startingDate to now then I get every item in the database since they're all older than now.
You should use queryPage function instead of query.
From DynamoDBQueryExpression.setLimit documentation:
Sets the maximum number of items to retrieve in each service request
to DynamoDB.
Note that when calling DynamoDBMapper.query, multiple
requests are made to DynamoDB if needed to retrieve the entire result
set. Setting this will limit the number of items retrieved by each
request, NOT the total number of results that will be retrieved. Use
DynamoDBMapper.queryPage to retrieve a single page of items from
DynamoDB.
As they've rightly answered the setLimit or withLimit functions limit the number of records fetched only in each particular request and internally multiple requests take place to fetch the results.
If you want to limit the number of records fetched in all the requests then you might want to use "Scan".
Example for the same can be found here

Asynchronous with indexeddb problems

I am having problem with a function in IndexedDB, where I need to change the status of some meetings. The Search feature which meetings are checked by grabbing the ID of each one of them, soon after I A for() where I retrace the vector that contains the ids for each database access do I get a different passing the id of the time. The following code example:
var val = [];
var checkbox = $('input:checkbox[class^=checkReunioes]:checked');
if(checkbox.length > 0){
checkbox.each(function(){
val.push($(this).val());
});
}
for(var i = 0; i < val.length; i++){
var transaction = db.transaction(["tbl_REUNIOES"], "readwrite").objectStore("tbl_REUNIOES");
var request = transaction.get(val[i]);
request.onerror = function(event) {
alert("BAD");
};
request.onsuccess = function(event) {
var data = request.result;
data.FLG_STATU_REUNI = 'I';
var codigo_igreja = localStorage.getItem("igreja");
var dataJSON = JSON.stringify(data);
enviarFilaSincronismo("tbl_REUNIOES", "U", dataJSON, " WHERE COD_IDENT_REUNI = '" + val[i] + "' and COD_IDENT_IGREJ = '" + codigo_igreja + "'");
var requestUpdate = transaction.put(data);
requestUpdate.onerror = function(event) {
alert("OK");
};
requestUpdate.onsuccess = function(event) {
$("#listReunioes").html("");
serchAll(w_key_celula);
};
};
}
In my view the problem is occurring due to be a bank indexeddb asynchronous, it passes to the next search, even before the first stop.
But how can I do to confer this ?
What is the good practice for something in this case ?.
If you are inexperienced with writing asynchronous code, a good general rule to consider is to never define functions inside loops. Do not set request.onsuccess to a function from within the for loop.
You can perform multiple get and put requests on the same transaction when you do not expect the individual requests to fail for data-related reasons, such as the violation of a uniqueness constraint of an index, or because you are performing many thousands of requests on the same transaction and reaching processing limits.
You might find that using IDBObjectStore.prototype.openCursor together with IDBCursor.prototype.update is more convenient than using IDBObjectStore.prototype.get and IDBObjectStore.prototype.put.
Your example code indicates that a successful get request means that data was retrieved, when in fact, this is not what actually happens. A successful get request just means that a request occurred without errors (e.g. against an object store that exists, against a database that is not blocked by other requests, against a database connection that is still valid). It does not mean that an object matched your get request query. You should be checking for whether the request's result object is defined, and use that check as a determination of whether an object matched your get query, and not simply that a successful request occurred.
You might want to spend more time organizing your code into smaller functions that use clearer names. Your example code is difficult to read.
It looks like you are using some type of global db variable. If you are not well experienced with writing asynchronous code, avoid using a global db variable. There is no guarantee the db variable will be defined and open when you decide to access it, which could lead to an unexpected error.

Resources