Azure Durable Fan-out/Fan-In Scenario with Cosmos Bulk Feature - azure-cosmosdb

I have created Azure Durable Functions to read a large volume of data from Azure CosmosDB. It consists of performing some Actions and storing the result back in a new Container of CosmosDB. I'm following Fan-Out/Fan-In Approach as shown below.
HTTP request (HttpTrigger Function) will be triggered by the client
via the Function URL and In turn, it calls an Orchestration
(OrchestrationTrigger Function)
The Orchestration calls an Activity Function (GetStoresToBeProcessed) to get all the Distinct Stores to be Processed.
The Orchestration makes multiple parallel function calls to Activity Function(DirectToStoreWac) to process all the transactional data of each Store
Waits for all Calculations to complete
This approach is working fine. I need suggestions for the below points:
Activity Function (GetStoresToBeProcessed) returns 3000+ stores back to Orchestration Function and from there 3000+ tasks will be created and awaited. Is there any way I can batch 3000+ stores? Instead of sending 3000+ tasks in one go here.
// Get all the Stores to be Processed.
int[] stores = await context.CallActivityAsync<int[ (FuncConstants.GetStoresToBeProcessed, null);
var parallelTasks = new List<Task<int>>();
foreach (var store in stores)
{
Task<int> task = context.CallActivityAsync<int>(FuncConstants.DirectToStoreWac, store);
parallelTasks.Add(task);
}
//wait for all results to come back
await Task.WhenAll(parallelTasks);
Each Activity Function (GetStoresToBeProcessed) will operate on Single Store's data (approximately 80-90K records) and internally the _calculationService.Calculate(storeId) functioncalls the Cosmos Bulk Feature await _wacDirectToStoreCosmosRepository.AddBulkAsync(list)
Here, is it correct approach to call Task.Await(cosmos bulk tasks of a store) inside Task.Await(3000 stores call)
[FunctionName(FuncConstants.DirectToStoreWac)]
public async Task<int> RunAsync([ActivityTrigger] int storeId, ILogger log)
{
log.LogInformation($"The Store ({storeId}) has been Queued for Processing at {DateTime.UtcNow}");
int totalRecordsProcessed = 0;
await _calculationService.Calculate(storeId).ContinueWith(t =>
{
if (t.IsCompletedSuccessfully)
{
totalRecordsProcessed += t.Result;
}
else
{
log.LogError(t.Exception, $"[{FuncConstants.DirectToStoreWac}] - Error occured while processing Function");
}
});
return totalRecordsProcessed;
}

Related

Waiting for multiple external events in Durable Azure Functions and act on them as they occur

I'm using Durable Functions in a process to collect data from 12 different people. For this I chose the WaitForExternalEvent method. I need to get notified of those external events as they happen. All events must be received in the next 1h.
I have created the following orchestration. The behaviour is odd however. The orchestration neither completes, nor fails. I am using Durable Functions Monitor (dfMon) to inspect the logs. As you can see in the execution history all 12 events are in fact received (before the 1h timeout). Still the orchestrator:
didn't execute the Fxbm_notifyOfNewReport activity function after each received event
also didn't exit the while loop after all 12 events
Also, more than 1h has elapsed and every timeout timer has fired. Still, no exception was thrown. The orchestration is still in a running state.
I took inspiration for this from this doc and this blog.
Does someone know why I am not seeing the expected behaviour?
public class Fxbm_workflow
{
[FunctionName(nameof(Fxbm_workflow))]
public async Task Run([OrchestrationTrigger] IDurableOrchestrationContext ctx, ILogger log)
{
var id = ctx.InstanceId;
var trigger = ctx.GetInput<Trigger<OrchestrationInput2>>();
// sub-orchestration to distribute data to people
// return value is int[], these are the ids of the people
var input = (trigger, id);
var childWorkflowId = $"{id}_{nameof(Fxbm_prep_workflow)}";
var requiredCompanies = await ctx.CallSubOrchestratorAsync<int[]>(nameof(Fxbm_prep_workflow), instanceId: childWorkflowId, input: input);
// to every distributed data package, a string response is expected in 1h at the latest
var expiresIn = TimeSpan.FromHours(1);
var responseTasks = requiredCompanies.Select(id => ctx.WaitForExternalEvent<string>(id.ToString(), expiresIn)).ToList();
// all tasks need to complete or a timeout exception must be thrown
// I want to get notified of responses as they come in
// therefore Task.WhenAll() is not suitable
while (responseTasks.Any())
{
try
{
// responses as they occur
var receivedResponse = await Task.WhenAny(responseTasks);
responseTasks.Remove(receivedResponse);
var stringResponse = await receivedResponse;
// notify via mail
await ctx.CallActivityAsync(nameof(Fxbm_notifyOfNewReport), stringResponse);
}
catch (TimeoutException)
{
// break;
throw;
}
}
}
}

Consecutive transactions

I was trying to simulate a situation where two users (on seperate devices) both run a Transaction at the same time. To imitate this, I made a List<String> of strings which would be added to the database without a delay between them.
However, only the first item in the List was added to the database, the second never arrived. What am I doing wrong? I am trying to have both items added to the database.
The call to the Transaction happens in the code below, along with the creation of the list:
List<String> items = new List<String>();
items.add("A test String 1");
items.add("A test String 2");
for (String q in questions)
{
database.updateDoc( q );
}
The code I use for updating the data in my database:
void updateDoc( String item ) async
{
var data = new Map<String, dynamic>();
data['item'] = item;
Firestore.instance.runTransaction((Transaction transaction) async {
/// Generate a unique ID
String uniqueID = await _generateUniqueQuestionCode();
/// Update existing list
DocumentReference docRef = Firestore.instance
.collection("questions")
.document("questionList");
List<String> questions;
await transaction.get(docRef)
.then (
(document) {
/// Convert List<dynamic> to List<String>
List<dynamic> existing = document.data['questions'];
questions = existing.cast<String>().toList();
}
);
if ( ! questions.contains(uniqueID) )
{
questions.add( uniqueID );
var newData = new Map<String, dynamic>();
newData['questions'] = questions;
transaction.set(docRef, newData );
}
/// Save question
docRef = Firestore.instance
.collection("questions")
.document( uniqueID );
transaction.set(docRef, data);
});
}
In reality, I have a few fields in the document I'm saving but they would only complicate the code.
I keep track of a list of documents because I need to be able to retreive a random document from the database.
When executing the first code snippet, only the first item in the list will be added to the database and to the list that keeps track of the documents.
No error is thrown in the debug screen.
What am I missing here?
As explained by Doug Stevenson in the comments under my question:
That's not a typical use case for a single app instance. If you're
trying to find out if transactions work, be assured that they do.
They're meant to defend against cases where multiple apps or processes
are making changes to a document, not a single app instance.
And also:
The way the Firestore SDK works is that it keeps a single connection
open and pipelines each request through that connection. When there
are multiple clients, you have multiple connection, and each request
can hit the service at a different time. I'd suspect that what you're
trying to simulate isn't really close to the real thing.

Change feed not trigger with multiple insert document in Cosmos DB

I replicated the architecture below.
I insert the gps positions (document db) in cosmos db and in the javascript client (maps google) the pin moves.
All the step works: insert document db, trigger azure function and signalr that link client and document db in azure cosmos db.
The code to upload a document db in Cosmos:
Microsoft.Azure.Documents.Document doc = client.CreateDocumentAsync(UriFactory.CreateDocumentCollectionUri(databaseName, collectionName), estimatedPathDocument).Result.Resource;
ret[0] = doc.Id;
Azure function:
public static async Task Run(IReadOnlyList<Document> input, IAsyncCollector<SignalRMessage> signalRMessages, ILogger log)
{
if (input != null && input.Count > 0)
{
var val = input.Select((d) => new
{
genKey = d.GetPropertyValue<string>("genKey"),
dataType = d.GetPropertyValue<string>("dataType")
});
await signalRMessages.AddAsync(new SignalRMessage
{
UserId = val.First().genKey,
Target = "tripUpdated",
Arguments = new[] { input }
});
}
}
When I insert only one position the function in azure records the event and fire by moving the pin.
The problem is when I insert sequentially a series of positions in an almost instantaneous way and this does not trigger the function for the document following the first one.
Only if i insert a delay only some documents fire the trigger:
Microsoft.Azure.Documents.Document doc = client.CreateDocumentAsync(UriFactory.CreateDocumentCollectionUri(databaseName, collectionName), estimatedPathDocument).Result.Resource;
Thread.Sleep(3000);
ret[0] = doc.Id;
I don't know if I load the documents correctly, but even managing them in an asynchronous way (see under), it almost seems like the trigger is triggered only when the document in cosmos db is "really / physically" created.
Task.Run(async () => await AzureCosmosDB_class.MyDocumentAzureCosmosDB.CreateRealCoordDocumentIfNotExists_v1("axylog-cdb-01", "axylog-collection-01", realCoord, uri, key));
The solution can be to list the documents in a queue and load them on azure cosmos sequentially after a delay of about ten seconds one from the other?

Best practice for long running SQL queries in ASP.Net MVC

I have an action method which needs to complete 15~52 long running SQL queries (all of them are similar, each takes more than 5 seconds to complete) according to user-selected dates.
After doing a lot of research, it seems the best way to do this without blocking the ASP.Net thread is to use async/await task methods With SQL Queries:
[HttpPost]
public async Task<JsonResult> Action() {
// initialization stuff
// create tasks to run async SQL queries
ConcurrentBag<Tuple<DateTime, List<long>>> weeklyObsIdBag =
new ConcurrentBag<Tuple<DateTime, List<long>>>();
Task[] taskList = new Task[reportDates.Count()];
int idx = 0;
foreach (var reportDate in reportDates) { //15 <= reportDates.Count() <= 52
var task = Task.Run(async () => {
using (var sioDbContext = new SioDbContext()) {
var historyEntryQueryable = sioDbContext.HistoryEntries
.AsNoTracking()
.AsQueryable<HistoryEntry>();
var obsIdList = await getObsIdListAsync(
historyEntryQueryable,
reportDate
);
weeklyObsIdBag.Add(new Tuple<DateTime,List<long>>(reportDate, obsIdList));
}
});
taskList[idx++] = task;
}
//await for all the tasks to complete
await Task.WhenAll(taskList);
// consume the results from long running SQL queries,
// which is stored in weeklyObsIdBag
}
private async Task<List<long>> getObsIdListAsync(
IQueryable<HistoryEntry> historyEntryQueryable,
DateTime reportDate
) {
//apply reportDate condition to historyEntryQueryable
//run async query
List<long> obsIdList = await historyEntryQueryable.Select(he => he.ObjectId)
.Distinct()
.ToListAsync()
.ConfigureAwait(false);
return obsIdList;
}
After making this change, the time taken to complete this action is greatly reduced since now I am able to execute multiple (15~52) async SQL queries simultaneously and await for them to complete rather than running them sequentially. However, users start to experience lots of time out issues, such as :
(from Elmah error log)
"Timeout expired. The timeout period elapsed prior to obtaining a connection from the pool.
This may have occurred because all pooled connections were in use and max pool size was
reached."
"The wait operation timed out"
Is it caused by thread starvation? I got a feeling that I might be using too many threads from thread pool to achieve what I want, but I thought it shouldn't be a problem because I used async/await to prevent all the threads from being blocked.
If things won't work this way, then what's the best practice to execute multiple long running SQL queries?
Consider limiting the number of concurrent tasks being executed, for example:
int concurrentTasksLimit = 5;
List<Task> taskList = new List<Task>();
foreach (var reportDate in reportDates) { //15 <= reportDates.Count() <= 52
var task = Task.Run(async () => {
using (var sioDbContext = new SioDbContext()) {
var historyEntryQueryable = sioDbContext.HistoryEntries
.AsNoTracking()
.AsQueryable<HistoryEntry>();
var obsIdList = await getObsIdListAsync(
historyEntryQueryable,
reportDate
);
weeklyObsIdBag.Add(new Tuple<DateTime,List<long>>(reportDate, obsIdList));
}
});
taskList.Add(task);
if (concurrentTasksLimit == taskList.Count)
{
await Task.WhenAll(taskList);
// before clearing the list, you should get the results and store in memory (e.g another list) for later usage...
taskList.Clear();
}
}
//await for all the remaining tasks to complete
if (taskList.Any())
await Task.WhenAll(taskList);
Take note I changed your taskList to an actual List<Task>, it just seems easier to use it, since we need to add/remove tasks from the list.
Also, you should get the results before clearing the taskList, since you are going to use them later.

Scatter / Gather using Rebus

I have a requirement to a batch a number of web service calls on the receipt of a single message appearing in a (MSMQ) queue.
Is "sagas" the way to go?
The interaction with the 3rd party web service is further complicated because I need to call it once, then poll subsequently for an acknowledgement with a correlation id, returned in the reply to the initial call to the web service.
Yes, sagas could be the way to coordinate the process of initiating the call, polling until the operation has ended, and then doing something else when all the work is done.
If you don't care too much about accidentally making the web service call more than once, you can easily use Rebus' async capabilities to implement the polling - I am currently in the process of building something that basically does this:
public async Task Handle(SomeMessage message)
{
var response = await _client.Get<SomeResponse>("https://someurl") ;
var pollUrl = response.PollUrl;
var resultUrl = response.ResultUrl;
while(true)
{
var result = await _client.Get<PollResult>(pollUrl);
if (result.Status == PollStatus.Processing)
{
await Task.Delay(TimeSpan.FromSeconds(2));
continue;
}
if (result.Status == PollStatus.Done)
{
var finalResult = await _client.Get<FinalResult>(resultUrl);
return new SomeReply(finalResult);;
}
throw new Exception($"Unexpected status while polling {pollUrl}: {result.Status}")
}
}
thus taking advantage of async/await to poll the external webservice while it is processing, while consuming minimal resources in our end.

Resources