Using ASP.NET Core .NET 5. Running on Windows.
Users upload large workbooks that need to be converted to a different format. Each conversion process is CPU intensive and takes around a minute to complete.
The idea is to use a pattern where the requests are queued in a background queue and then processed by background tasks.
So, I followed this Microsoft article
The queuing part worked well but the issue was that workbooks were executing sequentially in the background:
private async Task BackgroundProcessing(CancellationToken stoppingToken)
{
while (!stoppingToken.IsCancellationRequested)
{
var workItem =
await TaskQueue.DequeueAsync(stoppingToken);
try
{
await workItem(stoppingToken);
}
catch (Exception ex)
{
_logger.LogError(ex,
"Error occurred executing {WorkItem}.", nameof(workItem));
}
}
}
If I queued 10 workbooks. Workbook 2 wouldn't start until workbook 1 is done. Workbook 3 wouldn't start until workbook 2 is done, etc.
So, I modified the code to run tasks without await and hid the warning with the discard operator (please note workItem is now Action, not Task):
while (!stoppingToken.IsCancellationRequested)
{
var workItem = await TaskQueue.DequeueAsync(stoppingToken);
_ = Task.Factory.StartNew(() =>
{
try
{
workItem(stoppingToken);
}
catch (Exception ex)
{
_logger.LogError(ex, "Error occurred executing {WorkItem}.", nameof(workItem));
}
}, TaskCreationOptions.LongRunning);
}
That works -- I get all workbooks starting processing around the same time, and then they complete around the same time too. But, I am not sure if doing this is dangerous and can lead to bugs, crashes, etc.
Is the second version a workable solution, or will it lead to some disaster in the future? Is there a better way to implement parallel workloads on the background threads in ASP.NET?
Thanks.
Using an external queue has some advantages over in-memory queueing. In particular, the queue message are stored in a reliable external store with features around retries, multiple consumers, etc. If your app crashes, the queue item remains and can be tried again.
In Azure, you can use several services including Azure Storage Queues and Service Bus. I like Service Bus because it uses push-based behavior to avoid the need for a polling loop in your code. Either way, you can create an instance of IHostedService that will watch the queue and process the work items in a separate thread with configurable parallelization.
Look for examples on using within ASP.NET Core, for example:
https://damienbod.com/2019/04/23/using-azure-service-bus-queues-with-asp-net-core-services/
The idea is to use a pattern where the requests are queued in a background queue and then processed by background tasks.
The proper solution for request-extrinsic code is to use a durable queue with a separate backend processor. Any in-memory solution will lose that work any time the application is shut down (e.g., during a rolling upgrade).
Related
According to
If async-await doesn't create any additional threads, then how does it make applications responsive?
a C# task, executed by await ... doesn't create a separate thread for the target Task. However, I observed, that such a task is executed not every time from the same thread, but can switch it's thread.
I still do not understand, what's going on.
public class TestProgram
{
private static async Task HandleClient(TcpClient clt)
{
using NetworkStream ns = clt.GetStream();
using StreamReader sr = new StreamReader(ns);
while (true)
{
string msg = await sr.ReadLineAsync();
Console.WriteLine($"Received in {System.Threading.Thread.CurrentThread.ManagedThreadId} :({msg.Length} bytes):\n{msg}");
}
}
private static async Task AcceptConnections(int port)
{
TcpListener listener = new TcpListener(IPAddress.Parse("127.0.0.1"), port);
listener.Start();
while(true)
{
var client = await listener.AcceptTcpClientAsync().ConfigureAwait(false);
Console.WriteLine($"Accepted connection for port {port}");
var task = HandleClient(client);
}
}
public async static Task Main(string[] args)
{
var task1=AcceptConnections(5000);
var task2=AcceptConnections(5001);
await Task.WhenAll(task1, task2).ConfigureAwait(false);
}
}
This example code creates two listeners for ports 5000 and 5001. Each of it can accept multiple connections and read independently from the socket created.
Maybe it is not "nice", but it works and I observed, that messages received from different sockets are sometimes handled in the same thread, and that the used thread for execution even changes.
Accepted connection for port 5000
Accepted connection for port 5000
Accepted connection for port 5001
Received new message in 5 :(17 bytes):
Port-5000 Message from socket-1
Received new message in 7 :(18 bytes):
Port-5000 Message from socket-1
Received new message in 7 :(18 bytes):
Port-5000 Message from socket-1
Received new message in 7 :(20 bytes):
Port-5000 Message from socket-2
Received new message in 7 :(18 bytes):
Port-5000 Message from socket-2
Received new message in 7 :(18 bytes):
Port-5001 Message from socket-3
Received new message in 8 :(17 bytes):
Port-5001 Message from socket-3
(texts manually edit for clarity, byte lengths are not valid)
If there is heavy load (I didn't test it yet), how many threads would be involved in order to execute those parallel tasks? I heard about a thread pool, but do not know, how to have some influence on it.
Or is it totally wrong asking that and I do not at all have to care about what particular thread is used and how many of them are involved?
a C# task, executed by await ... doesn't create a separate thread for the target Task.
One important correction: a task is not "executed" by await. Asynchronous tasks are already in-progress by the time they're returned. await is used by the consuming code to perform an "asynchronous wait"; i.e., pause the current method and resume it when that task has completed.
I observed, that such a task is executed not every time from the same thread, but can switch it's thread.
I observed, that messages received from different sockets are sometimes handled in the same thread, and that the used thread for execution even changes.
The task isn't "executed" anywhere. But the code in the async method does have to run, and it has to run on a thread. await captures a "context" when it pauses the method, and when the task completes it uses that context to resume executing the method. Console apps don't have a context, so the method resumes on any available thread pool thread.
If there is heavy load (I didn't test it yet), how many threads would be involved in order to execute those parallel tasks? I heard about a thread pool, but do not know, how to have some influence on it.
Or is it totally wrong asking that and I do not at all have to care about what particular thread is used and how many of them are involved?
You usually do not have to know; as long as your code isn't blocking thread pool threads you're generally fine. It's important to note that zero threads are being used while doing I/O, e.g., while listening/accepting a new TCP socket. There's no thread being blocked there. Thread pool threads are only borrowed when they're needed.
For the most part, you don't have to worry about it. But if you need to, the thread pool has several knobs for tweaking.
All,
I am using Change Feed Processor Library.Want to know the best way to handle service failure along with the exceptions/errors scenario's in ProcessChangesAsync method. Below are the events am referring to.
1) Service failure - Service having the processor library crashed in the middle of some operation. How to start the process from the same document(doc on failure instance)? is there any inbuilt mechanism where change feed will start with the last failed documents? E.g. Let assume,in current batch we have 10 docs.5 processed successfully and then service breaks because of network failure or by some other reasons.Will my process starts with 6th document once service is re-started? How to achieve this?
2) Exception and Errors- Any errors in ProcessChangesAsync method can be handle using try catch at the global level but how to persist those failure records and make them available for the next batch? Again,looking for any available inbuilt mechanism in change feed process.
1) The Processor Library, by default, checkpoints after a successful run of ProcessChangesAsync. In the latest library version, you can customize the Checkpointer to do manual checkpoints in case you need it. If for some reason the processor shuts down before checkpointing, then it will start processing next from the the last successful checkpoint stored in the Leases collection. In your case, it will start with the first document again, so you will never lose a change but you could experience double processing (this is an "at least once" model).
2) There is no built-in mechanism that you can leverage, handling exceptions within the ProcessChangesAsync is your responsibility. You could not only add a global try/catch but, in the case you are looping over the documents, add a try/catch inside the loop, to handle a failing document (maybe send it to queue for later analysis/post-process) without losing the batch. If you require logging for those errors (I'm assuming that's what you mean by persisting errors?), then the latest version is compatible with LibLog, so plugging your own custom logging is as simple as:
using Microsoft.Azure.Documents.ChangeFeedProcessor.Logging;
var hostName = "SampleHost";
var tracelogProvider = new TraceLogProvider(); //You can use any provider supported by LibLog
using (tracelogProvider.OpenNestedContext(hostName))
{
LogProvider.SetCurrentLogProvider(tracelogProvider);
// After this, create IChangeFeedProcessor instance and start/stop it.
}
Source
Extra info for the comments
To avoid exceptions halting the batch or causing a batch to be reprocessed, you can have handling like this:
public async Task ProcessChangesAsync(IChangeFeedObserverContext context, IReadOnlyList<Document> documents, CancellationToken cancellationToken)
{
try
{
foreach(var document in documents)
{
try
{
// Do your work for the document
}
catch(Exception ex)
{
// Something happened with the current document, handle it, send it to a queue / another storage to analyze, log it. This catch will make the loop continue with the next.
}
}
}
catch(Exception ex)
{
// Something unhandled happened, log it and avoid throwing it again so the next batch is processed
}
}
I am working on an asp.net mvc 5 web application , deployed inside IIS-8, and i have a method inside my application to perform a long running task which mainly scans our network for servers & VMs and update our database with the scan results. method execution might last between 30-40 minutes to complete on production environment. and i am using a schedule tool named Hangfire which will call this method 2 times a day.
here is the job definition inside the startup.cs file, which will call the method at 8:01 am & 8:01 pm:-
public void Configuration(IAppBuilder app)
{
var options = new SqlServerStorageOptions
{
PrepareSchemaIfNecessary = false
};
GlobalConfiguration.Configuration.UseSqlServerStorage("scanservice",options);
RecurringJob.AddOrUpdate(() => ss.Scan(), "01 8,20 ***");
}
and here is the method which is being called twice a day by the schedule tool:-
public void Scan()
{
Service ss = new Service();
ss.NetworkScan().Wait();
}
Finally the method which do the real scan is (i only provide a high level description of what the method will do):-
public async Task<ScanResult> NetworkScan()
{
// retrieve the server info from the DB
// loop over all servers & then execute some power shell commands to scan the network & retrieve the info for each server one by one...
// after the shell command completed for each server, i will update the related server info inside the DB
currently i did some tests on our test environment and every thing worked well ,, where the scan took around 25 seconds to scan 2 test servers.but now we are planning to move the application to production and we have around 120++ servers to scan. so i estimate the method execution to take around 30 -40 minutes to complete on the production environment. so my question is how i can make sure that this execution will never expire , and the ScanNetwork() method will complete till the end?
Instead of worrying about your task timing out, perhaps you could start a new task for each server. In this way each task will be very short lived, and any exceptions caused by scanning a single server will not effect all the others. Additionally, if your application is restarted in IIS any scans which were not yet completed will be resumed. With all scans happening in one sequential task this is not possible. You will likely also see the total time to complete a scan of your entire network plummet, as the majority of time would likely be spent waiting on remote servers.
public void Scan()
{
Service ss = new Service();
foreach (var server in ss.GetServers())
{
BackgroundJob.Enqueue<Service>(s => s.ServerScan(server));
}
}
Now your scheduled task will simply enqueue one new task for each server.
I have looked at the documentation for both synchronous and asynchronous approaches for the QuickBooks Online API V3. They both allow the creation of a data object and the adding of requests to a batch operation followed by the execution of the batch. In both the documentations they state:
"Batch items are executed sequentially in the order specified in the
request..."
This confuses me because I don't understand how asynchronous processing is allowed if the batch process executes each batch operation sequentially.
The documentation for asynchronous processing states at the top:
"To asynchronously access multiple data objects in a single request..."
I don't understand how this can occur if batch operations are executed sequentially within a batch process request.
Would someone kindly clarify.
In asyn call( from devkit ), calling thread doesn't wait for the response from service. You can associate a handler which will take care of that.
for Ex -
public void asyncAddAccount() throws FMSException, Exception {
Account accountIn = accountHelper.getBankAccountFields();
try {
service.addAsync(accountIn, new CallbackHandler() {
#Override
public void execute(CallbackMessage callbackMessage) {
callbackMessageResult = callbackMessage;
lock_add.countDown();
}
});
} catch (FMSException e) {
Assert.assertTrue(false, e.getMessage());
}
lock_add.await();
Account accountOut = (Account) callbackMessageResult.getEntity();
Assert.assertNotNull(accountOut);
accountHelper.verifyAccountFields(accountIn, accountOut);
}
Server always executes the requests sequentially.
In a batch, if you specify multiple operations, then server will execute it sequentially (top - down).
Thanks
I have a web service that can be broken down into two main sections:
[WebMethod]
MyServiceCall()
{
//Do stuff the client cares about
//Do stuff I care about
}
What I'd like to do is run that 2nd part on another thread, so that the client isn't waiting on it: once the user's logic has completed, send them their information immediately, but continue processing the stuff I care about (logging, etc).
From a web service, what is the recommended way of running that 2nd piece asynchronously, to get the user back their information as quickly as possible? BackgroundWorker? QueueUserWorkItem?
You may want to look into Tasks which are new to .NET 4.0.
It lets you kick off an asynchronous operation, but also gives you an easy way to see if it's done or not later.
var task = Task.Factory.StartNew(() => DoSomeWork());
It'll kick off DoSomeWork() and continue without waiting so you can continue doing your other processing. When you get to the point where you don't want to process anymore until your asynchronous task has finished, you can call:
task.Wait();
Which will wait there until the task has completed. If you want to get a result back from the task, you can do this:
var task = Task.Factory.StartNew(() =>
{
Thread.Sleep(3000);
return "dummy value";
});
Console.WriteLine(task.Result);
A call to task.Result blocks until the result is available.
Here's a tutorial that covers Tasks in greater detail: http://www.codethinked.com/net-40-and-systemthreadingtasks
The easiest way to fire off a new thread is probably:
new Thread(() =>
{
/// do whatever you want here
}).Start();
Be careful, though - if the service is hit very frequently, you could wind up creating a lot of threads that block each other.