I have a few microservices (AWS Lambda) which when started will authenticate with Firebase, change state in an appropriate manner, and then stop execution.
I have found out tonight though that I must not be cleaning up after myself as I have 100 concurrent users and it's only me doing some testing work (albeit testing authentication).
I've seen references to the goOffline() method and possibly could use that but it seems a little hacky. Are there any good examples of how one can ensure that you're closing and/or reusing an existing connection?
When spinning up server-side processes that use Firebase it's vital that you make sure the process is completed when killed.
If the process is killed, then the connection will get killed as well. But, there's nothing hacky about calling Firebase.goOffline(), as it will close the connection.
Related
I have a busy ASP.NET 5 Core app (thousands of requests per second) that uses SQL Server. Recently we decided to try to switch some hot code paths to async database access and... the app didn't even start. I get this error:
The timeout period elapsed prior to obtaining a connection from the
pool. This may have occurred because all pooled connections were in
use and max pool size was reached.
And I see the number of threads in the thread pool growing to 40... 50... 100...
The code pattern we use is fairly simple:
using (var cn = new SqlConnection(connenctionStrng))
{
cn.Open();
var data = await cn.QueryAsync("SELECT x FROM Stuff WHERE id=#id"); //QueryAsync is from Dapper
}
I made a process dump and all threads are stuck on the cn.Open() line, just sitting there and waiting.
This mostly happens during application "recycles" on IIS, when the app process is restarted and HTTP requests are queued from one process to another. Resulting in tens of thousands requests in the queue, that need to be processed.
Well, yeah,, I get it. I think I know what's happening. async makes the app scale more. And while the database is busy responding to my query, the control is returned to other threads. Which try to open more, and more, and more connections in parallel. The connection pool maxes out. But why the closed connections are not returned to the pool immediately after the work is finished?
Switching from async to "traditional" code fixes the problem immediately.
What are my options?
Increasing max pool size from the default 100? Tried 200, didn't help. Should I try, like, 10000?
Using OpenAsync instead of Open? Didn't help.
I thought I'm running into this problem https://github.com/dotnet/SqlClient/issues/18 but nope, I'm on a newer version of SqlClient and it's said to be fixed. Supposedly.
Not use async with database access at all? Huh...
Do we really have to come up with our own throttling mechanisms when using async like this answer suggests? I'm surprised there's no built-in workaround...
P.S. Taking a closer look at the process dump - I checked the Tasks report and discovered literally tens of thousands blocked tasks in the waiting state. And there's exactly 200 db-querying tasks (which is the size of connection pool) waiting for queries to finish.
Well, after a bit of digging, investigating source codes and tons of reading, it appears that async is not always a good idea for DB calls.
As Stephen Cleary (the god of async who wrote many books about it) has nailed it - and it really clicked with me:
If your backend is a single SQL server database, and every single request
hits that database, then there isn't a benefit from making
your web service asynchronous.
So, yes, async helps you free up some threads, but the first thing these threads do is rush back to the database.
Also this:
The old-style common scenario was client <-> API <-> DB, and in that
architecture there's no need for asynchronous DB access
However if your database is a cluster or a cloud or some other "autoscaling" thing - than yes, async database access makes a lot of sense
Here's also an old archive.org article by RIck Anderson that I found useful: https://web.archive.org/web/20140212064150/http://blogs.msdn.com/b/rickandy/archive/2009/11/14/should-my-database-calls-be-asynchronous.aspx
I am testing a feature which requires a Firebase database write to happen at midnight everyday. Now it is possible that at this particular time, the client app might not be connected to the internet.
I have been using Firebase with persistence off as that can potentially cause issues of stale data in another feature of mine.
From my observation, if I disconnect the app before the write and keep it this way for a minute or so, Firebase eventually reconnects when I turn on the connectivity again and performs the write.
My main questions are:
Will this behaviour be consistent even if the connectivity is lost for quite a few hours?
Will Firebase timeout?
Since it is inside a forever running service, does it still need persistence to ensure that writes are not lost? (assume that the service does not restart).
If the service does restart, will the writes get lost?
I have some experience with this exact case, and I actually do NOT recommend the use of a background service for managing your Firebase requests. In fact, I wouldn't recommend managing Firebase requests at all (explained later).
Services, even though we can make them run forever, tend to get killed by the system quite a lot actually (unless you set their CPU priority to a higher level, but even then the system still might kill them).
If you call a Firebase Write call (of any kind), and your service gets killed, the write will get lost as you said. Unless, you create a sophisticated manager in which you store requests that haven't been committed into your internal storage, and load them up each time the service is restarted - but that is a very dirty work to do, considering the fact that Firebase Developers took care of us and made .setPersistenceEnabled(true) :)
I know, you mentioned you don't want to use it, but I STRONGLY advise you to do so. It works like charm, no services required, and you don't have to worry at all about managing your write requests. Perhaps it would be better to solve the other issue you have in order to make this possible.
To sum up, here's what I would do in your case:
I would call the .setPersistenceEnabled(true) someplace at the beginning (extending the Application class and calling it from onCreate() is recommended)
I would use Android's AlarmManager and register a BroadcastReceiver to receive an alarm at midnight (repetitive or not - you decide)
Inside the BroadcastReceiver, I'd simply call a write function of Firebase and worry about nothing :)
To make sure I covered all of your questions:
will this behaviour be consistent....
No. Case-scenario: Midnight time, your service has successfully received the call and is now trying to write into Firebase. If, for example, the user has no connection until 6 AM (just a case scenario), there is a very high chance that the system will kill it during those 6 hours, and your write will get lost. Flight Time, or staying in an area with no internet coverage - both are examples of risky scenarios that could break your app's consistency
Will Firebase Timeout?
It definitely could, as mentioned. I wouldn't take the risk and make a 80-90% working app. Use persistence and have a 100% working app :)
I believe I covered the rest of the questions..
Good luck!
I have a long running process in my MVC application(C#).
Its building data for many reports.
Some of the clients could take several minutes or longer to calculate. Is running the process in a separate thread the best option?
Is there another way to allow the process to run, while allowing the user to still use the rest of the site?
If threading is the best solution, any good sites or stackoverflow threads to look at on how to do this?
When I've had cases like those, I usually would build a service to asynchronously process requests, and return a handle that I could use to check on its status in a database. IMHO, splitting it off as a thread in the web application seems like you'd be trying to shove a square peg into a round hole.
I've used two methods to solve this. If the work is guaranteed to not be TOO long running, I've kicked off a thread to do the work and return immediately to the user. When we couldn't make that guarantee, we used a queue (we happened to use MSMQ) for executing long running tasks. This processing was done on a different server apart from IIS. A benefit of this is that we built in a wait and retry on failure mechanism. So besides it handling long running tasks, we also used it for anything that might fail in a way that was inconvenient to handle in our MVC app. The main example of this is sending an email. Rather than do that in the MVC app we would just toss an email task on the queue. We used the Command Pattern for the task objects placed on the queue. Once we had that mechanism in place, we stopped using the technique of spawning a thread from our MVC code.
I know there's a bunch of APIs out there that do this, but I also know that the hosting environment (being ASP.NET) puts restrictions on what you can reliably do in a separate thread.
I could be completely wrong, so please correct me if I am, this is however what I think I know.
A request typically timeouts after 120 seconds (this is configurable) but eventually the ASP.NET runtime will kill a request that's taking too long to complete.
The hosting environment, typically IIS, employs process recycling and can at any point decide to recycle your app. When this happens all threads are aborted and the app restarts. I'm however not sure how aggressive it is, it would be kind of stupid to assume that it would abort a normal ongoing HTTP request but I would expect it to abort a thread because it doesn't know anything about the unit of work of a thread.
If you had to create a programming model that easily and reliably and theoretically put a long running task, that would have to run for days, how would you accomplish this from within an ASP.NET application?
The following are my thoughts on the issue:
I've been thinking a long the line of hosting a WCF service in a win32 service. And talk to the service through WCF. This is however not very practical, because the only reason I would choose to do so, is to send tasks (units of work) from several different web apps. I'd then eventually ask the service for status updates and act accordingly. My biggest concern with this is that it would NOT be a particular great experience if I had to deploy every task to the service for it to be able to execute some instructions. There's also this issue of input, how would I feed this service with data if I had a large data set and needed to chew through it?
What I typically do right now is this
SELECT TOP 10 *
FROM WorkItem WITH (ROWLOCK, UPDLOCK, READPAST)
WHERE WorkCompleted IS NULL
It allows me to use a SQL Server database as a work queue and periodically poll the database with this query for work. If the work item completed with success, I mark it as done and proceed until there's nothing more to do. What I don't like is that I could theoretically be interrupted at any point and if I'm in-between success and marking it as done, I could end up processing the same work item twice. I might be a bit paranoid and this might be all fine but as I understand it there's no guarantee that that won't happen...
I know there's been similar questions on SO before but non really answers with a definitive answer. This is a really common thing, yet the ASP.NET hosting environment is ill equipped to handle long-running work.
Please share your thoughts.
Have a look at NServiceBus
NServiceBus is an open source
communications framework for .NET with
build in support for publish/subscribe
and long-running processes.
It is a technology build upon MSMQ, which means that your messages don't get lost since they are persisted to disk. Nevertheless the Framework has an impressive performance and an intuitive API.
John,
I agree that ASP.NET is not suitable for Async tasks as you have described them, nor should it be. It is designed as a web hosting platform, not a back of house processor.
We have had similar situations in the past and we have used a solution similar to what you have described. In summary, keep your WCF service under ASP.NET, use a "Queue" table with a Windows Service as the "QueueProcessor". The client should poll to see if work is done (or use messaging to notify the client).
We used a table that contained the process and it's information (eg InvoicingRun). On that table was a status (Pending, Running, Completed, Failed). The client would submit a new InvoicingRun with a status of Pending. A Windows service (the processor) would poll the database to get any runs that in the pending stage (you could also use SQL Notification so you don't need to poll. If a pending run was found, it would move it to running, do the processing and then move it to completed/failed.
In the case where the process failed fatally (eg DB down, process killed), the run would be left in a running state, and human intervention was required. If the process failed in an non-fatal state (exception, error), the process would be moved to failed, and you can choose to retry or have human intervantion.
If there were multiple processors, the first one to move it to a running state got that job. You can use this method to prevent the job being run twice. Alternate is to do the select then update to running under a transaction. Make sure either of these outside a transaction larger transaction. Sample (rough) SQL:
UPDATE InvoicingRun
SET Status = 2 -- Running
WHERE ID = 1
AND Status = 1 -- Pending
IF ##RowCount = 0
SELECT Cast(0 as bit)
ELSE
SELECT Cast(1 as bit)
Rob
Use a simple background tasks / jobs framework like Hangfire and apply these best practice principals to the design of the rest of your solution:
Keep all actions as small as possible; to achieve this, you should-
Divide long running jobs into batches and queue them (in a Hangfire queue or on a bus of another sort)
Make sure your small jobs (batched parts of long jobs) are idempotent (have all the context they need to run in any order). This way you don't have to use a quete which maintains a sequence; because then you can
Parallelise the execution of jobs in your queue depending on how many nodes you have in your web server farm. You can even control how much load this subjects your farm to (as a trade off to servicing web requests). This ensures that you complete the whole job (all batches) as fast and as efficiently as possible, while not compromising your cluster from servicing web clients.
Have thought about the use the Workflow Foundation instead of your custom implementation? It also allows you to persist states. Tasks could be defined as workflows in this case.
Just some thoughts...
Michael
What kind of multi-threading issues do you have to be careful for in asp.net?
It's risky to spawn threads from the code-behind of an ASP.NET page, because the worker process will get recycled occasionally and your thread will die.
If you need to kick off long-running processes as a result of user actions on web pages, your best bet is to drop a message off in MSMQ and have a separate background service monitoring the queue. The service could take as long as it wants to accomplish the task, and the web page would be finished with its work almost immediately. You could accomplish the same thing with an asynch call to a web method, but don't rely on getting the response when the web method is finished working. From code-behind, it needs to be a quick fire-and-forget.
One thing to watch out for at things that expire (I think httpContext does), if you are using it for operations that are "fire and forget" remember that all of a sudden if the asp.net cleanup code runs before your operation is done, you won't be able to access certain information.
If this is for a web service, you should definitely consider thread pooling. Too many threads will bring your application to a grinding halt because they will eventually start competing for CPU time.
Is this for file or network IO? If so, you should also consider using asynchronous IO. It can be a bit more of a pain to program, but you don't have to worry about spawning off too many threads at once.
Programmatic Caching is one area which immediately comes to my mind. It is a great feature which needs to be used carefully. Since it is shared across requests, you have to put locks around it before updating it.
Another place I would check is any code accessing filesystem like writing to log files. If one request has a read-write lock on a file, other concurrent requests will error out if not handled properly.
Isn't there a Limit of 25 Total Threads in the IIS Configuration? At least in IIS 6 i believe. If you exceed that limit, interesting things (read: loooooooong response times) may happen.
Depending on what you need, as far as multi threading is concerned, have you thought of spawning requests from the client. It's safe to spawn requests using AJAX, and then act on the results in a callback. Or use a service as a backgrounding mechanism, which runs every X minutes and processes in the background that way.