Workflow Service stops responding after 464 messages - workflow-foundation-4

I am having a peculiar issue while executing workflows.
I have tried everything I could think of and now need ideas.
Here is my configuration:
1. A WF4 Workflow Service (xamlx) hosted in IIS 7 and uses net.msmq/netMsmqBinding for transport (MSMQ is transactional).
2. No Workflow Persistence is used.
3. I use a console app client to send messages to the workflow (each message creates new workflow).
4. Each workflow looks like: Wait for START message -> Wait for END message (I only send START messages).
If I send 500 messages - 464 get processed correctly, but above that all messages go to the lock_* queue and then move to poison queue.
I have inspected Debug, Analytic event logs, as well as messages and trace svclogs
Here is most detailed message I get:
System.TimeoutException, mscorlib,
Version=4.0.0.0, Culture=neutral,
PublicKeyToken=b77a5c561934e089The operation
did not complete within the allotted timeout of 00:00:30. The time
allotted to this operation may have been a portion of a longer
timeout. at
System.Runtime.AsyncResult.End[TAsyncResult](IAsyncResult result) at
System.ServiceModel.Activities.Dispatcher.PersistenceProviderDirectory.LoadOrCreateAsyncResult.HandleReserveThrottle(IAsyncResult
result) at
System.Runtime.AsyncResult.AsyncCompletionWrapperCallback(IAsyncResult
result)System.TimeoutException: The
operation did not complete within the allotted timeout of 00:00:30.
The time allotted to this operation may have been a portion of a
longer timeout. at
System.Runtime.AsyncResult.End[TAsyncResult](IAsyncResult result) at
System.ServiceModel.Activities.Dispatcher.PersistenceProviderDirectory.LoadOrCreateAsyncResult.HandleReserveThrottle(IAsyncResult
result) at
System.Runtime.AsyncResult.AsyncCompletionWrapperCallback(IAsyncResult
result)
at that point request to http://localhost/MyWebService?wsdl also fails with 404.
If I restart IIS - everything goes back to normal until 464 messages are sent.
Where can I find more detailed log? (I already have System.Diagnostics set to max verbosity)
Is number 464 magic at all?
What can be causing this web service locking up?

It sounds like you are running into the throttling limits, these settings apply to WF4 as much as they do to WCF. The maxConcurrentInstances setting sets the maximum number of workflow instances that can be in memory at a given time.
<behaviors>
<serviceBehaviors>
<behavior name="WorkflowServiceBehavior">
<!-- Specify throttling behavior -->
<serviceThrottling maxConcurrentInstances="1000"/>
</behavior>
</serviceBehaviors>
</behaviors>
Just as an aside you should always use persistence when hosting in IIS. Sooner or later IIS is going to restart the AppDomain and if the WorkflowServicehost can't save the state of the workflow instances to disk they will be lost. It will also mean that idle workflow instances can be removed from memory and don't count against the maxConcurrentInstances which is an in memory restriction.

Related

The request queue limit of the session is exceeded

I have this error in ASP.NET application , NET 4.7.1.
The request queue limit of the session is exceeded.
Full:
System.Web.HttpException (0x80004005): The request queue limit of the session is exceeded.
at System.Web.SessionState.SessionStateModule.QueueRef()
at System.Web.SessionState.SessionStateModule.PollLockedSession()
at System.Web.SessionState.SessionStateModule.GetSessionStateItem()
at System.Web.SessionState.SessionStateModule.BeginAcquireState(Object source, EventArgs e, AsyncCallback cb, Object extraData)
any suggestions ?
The default behavior has changed in .NET 4.7. Retargeting guide suggests:
To restore the old behavior, you can add the following setting to your web.config file to opt-out of the new behavior.
<appSettings>
<add key="aspnet:RequestQueueLimitPerSession" value="2147483647"/>
</appSettings>
Clarification of changed behavior:
In the .NET Framework 4.6.2 and earlier, ASP.NET executes requests
with the same Sessionid sequentially and ASP.NET always issues the
Sessionid through cookies by default. If a page takes a long time to
respond, it will significantly degrade server performance just by
pressing F5 on the browser. In the fix, we added a counter to track
the queued requests and terminate the requests when they exceed a
specified limit. The default value is 50. If the limit is reached, a
warning will be logged in the event log, and an HTTP 500 response may
be recorded in the IIS log.
Also addressed here: https://knowledgebase.progress.com/articles/Article/The-request-queue-limit-of-the-session-is-exceeded-in-sitefinity-11-2
Some time this error is generated by to many redirects on server side, after investigation I detect that in the fact user is redirected to same action by ActionsFilter after I fixed this error has not occurred, I think if you investigate IIS logs you are more likely to find the same problem.
PS. For this case setting RequestQueueLimitPerSession will not solve the problem.
TO REPRODUCE: Open IE 11 open the specified path and press F5 for 60 sec. It will generate a lot of requests to this path and if we'll take a look to iis then we will find some requests with win-32 status = 64
Quite analyzing of IIS logs will give you a lot of information about nature of this requests/user agent/all accessed paths/request status/...
I was getting the same error in my MVC application (.NET version 4.7.2) on days with unusually high activity. I fixed it by adding the necessary table indexes in the application's database. In my case, the solution was not to adjust the "aspnet:RequestQueueLimitPerSession" setting but to address the underlying problem regarding database performance that caused the session requests to exceed the default limit.

Some requests on IIS hang for minutes and end in a lost connection

I have an awkward issue with IIS 10.0 on Windows Server 2016 and ASP.Net 4.5.2 and MVC 5.2.7.
At times, certain requests do not receive a response and run for minutes, maybe 10 or so, before ending in a lost connection (PR_CONNECT_RESET_ERROR in Firefox on Windows, NSURLDomainError in Firefox on iOS). These are mostly POST requests. When this issue occurs, other GET requests will receive a swift response and a correct result. Normally, POST-request do no take long to be processed, typically less than 3 seconds.
Recycling the associated worker process will make the issue go away, for hours or days.
When today inspected the web server when the issue was going on, I saw little CPU usage, less than 10%, memory 56%, the worker process a modest 615 MB. I saw neither logging in the W3C log of these requests, nor in my custom application logs.
I added the Web-Request-Monitor conform How do I see currently executing web request on IIS 8, but in doing so, the the worker process probably got recycled, as the issue is not currently occurring.
There are a reverse proxy and an access manager between the internet and my web server. I suppose they can have something to do with this issue, but it certainly is related to IIS, as recycling helps.
All of this is happening on a acceptation web server running a newer version of my application. I am not aware of any big changes to the application's architecture that could be involved. Also, there will be very little traffic from other clients, if none at all.
What could be next steps to investigate this issue further?
Update
This issue was definitely caused by log4net. However, it was not related to the log4net.Internal.Debug setting. It was caused by two application domains accessing the same log file. This occasionally resulted in concurrency issues with accessing the log file. It appeared that log4net could not properly handle this and got stuck while writing to the log file.
This log file was configured with the RollingFileAppender option. Since we also used AdoNetAppender, we decided to remove file logging all together.
Original
I have found a probable cause. I'll report the steps I took to investigate the issue.
I activated the Worker Processes feature in IIS.
When, after a couple of days of waiting, the issue started again, I found long running requests. They all had State ExecuteRequestHandler and Module Name ManagedPipelineHandler. They had Time Elapsed of hundreds of seconds.
I also activated the Failed Requests Tracing with a rule for long running requests with a Time Taken of 1 minute.
After a couple of days, I started to receive failed request reports. The failed request all have a GENERAL_SET_RESPONSE_HEADER event as their last event.
I added additional debug logging events for each requests. When debugging in my development environment, at one point, I started to see the hanging behaviour there, on one of the new logging statements(!). The application uses log4net.
I captured a stack trace:
log4net.dll!log4net.Appender.AppenderSkeleton.DoAppend(log4net.Core.LoggingEvent loggingEvent) log4net.dll!log4net.Util.AppenderAttachedImpl.AppendLoopOnAppenders(log4net.Core.LoggingEvent loggingEvent) log4net.dll!log4net.Repository.Hierarchy.Logger.CallAppenders(log4net.Core.LoggingEvent loggingEvent) log4net.dll!log4net.Repository.Hierarchy.Logger.Log(System.Type callerStackBoundaryDeclaringType, log4net.Core.Level level, object message, System.Exception exception) log4net.dll!log4net.Core.LogImpl.DebugFormat(string format, object arg0)
The DoAppend method uses lock(this), which may very well cause hangs.
I also found out that the config setting log4net.Internal.Debug was set to true, which I do not want under normal circumstances and this may be related. I did not attempt to understand the log4net code, but I remember that logging initially did not work, in the acceptance environment, so the setting may very well have been set to true then, causing the issue to start.
Another indication that this is happening with log4net is that when the issue last occurred, I realized that logging of level standard, only occurs in some POST requests. I found a POST-request that does not log and requests to it where handled normally, while the other POST-requests still hung.
For now, I have set log4net.Internal.Debug to false and will wait to see what happens.
IIS recycle fix this issue doesn't mean that this is an IIS issue because all asp.net application run in .net runtime unless it is proved that the request is hang in IIS module.
So you may need to wait this issue happen again, then create a Failed request tracing rule for time-taken. Then it will tell us this issue is happening on IIS pipeline module or .net runtime.
If all request hang in .net runtime. Then you may have to capture a hang dump and do a deep analysis via WINDGB and mex extension. It will tell us what's happening there.

Thread abort (Timeout) in ASHX handler

Background
I have a batch processing system that needs to send out messages (via SMS/Email) to groups of people.
As our message publihsing system is fairly slow, when the user hits the "send" button, the system posts all the message informtion into a database with a "batch ID" and then does an Asynchronise call (WebRequest.BeginGetRequest) to a "ProcesBatch" ASHX handler request, with the batch ID as a URL request parameter.
This releases the front end page back to the user to do the next batch of messages as the users dont actually need any feedback, however, the recording in the database is subsequently used in a reporting module.
In the mean time, the batch process handler simply cycles around the records from the database for the given batch ID and then posts the messages to our (slow) message publisher sequentially.
The Problem
The problem is that during the batch processing, asp.net is throwing a
System.Threading.ThreadAbortException: Thread was being aborted.
half way through and the remaining messages are not sent.
I have checked IIS and the recycle mins is set at the default 1740 so is there anything else that would cause this?
Or is there a more appropriate way to approach this.
Have you tried to increase executionTimeout under httpRuntime in web.config?
The default value is 90 or 110 seconds (depending .net version).
Perhaps your ashx requires more time to end its job
http://msdn.microsoft.com/en-us/library/vstudio/e1f13641(v=vs.100).aspx
Edit: in general it's not a good idea to set a very long executionTimeout. As other users suggested, consider to develop a Windows Service to do the long jobs.

IIS 7.5 Application Initialization for ASP.NET web service (warmup) without remapping requests

I'm trying to use the IIS 7.5 Application Initialization extension to configure a warmup process for my web application. This is an approach I am taking to minimize slow downs caused by application pool recycling, which is a problem explained well in other questions on Stack Overflow.
What I would like, is to gain the benefits of application initialization, without remapping requests anywhere else.
What I've done so far
I followed the IIS 8 instructions for the basic use case, and it works great! I created a splash page called app_starting.htm and by using this code, it gets displayed while the app initializes:
<applicationInitialization remapManagedRequestsTo="app_starting.htm" skipManagedModules="true" >
<add initializationPage="/" />
</applicationInitialization>
Why this isn't good
I want to use initialization to speed up requests to a REST-based web service written using ASP.NET MVC. This web service is a backend for several applications. When they make a request to a resource (i.e. /client/1/addresses), they can't handle receiving a splash page instead.
What I've tried
I removed the remapManagedRequestsTo attribute. However, now when I request a resource during initialization, I get a 500 error until initialization is completed. After which, responses go back to normal. The applications which rely on this this service also wouldn't respond well to a 500 error, since initialization should not be an error condition.
What I need
Without performing any remapping, I expect the request behavior to go back to normal. Even if initialization is in progress, other requests to the application should be queued and wait until after initialization has completed.
Is there something I am missing? Can I accomplish this?
Thanks for the help!
I think I answered my own question. I removed the skipManagedModules attribute and it worked. This code accomplishes application initialization, and during warmup, requests seem to wait for it to complete before being processed:
<applicationInitialization>
<add initializationPage="/" />
</applicationInitialization>
I couldn't find any documentation for why it behaves this way and don't really understand what skipManagedModules means. If anyone can further explain this, I can mark the explanation as an answer. Thanks!

Increase loading and concurrent users on WCF Service

I have an internal website which calls multiple calls to the WCF service which is hosted in IIS 7 Windows 2008 R2 server.
During the heavy loading period when it is being used by 50-75 users, it returns FaultException in WCF calls. I highly doubt that it is because of the user loading because we are using the same website for almost a year and we haven't got the same error before. Some of the calls might take 2 or 3 sec to execute.
So, I added the following lines in web.config file in the Host Service. Do I need to add anything anywhere? Do I need to do anything in the client website too? Thanks.
<serviceBehaviors>
<behavior>
<serviceThrottling
maxConcurrentCalls="100"
maxConcurrentSessions="100"
maxConcurrentInstances="100"
/>
</behavior>
</serviceBehaviors>
Before changing configurations arbitrarly I would look for the reason of why it fails.
Since your calls take 2/3s to return we can safely assume that you achieve more than 10 instances created (10 is the default) at a given time. When that happens the calls queue up and if your load don't drop so every call can be replied to you are going to have problems.
You can do that diagnose by looking at performance counters that WCF provides, specially at Instance counters related to ServiceModelService 4.0.0.0.
If that is the problem (looks like it could be) then you can take action and increase the number of instances. However, the ultimate fix should be decrease response times because otherwise you will just be postponing the problems that comes with scaling.

Resources