We're currently having an issue with our Biztalk having too few concurrent orchestrations and is causing delay in delivering messages. The system has been running for years and the issue happened just recently. In a normal state it would have 20-40 concurrent orchestrations but currently it's been running only 4 or less at the same time.
The same configuration on a test server is working properly so at first we thought clearing the database would help, but unfortunately hasn't.
Any advise will be greatly appreciated.
First thing is to fire up Performance Monitor and see if the system is Throttling and if so, why. See: Host Throttling Performance Counters
Related
I have been having a lot of trouble to get Signalr to function reliably with messages inexplicably never arriving but being sent from the Hub almost as if the groups are being lost although the websocket connection is open on two clients which I am trying to communicate between.
I noticed now that I have IIS 8.5 set at 10 maximum worker processes for the site and they are all running.
Is this possibly a cause for the erratic behavior? Should I implement a backplane even if I have just one server but multiple processes?
Any help will be much appreciated. It's been weeks. :(
In the last 3 weeks, we have been experiencing a huge number of socket timeouts from the Microsoft Band Cloud API. When this issue started, a few retries were sufficient, however, at this point, it seems like it would be a never-ending retry loop.
We integrate with several other cloud API's and this is the only one that is experiencing this issue and is actually one of our lighter used API's, so I don't believe that it is load induced or our system.
Is anyone else experiencing this issue? Any recommendations? Thanks!
The service was having some issues with it's DNS failover components, but should now be back up and running normally. Please let me know if it continues to happen.
Last night one of the websites (.NET 4.0 forms) hosted on my Win 2008 R2 (IIS 7.5) Server started to time out throwing the following error for all connected users.
TYPE System.Web.HttpException
MESSAGE Request timed out.
DETAIL System.Web.HttpException (0x80004005): Request timed out.
The outage was confined to just one website within IIS, the others continued to work fine.
Unfortunately I was unable to identify why the website was timing out. Here are the steps I took:
First thing I did was look at the task manager which revealed normal CPU and memory usage. Network activity was also moderate.
I then opened IIS to look at the live connections under 'Worker Processes'. There were about 60 live connections, so it didn't look like anything DDoS related.
Checked database connectivity (hosted on a separate server), all fine!
I then reset the website on IIS. That didn't work
I tried to then do a complete iisreset...still no luck :(
In the end (and under some duress) the only thing I could think to do to resolve this was to restart the server.
Restarting the server worked but I am nervous not knowing why this happened in the first place. Can anyone recommend any checks that I failed to carryout? Is there an official checklist for working through these sorts of IIS problems? I have reviewed the IIS logs but don't see anything unusual on the run up to the outage.
Any pointers or links to useful resources to help me understand and mitigate against this in future will be much appreciated.
EDIT
The only time I logged into the server that day was to add an additional web handler component (for remote deploy) to IIS Web Deploy. I'm doubtful this caused the outage as the server worked for for 6 hours after.
Because iisreset didn't helped and you had to restart whole machine, I would suspect it was a global resources shortage and mostly used website (or most resource consuming) was impacted. It could be because of not available RAM, network connections congestion due to some malfunctioning calls (for example a lot of CLOSE_WAIT sockets exhausting connections pool, we've seen that in production because of malfunction of external service). It could be also one specific client problem, which was disconnected after machine restart so eventually the problem disappeared.
I would start from:
Historical analysis
review Event Viewer to see any errors/warnings from that period of time,
although you have already looked into IIS logs, I would do it once again with help of Log Parser Lizard to make some statistics like number of request per client, network bandwith per client, average response time per client and so on.
Monitoring
continuously monitor Performance Counters:
\Processor(_Total_)\% Processor Time,
\.NET CLR Exceptions(_Global_)\# of Exceps Thrown / sec,
\Memory\Available MBytes,
\Web Service(Default Web Site)\Current Connections (per each your site name),
\ASP.NET v4.0.30319\Request Wait Time,
\ASP.NET v4.0.30319\Requests Current,
\ASP.NET v4.0.30319\Request Queued,
\Process(XXX)\Working Set,
\Process(XXX)\% Processor Time (XXX per each w3wp process),
\Network Interface(XXX)\Bytes total / sec
run Performance Analysis of Logs (PAL) Tool during time of failure to make a very detailed analysis of performance counters data,
run netstat -ano to analyze network traffic (or TCPView tool even better)
If all this will not lead you to any conclusion, create a Debug Diagnostic rule to create a memory dump of the process for long running requests and analyze it with WinDbg and PSSCor extension for .NET debugging.
I have an IIS Web Server that hosts 400 web applications (distributed across 30 application pools). They are both ASP.NET applications and WCF Services end points. The server has 32GB of RAM and is usually running fast; although it's running at 95% memory usage. Worker processes each take between 500MB and 1.5GB of RAM.
I also have another box running SQL Server. That one has plenty of free memory.
Sometimes, the Web Server starts throwing SQL Timeout exceptions. A few per minutes at first and rapidly increasing to hundreds per minute; effectively making the server down. This problem affects applications in all pools. Some requests still complete but most of them don't. While this happens the CPU usage on the server is around 30% (which is the normal load on that box).
While this is happening, we can still use SQL Server Management Studio (from the IIS Server) to execute requests successfully (and fast).
The fix is to restart IIS. And then everything goes back to normal until the next time.
Because the server is running with very low memory, I feel like this is the cause. But I cannot explain the relationship between low memory and sudden bursts of SQL Timeout exceptions.
Any idea?
Memory pressure can trigger paging and garbage collection. Both introduce latency which would not be present otherwise.
GC'ing 32GB of data can take seconds. Why would all app processes GC at the same time? Because at about 95% memory utilization Windows sets a "low memory" event that the CLR listens to. It will try to release memory to help other processes.
If the applications get into a paging frenzy that would also explain huge delays in normal execution.
This is just guessing, though. You can try proving it by looking at the "Hard page faults/sec" counter. There also must be a counter for "full GC" or "Gen 2 GC".
The fix would be running at a higher margin to the physical memory limit.
The first problem is to discover where the timeout is happening. Can you tell from the stack trace if the timeout is happening when executing a request against the database, or when connecting to the database? (Or even connecting to the web server?)
Timeouts executing database requests can be a variety of causes. The problem might be in the database with blocking processes, database maintenance (also locking), deadlocks, etc. When apps are running slowly, do you see a lot of entries in sys.dm_exec_requests, and if so, what are their wait_types?
Even if you can run SQL in the query window while the web server is timing out, that doesn't mean there isn't massive blocking or deadlocking going on.
If it is a timeout connecting to the database, then it is possible the ADO connection pools are overwhelmed and not getting cleaned up, or the database has a connection limit, and the web services are timing out waiting for a connection.
One of the best ways to find out what is going on is to capture a memory dump of the w3wp.exe process and analyze it. Even if you aren't adept at a debugger like WinDbg, Microsoft's DebugDiag tool can produce some nice reports with helpful information.
SqlCommand.CommandTimeout
This property is the cumulative time-out for all network reads during command execution or processing of the results. A time-out can still occur after the first row is returned, and does not include user processing time, only network read time.
It is a client based time out. If stuff is getting queued due to memory constraints then that could cause a timeout.
Are you retrieving a lot of data from these queries?
If some queries return a lot of data consider breaking them up and give the user a next and prior button.
Have you considered asynch like BeginExecuteReader?
The advantage is no timeout.
It does not release the calling thread.
isExecutingFTSindexWordOnce = true;
sqlCmdFTSindexWordOnce.BeginExecuteNonQuery(callbackFTSindexWordOnce, sqlCmdFTSindexWordOnce);
// isExecutingFTSindexWordOnce set to false in the callback
Debug.WriteLine("Calling thread active");
But I agree with your comment how to respond to the request as the answer does not come back to the calling thread.
Sorry I am used to WPF where I just update a public property on the call back.
I'm seeing a really strange error that I'm having a difficult time
tracking down. I think its related to my configuration of Rhino ESB, though I'm not sure
if RSB is actually causing it, so I figured I'd ask and see if
anyone else has come across this in any other usages of MSMQ.
I'm using RSB as a client in a web app (ASP.NET, the client runs in the background). The client talks to a windows service via the MSMQ binding for RSB. Restarting the service never appears to have an effect on MSMQ, neither does restarting IIS by hand. However, whenever I actually restart the computer itself, MSMQ always refuses to start back up, claiming that a "queue is in an inconsistent state". Attempting to start MSMQ manually results in the same error, effectively rendering the MSMQ install completely useless. The only way to solve it is to actually remove then reinstall MSMQ.
The only information I've found via the almighty Google are references to a problem in MSMQ 2.0 (this problem is occurring in MSMQ 4.0). I've verified that Dispose is being called on on the bus at shutdown, in both the service and the web site.
Does anyone have any idea why this could be occurring? Thanks!
I faced the same issue on Window 2008 Server (Virtual Machine). Although the environment was not related to rhino tools.
The error in the event log:
"The Message Queuing service cannot start because a queue is in an inconsistent state. For more information, see Microsoft Knowledge Base article 827493 at support.microsoft.com."
As Roy pointed out, this is happening every 2-3 days. Every time we would follow the steps below to recover - instead re-installing the MSMQ.
1) Stop all applications and services that uses MSMQ.
2) Kill the mqsvc.exe from the Task Manager
3) Go to C:\Windows\System32\msmq\storage and delete any .mq files
4) Start the MSMQ Service
4) Start your application
In my scenario I've been able to fix "queue is in an inconsistent state" error after MSMQ service restart.
Turns out the computer name was too long, so changing computer name to a name with less than 15 characters fixed the issue.
My team is experiencing a similar issue, with MSMQ getting called by NSB 2.5. The issue came up recently after Infrastructure moved our VM to another physical server and for some reason lowered available RAM. We think the issue may be memory-related.
EDIT
After a week of no more issues with this, I can confidently say that raising RAM on the server solved our MSMQ's "Inconsistent state" issue. Mind you, we did have to re-install MSMQ first -- but the issue never came back, and before the RAM update the issue popped up every 2 days.
Regularly on Windows 2008RC2, MSMQ cannot start after reboot.
The two regular issues for me are:
"The Message Queuing service cannot start because a queue is in an inconsistent state"
and
"The dependency service does not exist or has been marked for deletion"
Sometimes, the following has helped (although we are seeking a more solid answer)
rename msmq folder to msmq_old
net stop wuauserv
net stop bits
Delete “%windir%\softwaredistribution” directory
Reboot
This has occurred 5 times this year, and each time, a variety of the above with plenty of reboots.
Sometimes, we revert to Remove Feature / Add Feature, however you may get yourself in a loop. As it boots up, a rollback occurs in the windows update service, so the Feature is never uninstalled, and the problem is never repaired.
Following the steps above can help with that.