BizTalk based webservice incoming location is slow - biztalk

I noticed that sometimes incoming location build on BizTalk WCF-BasicHttp is exactly 15 seconds slower than processing time of this message inside orchestration.
Also I've found that in Rate based Throttling for BizTalkServerIsolatedHost Sampling window duration is 15 seconds .
Is that possible that for some reason BizTalk start throttling incoming messages this is why webservice start responding 15 seconds longer .
However there are no messages into even log that BizTalk is throttling.
How can I found what happening with BizTalk ?

To see if BizTalk is throttling you have to monitor the Host Throttling counters using Perfmon.
From Host Throttling Performance Counters
To access performance counters
Use the following steps to access the performance counters.
If you are using Windows 2008
Click Start, point to Administrative Tools, and then click Performance Monitor.
In the Performance Monitor dialog box, expand Monitoring Tools, select Performance Monitor, and then click Add.
In the Add Counters dialog box, from the Available Counters list, expand the BizTalk:Message Agent performance counter object and select the counters to be monitored.
In the Instances of Selected object list, select the specific instances to be monitored for the selected counters and then click Add. To select all available counter instances, select .
After adding the counters, click OK.
The selected performance counters appear on the Performance Monitor screen.
However I don't think that is your issue, it sounds like you want Low Latency. For this you need to do Low-Latency Scenario Optimizations
Note that the total processing time also includes the time that IIS takes to spin up the web service and publish the message into the message box. So you might need to also make changes in the Application Pool of the Web Service to stop it tearing down. Look at the setting Idle Time-out, this is set to 20 minutes by default. If the web service is hit less often than this time and you want a faster response, set this to a higher value. Some people also schedule a task to wake the webservice on a regular basis. This is that even if the appPool is recycled or restarted it is spun up again quickly.

Related

Forcing a Biztalk Host to Throttle for Debugging Purposes

we're currently having issue in our production servers and would like to try to replicate the issue in our dev. I'm currently awaiting access to our Performance Monitoring Tool, and while waiting would like to play with it a little.
I'm thinking of, since I suspect a host throttling in prod, forcing hosts to throttle in dev and see if it will recreate the issue.
Is there a way to do this?
As others have mentioned, monitoring of the throttling counters and other counters like memory and WIP messages is a must to see what is going on in your production server. Also would recommend that set up a SCOM alert on throttling states of 3+ (publishing + delivery states), if you have SCOM.
Message throughput can grind to a halt on especially the memory (4, 5) and Queue Size (6) states. States 1+2 are generally short lived (e.g. arrival of a large batch of messages) and Biztalk recovers within a few seconds.
Simulating the memory state in your Dev environment should be straightforward by tweaking the throttling thresholds (obviously not something to be taken lightly in production!)
e.g. to trigger the Memory threshold states - AFAIK the lowest memory usage threshold you can set is 101MB. Running a load test in dev should then be able reproduce the throttle.
There is also apparantly a user-based throttling override to set states 10 and 11 although haven't actually tried this.
Some other experience on avoiding throttling:
(Caveat - I don't have an active BizTalk 2006/R2 setup - this is for 2009 / 2010)
If you do a lot of asynchronous processing (e.g. Queue receives), ensure that you have split functionality into separate Hosts for Receive, Processing and Send hosts. This way you can adjust the throttling for asynch Receive hosts to trigger much earlier than the processing and sending hosts - this should have the effect of constricting new incoming messages to the messagebox but allowing existing messages to complete processing.
On 64 bit hosts, the default 25% memory host usage throttling level is usually an unnecessary liability - we increased this using Yossi Dahan's recommendation of 50% on a 4GB server
Note that suspended messages count toward throttling state 6 - ensure that you have a strategy for dealing with suspended messages (and obviously ensure that the Sql Agent jobs are running!).

Can low memory on IIS server cause SQL Timeouts (SQL Server on separate box)?

I have an IIS Web Server that hosts 400 web applications (distributed across 30 application pools). They are both ASP.NET applications and WCF Services end points. The server has 32GB of RAM and is usually running fast; although it's running at 95% memory usage. Worker processes each take between 500MB and 1.5GB of RAM.
I also have another box running SQL Server. That one has plenty of free memory.
Sometimes, the Web Server starts throwing SQL Timeout exceptions. A few per minutes at first and rapidly increasing to hundreds per minute; effectively making the server down. This problem affects applications in all pools. Some requests still complete but most of them don't. While this happens the CPU usage on the server is around 30% (which is the normal load on that box).
While this is happening, we can still use SQL Server Management Studio (from the IIS Server) to execute requests successfully (and fast).
The fix is to restart IIS. And then everything goes back to normal until the next time.
Because the server is running with very low memory, I feel like this is the cause. But I cannot explain the relationship between low memory and sudden bursts of SQL Timeout exceptions.
Any idea?
Memory pressure can trigger paging and garbage collection. Both introduce latency which would not be present otherwise.
GC'ing 32GB of data can take seconds. Why would all app processes GC at the same time? Because at about 95% memory utilization Windows sets a "low memory" event that the CLR listens to. It will try to release memory to help other processes.
If the applications get into a paging frenzy that would also explain huge delays in normal execution.
This is just guessing, though. You can try proving it by looking at the "Hard page faults/sec" counter. There also must be a counter for "full GC" or "Gen 2 GC".
The fix would be running at a higher margin to the physical memory limit.
The first problem is to discover where the timeout is happening. Can you tell from the stack trace if the timeout is happening when executing a request against the database, or when connecting to the database? (Or even connecting to the web server?)
Timeouts executing database requests can be a variety of causes. The problem might be in the database with blocking processes, database maintenance (also locking), deadlocks, etc. When apps are running slowly, do you see a lot of entries in sys.dm_exec_requests, and if so, what are their wait_types?
Even if you can run SQL in the query window while the web server is timing out, that doesn't mean there isn't massive blocking or deadlocking going on.
If it is a timeout connecting to the database, then it is possible the ADO connection pools are overwhelmed and not getting cleaned up, or the database has a connection limit, and the web services are timing out waiting for a connection.
One of the best ways to find out what is going on is to capture a memory dump of the w3wp.exe process and analyze it. Even if you aren't adept at a debugger like WinDbg, Microsoft's DebugDiag tool can produce some nice reports with helpful information.
SqlCommand.CommandTimeout
This property is the cumulative time-out for all network reads during command execution or processing of the results. A time-out can still occur after the first row is returned, and does not include user processing time, only network read time.
It is a client based time out. If stuff is getting queued due to memory constraints then that could cause a timeout.
Are you retrieving a lot of data from these queries?
If some queries return a lot of data consider breaking them up and give the user a next and prior button.
Have you considered asynch like BeginExecuteReader?
The advantage is no timeout.
It does not release the calling thread.
isExecutingFTSindexWordOnce = true;
sqlCmdFTSindexWordOnce.BeginExecuteNonQuery(callbackFTSindexWordOnce, sqlCmdFTSindexWordOnce);
// isExecutingFTSindexWordOnce set to false in the callback
Debug.WriteLine("Calling thread active");
But I agree with your comment how to respond to the request as the answer does not come back to the calling thread.
Sorry I am used to WPF where I just update a public property on the call back.

SignalR - in production, after 8 hours doesn't respond

using SignalR in procution. at startup everything works fine, but after 8-9 hours, service stops working, without any exception, or any log information in Event Logs.
Info:
Online Users (who uses this service in this 8-9 hours): 3000
Online Concurrent Users Max (at same time): 200
Hubs Count: 1 (try catch in every method for logging)
after browser timeout, it returns "404 not found".
do u have any ideas?
What version of SignalR are you using? You should be using v0.5.2 as previous versions had issues with zombie connections which would shut down your app by causing either an OutOfMemoryException or exceed the allowed number of requests for the app pool.
Essentially what would happen is that the # of requests would get backed up (use the performance monitor to view ASP.NET/{Requests Current, Requests Queued, Requests Rejected} - See Performance Tuning) and/or you would max out your IIS requests and the service would shut down. You can manually override this and increase your Requests Current for ASP.NET. I increased this to about 20,000 on our production box.
If you're not maxing out your requests, your app pool may be shutting down due to increased memory usage or # of exceptions shutting down the app pool.
In the IIS manager, under Management, double-click on Configuration Editor.
Next, at the top, click on system.applicationHost/applicationPools then click on the RHS of the line that says "(Collection)". This will open up your application pool collection editor.
Select your SignalR app pool and check the Properties at the bottom. Here you can set the periodicRestart/memory threshold to whatever you want.
In our application I'm finding that we're good for about 45 minutes, due to the high traffic nature of the application.
Hopefully that helps.

Impact when changing instance count for asp.net web role in Azure

I'm having no luck trying to find out how channging the instance count for an ASP.Net web role affects requests currently being processed.
Heres the scenario:
An ASP.Net site is deployed with 6 instances
Via the console I reduce the instancecount to 4
Is azure smart enough to not remove instances from the pool if it is currently progressing requests or does it just kill them mid request?
I've been through the azure doco, goolge and a number of emails to MS tech support none of which were able to answer this seemingly simple question. I know about the events that get triggered by a shutdown etc but that doesnt really help in web site scenario with a live person waiting for a request to their response.
You cannot choose which instances to kill off. Primarily this is due to Windows Azure's instance allocation scheme, where your instances are split into different fault domains (meaning different areas of the data center - different rack, etc.). If you were to choose the instances to kill, this could leave you in a state where your remaining instances are in the same fault domain, which would void the SLA.
Having said that: You get an event when your role instance is shutting down (the OnStop() event). If you capture this event, you can do instance cleanup in preparation for VM shutdown. I can't recall if you're taken out of the load balancer at this point, but you could always force yourself out with a simple PowerShell command (Set-RoleInstanceStatus -Busy). This way your asp.net instance stops taking requests, and you can more easily shut down in a graceful manner.
EDIT: Sorry - didn't quite address all of your question. Since you get to capture OnStop(), you might have to implement a mechanism to make sure nothing's being processed in that instance. Since you're out of the load balancer, and assuming your requests are processed fairly quickly (2-5 seconds), you shouldn't have to wait long to clear out remaining requests. There's probably a performance counter to check, to see how many active requests are being handled.
Just to add to David's answer: the OnStop event happens when you are off the load balancer. For web apps, it is usually sufficient time to bleed out all requests after you are disconnected from the LB until the instance is shutdown. However, for long running or stateful connections (perhaps to a worker role), there would be an abrupt disconnect in some cases. While the OnStop method removes you from the LB, it does not terminate open connections. It simply prevents you from getting new connections. For web apps, this is usually enough time to complete the request (and you can delay the shutdown if necessary in the OnStop as well if you really want to).

impact of disabling recycling of worker process in IIS application pool

I have a WCF service hosted in IIS which takes a long time (around 5 hours) to execute. the WCF service basically generates some reports using SSRS (SQL server reporting services) and saves them to a locaton on the server. this service was actually stopping after generating few reports, so I disabled the "recycling of worker process", "shut down worker processes after being idle" and "limit kernel request queue" in application pool and that fixed the issue and all the reports were generated regardless of the amount taken to generate them. but I am not sure if this is the right fix for this and I would like to know what is the impact of unchecking these settings in application pool for the WCF service in IIS? and is there any better way to get around this problem?
For any long running process it is much better to do it outside of IIS.
In this case I would have a regular windows service running that monitored a request queue. When a request comes in to generate a report, it would then spin off a thread to perform the generation.
The web service would be responsible for 3 things. First, adding an item to the queue to be handled. Second, checking status on the queue as to whether the report is ready. Third, sending the completed report back to the calling client.
This would allow the client to essentially do a fire and forget on the report request and call back later to check on it's status. Further it would mean that if IIS recycled for whatever reason you are still OK.
For bonus points I would add some error handling code that when the windows service restarted it could restart report jobs that were in the middle of execution. This would make it a bit more robust and allow you to reboot the server at any point.
I have disable also all the automatic shut down process from iis for my application with out any issue. I have monitor the memory limits and of course the program work smoothly with out any issues on memory.
I think that this automatic shut down triggers are designed mostly for process that keep too many web sites together and possible some of them have not good programming. But if you are the master of your iis, and you have check your program that have not memory issue, then is better to not shut it down, or at least control the shut down process with some way.
Ok is better to make long running process outside IIS, but is not so simple to develop it, not so simple to install it, not so simple to check it out.

Resources