Slow BizTalk File Receive - biztalk

I have an application with a file receive location. After the host instance has been running for a few hours the receive location fails to identify new files dropped into the folder that it is monitoring. It doesn't forget about them altogether, it's just that performance grinds to a crawl. The receive location is configured to poll the target folder every 60 seconds but after host instance has been running for an hour or so, then it seems that the target folder is being polled only every thirty minutes. If I restart the host instance then the files waiting in the target folder are collected right away and performance is fine for the next hour or so.
The same application runs fine in a different environment.
There are now obvious entries in the event log related to the problem.
All the BizTalk SQL jobs are running fine except for Backup BizTalk Server (BizTalkMgmtDb).
Any suggestions gratefully received.
Thanks
Rob

Here are some additional tools which may help you identify and diagnose BizTalk database issues.
BizTalk MsgBox Viewer
Here is a tool to repair identified errors:
Terminator
Use at your own risk... read the glogs and docs. Start with the message box viewer and let us know our results.

Without more details, the biggest tell is that your Backup Job is failing. If the backup job is failing, it may not be properly configured. If it is properly configured and still failing, then you've got other issues. Can you give us some more information about your BizTalk install.
What version are you running?
What are our database sizes?
What are your purge and archive settings like?
Is there any long running blocks in your SQL Server DB coming from BizTalk?

Another thing to consider is the user accounts the send, receive and orchestration hosts are running under. Please check the BizTalk Administration Console. If they are all running the same account, sometimes the orchestrations can starve the send and receive processes of CPU time. I believe priority is given to orchestrations then receive, then send. Even if you are just developing, it is useful to use separate accounts for this. This also improves security.
The Wrox BizTalk Server 2006 will also supply tuning advice.

What other things are going on with the server? Is BizTalk pegged otherwise or is it idle?

You mention that the solution does not have any problems in another environment, so it's likely that there is a configuration problem.
Check the following:
** On SQL Server, set some upper memory limit for SQL Server. By default, SQL Server uses whatever it can get and then hangs onto it, so set a reasonable limit so that your system can operate without spending a lot of time paging memory onto and from your hard drive(s).
** Ensure that you have available disk space - maybe you are running low - this can lead to all kinds of strange problems.
** Try to split up the system's paging file among its physical drives (if you have more than one drive on the system). Also consider using a faster drive, or if you have lots of cash laying around, get a SAN.
** In BizTalk, is tracking enabled? If so, are you also tracking message bodies? Disable tacking or message body tracking and see if there is a difference.
** Start performance monitor and monitor the following counters when running your solution
Object: BizTalk Messaging
Instance: (select the receiving host) %%
Counter: Documents Received/Sec
Object: BizTalk Messaging
Instance: (select the transmitting host) %%
Counter: Documents Sent/Sec
Object: XLANG/s Orchestrations
Instance: (select the processing host) %%
Counter: Orchestrations Completed/Sec.
%% You may have only one host, so just use it. Since BizTalk configurations vary, I am using generic names for hosts.
The preceding counters monitor the most basic aspects of your server, but may help to narrow down places to look further. You can, of course, add CPU and Memory too. If you have time (days...maybe weeks) you could monitor for processes that allocate memory and never release it. Use the following counter...
Object: Memory
Counter: Pool Nonpaged Bytes
Slow decline of this counter indicates that a process is not releasing memory, which affects everything on the system.
Let us know how things turn out!

I had the same problem with, when my orchestration was idle for some time it took a long time to process the first msg. A article of EvYoung helped me solve this problem.
"This is caused by application domain unloading within the BizTalk host process. If an AppDomain is shutdown after idle, the next message that comes needs to wait for the Orchestration to compile again. Depending on the complexity of your design, this can be a noticeable wait. To prevent this in low latency requirement scenario, you can modify the BTSNTSVC.EXE.config file and set SecondsIdleBeforeShutdown property to -1. This will prevent AppDomain shutdown due to idle."
You can find the article in here:
http://blogs.msdn.com/b/biztalkcpr/archive/2008/05/08/thoughts-on-orchestration-performance.aspx
It took me to long to respond but i thought i might help someone. cheers :)

Some good suggestions from others. I will add :
Do you have any custom receive pipeline components on the receive location ? If so perhaps one is leaking memory, calling some external component eg database which is taking a long time ?
How big are the files you are receiving ?
On the File transport properties of your receive location, set "file renaming" on, do the files get renamed within 60s.

Related

uWSGI and Flask: keep objects in memory between requests

My stack is uWSGI, flask and nginx currently. I have a need to store data between requests (basically I receive push notifications from another service about events to the server and I want to store those events in the server memory, so client can just query server every n milliseconds, to receive latest update).
Normally this would not work, because of many reasons. One is a good deployment requires you to have several processes in uwsgi in production (and even maybe several machines to scale this out). But my case is very specific: I'm building a web app for a piece of hardware (You can think of your home router configuration page as a good example). This means no need to scale. I also do not have a database (at least not a traditional one) and probably normally 1-2 clients simultaneously.
if I specify --processes 1 --threads 4 in uwsgi, is this enough to ensure the data is kept in the memory as a single instance? Or do I also need to use --threads 1?
I'm also aware that some web servers clear memory randomly from time to time and restart the hosted app. Does nginx/uwsgi do that and where can I read about the rules?
I'd also welcome advises on how to design all of this, if there are better ways to handle this. Please note that I do not consider using any persistant storage for this - this does not worth the effort and may be even impossible due to hardware limitations.
Just to clarify: When I'm talking about one instance of data, I'm thinking of my app.py executing exactly one time and keeping the instances defined there for as long as the server lives.
If you don't need data to persist past a server restart, why not just build a cache object into you application that can do push and pop operations?
A simple array of objects should suffice, one flask route pushes new data to the array and another can pop the data off the array.

Program not closing connections to db2400 / as/400

I've been programming for just a few years, and we have a default dll used for data access. It seems like there has been some data-mining or site scraping going on here lately, and although there are no issues with our SQL database connections, many of the programs that access the as/400 are keeping connections open and idle for long periods of time. I looked through our default data access dll and added code to close the connection after each function, but that didn't help. I have little experience with db2 / as/400 ... how do I close all of these open / idle connections from the code?
If you're using connections pools, that's working as designed.
Are you sure the connection is actually open? How are you determining that?
If you're just seeing locks held by the QZDASOINIT job on the IBM i, then that's also by design. The system will hard close tables (cursors) after the first use. When used again by the same job, the system will only pseudo-close them; in order to provide faster response when they are re-used.
If an operation needing exclusive access is attempted, the system will hard close the pseudo closed cursor.

Some BizTalk Receive Locations are disabled after server reboot

It is found that some BizTalk Receive Locations are disabled after server reboot (BizTalk server and SQL Server are separately installed to different physical servers)
Is there any idea on this scenario? Due to the boot sequence or other issues?
I will assume that, once you enable the receive locations manually, they are working correctly.
Typically, when FILE receive locations fail while pointing to an external server/share, it is because they are no longer available.
Make sure that, during the night, there are no network issues, planned/unplanned downtime of the share (= here your SQL server). A BizTalk receive location will retry to access a share for quite a while before disabling itself. Check the event log(s) for more information. You would want to look for errors/warnings there indicating an issue with connectivity between BizTalk and SQL.
Another issue might be that there are too many connections between your BizTalk server and SQL server. You can provide a maximum number of connections in the FILE share properties.
Also, you could try this link: https://serverfault.com/questions/235032/intermittent-connection-to-windows-7-shared-folder-from-windows-xp-workstations
It describes a potential fix for optimizing throughput for file sharing, although it depends on your operating system.

Can low memory on IIS server cause SQL Timeouts (SQL Server on separate box)?

I have an IIS Web Server that hosts 400 web applications (distributed across 30 application pools). They are both ASP.NET applications and WCF Services end points. The server has 32GB of RAM and is usually running fast; although it's running at 95% memory usage. Worker processes each take between 500MB and 1.5GB of RAM.
I also have another box running SQL Server. That one has plenty of free memory.
Sometimes, the Web Server starts throwing SQL Timeout exceptions. A few per minutes at first and rapidly increasing to hundreds per minute; effectively making the server down. This problem affects applications in all pools. Some requests still complete but most of them don't. While this happens the CPU usage on the server is around 30% (which is the normal load on that box).
While this is happening, we can still use SQL Server Management Studio (from the IIS Server) to execute requests successfully (and fast).
The fix is to restart IIS. And then everything goes back to normal until the next time.
Because the server is running with very low memory, I feel like this is the cause. But I cannot explain the relationship between low memory and sudden bursts of SQL Timeout exceptions.
Any idea?
Memory pressure can trigger paging and garbage collection. Both introduce latency which would not be present otherwise.
GC'ing 32GB of data can take seconds. Why would all app processes GC at the same time? Because at about 95% memory utilization Windows sets a "low memory" event that the CLR listens to. It will try to release memory to help other processes.
If the applications get into a paging frenzy that would also explain huge delays in normal execution.
This is just guessing, though. You can try proving it by looking at the "Hard page faults/sec" counter. There also must be a counter for "full GC" or "Gen 2 GC".
The fix would be running at a higher margin to the physical memory limit.
The first problem is to discover where the timeout is happening. Can you tell from the stack trace if the timeout is happening when executing a request against the database, or when connecting to the database? (Or even connecting to the web server?)
Timeouts executing database requests can be a variety of causes. The problem might be in the database with blocking processes, database maintenance (also locking), deadlocks, etc. When apps are running slowly, do you see a lot of entries in sys.dm_exec_requests, and if so, what are their wait_types?
Even if you can run SQL in the query window while the web server is timing out, that doesn't mean there isn't massive blocking or deadlocking going on.
If it is a timeout connecting to the database, then it is possible the ADO connection pools are overwhelmed and not getting cleaned up, or the database has a connection limit, and the web services are timing out waiting for a connection.
One of the best ways to find out what is going on is to capture a memory dump of the w3wp.exe process and analyze it. Even if you aren't adept at a debugger like WinDbg, Microsoft's DebugDiag tool can produce some nice reports with helpful information.
SqlCommand.CommandTimeout
This property is the cumulative time-out for all network reads during command execution or processing of the results. A time-out can still occur after the first row is returned, and does not include user processing time, only network read time.
It is a client based time out. If stuff is getting queued due to memory constraints then that could cause a timeout.
Are you retrieving a lot of data from these queries?
If some queries return a lot of data consider breaking them up and give the user a next and prior button.
Have you considered asynch like BeginExecuteReader?
The advantage is no timeout.
It does not release the calling thread.
isExecutingFTSindexWordOnce = true;
sqlCmdFTSindexWordOnce.BeginExecuteNonQuery(callbackFTSindexWordOnce, sqlCmdFTSindexWordOnce);
// isExecutingFTSindexWordOnce set to false in the callback
Debug.WriteLine("Calling thread active");
But I agree with your comment how to respond to the request as the answer does not come back to the calling thread.
Sorry I am used to WPF where I just update a public property on the call back.

What Causes "Internal connection fatal errors"

I've got a number of ASP.Net websites (.Net v3.5) running on a server with a SQL 2000 database backend. For several months, I've been receiving seemingly random InvalidOperationExceptions with the message "Internal connection fatal error". Sometimes there's a few days in between, while other times there are multiple errors per day.
The exception is not limited to one site in particular, though they share business and data access assemblies. The error seems to always be thrown from SqlClient.TdsParser.Run(). It sometimes is thrown from old-school direct SqlCommand.Execute() calls, while other times it is thrown from Linq2Sql code.
I've been assured by the network guys that there are no errors or packets lost on their end. Has anyone else experienced this? Could it be a driver problem? We have been unable as of yet to pinpoint a specific trigger for this exception.
We're running II6 on Windows Server 2003.
After a few months of ignoring this issue, it started to reach a critical mass as traffic gradually increased. Under heavy load, including some crawlers, things got crazy and these errors poured in nonstop.
Through trial and error, we eventually tracked down a handful of SqlCommand or LINQ queries whose SqlConnection wasn't closed immediately after use. Instead, through some sloppy programming originating from a misunderstanding of LINQ connections, the DataContext objects were disposed (and connections closed) only at the end of a request rather than immediately.
Once we refactored these methods to immediately close the connection with a C# "using" block (freeing up that pool for the next request), we received no more errors. While we still don't know the underlying reason that a connection pool would get so mixed up, we were able to cease all errors of this type. This problem was resolved in conjunction with another similar error I posted, found here: Why is my SqlCommand returning a string when it should be an int?
Sounds like the database connection is getting dropped or timing out.
We recently had similar issues moving to IIS 6 from IIS 5 connecting to SQL 2000. Our issue was solved by increasing number of ephemeral ports available.
Look at the usage of the ephemeral ports by the IIS server. The default max no. of ports available is normally 4000. You might want to consider increasing this if the sites on your server are particularly busy or your application is making a lot of database calls.
You can monitor these first to see if going over max limit.
Search Microsoft Knowledge base for "MaxUserPort" and "TcpTimedWaitDelay" and make necessary registry changes. Make sure you back up registry or snapshot server before making the changes. Will need to reboot for changes to take effect.
You should double check your database and recordset connection are being closed after use. Not closing will use up this port range unnecessarily.
Check the efficiency of your stored procedures anyway as they might be taking longer than they need too.
"If you rapidly open and close 4000 sockets in less than four minutes, you will reach the default maximum setting for client anonymous ports, and new socket connection attempts fail until the existing set of TIME_WAIT sockets times out." - from http://support.microsoft.com/kb/328476
Check your server's LOG folder (\program files\Microsoft SQL Server\MSSQL.1\MSSQL\LOG or similar) for files named SqlDump*.mdmp and SqlDump*.txt. If you do find any you'll have to take it to Product Support.
I was creating a new EF Core project and was trying to create the database to an external Linux server instead of a Windows Server or local one. After hours of searching I found out that I am using MySQL instead of the Microsoft SQL server.
I found it weird that everyone was using 1433 instead of the usual 3306. So to fix my 'Internal connection fatal error' I had to set up a docker instance of SQL Server bound to its default port of 1433.
It literally was that simple. In the docker repo look for "microsoft-mssql-server" and run the image as described neatly in the description below. Everything works now and I am able to push my database from my EF Core project to an external server.

Resources