MSDTC fails for self-hosted NServiceBus ASP.NET endpoints but not other processes - asp.net

I have a Windows 2008 R2 server that hosts many back end NServiceBus endpoints. All of the services that rely on the NServiceBus.Host.exe host (installed as Windows Services) are able to interact with MSDTC perfectly, averaging a small handful of concurrent distributed transactions throughout the day. There are 2 small Web.API applications, however, that self host NServiceBus endpoints (as publishers) that constantly receive the following error when trying to process subscription requests:
NServiceBus.Transports.Msmq.MsmqDequeueStrategy Error in receiving
messages. System.Transactions.TransactionAbortedException: The
transaction has aborted. --->
System.Transactions.TransactionManagerCommunicationException:
Communication with the underlying transaction manager has failed. --->
System.Runtime.InteropServices.COMException: The Transaction Manager
is not available. (Exception from HRESULT: 0x8004D01B) at
System.Transactions.Oletx.IDtcProxyShimFactory.ConnectToProxy(String
nodeName, Guid resourceManagerIdentifier, IntPtr managedIdentifier,
Boolean& nodeNameMatches, UInt32& whereaboutsSize, CoTaskMemHandle&
whereaboutsBuffer, IResourceManagerShim& resourceManagerShim) at
System.Transactions.Oletx.DtcTransactionManager.Initialize() ---
End of inner exception stack trace --- at
System.Transactions.Oletx.OletxTransactionManager.ProxyException(COMException
comException) at
System.Transactions.Oletx.DtcTransactionManager.Initialize() at
System.Transactions.Oletx.DtcTransactionManager.get_ProxyShimFactory()
at
System.Transactions.Oletx.OletxTransactionManager.CreateTransaction(TransactionOptions
properties) at
System.Transactions.TransactionStatePromoted.EnterState(InternalTransaction
tx) --- End of inner exception stack trace --- at
System.Transactions.TransactionStateAborted.CheckForFinishedTransaction(InternalTransaction
tx) at System.Transactions.Transaction.Promote() at
System.Transactions.TransactionInterop.ConvertToOletxTransaction(Transaction
transaction) at
System.Transactions.TransactionInterop.GetDtcTransaction(Transaction
transaction) at
System.Messaging.MessageQueue.StaleSafeReceiveMessage(UInt32 timeout,
Int32 action, MQPROPS properties, NativeOverlapped* overlapped,
ReceiveCallback receiveCallback, CursorHandle cursorHandle, IntPtr
transaction) at
System.Messaging.MessageQueue.ReceiveCurrent(TimeSpan timeout, Int32
action, CursorHandle cursor, MessagePropertyFilter filter,
MessageQueueTransaction internalTransaction,
MessageQueueTransactionType transactionType) at
System.Messaging.MessageQueue.Receive(TimeSpan timeout,
MessageQueueTransactionType transactionType) at
NServiceBus.Transports.Msmq.MsmqDequeueStrategy.ReceiveMessage(Func`1
receive) in
c:\BuildAgent\work\31f8c64a6e8a2d7c\src\NServiceBus.Core\Transports\Msmq\MsmqDequeueStrategy.cs:line
313
Some other notes:
Both the erroring ApplicationPools' identities and the Windows
Services' Log On users are the same.
This actually worked well before
a recent reboot, as the Web.API services were able to successfully
process subscription requests, and are able to publish messages just
fine (though publishing does not automatically use MSDTC, and we are
not using a TransactionScope explicitly). Since the local reboot, we
simply get the above error if a subscription request message sits
in either of the Web.API publisher's input queue.
I've used both procmon.exe and MSDTC tracing and have found nothing of interest. The typical event viewer logs also do not provide any information.
All endpoints are running .NET 4.5 and NServiceBus 4.6
We cannot
recreate this in any other environment.
Additional notes from below conversations
The thread which throws the exception is pure NServiceBus subscription management where none of "my" code is involved. When the application pool starts the w3wp.exe worker process on demand, NSB is spawning a worker thread unbeknownst to the application to process subscription requests. It should only ever work across the publisher's input queue and the subscription storage, which I'm using MSMQ for that as well, in a queue right beside the other (i.e. no other server is involved to my knowledge).
The "code" of the website didn't change across reboots, and the application pool stopped and restarted several times before the reboot without issue.

Not really an answer, but too long for a comment.
What part of your operation requires DTC? A Distributed Transaction gets enlisted automatically when needed, usually when you are talking to two different DTC-supporting bits of infrastructure (e.g. MSMQ and a database).
You said you tested via DTC tracing--do you mean DTC Ping? Did you test by having it run on both machines (or all machines if there are more than two involved in the transaction)? The DTC tool is pretty esoteric, and its output can be confusing.
Also, if it did work before the reboot, is it possible the reboot reset firewall settings? Firewalls are a common cause of DTC problems.
Also, I assume you checked and rechecked your DTC settings on the local machine? Did you ensure that your MSMQ queues are set up to be transactional?
From your comments:
Note that this particular failure occurs when attempting to dequeue a
message from a local private MSMQ queue [...]
The stack trace makes it appear that that's all it's doing, but I suspect that as it is attempting dequeue it is also trying to enlist the transaction between multiple servers. See below.
Why MSDTC? It's the original way to support exactly-once messaging in
NServiceBus (see here).
Right, but what I'm asking is why the particular operation requires a distributed transaction. If all a handler is doing is reading from a queue and (for example) writing output to the console, MSDTC will never be enlisted, even though the handler is wrapped in a transaction scope. It will simply use a local transaction to read from the queue. The escalation to a distributed transaction is automatic, and only happens when it is needed to support multiple bits of infrastructure.
So if you recently deployed code in a handler that writes data to a new database server, you may be getting a failure because you are now enlisting a transaction that includes the new server, which may be where the failure is happening.
So determining all the servers involved in the distributed transaction is the first step. The next step would be to check the DTC settings on all involved servers. If DTC settings aren't the problem, I'd recommend testing communication between the servers using DTCPing. The NServiceBus documentation has some good instructions for using DTCPing.

What "fixed" this for us in the production environment was adding the application pool identity user to the local Administrators group on the server. Unfortunately we don't have time to determine what setting required that security setup, as this isn't a required configuration in other similar servers. Also, this isn't the most desirable solution from a security perspective, but in our particular situation, we're willing to live with it.

Related

Biztalk orchestration slows down on QA machine not on DEV

I have an BizTalk application which loops on a XML and send data to SQL server database. The orchestration works fine on the DEV machine throughout the process and is consistent. But if I process the same file on the QA machine it starts with the same speed and then the performance keeps on degrading. There is no issue on the Database object, the throttling settings are the same compared to DEV. I restarted the machine. Not sure why QA is reacting this way for this application.
What are the areas to be checked?
There are various factors which can cause this and overall your solution performance:
Is QA a shared environment, i.e. there are other solutions on it
which may cause the slow down?
If you are sharing hosts on which orchestration is running then that host might be throttling due to various reasons such as memory issues etc, Use performance counter to monitor the host throttling state.
You may have too many persistent points in orchestration, since you are looping and sending message to sql db in loop. if you are using send shape it will cause persistent point per send in loop,will degrade performance considerable.
Isolate the issue i.e. whether it is orchestration running slow or
sending to SQL is taking time.
Tracking is turned on and DTA jobs are not running
Message clean jobs not running as expected in QA
I wrote a blog about how to use SQL Server Profiler to capture the RPC call from BizTalk to SQL Server. You could isolate whether SQL is causing the issue that way; capture the RPC call on DEV or QA, and then try running just the stored procedure on QA. If it doesn't run as quickly as on DEV, that's your problem. If it does, look at your BizTalk artifacts.
Here's the blog: http://blog.tallan.com/2015/01/09/capturing-and-debugging-a-sql-stored-procedure-call-from-biztalk/
BizTalk host throttled because DatabaseSize exceeded the configured throttling limit. Also The SQL Server Agent was not running on the server, so purge processes did not run. This looks to have built up the database size over time until Biztalk throttled the application due to the resources being low

How to deploy a socket server in iis application scope

I am implementing an ASP.NET application that needs to service conventional http requests but the responses require data that I need to acquire from providers that are executables that provide their data over sockets. My plan to implement was:
1) In Application_Start, start a new thread that starts a socket server
2) In Session_Start, launch the session-specific process that will ultimately connect to the socket server, and from there do a Monitor.Wait on a session-specific lock object which I've stored in Application.Contents by Session key
3) When the socket server sees a new connection, make the data available to the appropriate session Contents and do a Monitor.Pulse on the session-specific lock object
Is this technically feasible in IIS? Can this concept function as a stable system?
Before answering, please bear in mind I am not asking "is this the recommended approach", I am aware it is not and if I had the option to write this system from scratch I would do this differently. I'm also not able to change the fact that the programs communicate using sockets.
Given the constraints this approach makes sense.
Shutdown and recycling of IIS worker processes are always throny issues when it comes to keeping state in a web app. Note, that your worker process can recycle pretty much at any time for many reasons. Some of those reasons are unavoidable: Server reboot, app deployment, bug leading to a process crash. So you need to think through what happens in those cases: All sessions will be lost while the child processes still run. Suggested solution: Add the children into a Windows Job Object and configure the Job to be killed when the parent exits.
With overlapped IIS worker recycling you can have two functioning workers running at the same time. You must deal with that possibility.
Consider the possibility that the child process immediately crashes. It will never make a connection. Make sure your app doesn't hang waiting for the connection forever.

Calling SqlConnection.ClearAllPools() in Application_Start & Application_End?

We are trying to diagnose an issue that occurred in our production environment last week. Long story short, the database connection pool seemed to be full of active connections from our ASP.NET 3.5 app that would not clear, even after restarting the application pool and IIS.
The senior DBA said that because the network connections occur at the operating system level, recycling the app and IIS did not sever the actual network connections, so SQL Server left the database connections to continue running, and our app was still unable to reach the database.
In looking up ways to force a database connection pool to reset, I found the static method SqlConnection.ClearAllPools(), with documentation explaining what it does, but little to nothing explaining when to call it. It seems like calling it at the beginning of Application_Start and the end of Application_End in my global.asax.cs is a good safety measure to protect the app from poisoned connection pools, though it would of course incur a performance hit on startup/shutdown times.
Is what I've described a good practice? Is there a better one? The goal is to allow a simple app restart to reset an app's mangled connection pool without having to restart the OS or the SQL Server service, which would affect many other apps.
Any guidance is much appreciated.
When a process dies, all network connection are always, always, always closed immediately. That's at the TCP level. Has nothing to do with ADO.NET and goes for all applications. Kill the browser, and all downloads stop. Kill the FTP client and all connections are closed immediately.
Also, the connection pool is per process. So clearing it when starting the app is useless because the pool is empty. Clearing it at shutdown is not necessary because all connections will (gracefully) shut down any moment.
Probably, your app is not returning connections to the pool. You must dispose of all connections after use in all cases. If you fail to do that, dangling connections will accumulate for an indefinite amount of time.
Clearing the pool does not free up dangling connections because those appear to be in use. How could ADO.NET tell that you'll never use them again? It can't.
Look at sys.dm_exec_connections to see who is holding connections open. You might increase the ADO.NET pool size as a stop-gap measure. SQL Server can take over 30k connections per instance. You'll normally never saturate that.

How to make a Windows Service listen for additional request while it is already processing the current request?

I need to build a Windows Service in VB.net under Visual Studio 2003. This Windows service should read the flat file (Huge file of about a million records) from the local folder and upload it to the corresponding database table. This should be done in Rollback mode (Database transaction). While transferring data to table, the service should also be listening to additional client requests. So, if in between client requests for a cancel operation, then the service should rollback the transactions and give feedback to the client. This windows service also keeps writing continuously to two log files about the status and error records.
My client is ASPX page (A website).
Can somebody help me explain how to organize and achieve this functionality in a windows service(Processing and listening for additional client requests simultaneously. Ex. Cancellation request).
Also could you suggest me the ideal way of achieving this (like if it is best to implement it as web service or windows service or just a remote object or some other way).
Thank you all for your help in advance!
You can architect your service to spawn "worker threads" that do the heavy lifting, while it simply listens for additional requests. Because future calls are likely to have to deal with the current worker, this may work better than, say, architecting it as a web service using IIS.
The way I would set it up is: service main thread is listening on a port or pipe for a communication. When it gets a call to process data, it spawns a worker thread, giving it some "status token" (could be as simple as a reference to a boolean variable) which it will check at regular intervals to make sure it should still be running. Thread kicks off, service goes back to listening (network classes maintain a buffer of received data so calls will only fail if they "time out").
If the service receives a call to abort, it will set the token to a "cancel" value. The worker thread will read this value on its next poll and get the message, rollback the transaction and die.
This can be set up to have multiple workers processing multiple files at once, belonging to callers keyed by their IP or some unique "session" identifier you pass back and forth.
You can design your work like what FTP do. FTP use two ports, one for commands and another for data transfer.
You can consider two classes, one for command parsing and another for data transfer, each one on separate threads.
Use a communication channel (like a privileged queue) between threads. You can use Syste.Collections.Concurrent if you move to .NET 4.0 and more threading features like CancellationTokens...
WCF has advantages over web service, but comparing it to windows service needs more details of your project. In general WCF is easier to implement in compare to windows service.

Are there any patterns for monitoring log4net exception logs from across a cluster of web servers?

Are there any patterns or practices for monitoring log4net exception logs across a cluster of web servers. I have considered several options including the following:
A central database
A log file retrieval system
A service based loggin architecture
Thanks,
Richard
Message Queuing is a great solution. It works great in a distributed enviroment where one machine or multiple can be popping log messages of the queue and persisting them somewhere (Central logging database or rolling flat files...). And if at any time a machine creating messages or popping go offline they can continue logging messages and continue as normal when they are back online.
Your could configure log4net to trace into the event log of each machine and use a hypervisor software such as SCOM to monitor each node.

Resources