BizTalk 2013 R2 WCF-SQL adapter having random issues - biztalk

Ahoy,
We have two BizTalk applcations in BizTalk 2013 R2 that seem to be having random issues. Both applications follow the same process.
Pull data from a WCF endpoint.
Delete data from a database via a stored procedure.
Insert the new data that was pulled via WCF-SQL call.
Both applications worked great during our testing for quite a while. But, over time, we've had a few issues crop up with the insert via the WCF-SQL call.
A fatal error occurred while reading the input stream from the network. The session will be terminated (input error: 64, output error: 0).
This error showed up in the Sql Server logs. We had this one for about a day and then it just went away. Everything else continued to work fine on that target sql server. It was only BizTalk that had issues.
Our latest error is where the request to the WCF-SQL insert happens ( the data is actually inserted ), but there never is a response. So, the Send Port continues to try and send for it's retries and the Orchestration just dehydrates.
We tinkered with every setting throughout the application to try and solve this, but only a delete of the application and a redeploy fixed this ( for now at least ).
So, I guess my question is whether or not anyone else has had these sorts of issues with BizTalk having "random" errors like this where it'll work great and then go downhill like we've seen?
I'd really prefer to have something stable that is minimal maintenance. This is an enterprise product after all.

I've issues similar to this happen when moving between environments where there were data differences, e.g. a column full of NULLs in QA and a column full of actual data in PROD. There are a few things you can try.
Use SQL Sever Profiler to capture the RPC call coming from BizTalk, and try running it directly on the SQL Server BizTalk is calling remotely (wrap it in a transaction you roll back at the end if this is production). Does it take longer than expected to run? Debug the procedure to find the pain points and optimize if possible. I've written a blog about how to do this here: http://blog.tallan.com/2015/01/09/capturing-and-debugging-a-sql-stored-procedure-call-from-biztalk/
Up the timeout settings in the binding configuration for the send port to ensure that it is not timing out before SQL can finish doing its work.
Up the System.Transactions timeout in Machine.config to ensure that MSDTC isn't causing issues: http://blogs.msdn.com/b/madhuponduru/archive/2005/12/16/how-to-change-system-transactions-timeout.aspx and http://blog.brandt-lassen.dk/2012/11/overriding-default-10-minutes.html
If possible, do a data compare between the TEST/QA and PROD databases. Look for significant differences, especially in columns that you are using in JOIN conditions and WHERE clauses.

Related

ASP.NET Connection loss handling

How would you go about and handle lost data from a sql connection loss on a ASP.NET application.
Lets say your running an algorithm adding and removing certain roles. A midst it, the connection to the SQL database is lost. And because the connection is not there, wont even be able to backtrack the steps done. The whole state is lost, leaving the database in an error nous condition.
Would you set the IIS Rapid Fail Protection to shut the site down upon 1 exception and manually force the function to run again (after connection have been fixed).
Or how is the professional way of handling it, i am quite new to it. Maybe there's something i do not know of it (such as iis maybe trying to rerun it/caching)
(Using entity framework)
This is not a coding problem in its own way, it is more of a question of best practice handling data loss with sql database on asp.net
You need to do batch SQL Operations inside a SQL Transaction. So that whatever the error, a rollback happens. This is a built-in SQL feature and nothing special needs to be done.
Once you start a SQL Transaction, a Commit is issued only when all operations succeed. The default behavior is normally Rollback in case of all other non-success scenarios.
If you're encountering issues with any specific logic, post the code snippet and we're glad to help.

SQL connection timeout

We have been facing weird connection timeouts on one of our websites.
Our environment is composed of an IIS 7 web server (running on Windows Server 2008 R2 Standard Edition) and an SQL Server 2008 database server.
When debugging the website functionality that provokes the timeout, we notice that the connection itself takes milliseconds to complete, but the SqlCommand, which invokes a stored procedure on the database, hangs for several minutes during execution, then raises the timeout exception.
On the other hand, when we run the stored procedure directly on the database, it takes only 2 seconds to correctly finish execution.
We already tried the following:
Modified SqlCommand timeout on the website code
Modified execution timeout on the web.config file
Modified sessionState timeout on the web.config file
Modified authorization cookie timeout on the web.config file
Modified the connection timeout on the website properties on IIS
Modified the application pool shutdown time limit on IIS
Checked the application pool idle timeout on IIS
Checked the execution timeout on the SQL Server properties (it's set to 0, unlimited)
Tested the stored procedure directly on the database with other parameters
We appreciate any help.
Nirav
I've had this same issue with a stored procedure that was a search feature for the users. I tried everything, include ARTIHABORT etc. The SP joined many tables, as the users could search on anything. Many of the parameters for the SP were optional, meaning they had a default value of NULL in the SP. Nothing worked.
I "fixed" it by making sure my ADO.NET code only added parameters where the user selected a value. The SP went from many minutes to seconds in execution time. I'm assuming that SQL Server handled the execution plan better when only parameters with actual values were passed to the SP.
Note that this was for SQL Server 2000.
A few years ago I had a similar problem when migrating an app from SQL2000 to SQL2008.
I added OPTION (RECOMPILE) to the end of all the stored procs in the database that was having problems. In my case it had to do with parameters that were very different between calls to the stored proc. Forcing the proc to recompile will force SQL to come up with a new execution plan instead of trying to use a cached version that may be sub-optimal for the new params.
And in case you haven't done it already, check your indexes. Nothing can kill db performance like lack of a badly needed index. Here is a good link (http://sqlfool.com/2009/04/a-look-at-missing-indexes/) on a query that will display missing indexes.
Super-super late suggestion, but might come handy for others: A typical issue I saw and rather applicable to Java is the following:
You have a query which takes a string as a parameter. That string is search criterion on a varchar(N) column in the database. however, you submit the string param in the query as Unicode (nvarchar(N)). This will result in a full-table scan and conversion of every single field values to Unicode for proper comparison, to avoid potential data loss (if SQL Server converted the input param to non-Unicode, it may lose information).
Simple test: run the query twice (for the sake of simplicity, I'm assuming it's an SP):
exec spWhatever 'input'
exec spWhatever N'input'
See how they behave. Also, you may want to take a look at the Recent Expensive Queries section on the Activity Monitor in SSMS and ask for the execution plan, to clarify the situation.
Cheers,
Erik

LINQ to SQL Stored Procedure. Exception says it times out but it is not timing out

Website using .NET Framework v3.5, SQL Server 2008, written in C#
I have a stored procedure which I have added to my DBML by dragging it across from the server explorer.
In it's properties it returns Auto-generated type.
The procedure takes < 1 second to run from within SQL Mgmt Studio for all inputs.
However from the code for 1 particular input (which takes < 1 second in the Mgmt studio) it hangs and then throws:
System.Data.SqlClient.SqlException: Timeout expired.
This didn't always happen for this one input! It used to also work fine when called from the code. The last time it didn't work I deleted and re-added the same stored procedure to the DBML. This "fixed" it, and that input ran fine and in the same time as all the others. However this is not an adequate fix! It has happened again and I can't keep deleting and re-adding as required.
I made no changes to the data that's being returned during the point at which it was "fixed", so I can't think what the problem could be. Any help on this would be much appreciated!
Exception says it times out but it is
not timing out
If it says it's timing out, it's timing out. The only question is "why"?
Run a SQL Server Profiler trace against your database and see what query is actually going to the server. It's possible that another query is being issued too. It's possible there is another transaction interfering in your production scenario.
It turns out that this is parameter sniffing - this is explained in another post: Executing stored proc from DotNet takes very long but in SSMS it is immediate
Also, be sure that the stored procedure is not being held up inside of a transaction, waiting for another process to complete. I just ran across this with a Linq to Sql stored procedure being called multiple times within a transaction. It gave me a timeout expired error and I just realized it was waiting for a previous call to complete, and thus timing out.

sporadic ASP.NET data error: "Cannot find table 0"

Having deployed a new build of an ASP.NET site in a production environment, I am logging dozens of data errors every second, almost always with the error "Cannot find table 0." We use datasets and frequently refer to Table[0], and while I understand the defensive coding practice of checking the dataset for tables before accessing Table[0], it's never been a problem in the past. A certain page will load fine one second, and then be missing one of its data-driven components the next. Just seeing if this rings a bell for anyone.
More detail: I used a different build server this time, and while I imagine the compiler settings are the same on both, I have a hard time thinking that there's a switch that makes 50% of my database calls come back with no tables. I also switched the project to VS 2008, but then reverted all of those changes when I switched back to VS 2005. I notice that the built assembly has a new MyLibrary.XmlSerializers.dll, where it didn't used to, but I also can't imagine that that's causing all the trouble. (It also doesn't fall down on calls to MyLibrary, or at least no more than any other time.)
Updated to add: I've discovered that the troublesome build is a "Release" build, where the working build was compiled as "Debug". Could that explain it?
Rolling back to the build before these changes fixed it. (Rebooting the SQL Server, the step we tried before that, did not.)
The trouble also seems to be load-based - this cruised through our integration and QA environments without a problem, and even our smoke test environment - the one that points to production data - is fine under light load.
Does this have the distinguishing characteristics of anything you might have seen in the past?
Bumping this old question because we have encountered the same issue and perhaps our solution would give more insight in what causes this.
Essentially this problem occurs in a production environment that is under very heavy load in a Windows service that uses multiple threads to process several jobs simultaneously (100 users use the same DB via ASP.NET web app and there are about 60 transactions/second on older hardware with SQL Server 2000).
No variables are shared, that is connections are opened anew, transaction is started, operations executed, transaction committed and connection closes.
Under heavy load sometimes one of the following exceptions occurs:
NullReferenceException: Object reference not set to an instance of an
object.
at System.Data.SqlClient.SqlInternalConnectionTds.get_IsLockedForBulkCopy()
or
System.Data.SqlClient.SqlException:
The server failed to resume the transaction. Desc:3400000178
or
New request is not allowed to start because it should come with valid transaction descriptor
or
This SqlTransaction has completed; it is no longer usable
It seems somehow the connection that is within the pool becomes corrupted and remains associated with previously used transactions. Furthermore, if such connection is retrieved from pool then sqlAdapter.Fill(dataset) results in an empty dataset, causing "Cannot find table 0". Because our service would retry the operation (reading job list) on failure and it would always get the same corrupt connection from the pool it would fail with this error until restarted.
We removed the issue by using SqlConnection.ClearPool(connection) on exception to make sure this connection is discarded from the pool and restructuring the application so less threads access the same resources simultaneously.
I have no clue who exactly caused this issue so I am not sure we have really fixed that, maybe just made it so rare it had not occurred again yet.
I've fought precisely this error message before. The key is that an underlying data method is swallowing a timeout exception.
You're probably doing something like this:
var table = GetEmployeeDataSet().Tables[0];
GetEmployeeDataSet is swallowing an exception, probably a timeout exception, which is why it only happens sporadically - it happens under load. You need to do the following to fix it:
Modify the underlying code to not swallow the exception, but rather let it bubble up to the next level so you can identify it properly.
Identify the query(s) causing the problem, and then rewrite, reindex, denormalize or throw hardware at the problem. See this for more info: System.Data.SqlClient.SqlException: Timeout expired
I've seen something similar. I believe our problem had to do with failed sessions being re-used (once the session object failed it went into a poor state and could not recover.) We fixed it by increasing the memory for the session pool and increasing the frequency of the web application recycling.
It also was "caused" by a new version that at first blush did not seem to have any change to cause such an effect. However, eventually it became clear that the logic of the program was opening and closing a lot more connections (maybe 20% more) than it used to. This small change pushed the limit of our prior configuration.
You might check the SQL Server logs for errors. Or, the Web server event log. It sounds like your connection pool could be out of open connections or your db could be out.
Which database calls changed between versions?
The error is obviously telling you one of your database calls isn't returning any data on occasion; I can't think of any cases where a code/assembly issue would cause it.
I have seen something like this when doing something with nHibernate Sessions in a non-thread-safe manner. That would explain why you only see it under load. Would need to see your code to guess at what isn't thread-safe though.

Distributed transaction completed. Either enlist this session in a new transaction or the NULL transaction

Just curious if anyone else has got this particular error and know how to solve it?
The scenario is as follow...
We have an ASP.NET web application using Enterprise Library running on Windows Server 2008 IIS farm connecting to a SQL Server 2008 cluster back end.
MSDTC is turned on. DB connections are pooled.
My suspicion is that somewhere along the line there is a failed MSDTC transaction, the connection got returned to the pool and the next query on a different page is picking up the misbehaving connection and got this particular error. Funny thing is we got this error on a query that has no need whatsoever with distributed transaction (committing to two database, etc.). We were only doing select query (no transaction) when we got the error.
We did SQL Profiling and the query got ran on the SQL Server, but never came back (since the MSDTC transaction was already aborted in the connection).
Some other related errors to accompany this are:
New request is not allowed to start
because it should come with valid
transaction descriptor.
Internal .Net Framework Data Provider error 60.
MSDTC has default 90 seconds timeout, if one query execute exceed this time limit, you will encounter this error when the transaction is trying to commit.
A bounty may help get the answer you seek, but you're probably going to get better answers if you give some code samples and give a better description of when the error occurs.
Does the error only intermittently occur? It sounds like it from your description.
Are you enclosing the close that you want to be done as a transaction in a using TransactionScope block as Microsoft recommends? This should help avoid weird transaction behavior. Recall that a using block makes sure that the object is always disposed regardless of exceptions thrown. See here: http://msdn.microsoft.com/en-us/library/ms172152.aspx
If you're using TransactionScope there is an argument System.TransactionScopeOption.RequiresNew that tells the framework to always create a new transaction for this block of code:
Using ts As New Transactions.TransactionScope(Transactions.TransactionScopeOption.RequiresNew)
' Do Stuff
End Using
Also, if you're suspicious that a connection is getting faulted and then put back into the connection pool, the likely solution is to enclose the code that may fault the connection in a Try-Catch block and Dispose the connection in the catch block.
Old question ... but ran into this issue past few days.
Could not find a good answer until now. Just wanted to share what I found out.
My scenario contains multiple sessions being opened by multiple session factories. I had to correctly rollback and wait and make sure the other transactions were no longer active. It seems that just rolling back one of them will rollback everything.
But after adding the Thread.Sleep() between rollbacks, it doesn't do the other and continues fine with the rollback. Subsequent hits that trigger the method don't result in the "New request is not allowed to start because it should come with valid transaction descriptor." error.
https://gist.github.com/josephvano/5766488
I have seen this before and the cause was exactly what you thought. As Rice suggested, make sure that you are correctly disposing of the db related objects to avoid this problem.

Resources