Database transaction timeout problem - asp.net

I have been experiencing an error that I believe is caused by the database timing out due to a large amount of data being processed and read to the database.
I keep getting this error message:
Distributed transaction completed.
Either enlist this session in a new
transaction or the NULL transaction.
I timed how long it takes to timeout and it is constantly around 60 seconds. Hence, I thought that it might have something to do with the transaction timeout limit (default 60s) set in Component Services (Windows XP). I increased it to 300 seconds.
When that didn't work, I edited the machine.config file by adding:
<system.transactions>
<machineSettings maxTimeout="02:00:00" />
</system.transactions>
This did not work either.
I don't believe it has anything to do with my data. It is read from an excel spreadsheet. It runs fine when I cut the spreadsheet into two separate files.
Hopefully, I'm just missing something simple like another max timeout setting somewhere.
Hopefully, somebody has run into this before!
EDIT: I am using SQL Server and Linq2SQL.

Andrew, I don't think this is caused by a timeout. Otherwise you would receive a specific timeout error. More than likely this is a programming error. I've encountered this myself, and almost always it was my poorly crafted code causing the issue.
I think you have another issue. It's not a good idea to have transactions running for this long. If your company has DBA's, they will likely, for good reason, throw fits over this. You're locking a lot of resources for a long period of time. Something is going to suffer for this.
BTW, if you're concerned about timeouts, check the timeout setting on your connection string.
Randy

I'm not sure why, but I fixed the problem by going into web.config and adding:
<system.transactions>
<defaultSettings timeout="02:00:00"/>
</system.transactions>
I thought that this setting would be inherited from machine.config. Perhaps they are two different timeout settings? I don't know.
If anyone has additional clarification, please comment!
EDIT 1: Also, if anyone is using ASP.NET Ajax controls, be sure to increase the script manager's AsyncPostBackTimeout property to accommodate a longer period as well.
EDIT 2: I removed the lines I added to the machine.config and reset the distributed transaction timeout setting to it's default. This appeared to have no affect and my program ran fine with just the changes to the web.config file and the script manager.

To change the timeout, you can set the CommandTimeout property in your data context:
var db = new YourDataContext();
db.CommandTimeout = 300;
Having said that, any time you have a distributed transaction, it's worth taking a careful look at the reasons why, and trying to avoid them if at all possible -- your issue may well be related to that rather than a timeout....

Related

Azure SqlException: Database on server is not currently available

Our site has been running for a few weeks in Azure without getting this error:
SqlException: Database 'database' on server 'server' is not currently
available. Please retry the connection later. If the problem
persists, contact customer support, and provide them the session
tracing ID of 'guid'.
It finally got that one day when there were a little over 2K of active (concurrent) users. This is the closest question that I can find in SO. We are not using EF though but rather we're using Dapper. I'm out of ideas how to debug our application to find out what caused the issue, and it's even harder now that the issue has not come up for the past 2 days. I definitely need to be on the lookout and I need you guys, any tip, on where I should be looking into, what I need to do to determine the cause of the issue, and possibly fix it.
It sounds like you need to handle transient failures via some sort of transient fault handling mechanism. Here is post asking a similar question:
SQL Azure Database retry logic David's answer is similar to the approach we took do deal with the issue.
Here is another link to some code that is similar to the David's and our solution to get your head around it. http://www.getcodesamples.com/src/4A7E4E66/41D6FAD
We had similar issues when we first moved to SQL Azure but by implementing back-off retry logic for the transient connection issues the majority of the time it recovers after a few seconds.
We went down the path of handling transient errors with the Azure Transient Fault Block, but this caused bigger issues - namely, if you reach the SQL connection limit (easy to do), having retry logic in place only makes things worse.
If it only happens once a month, I'd leave it be, and just handle it gracefully higher up the stack. An alternative is to create a custom retry policy to avoid retrying on certain errors, but it may still do more harm than good.

Mgo-based app code structure dealing with connection pool and tcp timeouts

I'm curious how should I structure JSON REST API server in Go language with Mgo library. I have dozens of collections related with each other. I've created the gist with sample part of file structure in my current approach.
It works great, but from time to time I encounter downtime caused by this error: "read tcp 10.168.30.100:37288: i/o timeout". I suppose that I handle mgo connection pool inapropriately. Are there any examples showing how should I create big applications based on mgo?
This error message implies a roundtrip to the database took longer than the timeout period you defined. Just increasing that timeout should get rid of the problem, assuming you don't have any real issues that are causing the application to behave in a sluggish manner.
In general, this error doesn't imply you have any kind of scale issues, other than the fact maybe you have an increasing amount of data in some collections and certain queries may be getting too slow and need re-thinking (indexes, etc).
There's also no need to restart the application. You can either Refresh the problematic session, or Close and re-create the session in case you're using copies of a master session. The state of mgo and the pool of connections is still fine. It's just warning you that this specific session observed an issue on the wire, and so you have to acknowledge it before the session will be valid again.
As usual, also make sure to be using the latest release to avoid problems that have already been fixed, if any.

IIS Worker process hangs forever on first request

I am working on solving a problem that I have had for a couple of days now. Every time one of my sites are rebuild or the AppPool is recycled, the first pageload will hang forever (well, I've only waited up to 30 minutes). It is only happening on one particular site out of ~10 sites. It is an ASP.NET site.
Here are the things I have observed:
In IIS Manager under worker processes I can see the request. Verb = GET, Sate = ExecuteRequestHandler, Module Name = ManagedPipelineHandler. Time Elapsed just keeps increasing, of course.
If I close down the browser in which I made the initial request and then open a new one to make another request, the page will load instantly.
In my code the Application_Start of my Global.asax file is not called on the first request. It is called on the second request.
The worker proccess is causing the memory usage on my machine to go through the roof
I'm inexperienced in troubleshooting IIS, but hours and hours of searching has led me nowhere.
The only major code change we have made on the site recently is that we have started implementing logging using log4net. I have though tried to remove any log4net code, both from my web.config file and Global.asax - still no luck.
Has anyone else experienced this and if so how did you solve it?
Any and all help will be much appreciated.
ADD:
If I place a .txt file in the root of the site and load that as the first thing after a build it will load instantly.
However the worker proccess still acts exactly as before and the memory usage still goes through the roof.
Final edit:
I feel like such an idiot. I can't explain why, but for some reason my break points in Global.asax suddenly got hit and I was able to identify the problem. It was a call to a database via Entity Framework that was badly written - i.e. the filtering was done after all the rows from the column in question had been fetched. And to make it worse, the filtering was done inside a foreach loop. Anyway, now everything is back to normal and I'm happy.
Possibly stating the obvious but you haven't got any silly code in your global asax in the app_start that could be causing this?
Sounds like an infinite loop or something?
Just a quick note what happend in my case:
Neither Process Monitor nor Failed Request Tracing was of any help. The website simply loaded (nearly) forever.
Finally, after waiting for several minutes an error occurred stating that it "cannot locate the network path".
The reason was that I entered a connection string to a non-existing SQL Server instance, so it somehow keept searching for the server. Finally, a timeout occured.
The solution was to simply specify the correct SQL Server in the connection string inside Web.Config.

sporadic ASP.NET data error: "Cannot find table 0"

Having deployed a new build of an ASP.NET site in a production environment, I am logging dozens of data errors every second, almost always with the error "Cannot find table 0." We use datasets and frequently refer to Table[0], and while I understand the defensive coding practice of checking the dataset for tables before accessing Table[0], it's never been a problem in the past. A certain page will load fine one second, and then be missing one of its data-driven components the next. Just seeing if this rings a bell for anyone.
More detail: I used a different build server this time, and while I imagine the compiler settings are the same on both, I have a hard time thinking that there's a switch that makes 50% of my database calls come back with no tables. I also switched the project to VS 2008, but then reverted all of those changes when I switched back to VS 2005. I notice that the built assembly has a new MyLibrary.XmlSerializers.dll, where it didn't used to, but I also can't imagine that that's causing all the trouble. (It also doesn't fall down on calls to MyLibrary, or at least no more than any other time.)
Updated to add: I've discovered that the troublesome build is a "Release" build, where the working build was compiled as "Debug". Could that explain it?
Rolling back to the build before these changes fixed it. (Rebooting the SQL Server, the step we tried before that, did not.)
The trouble also seems to be load-based - this cruised through our integration and QA environments without a problem, and even our smoke test environment - the one that points to production data - is fine under light load.
Does this have the distinguishing characteristics of anything you might have seen in the past?
Bumping this old question because we have encountered the same issue and perhaps our solution would give more insight in what causes this.
Essentially this problem occurs in a production environment that is under very heavy load in a Windows service that uses multiple threads to process several jobs simultaneously (100 users use the same DB via ASP.NET web app and there are about 60 transactions/second on older hardware with SQL Server 2000).
No variables are shared, that is connections are opened anew, transaction is started, operations executed, transaction committed and connection closes.
Under heavy load sometimes one of the following exceptions occurs:
NullReferenceException: Object reference not set to an instance of an
object.
at System.Data.SqlClient.SqlInternalConnectionTds.get_IsLockedForBulkCopy()
or
System.Data.SqlClient.SqlException:
The server failed to resume the transaction. Desc:3400000178
or
New request is not allowed to start because it should come with valid transaction descriptor
or
This SqlTransaction has completed; it is no longer usable
It seems somehow the connection that is within the pool becomes corrupted and remains associated with previously used transactions. Furthermore, if such connection is retrieved from pool then sqlAdapter.Fill(dataset) results in an empty dataset, causing "Cannot find table 0". Because our service would retry the operation (reading job list) on failure and it would always get the same corrupt connection from the pool it would fail with this error until restarted.
We removed the issue by using SqlConnection.ClearPool(connection) on exception to make sure this connection is discarded from the pool and restructuring the application so less threads access the same resources simultaneously.
I have no clue who exactly caused this issue so I am not sure we have really fixed that, maybe just made it so rare it had not occurred again yet.
I've fought precisely this error message before. The key is that an underlying data method is swallowing a timeout exception.
You're probably doing something like this:
var table = GetEmployeeDataSet().Tables[0];
GetEmployeeDataSet is swallowing an exception, probably a timeout exception, which is why it only happens sporadically - it happens under load. You need to do the following to fix it:
Modify the underlying code to not swallow the exception, but rather let it bubble up to the next level so you can identify it properly.
Identify the query(s) causing the problem, and then rewrite, reindex, denormalize or throw hardware at the problem. See this for more info: System.Data.SqlClient.SqlException: Timeout expired
I've seen something similar. I believe our problem had to do with failed sessions being re-used (once the session object failed it went into a poor state and could not recover.) We fixed it by increasing the memory for the session pool and increasing the frequency of the web application recycling.
It also was "caused" by a new version that at first blush did not seem to have any change to cause such an effect. However, eventually it became clear that the logic of the program was opening and closing a lot more connections (maybe 20% more) than it used to. This small change pushed the limit of our prior configuration.
You might check the SQL Server logs for errors. Or, the Web server event log. It sounds like your connection pool could be out of open connections or your db could be out.
Which database calls changed between versions?
The error is obviously telling you one of your database calls isn't returning any data on occasion; I can't think of any cases where a code/assembly issue would cause it.
I have seen something like this when doing something with nHibernate Sessions in a non-thread-safe manner. That would explain why you only see it under load. Would need to see your code to guess at what isn't thread-safe though.

SQL Server requests time out as TempGetStateItemExclusive is getting called continuously

I run a site with decent traffic (~100,000 page views per day) and sporadically the site has been brought to its knees due to SQL Server timeout errors.
When I run SQL Profiler, I see a command getting called hundreds of times a second like this:
...
exec dbo.TempGetStateItemExclusive3 #id=N'ilooyuja4bnzodienj3idpni4ed2081b',...
...
We use SQL Server to store ASP.NET session state. The above is the stored procedure called to grab the session state for a given session. It seems to be looping, asking for the same 2 or 3 sessions over and over.
I found a promising looking hot fix that seems to address this exact situation, but it doesn't seem to have solved the problem for us. (I'm assuming this hotfix is included in the most recent .NET service pack, because it doesn't look like you can install it directly anymore). I added that registry key manually, but we still see the looping stored procedure calls like above (requesting the same session much more often than every 500ms)
I haven't been able to recreate this on a development machine. When two requests are made for the same session ID, it seems to block correctly, and even try to hit SQL until the first page releases the session.
Any ideas? Thank you in advance!!!
This may be one of those cases where I needed an answer to a different question. The question should have been "Why am I using SQL to store session state information?" SQL is much slower, and much more disconnected from the web server, both of which may have contributed to this problem. I looked up the size of our ASPStateTempSessions table and realized it was only about 1MB. We moved back to <sessionState mode="InProc" ... /> and the problem is fixed (And the site runs faster)
The next step, when traffic dictates, would be to add another servers and use the "StateServer" mode so we can spread out the memory usage.
I think I originally made this move to deal with a memory bottle neck which is no longer an issue. (This is not a good solution to dealing with a memory bottle neck, FYI!)
IMPORTANT EDIT: Ok, so it turns out that the whole "TempGetStateItemExclusive" thing was not the problem, it was just a symptom of another problem. We had some queries that were causing blocking issues, so every SQL request would just get kicked out. The actual fix was to identify and fix the blocking issues. (I still believe that "InProc" is the way to go, though) This link helped a lot to identify our issues:
http://www.simple-talk.com/sql/sql-tools/how-to-identify-blocking-problems-with-sql-profiler/
It's been some time, but its there not a cleanup job that runs to remove stale sessions? Is it enabled.
This old KB mentions it. Like I said, it's been a while.
Just out of curiosity. Have you opened up that proc to see what it does?
If it's just making a select statement, you might look to see if it is using NOLOCK or not. If not, add NOLOCK to it and see what happens.

Resources