Azure SqlException: Database on server is not currently available - asp.net

Our site has been running for a few weeks in Azure without getting this error:
SqlException: Database 'database' on server 'server' is not currently
available. Please retry the connection later. If the problem
persists, contact customer support, and provide them the session
tracing ID of 'guid'.
It finally got that one day when there were a little over 2K of active (concurrent) users. This is the closest question that I can find in SO. We are not using EF though but rather we're using Dapper. I'm out of ideas how to debug our application to find out what caused the issue, and it's even harder now that the issue has not come up for the past 2 days. I definitely need to be on the lookout and I need you guys, any tip, on where I should be looking into, what I need to do to determine the cause of the issue, and possibly fix it.

It sounds like you need to handle transient failures via some sort of transient fault handling mechanism. Here is post asking a similar question:
SQL Azure Database retry logic David's answer is similar to the approach we took do deal with the issue.
Here is another link to some code that is similar to the David's and our solution to get your head around it. http://www.getcodesamples.com/src/4A7E4E66/41D6FAD
We had similar issues when we first moved to SQL Azure but by implementing back-off retry logic for the transient connection issues the majority of the time it recovers after a few seconds.

We went down the path of handling transient errors with the Azure Transient Fault Block, but this caused bigger issues - namely, if you reach the SQL connection limit (easy to do), having retry logic in place only makes things worse.
If it only happens once a month, I'd leave it be, and just handle it gracefully higher up the stack. An alternative is to create a custom retry policy to avoid retrying on certain errors, but it may still do more harm than good.

Related

ORA-22337: the type of accessed object has been evolved - in application

Setting: ASP.Net application with Oracle backend, we utilize User Defined Types (UDTs) and use ODP.Net to communicate them between the front and back-ends.
Problem: I had to alter one of my UDTs attribute length, once I did that and tested in backend it worked fine, but when I run my site I keep getting the ORA-22337 error (in subject line)!!
You will not find much if you research this problem online, other than the useless Oracle error documentation you will not find anything helpful. The Oracle documentation says to close and re-open the connection, but that does not apply to my scenario
I already solved the problem by dropping and recreating the UDTs and NTs, but this is inefficient to have to do every time I need to modify one of my core UDTs, any ideas how to solve this without dropping and recreating everything?
If the error info says "Close and reopen the connection" as the solution and you are using a OracleConnection which has a connection pool in it, then simply Close()ing the connection is not good enough. It will just go back to the pool still open and when you "reconnnect" you will just get it back again. You'll need to Close all open connections and then call ClearPool() to make sure that all old connections in the pool are removed.

Mgo-based app code structure dealing with connection pool and tcp timeouts

I'm curious how should I structure JSON REST API server in Go language with Mgo library. I have dozens of collections related with each other. I've created the gist with sample part of file structure in my current approach.
It works great, but from time to time I encounter downtime caused by this error: "read tcp 10.168.30.100:37288: i/o timeout". I suppose that I handle mgo connection pool inapropriately. Are there any examples showing how should I create big applications based on mgo?
This error message implies a roundtrip to the database took longer than the timeout period you defined. Just increasing that timeout should get rid of the problem, assuming you don't have any real issues that are causing the application to behave in a sluggish manner.
In general, this error doesn't imply you have any kind of scale issues, other than the fact maybe you have an increasing amount of data in some collections and certain queries may be getting too slow and need re-thinking (indexes, etc).
There's also no need to restart the application. You can either Refresh the problematic session, or Close and re-create the session in case you're using copies of a master session. The state of mgo and the pool of connections is still fine. It's just warning you that this specific session observed an issue on the wire, and so you have to acknowledge it before the session will be valid again.
As usual, also make sure to be using the latest release to avoid problems that have already been fixed, if any.

T-Sql Error Handling and Logging

I am trying to maximize the benefits from an experience.
Also I usually use Enterprise library logging block, I log errors and a portion of statistical information into the database, because it is centralized place to track errors, if database logging failed, Normally it goes to Event Log.
Tracing messages should go into file:
Which choice you believe we should go
1- Only Some tracing messages can be left in code if there is a complex algorithm or unstable module.
OR
2- We should not keep any tracing messages in code, clean it up as soon as bug is resolved.
For database.
I think that Errors raised from SP and functions should be logged into another table in the database, and that exactly what is done by AdventureWorksLT2008 database.
Is it a bad idea to log database events directly to Enterprise library Log table without raising this errors to next tier. I think it is more fixable, because I can put more custom information in the message. of course some errors will not be handled and will reach the next tier.
Any ideas, or comments, something else you do. something you want to clarify.
Thanks
Are you talking about catching errors and logging directly in T-SQL and not then doing RAISERROR to get it to the caller?
I think that's a viable strategy for certain kinds of issues - for instance, if an SP wants to find a problem and correct it silently and simply issue a warning.
But the kind of issues it would apply to might not be terribly frequent.
The kind of things I would think about are things like unusual cases where unexpected UPDATEs are done instead of INSERTs? Or where data already exists so is not generated. Or in a deployment or build script which skips an existing table, etc.
What if your database has performance issues and SP/functions start timing out - logging the error to the database may not work?

sporadic ASP.NET data error: "Cannot find table 0"

Having deployed a new build of an ASP.NET site in a production environment, I am logging dozens of data errors every second, almost always with the error "Cannot find table 0." We use datasets and frequently refer to Table[0], and while I understand the defensive coding practice of checking the dataset for tables before accessing Table[0], it's never been a problem in the past. A certain page will load fine one second, and then be missing one of its data-driven components the next. Just seeing if this rings a bell for anyone.
More detail: I used a different build server this time, and while I imagine the compiler settings are the same on both, I have a hard time thinking that there's a switch that makes 50% of my database calls come back with no tables. I also switched the project to VS 2008, but then reverted all of those changes when I switched back to VS 2005. I notice that the built assembly has a new MyLibrary.XmlSerializers.dll, where it didn't used to, but I also can't imagine that that's causing all the trouble. (It also doesn't fall down on calls to MyLibrary, or at least no more than any other time.)
Updated to add: I've discovered that the troublesome build is a "Release" build, where the working build was compiled as "Debug". Could that explain it?
Rolling back to the build before these changes fixed it. (Rebooting the SQL Server, the step we tried before that, did not.)
The trouble also seems to be load-based - this cruised through our integration and QA environments without a problem, and even our smoke test environment - the one that points to production data - is fine under light load.
Does this have the distinguishing characteristics of anything you might have seen in the past?
Bumping this old question because we have encountered the same issue and perhaps our solution would give more insight in what causes this.
Essentially this problem occurs in a production environment that is under very heavy load in a Windows service that uses multiple threads to process several jobs simultaneously (100 users use the same DB via ASP.NET web app and there are about 60 transactions/second on older hardware with SQL Server 2000).
No variables are shared, that is connections are opened anew, transaction is started, operations executed, transaction committed and connection closes.
Under heavy load sometimes one of the following exceptions occurs:
NullReferenceException: Object reference not set to an instance of an
object.
at System.Data.SqlClient.SqlInternalConnectionTds.get_IsLockedForBulkCopy()
or
System.Data.SqlClient.SqlException:
The server failed to resume the transaction. Desc:3400000178
or
New request is not allowed to start because it should come with valid transaction descriptor
or
This SqlTransaction has completed; it is no longer usable
It seems somehow the connection that is within the pool becomes corrupted and remains associated with previously used transactions. Furthermore, if such connection is retrieved from pool then sqlAdapter.Fill(dataset) results in an empty dataset, causing "Cannot find table 0". Because our service would retry the operation (reading job list) on failure and it would always get the same corrupt connection from the pool it would fail with this error until restarted.
We removed the issue by using SqlConnection.ClearPool(connection) on exception to make sure this connection is discarded from the pool and restructuring the application so less threads access the same resources simultaneously.
I have no clue who exactly caused this issue so I am not sure we have really fixed that, maybe just made it so rare it had not occurred again yet.
I've fought precisely this error message before. The key is that an underlying data method is swallowing a timeout exception.
You're probably doing something like this:
var table = GetEmployeeDataSet().Tables[0];
GetEmployeeDataSet is swallowing an exception, probably a timeout exception, which is why it only happens sporadically - it happens under load. You need to do the following to fix it:
Modify the underlying code to not swallow the exception, but rather let it bubble up to the next level so you can identify it properly.
Identify the query(s) causing the problem, and then rewrite, reindex, denormalize or throw hardware at the problem. See this for more info: System.Data.SqlClient.SqlException: Timeout expired
I've seen something similar. I believe our problem had to do with failed sessions being re-used (once the session object failed it went into a poor state and could not recover.) We fixed it by increasing the memory for the session pool and increasing the frequency of the web application recycling.
It also was "caused" by a new version that at first blush did not seem to have any change to cause such an effect. However, eventually it became clear that the logic of the program was opening and closing a lot more connections (maybe 20% more) than it used to. This small change pushed the limit of our prior configuration.
You might check the SQL Server logs for errors. Or, the Web server event log. It sounds like your connection pool could be out of open connections or your db could be out.
Which database calls changed between versions?
The error is obviously telling you one of your database calls isn't returning any data on occasion; I can't think of any cases where a code/assembly issue would cause it.
I have seen something like this when doing something with nHibernate Sessions in a non-thread-safe manner. That would explain why you only see it under load. Would need to see your code to guess at what isn't thread-safe though.

Distributed transaction completed. Either enlist this session in a new transaction or the NULL transaction

Just curious if anyone else has got this particular error and know how to solve it?
The scenario is as follow...
We have an ASP.NET web application using Enterprise Library running on Windows Server 2008 IIS farm connecting to a SQL Server 2008 cluster back end.
MSDTC is turned on. DB connections are pooled.
My suspicion is that somewhere along the line there is a failed MSDTC transaction, the connection got returned to the pool and the next query on a different page is picking up the misbehaving connection and got this particular error. Funny thing is we got this error on a query that has no need whatsoever with distributed transaction (committing to two database, etc.). We were only doing select query (no transaction) when we got the error.
We did SQL Profiling and the query got ran on the SQL Server, but never came back (since the MSDTC transaction was already aborted in the connection).
Some other related errors to accompany this are:
New request is not allowed to start
because it should come with valid
transaction descriptor.
Internal .Net Framework Data Provider error 60.
MSDTC has default 90 seconds timeout, if one query execute exceed this time limit, you will encounter this error when the transaction is trying to commit.
A bounty may help get the answer you seek, but you're probably going to get better answers if you give some code samples and give a better description of when the error occurs.
Does the error only intermittently occur? It sounds like it from your description.
Are you enclosing the close that you want to be done as a transaction in a using TransactionScope block as Microsoft recommends? This should help avoid weird transaction behavior. Recall that a using block makes sure that the object is always disposed regardless of exceptions thrown. See here: http://msdn.microsoft.com/en-us/library/ms172152.aspx
If you're using TransactionScope there is an argument System.TransactionScopeOption.RequiresNew that tells the framework to always create a new transaction for this block of code:
Using ts As New Transactions.TransactionScope(Transactions.TransactionScopeOption.RequiresNew)
' Do Stuff
End Using
Also, if you're suspicious that a connection is getting faulted and then put back into the connection pool, the likely solution is to enclose the code that may fault the connection in a Try-Catch block and Dispose the connection in the catch block.
Old question ... but ran into this issue past few days.
Could not find a good answer until now. Just wanted to share what I found out.
My scenario contains multiple sessions being opened by multiple session factories. I had to correctly rollback and wait and make sure the other transactions were no longer active. It seems that just rolling back one of them will rollback everything.
But after adding the Thread.Sleep() between rollbacks, it doesn't do the other and continues fine with the rollback. Subsequent hits that trigger the method don't result in the "New request is not allowed to start because it should come with valid transaction descriptor." error.
https://gist.github.com/josephvano/5766488
I have seen this before and the cause was exactly what you thought. As Rice suggested, make sure that you are correctly disposing of the db related objects to avoid this problem.

Resources