I run a site with decent traffic (~100,000 page views per day) and sporadically the site has been brought to its knees due to SQL Server timeout errors.
When I run SQL Profiler, I see a command getting called hundreds of times a second like this:
...
exec dbo.TempGetStateItemExclusive3 #id=N'ilooyuja4bnzodienj3idpni4ed2081b',...
...
We use SQL Server to store ASP.NET session state. The above is the stored procedure called to grab the session state for a given session. It seems to be looping, asking for the same 2 or 3 sessions over and over.
I found a promising looking hot fix that seems to address this exact situation, but it doesn't seem to have solved the problem for us. (I'm assuming this hotfix is included in the most recent .NET service pack, because it doesn't look like you can install it directly anymore). I added that registry key manually, but we still see the looping stored procedure calls like above (requesting the same session much more often than every 500ms)
I haven't been able to recreate this on a development machine. When two requests are made for the same session ID, it seems to block correctly, and even try to hit SQL until the first page releases the session.
Any ideas? Thank you in advance!!!
This may be one of those cases where I needed an answer to a different question. The question should have been "Why am I using SQL to store session state information?" SQL is much slower, and much more disconnected from the web server, both of which may have contributed to this problem. I looked up the size of our ASPStateTempSessions table and realized it was only about 1MB. We moved back to <sessionState mode="InProc" ... /> and the problem is fixed (And the site runs faster)
The next step, when traffic dictates, would be to add another servers and use the "StateServer" mode so we can spread out the memory usage.
I think I originally made this move to deal with a memory bottle neck which is no longer an issue. (This is not a good solution to dealing with a memory bottle neck, FYI!)
IMPORTANT EDIT: Ok, so it turns out that the whole "TempGetStateItemExclusive" thing was not the problem, it was just a symptom of another problem. We had some queries that were causing blocking issues, so every SQL request would just get kicked out. The actual fix was to identify and fix the blocking issues. (I still believe that "InProc" is the way to go, though) This link helped a lot to identify our issues:
http://www.simple-talk.com/sql/sql-tools/how-to-identify-blocking-problems-with-sql-profiler/
It's been some time, but its there not a cleanup job that runs to remove stale sessions? Is it enabled.
This old KB mentions it. Like I said, it's been a while.
Just out of curiosity. Have you opened up that proc to see what it does?
If it's just making a select statement, you might look to see if it is using NOLOCK or not. If not, add NOLOCK to it and see what happens.
Related
in an old ASP.NET Web Forms application, developed in Visual Studio 2010,
suddenly does not run anymore, and in the log file appears this message:
Session state has created a session id,
but cannot save it because the response was already flushed by the application.
No new deployment has been made, and no code modifications take place.
Until now I didn't find any solution to this.
What I have to check?
I state that the source code is no longer available, and therefore it would be very difficult to change the code and proceed with a new deployment.
Thanks in advance.
Luis
This would suggest that someone might be hitting the site and jumping directly to some URL (and thus code) that say does some response redirect to another page or some such.
Remember, when code behind runs, and say re-directs to another page, in most cases the running code for the current page is terminated, and that is normal behaviors.
However, the idea that you going to debug code and debug a web site when you don't have the code to debug? Gee, I don't see how that's going to work at all. As noted, if this just started, then it sounds like incoming requests are to pages that don't expect to be hit "first", but some pages that expect to be ONLY called from other pages in the site when some session() and imporant values are setup BEFORE such pages are to be hit.
It also not clear if the site is using sql based sessions, or just in-memory sessions. In memory can (and is) faster, but it also not particually relaible. Now, if you deployed to a new web server or new hosting, then often session errrors can now start to appear, and this is due to the MASSIVE HUGE LARGE DIFFERENT of using cloud based hosting vs that of older hosting soluions that run on a single server.
Clould computing is real utility computing, and thus when you host a web site on such systems, then in-memory session() cannot be used anymore, since multiple servers can and will be used to "dish out" web pages. Since more then one server might be used, then obvisouly in-memory sesson() can't work, since a few web pages might be served out by one server, and then a few more pages might be served out by another server. And using shared memory for a session is limited to ONE server, since multiplel servers don't and can't transfer their memory to other servers.
So, this suggests that you want to be sure that sql server based sessions are being used here - and for any kind of server farm, or any kind of system that does load balances between more then one server, then of course you HAVE to use sql server based sessions, since in memory can't work in that kind of environment.
The error could also be due to excessive server loads - often the session database is "locked" for a short period of time, and can thus often be a bottleneck. So, for say years you might not see a issue, but then as load and use of the web site increases, then this can become noticed where as in the past it was not. I suppose the database used for storing sessions could be checked, or looked at, since as you note, you don't have the ability to test + debug the code. I would REALLY but REALLY work towards solving and fixing this lack of source code for the web site, since without that, you have really no means to manage, maintain, and fix issues for that web site.
But, abrupt terminations of web pages? As noted, this could be a error triggered in code, and thus the page never finished what it was supposed to do. And as noted, perhaps a page that expects some session() values but does not have them as explained above could be the problem. It not clear if your errors also shows what URL this was occurring for.
While nothing seems to have changed - something obviously did.
Ultimate, you need to get that source code, or deal with the people + vendor that supplies the code for that site. If you don't have a vendor, and you don't have source code, you quite much attempting to work on a car that you cant even open the hood to check what's going on under that hood.
so, one suggestion here? Someone is hitting a page that expected some value(s) in session to exist. Often the simple solution is to shove ANY simple or dummy value into session so session REALLY does get created, and then when the page attempts to save the session(), there is one to save!!!
In other words, this error often occurs when session is attempted to be saved, but no sesison exists. For such pages, as noted, a simple tiny small code change of doing this session("zoozoo") = "my useless text" will fix this error. So, it sounds like session is being lost.
As noted, a error on a web page can also trigger a app-pool re-start. If app-pool re-starts, then session is lost (in memory session). Now, with session being lost, then any page that decides to terminate early AND ALSO having used session() will trigger this error.
So, this sounds like app-pool is being re-started and session is being lost. (you can google why app-pool restarts and for the many reasons). However, critical to this issue would be are you using sql based sessions, or in-memory (server) sessions? So, this sounds like some code is triggering a error, and with a error triggered, app-pool re-starts. And with app-pool being restarted, then in-memory session is blown away. And now, without ANY session at all, then attempts to save the session trigger the exact error message you see. (and this is why shoving a dummy value into the session allows and can fix some pages - since you can't save a "nothing" session, and if you do, then you get that exact error message.
but, as noted, you can't make these simple changes to code anyway, right?
But, first on this issue - are you using memory based sessions or not? And that feature can be setup and configured in IIS, and without changes to the code base. So, one quick fix might be to turn on sql server based sessions. It will cost web site performance (10%), but the increased reliability is more then worth the performance hit.
Another area to look at? Are AJAX calls being made to a page, and again without any previous session having been created? So, once again, we down to a change in end user behaviors, and possible those hitting a page first before having logged in, or done other things - and again one would see this error crop up.
Yesterday, my customer played with the IIS settings and he changed the number of Working Process to 2 which made my Web Application ran very weird, the Session State sometimes was lost, sometimes was recovered and took me a day to find out what happened. So, the question is Multiple Working Process is useful in what situation?
It can be useful for scaling a web app vertically. Especially poorly designed ones that do too much work in the web processes or one where processes crash frequently so you always have a hot one. It isn't an option that should be exercised lightly as you have found out, but it is good to have when you need it.
The reason your user sessions started to fail was because you are using the default in-process session state module. This is fairly easy to fix as well -- just run the session state out of process using either the session state service or a database. Note that some behavior of the session state changes when you do this as well, so you will need to test carefully that you don't break something else.
Here's the scenario...
We have an internal website that is running the latest version of the ODAC (Oracle Client). It opens database connections, runs a stored procedure or packaged method, then disconnects. Connection pooling is turned on, and we are currently under version 11g in both our development and test environments, but under 10gR2 in our production environment. This happens on Production.
A few days ago, a process began firing off a ORA-2020 error. The process is called from a webpage on our internal website. The user simply sets a date, hits a button, and a job is started on another system that is separate from the website. The call itself, however, uses a database link to run a function.
We've scoured the SQL to find that it only uses that one database link. And since these links are on a per session basis and the user isn't exceeding the default limit of 4, how is it possible that we are getting a ORA-2020 error.
We have ran a number of tests to try to push over the default limit of 4. ODAC, from what I recall, runs a commit after each connection, and I can't seem to run 4 DB links, then run a piece of SQL with 1 DB link directly after with any errors. The only way I can bring up this error is if I run a query with 4 DB links, then a function or piece of dynamic SQL with a database link within it. We don't have that problem as this issue is sporadic. It isn't ALWAYS happening.
Questions
Is it possible that connection pooling is allowing User B to use User A's connection after the initial process was run, thus adding to the open links number if User B runs a SQL statement with more database links?
Is this a scenario where we should up our limit past 4? What are the disadvantages of increasing the number?
Do I need to explicitly close open database links before disconnecting from the database? Oracle documentation seems to suggest it should automatically happen, but "on occasion"... doesn't.
Firstly, the simple solution: I'd double check that in the production database the number of default links is actually 4.
select *
from v$system_parameter
where name = 'OPEN_LINKS'
Assuming you're not going to get off that lightly:
Is it possible that connection pooling is allowing User B to use User
A's connection after the initial process was run, thus adding to the
open links number if User B runs a SQL statement with more database
links?
You say that you explicitly close the session, which, according to the documentation, should mean that all links associated with that session are closed. Other than that I confess complete ignorance on this point.
Is this a scenario where we should up our limit past 4? What are the
disadvantages of increasing the number?
There aren't any disadvantages that I can think of. Tom Kyte suggests, albeit a long time ago, that each open database link uses 500k of PGA memory. If you don't have any then this will obviously cause a problem but it should be more than fine for most situations.
There are, however, unintended consequences: Imagine that you up this number to 100. Somebody codes something that continually opens links and draws a lot of data through all them select * from my_massive_table or similar. Instead of 4 sessions doing this you have 100, which is attempting to transfer hundreds of gigabytes simultaneously. Your network dies under the strain...
There's probably more but you get the picture.
Do I need to explicitly close open database links before disconnecting
from the database? Oracle documentation seems to suggest it should
automatically happen, but "on occasion"... doesn't.
As you've noted the best answer is "probably not", which isn't much help. You don't mention exactly how you're terminating the session but if you're killing it rather than closing gracefully then definitely.
Using a database link spawns a child process on the remote server. Because your server is no longer in absolute charge of this process there's a myriad of things that could cause it to become orphaned or otherwise not close on termination of the parent process. By no means does this happen the whole time but it can and does.
I would do two things.
In your process, if an exception is encountered, e-mail the results of the following query to yourself.
select *
from v$dblink
As a minimum at least you will know what database links are open in the session and give you some way of tracing them.
Follow the documentations advice; specifically the following:
"You may have occasion to close the link manually. For example, close
links when:
The network connection established by a link is used infrequently in an application.
The user session must be terminated."
The first seems to exactly fit your situation. Unless your process is time-sensitive, which doesn't seem to be the case, then what have you got to lose? The syntax is:
alter session close database link <linkname>
We ended up increasing the link amount, but we never did find the root cause.
I'm looking to create a webpage that will reflect the status of one of my company's servers automatically. Frequently there will be a minor error that only lasts 2-3 minutes, and it would be great to have this reflected on a self-generated page, which might prevent 50-60 unhappy clients from calling in simultaneously and asking what's wrong.
I'm not quite sure where to begin - would anyone have a suggestions for good resources to study? Programming examples? I'm not referring to the basics of writing an ASP.NET page, of course, but rather process interaction in Windows.
Thanks.
To pull this off, you'd need a separate page that essentially runs server diagnostics, otherwise the page wouldn't know if it was up or down. Also, the page would need to be isolated from the sort of problems that are kill other people's requests, such as cache hit problems, memory starvation, high CPU usage, insufficient bandwidth. So ideally the diagnostics would run in a separate app-pool, separate virtual directory, separate machine.
Many of the interesting diagnostics would require a WMI call, but some you can get from the My.Computer namespace.
Also, are you going to do this on every server, or do you want one web server to display the status of several different servers?
It also depends on the type of errors your servers are encountering.
If they are going down completely, or are losing internet connection, then pinging them after an interval of time will let you know if they are up or not.
If you have a specific process running on a server that becomes unavailable, that can be a little more tricky.
Your best bet is to find a way to do a simple request from the services/applications that are important and see if you get a response, if you do, the server is likely up, if not, then it is likely not.
Anything you can do to reduce the number of support calls you get is a good idea, but I'd also focus some time and try to figure out why your servers are going down so often.
Also, telling your users that the server is down, but not giving a reason why may not give the effect you are looking for. Users will still be confused and frustrated when they can't get their work done.
I know you were looking to build a webpage to display the server diagnostics, but there are plenty of server monitoring tools that produce webpages for an easy dashboard view of the history.
A quick google returned the following link:
http://www.webdesignbooth.com/10-really-useful-server-monitoring-tools/
I have a query from a web site that takes 15-30 seconds while the same query runs in .5 seconds from SQL Server Management studio. I cannot see any locking issues using SQL Profiler, nor can I reproduce the delay manually from SSMS. A week ago, I detached and reattached the database which seemed to miraculously fix the problem. Today when the problem reared its ugly head again, I tried merely rebuilding the indexes. This also fixed the problem. However, I don't think it's necessarily an index problem since the indexes wouldn't be automatically rebuilt on a simple detach/attach, to my knowledge.
Any idea what could be causing the delay? My first thought was that perhaps some parameter sniffing on the stored procedure being called (said stored proc runs a CTE, if that matters) was causing a bad query plan, which would explain the intermittent nature of the problem. Since both detaching / reattaching and an index rebuild should theoretically invalidate the cached query plan, this makes sense, but I'm unsure how to verify this. Additionally, why wouldn't the same query (copied directly from SQL Profiler with the exact same parameters) exhibit the same delay when run manually through SSMS?
Any thoughts?
I know I am weighing in on this topic very late, but I wanted to post a solution that I found when having a similar issue. In brief, adding the SET ARITHABORT ON command at the outset of my procedures brought website query performance in line with performance seen from SQL Server tools. This option is typically being set on the connection when you run a query from QA or SSMS (you can change that option, but it is the default).
In my case, I had about 15 different stored procs doing mathematical aggregates (SUMs, COUNTs, AVGs, STDEVs) across a fairly sizeable set of data (10s to 100s of thousands of rows) - adding the SET ARITHABORT ON option moved them all from running in 3-5 seconds each to 20-30ms.
So, hopefully that helps someone else out there.
If a bad plan is cached then the same bad plan should be used from SSMS too, if you run the very same query with identical arguments.
There cannot be better solution that finding the root cause. Trying to peek and poke various settings in the hope it fixes the problem will never give you the confidence it is actually fixed. Besides, next time the system may have a different problem and you'll believe this same problem re-surfaced and apply a bad solution.
The best thing to try is to capture the bad execution plan. Showplan XML Event Class Profiler event is your friend, you can get the plan of the ADO.Net call. This is a very heavy event, so you should attach profiler and capture it only when the problem manifests itself, in a short session.
Query IO statistics can also be of help. RPC:Completed and SQL: Batch Completed events both include Reads and Writes so you can compare the amount of logical IO performed by ADO.Net invocation vs. SSMS one. Large difference (for exactly the same query and params) indicate different plans.sys.dm_exec_query_stats is another avenue of investigation. You can find your query plan(s) in there and inspect the execution stats.
All these should help establish with certitude if the problem is a bad plan or something else, to start with.
I have been having the same problem.
The only way i can fix this is setting ARITHABORT ON.
but unfortunatley when it occurs again i Have to set ARITHABORT OFF.
I have no clue what ARITHABORT has to do with this but it works, I have been having this problem for over 2 years now with still no solution. The databses i am working with are over 300GB so maybe it is a size issue...
The closest i got to resolving this problem was from an earlier post
Google Groups post
Let me know if you have managed to completely solve this problem as it is very frustrating!
Is it possible that your ADO.NET query is running after the system has been busy doing other things, so that the data it needs is no longer in RAM? And when you test on SSMS, it is?
You can check for that by running the following two commands from SSMS before you run the query:
CHECKPOINT
DBCC DROPCLEANBUFFERS
If that causes the SSMS query to run slowly, then there are some tricks you can play on the ADO.NET side to help it run faster.
Simon Sabin has a great session on "when a query plan goes wrong" ( http://sqlbits.com/Sessions/Event5/When_a_query_plan_goes_wrong ) that discusses how to address this issue within procs by using various "optimize for" hints and such to help a proc generate a consistent plan and not use the default parameter sniffing.
However I've got an issue with and ad-hoc query (not in a proc) where the SSMS plan and the ASP plan are exactly the same - clustered index / table scan - and yet the ASP query takes 3+ minutes instead of 1 second. (In this case table-scan happens to be a decent answer for fetching the results.)
Anyone care to explain that one?