Suddenly getting "Unable to make the session state request to the session state server" - asp.net

The setup: 2 web servers and a seperate state server
I have two production web servers in a load balanced configuration. The ASP.NET web app they host shares state (like a web farm) using this line in their web.configs:
<sessionState mode="StateServer" stateConnectionString="tcpip=9.9.9.9:42424" cookieless="false" timeout="60"/>
9.9.9.9 is the IP of the machine the asp.net session state service is running on (ok it's not 9.9.9.9 really, changed to protect the innocent). It's a third machine (the database server, actually.
It worked fine until...
The error: website down!
Suddenly the site went down, just showing a generic asp.net error page ('turn custom errors off to see this error' or whatever).
The app's log recorded the actual error message:
An unhandled exception occurred Unable to make the session state request to the session state server. Please ensure that the ASP.NET State service is started and that the client and server ports are the same. If the server is on a remote machine, please ensure that it accepts remote requests by checking the value of HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\aspnet_state\Parameters\AllowRemoteConnection. If the server is on the local machine, and if the before mentioned registry value does not exist or is set to 0, then the state server connection string must use either 'localhost' or '127.0.0.1' as the server name.
So it appears that the web app was unable to contact the state server (9.9.9.9).
I "tried turning if auf and then onnegen" - restarting the state server fixed the problem.
Why?
I really want know what happened and why so I can prevent it happening again.
So far all I have are two theories:
A windows update, to .net framework 4, was applied around that time on the state server. So maybe the update did something to the asp.net state service? The windows event viewer showed that .net 4 had logged a warning around then:
Updates to the IIS metabase were aborted because IIS is either not installed or is disabled on this machine. To configure ASP.NET to run in IIS, please install or enable IIS and re-register ASP.NET using aspnet_regiis.exe /i.
Some kind of temporary network problem between the prod web sites and the state server? They do sit right next to each other in the same physical rack though.
??? Any other ideas, anyone?
Anyone seen this before, or able to correct me on anything?

Has this happened since? The easy answer is that the problem was with the db server, not the web app. Are there any relevant errors in the log on the db server?
The fact that both apps threw an error indicates that a common resource was the problem. We chased a similar issue for a good solid week awhile back, and eventually found a faulty fiber channel gadget. (that's below my OSI level, not sure about the details).

Start–> Administrative Tools –> Services
Right-click over the ASP.NET State Service and click “start”
It is working fine We need to follow the steps

Had a similar issue before when our Infrastructure team tried sneaking in an install of 3.5 when they forgot to install it on our Production box. Not bouncing a server after a framework update is just going to cause all kinds of weird problems.

Related

ASP.NET deployed application stops responding

We have an ASP.NET deployed application running on IIS 7. Lately we started having problems with the website, which usually starts at high traffic times, and the issue is that the page stops loading without showing an error. It essentially continues spinning and does not load. IIS reset would usually fix the issue but we have tried everything to resolve it with no success. Below are additional information about what we have already tried.I can intentionally put the website into this state by running 25 concurrent users to the landing page after which I would have to reset iis because it would stop responding. I am thinking this might have to do with a settings in IIS. Maximum Concurrent Connection is set to the default which is 4294967295. Kind of at a lost here.
We turned on Failed log tracing on IIS. The data in the error log did not provide anything conclusive.This might be partly related to the request not failing completely hence no log was created. Most errors were collected based on time the page took to respond.
I have also looked at the app pool and host log files and found nothing out of place
25 concurrent users is nothing. What's the back end stack? There's not too many details listed here but I'd start with looking at each stage the request is in and in addition enable failed request tracing. Mike (formerly from the IIS team) has a great write up on this, in a nutshell though:
Troubleshoot hanging requests on IIS in 3 steps
View requests
%windir%system32inetsrvappcmd list requests /elapsed:30000
Enabled Failed Request Tracing (modify if not using Default Web Site of course)
%windir%system32inetsrvappcmd configure trace "Default Web Site" /enablesite
%windir%system32inetsrvappcmd configure trace "Default Web Site" /enable /path:test.aspx /timeTaken:00:00:30
Then hopefully you can find some details via
appcmd list traces | findstr "yourpage.aspx"

Validation of viewstate MAC failed caused due to Application Pool Idle Timeout

i had bought a web domain online where i am hosting asp.net website's/web-application's.
Many a times I am facing an error:
Validation of viewstate MAC failed. If this application is hosted by a Web Farm or cluster......
After a long research i had found that the error occurs due to "Application Pool Idle Timeout".
By default an app-pool will recycle every 5 minutes. If this recycle happened while a user is busy on the site and send post back to the server, the server no longer recognizes the session/viewstate and rejects what is being posted back.
My "Application Pool Idle Timeout" value is around 5 min. which is too short.
i had contacted the domain person to change the timeout period but they refused to do so saying its same for all and cant be changed.
I had googled for other solutions and found the below solutions:
Setting the EnableViewStateMAC property to false (Not good w.r.t. security reasons).
Provide your own validation and decryption keys "" (Doesn't work).
Please provide me a better solution ASAP.
Or Should I change the domain manager (like godaddy.com).
I have seen and resolved this issue in past. This issue majorly comes when you host application on Web Farm or web Cluster.
When a page is rendered, its view state is encrypted on server and sent to client. When page is posted back, this view state data is decrypted on server to get the state of the page. For Encryption and decryption of viewState server uses some keys, which if not provided in Maching.config files, are generated on the fly by server.
If you are on a single server hosting environment, these keys might get recycled. But on a Web Farm or Web Cluster, if these keys are generated at random then they are different for every server, and a request from one server can be posted back to another server that has different set of key and where it fails.
Solution to this is Adding MachineKey entries to all the server's Machine.Config files, or to your application's web.config files so that each server uses same keys for encryption and decryption of view state.

Connection issue in ASP.NET MVC + NHibernate+Oracle application after a couple of minutes

First of all, we have a MVC web application which uses NHibernate (version 3) and an Oracle 11g database.
The application is working, but when we publish in the production server a curious scenario happens:
The user access the application and perform a task, for example, select a link into the Menu.
The user waits a couple of minutes (2-3 minutes).
The user perform another task, for example, reload the same page or select another link into the Menu.
The application fails with a ORA-12571: TNS:packet writer failure exception.
The user just refresh the error page, the application works.
The first thing we tried to do was to isolate the problem, so we published the application into another server with the exactly same configuration:
Same binaries, of course.
Same Oracle x64 client version, even the minor version.
Same Windows Server 2008 version with IIS 7.5.
Same IIS configuration (we compared the windows/system32/inetsrv/config files using WinMerge).
Accessing the same production database.
And for our surprise we couldn't reproduce the problem.
Please, someone have a clue of what is going on?
The problem is related to the connection pool of the server's oracle client. It seems that it is delivering invalid connections to the web application, while in the other servers it does not happens.
The solution is not very interesting, but putting the Validate Connection = True inside the connection string resolved the issue. I am aware of the performance penalty of this, but I am out of options.
PS: using this flag, each connection is validated by the connection pool service before delivering it for the client application. This is not very nice, since a database round-trip will happen for every connection request.

ASP.net web app calling asp.net web service returning error code 401 even with System.Net.CredentialCache.DefaultCredentials

We have a .net web application that is running in IIS7.5 on an application pool that is set to run with a domain level AD account instead of the default account.
It has been configured according to these instructions:
http://support.microsoft.com/kb/813834
to use
myProxy.Credentials = System.Net.CredentialCache.DefaultCredentials;
so that the credentials the pool is running on are passed to the web service.
This works in my test VM (which may have had other settings modified in the past)
Deployed on our Dev Server, the same code does not work.
I know the Web Service isn't the culprit because the IIS log shows no Account info passed to the webservice call, but if I point my test VM to the webservice on the server it works and does.
Is there a configuration/permission thing I'm missing somewhere?
Any pointers?
Edit: Learned some more. Event Viewer is showing audit failures with NULL SID for this account, even though from the VM the SID comes through correctly.
Thanks!
Got it! So the NULL SID led me to the right place:
This is because of a "working as designed" feature with windows.
read was MS has to say about it here:
http://support.microsoft.com/kb/896861
Registry change option #1 fixed it.

IIS application using application pool identity loses primary token?

(This is a question about a vague problem. I try to present all relevant data, in the hope that someone has helpful information; apologies for the long description.)
Our web app
We have a .NET 4 web application running in IIS 7.5 accessing Active Directory and a SQL Server database.
This web application is running under a virtual 'app pool identity', by setting the Identity of the application's application pool to ApplicationPoolIdentity. A concise description of virtual identities can be found in a StackOverflow answer, and the blog post to which it refers: an app pool identity is just an additional group which is added to the web application's worker processes which is running as 'network service'. However, one source vaguely suggests that "Network Service and ApplicationPoolIdentity do have differences that IIS.net site documents do not publish." So a virtual identity might be more than just an additional group.
We chose to use ApplicationPoolIdentity, as opposed to NetworkService, because it became the default in IIS 7.5 (see, e.g., here), and per Microsoft's recommendation: "This identity allows administrators to specify permissions that pertain only to the identity under which the application pool is running, thereby increasing server security." (from processModel Element for add for applicationPools [IIS 7 Settings Schema]) "Application Pool Identities are a powerful new isolation feature" which "make running IIS applications even more secure and reliable. " (from IIS.net article "Application Pool Identities")
The application uses Integrated Windows Authentication, but with <identity impersonate="false"/>, so that not the end user's identity but the virtual app pool identity is used to run our code.
This application queries Active Directory using the System.DirectoryServices classes, i.e., the ADSI API. In most places this is done without specifying an additional username/password or other credentials.
This application also connects to a SQL Server database using Integrated Security=true in the connection string. If the database is local, then we see that IIS APPPOOL\OurAppPoolName is used to connect to the database; if the database is remote, then the machine account OURDOMAIN\ourwebserver$ is used.
Our problems
We regularly have issues where a working installation starts to fail in one of the following ways.
When the database is on a remote system, then the database connection starts to fail: "Login failed for user 'NT AUTHORITY\ANONYMOUS LOGON'. Reason: Token-based server access validation failed with an infrastructure error. Check for previous errors." The previous error is "Error: 18456, Severity: 14, State: 11." So it seems that now OURDOMAIN\ourwebserver$ is not used anymore, but instead anonymous access is attempted. (We have anecdotal evidence that this problem occurred when UAC was switched off, and that it went away after switching on UAC. But note that changing UAC requires a reboot...) A similar problem is reported in IIS.net thread "use ApplicationPoolIdentity to connect to SQL", specifically in one reply.
Active Directory operations through ADSI (System.DirectoryServices) start to fail with error 0x8000500C ("Unknown Error"), 0x80072020 ("An operations error occurred."), or 0x200B ("The specified directory service attribute or value does not exist").
Signing in to the application from Internet Explorer starts to fail, with HTTP 401 errors. But if in IIS we then put NTLM before Negotiate then it works again. (Note that access to AD is needed for Kerberos but not for NTLM.) A similar problem is reported in IIS.net thread "Window Authentication Failing with AppPool Identity".
Our hypothesis and workaround
At least the AD and sign-in problems always seem to go away when switching the application pool from ApplicationPoolIdentity to NetworkService. (We found one report confirming this.)
Page "Troubleshooting Authentication Problems on ASP Pages" has some suggestions related to primary vs. secondary tokens, and what I find encouraging is that it links the first two of our errors: it mentions NT AUTHORITY\ANONYMOUS LOGON access, and AD errors 0x8000500C and "The specified directory service attribute or value does not exist".
(The same page also mentions ADSI schema cache problems, but everything we can find on that topic is old. For now we consider this to be unrelated.)
Based on the above, our current working hypothesis is that, only when running under a virtual app pool identity, our web application (IIS? worker process?) suddenly loses its primary token, so that IIS only has a secondary token, so that all access to Active Directory and SQL Server is done anonymously, leading to all of the above errors.
For now we intend to switch from ApplicationPoolIdentity to NetworkService. Hopefully this makes all of the above problems go away. But we are not sure; and we would like to switch back if possible.
Our question
Is the above hypothesis correct, and if so, is this a bug in IIS/Windows/.NET? Under which circumstances does this primary token loss occur?
Through Microsoft Support I found out that we ran into the issue described in Microsoft Knowledge Base article KB2545850. This only occurs when ApplicationPoolIdentity is used. It occurs very easily, namely, after the machine account password is changed (which by default happens automatically every 30 days), and then IIS is restarted (e.g., through iisreset). Note that the problem goes away after a reboot, according to Microsoft and our observations.
According to Microsoft it is not possible to check if your Windows/IIS has gotten into this state.
Microsoft has a hotfix attached to this KB article. There is no indication when that hotfix will be rolled into an official delivery, and the hotfix is already 10 months old. In our specific case, we decided to switch to NetworkService instead.
See https://serverfault.com/a/403534/126432 for my comments on the same problem/solution.
Using the hotfix you linked to allowed me to get ApplicationPoolIdentity working as the docs say it should. This hotfix doesn't specifically describe a solution for accessing network resources as NT AUTHORITY\ANONYMOUS LOGON, but it's related to the computer password changing. Bottom line is that it worked for me, at least so far.
This is also relevant for Umbraco using Active Directory authentication.
From time-to-time you may get this exception:
Configuration Error
The specified directory service attribute or value does not exist
This is apparently caused by the problem outlined here. A reboot invariably fixes it.

Resources