IIS application using application pool identity loses primary token? - asp.net

(This is a question about a vague problem. I try to present all relevant data, in the hope that someone has helpful information; apologies for the long description.)
Our web app
We have a .NET 4 web application running in IIS 7.5 accessing Active Directory and a SQL Server database.
This web application is running under a virtual 'app pool identity', by setting the Identity of the application's application pool to ApplicationPoolIdentity. A concise description of virtual identities can be found in a StackOverflow answer, and the blog post to which it refers: an app pool identity is just an additional group which is added to the web application's worker processes which is running as 'network service'. However, one source vaguely suggests that "Network Service and ApplicationPoolIdentity do have differences that IIS.net site documents do not publish." So a virtual identity might be more than just an additional group.
We chose to use ApplicationPoolIdentity, as opposed to NetworkService, because it became the default in IIS 7.5 (see, e.g., here), and per Microsoft's recommendation: "This identity allows administrators to specify permissions that pertain only to the identity under which the application pool is running, thereby increasing server security." (from processModel Element for add for applicationPools [IIS 7 Settings Schema]) "Application Pool Identities are a powerful new isolation feature" which "make running IIS applications even more secure and reliable. " (from IIS.net article "Application Pool Identities")
The application uses Integrated Windows Authentication, but with <identity impersonate="false"/>, so that not the end user's identity but the virtual app pool identity is used to run our code.
This application queries Active Directory using the System.DirectoryServices classes, i.e., the ADSI API. In most places this is done without specifying an additional username/password or other credentials.
This application also connects to a SQL Server database using Integrated Security=true in the connection string. If the database is local, then we see that IIS APPPOOL\OurAppPoolName is used to connect to the database; if the database is remote, then the machine account OURDOMAIN\ourwebserver$ is used.
Our problems
We regularly have issues where a working installation starts to fail in one of the following ways.
When the database is on a remote system, then the database connection starts to fail: "Login failed for user 'NT AUTHORITY\ANONYMOUS LOGON'. Reason: Token-based server access validation failed with an infrastructure error. Check for previous errors." The previous error is "Error: 18456, Severity: 14, State: 11." So it seems that now OURDOMAIN\ourwebserver$ is not used anymore, but instead anonymous access is attempted. (We have anecdotal evidence that this problem occurred when UAC was switched off, and that it went away after switching on UAC. But note that changing UAC requires a reboot...) A similar problem is reported in IIS.net thread "use ApplicationPoolIdentity to connect to SQL", specifically in one reply.
Active Directory operations through ADSI (System.DirectoryServices) start to fail with error 0x8000500C ("Unknown Error"), 0x80072020 ("An operations error occurred."), or 0x200B ("The specified directory service attribute or value does not exist").
Signing in to the application from Internet Explorer starts to fail, with HTTP 401 errors. But if in IIS we then put NTLM before Negotiate then it works again. (Note that access to AD is needed for Kerberos but not for NTLM.) A similar problem is reported in IIS.net thread "Window Authentication Failing with AppPool Identity".
Our hypothesis and workaround
At least the AD and sign-in problems always seem to go away when switching the application pool from ApplicationPoolIdentity to NetworkService. (We found one report confirming this.)
Page "Troubleshooting Authentication Problems on ASP Pages" has some suggestions related to primary vs. secondary tokens, and what I find encouraging is that it links the first two of our errors: it mentions NT AUTHORITY\ANONYMOUS LOGON access, and AD errors 0x8000500C and "The specified directory service attribute or value does not exist".
(The same page also mentions ADSI schema cache problems, but everything we can find on that topic is old. For now we consider this to be unrelated.)
Based on the above, our current working hypothesis is that, only when running under a virtual app pool identity, our web application (IIS? worker process?) suddenly loses its primary token, so that IIS only has a secondary token, so that all access to Active Directory and SQL Server is done anonymously, leading to all of the above errors.
For now we intend to switch from ApplicationPoolIdentity to NetworkService. Hopefully this makes all of the above problems go away. But we are not sure; and we would like to switch back if possible.
Our question
Is the above hypothesis correct, and if so, is this a bug in IIS/Windows/.NET? Under which circumstances does this primary token loss occur?

Through Microsoft Support I found out that we ran into the issue described in Microsoft Knowledge Base article KB2545850. This only occurs when ApplicationPoolIdentity is used. It occurs very easily, namely, after the machine account password is changed (which by default happens automatically every 30 days), and then IIS is restarted (e.g., through iisreset). Note that the problem goes away after a reboot, according to Microsoft and our observations.
According to Microsoft it is not possible to check if your Windows/IIS has gotten into this state.
Microsoft has a hotfix attached to this KB article. There is no indication when that hotfix will be rolled into an official delivery, and the hotfix is already 10 months old. In our specific case, we decided to switch to NetworkService instead.

See https://serverfault.com/a/403534/126432 for my comments on the same problem/solution.
Using the hotfix you linked to allowed me to get ApplicationPoolIdentity working as the docs say it should. This hotfix doesn't specifically describe a solution for accessing network resources as NT AUTHORITY\ANONYMOUS LOGON, but it's related to the computer password changing. Bottom line is that it worked for me, at least so far.

This is also relevant for Umbraco using Active Directory authentication.
From time-to-time you may get this exception:
Configuration Error
The specified directory service attribute or value does not exist
This is apparently caused by the problem outlined here. A reboot invariably fixes it.

Related

Is allowing the AppPool local activation permission System Wide in dcomcnfg a big security risk?

I've recently been trying to use the IIS AppPool identity instead of Network Service or Local System.
As such I came across the ugly error
The machine-default permission settings do not grant Local Activation
permission for the COM Server application with CLSID
{6E46607A-7347-471B-A98C-BC9E49B07248} and APPID Unavailable to the
user IIS APPPOOL\MyAppPool SID
(S-1-5-82-476059244-1685105758-59475158-1390954050-72429515) from
address LocalHost (Using LRPC) running in the application container
Unavailable SID (Unavailable). This security permission can be
modified using the Component Services administrative tool.
As you may notice my APPID was missing from this error, I searched the registry and found out which component it was (also by debugging).
It's a VC++ out-of-process OLE/COM server which processes requests from our web server. (Yay, 1990's called). I'm not entirely sure why this involves DCOM, there's nothing 'distributed' about it by design, maybe more by accident or an artefact of VS2008's default MFC/OLE server templates?
On using the power of Google, I followed the typical route of changing the dcomcnfg setting for this component to allow my IIS AppPool\MyAppPool user the local activation permission (I tried them all actually!), and confirmed that w3wp.exe is running as the same identity.
I also made sure that this exe was readable/executable by that user.
However the error still persisted.
Only by setting the same permissions machine-wide (via the My Computer node, instead of the individual component node), did the component load properly. This feels like a big security risk. Is it?
In the failing case, I tried using process monitor to spot any registry keys or file access problems, or to identify what other components might require access. But nothing reared its head.
Given that setting the DCOM permission system wide fixes the problem - It does feel to me that there's another DCOM component or service that needs permissions being set, but I can't find out which.
So
a) Is there a way to further diagnose this problem? Sniff out decisions being made by DCOM? Is there a central DCOM broker that needed the permissions set also? Debugging/Process Monitor doesn't seem to help.
b) Is it ok to set the AppPool local activation policy machine wide?
Many thanks to anyone who helps me make the right decision.
Q1. Is it bad practice to give your App Pool account local DCOM activation permissions, computer-wide?
A1: Yes, it's bad. According to the book Secure DCOM Best Practices
It isn’t a good idea to loosen these permissions from the default
values
Q2: Why is the component still failing?
A2: This was a combination of problems:
Process monitor DID pick up an issue that my AppPool identity was not able to read the registry key HKEY_CLASSES_ROOT\WOW6432Node\AppID\{C33D7656-D310-4684-9482-A486787E4E3B}. Enabling read permission for my AppPool identity got me one step further.
The event log message about an Unavailable AppID was a clue. There was no AppID REG_SZ entry for the class being requested. So the security settings were not being picked up. I needed to ensure the following key existed: HKEY_CLASSES_ROOT\WOW6432Node\CLSID\{6E46607A-7347-471B-A98C-BC9E49B07248} with String Value AppID={C33D7656-D310-4684-9482-A486787E4E3B}
As per MSDN documentation AppID, and What is AppID

Validation of viewstate MAC failed caused due to Application Pool Idle Timeout

i had bought a web domain online where i am hosting asp.net website's/web-application's.
Many a times I am facing an error:
Validation of viewstate MAC failed. If this application is hosted by a Web Farm or cluster......
After a long research i had found that the error occurs due to "Application Pool Idle Timeout".
By default an app-pool will recycle every 5 minutes. If this recycle happened while a user is busy on the site and send post back to the server, the server no longer recognizes the session/viewstate and rejects what is being posted back.
My "Application Pool Idle Timeout" value is around 5 min. which is too short.
i had contacted the domain person to change the timeout period but they refused to do so saying its same for all and cant be changed.
I had googled for other solutions and found the below solutions:
Setting the EnableViewStateMAC property to false (Not good w.r.t. security reasons).
Provide your own validation and decryption keys "" (Doesn't work).
Please provide me a better solution ASAP.
Or Should I change the domain manager (like godaddy.com).
I have seen and resolved this issue in past. This issue majorly comes when you host application on Web Farm or web Cluster.
When a page is rendered, its view state is encrypted on server and sent to client. When page is posted back, this view state data is decrypted on server to get the state of the page. For Encryption and decryption of viewState server uses some keys, which if not provided in Maching.config files, are generated on the fly by server.
If you are on a single server hosting environment, these keys might get recycled. But on a Web Farm or Web Cluster, if these keys are generated at random then they are different for every server, and a request from one server can be posted back to another server that has different set of key and where it fails.
Solution to this is Adding MachineKey entries to all the server's Machine.Config files, or to your application's web.config files so that each server uses same keys for encryption and decryption of view state.

ASP.net web app calling asp.net web service returning error code 401 even with System.Net.CredentialCache.DefaultCredentials

We have a .net web application that is running in IIS7.5 on an application pool that is set to run with a domain level AD account instead of the default account.
It has been configured according to these instructions:
http://support.microsoft.com/kb/813834
to use
myProxy.Credentials = System.Net.CredentialCache.DefaultCredentials;
so that the credentials the pool is running on are passed to the web service.
This works in my test VM (which may have had other settings modified in the past)
Deployed on our Dev Server, the same code does not work.
I know the Web Service isn't the culprit because the IIS log shows no Account info passed to the webservice call, but if I point my test VM to the webservice on the server it works and does.
Is there a configuration/permission thing I'm missing somewhere?
Any pointers?
Edit: Learned some more. Event Viewer is showing audit failures with NULL SID for this account, even though from the VM the SID comes through correctly.
Thanks!
Got it! So the NULL SID led me to the right place:
This is because of a "working as designed" feature with windows.
read was MS has to say about it here:
http://support.microsoft.com/kb/896861
Registry change option #1 fixed it.

ASP.NET MVC intermittent 401 authorization errors

I have an ASP.NET MVC intranet site that uses Windows Authentication (Kerberos) exclusively with pass-through authentication. It is setup to use an app pool (v4/integrated) that uses the Network Service identity. The web site provides a pretty UI on top of a network share that is hosted on another machine (SAMBA NAS box). Occassionally (and usually when someone hasn't accessed the site for a while), clients are getting a 401 authorization error at the point the MVC code is trying to get directory info (System.IO.Directory.GetLastWriteTime) on the remote UNC share. The event log on the IIS machine captures a security audit failure at this same point in time:
+ System
- Provider
[ Name] Microsoft-Windows-Security-Auditing
[ Guid] {54849625-5478-4994-a5ba-3e3b0328c30d}
EventID 4625
Version 0
Level 0
Task 12544
Opcode 0
Keywords 0x8010000000000000
- TimeCreated
[ SystemTime] 2012-03-17T00:43:50.522Z
EventRecordID 398873
Correlation
- Execution
[ ProcessID] 696
[ ThreadID] 792
Channel Security
Computer lvtloweb1.acme.com
Security
- EventData
SubjectUserSid S-1-0-0
SubjectUserName -
SubjectDomainName -
SubjectLogonId 0x0
TargetUserSid S-1-0-0
TargetUserName
TargetDomainName
Status 0xc000006d
FailureReason %%2304
SubStatus 0xc0000133
LogonType 3
LogonProcessName Kerberos
AuthenticationPackageName Kerberos
WorkstationName -
TransmittedServices -
LmPackageName -
KeyLength 0
ProcessId 0x0
ProcessName -
IpAddress -
IpPort -
The weird thing is that if you sit and refresh the page over and over, it usually get's past the 401 error in about a minute. Anybody got any ideas on how to troubleshoot such a problem? Oh yeah, the IIS machine is hosted in a VM. The guest OS is Windows Server 2008 Enterprise 6.0.6002 Service Pack 2.
Keith,
Being that it appears to happen most after the user has been idle, I'm leaning towards some kind event (i.e. a session timeout) that might invalidate the credentials for your server. I'm assuming one server since you didn't mention any web farm.
A part of me thinks you've likely hammered the session timeout angle. Unless you felt safe trusting that the users would 'auto authenticate' and timing out mid operation wouldn't cause it to fail. That said, I'm not sure I'd completely trust that is the case. To at least eliminate this possibility I would add a routine that logs information about the current session/credentials before that operation is started. Even though the user is "always logged in" assuming they are on their computer I've seen weird issues with VPNs, proxy servers, server double hops, IE configurations for 'trusted zones' and 'intranet' settings. Even a computer suddenly having its route to the server changed could cause issues. I'm not sure the network emulation on a VM would play a part, but who knows.
Here is a 'starter' article regarding IE and it's complex approach to authentication and the role it plays as the client application accessing your Intranet MVC app. (It pays to dig deep on how IE auto-magically authenticates in a Windows AD environment.)
http://support.microsoft.com/kb/258063/en-us
Here is a related problem someone had due to a double-hop causing 401s. I've also included a link to another good 'starter page' on investigating these types of issues.
http://social.msdn.microsoft.com/Forums/en/sqlreportingservices/thread/6d1604e5-e739-41e4-89a5-c6681bff2e61
http://blogs.technet.com/b/askds/archive/2008/06/13/understanding-kerberos-double-hop.aspx
Sorry if anything above you already knew or for the general nature of my response. It's tough without actually having network access or being able to add logging to your code. I hope I've pointed you in the right direction.

Suddenly getting "Unable to make the session state request to the session state server"

The setup: 2 web servers and a seperate state server
I have two production web servers in a load balanced configuration. The ASP.NET web app they host shares state (like a web farm) using this line in their web.configs:
<sessionState mode="StateServer" stateConnectionString="tcpip=9.9.9.9:42424" cookieless="false" timeout="60"/>
9.9.9.9 is the IP of the machine the asp.net session state service is running on (ok it's not 9.9.9.9 really, changed to protect the innocent). It's a third machine (the database server, actually.
It worked fine until...
The error: website down!
Suddenly the site went down, just showing a generic asp.net error page ('turn custom errors off to see this error' or whatever).
The app's log recorded the actual error message:
An unhandled exception occurred Unable to make the session state request to the session state server. Please ensure that the ASP.NET State service is started and that the client and server ports are the same. If the server is on a remote machine, please ensure that it accepts remote requests by checking the value of HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\aspnet_state\Parameters\AllowRemoteConnection. If the server is on the local machine, and if the before mentioned registry value does not exist or is set to 0, then the state server connection string must use either 'localhost' or '127.0.0.1' as the server name.
So it appears that the web app was unable to contact the state server (9.9.9.9).
I "tried turning if auf and then onnegen" - restarting the state server fixed the problem.
Why?
I really want know what happened and why so I can prevent it happening again.
So far all I have are two theories:
A windows update, to .net framework 4, was applied around that time on the state server. So maybe the update did something to the asp.net state service? The windows event viewer showed that .net 4 had logged a warning around then:
Updates to the IIS metabase were aborted because IIS is either not installed or is disabled on this machine. To configure ASP.NET to run in IIS, please install or enable IIS and re-register ASP.NET using aspnet_regiis.exe /i.
Some kind of temporary network problem between the prod web sites and the state server? They do sit right next to each other in the same physical rack though.
??? Any other ideas, anyone?
Anyone seen this before, or able to correct me on anything?
Has this happened since? The easy answer is that the problem was with the db server, not the web app. Are there any relevant errors in the log on the db server?
The fact that both apps threw an error indicates that a common resource was the problem. We chased a similar issue for a good solid week awhile back, and eventually found a faulty fiber channel gadget. (that's below my OSI level, not sure about the details).
Start–> Administrative Tools –> Services
Right-click over the ASP.NET State Service and click “start”
It is working fine We need to follow the steps
Had a similar issue before when our Infrastructure team tried sneaking in an install of 3.5 when they forgot to install it on our Production box. Not bouncing a server after a framework update is just going to cause all kinds of weird problems.

Resources