I have set up LDAP Active Directory authentication for a Spring MVC application that I am configuring. I have been able to log in, and a majority of the time authentication occurs successfully. However, every so often, I will get a Connection Refused error. The timing for this seems to be sporadic and resolves each time within ten to fifteen minutes. I have done some research and have found that other have also had this problem. However, I have not been able to find a solution or a hint as to what may be causing it. If anyone could point me in the right direction on this, it would be greatly appreciated.
I was able to get an answer to this question from a coworker. He had me switch to another server and the problem seems to be resolved. He gave me the reason that the server I had been hitting was being overloaded.
Related
I have an awkward issue with IIS 10.0 on Windows Server 2016 and ASP.Net 4.5.2 and MVC 5.2.7.
At times, certain requests do not receive a response and run for minutes, maybe 10 or so, before ending in a lost connection (PR_CONNECT_RESET_ERROR in Firefox on Windows, NSURLDomainError in Firefox on iOS). These are mostly POST requests. When this issue occurs, other GET requests will receive a swift response and a correct result. Normally, POST-request do no take long to be processed, typically less than 3 seconds.
Recycling the associated worker process will make the issue go away, for hours or days.
When today inspected the web server when the issue was going on, I saw little CPU usage, less than 10%, memory 56%, the worker process a modest 615 MB. I saw neither logging in the W3C log of these requests, nor in my custom application logs.
I added the Web-Request-Monitor conform How do I see currently executing web request on IIS 8, but in doing so, the the worker process probably got recycled, as the issue is not currently occurring.
There are a reverse proxy and an access manager between the internet and my web server. I suppose they can have something to do with this issue, but it certainly is related to IIS, as recycling helps.
All of this is happening on a acceptation web server running a newer version of my application. I am not aware of any big changes to the application's architecture that could be involved. Also, there will be very little traffic from other clients, if none at all.
What could be next steps to investigate this issue further?
Update
This issue was definitely caused by log4net. However, it was not related to the log4net.Internal.Debug setting. It was caused by two application domains accessing the same log file. This occasionally resulted in concurrency issues with accessing the log file. It appeared that log4net could not properly handle this and got stuck while writing to the log file.
This log file was configured with the RollingFileAppender option. Since we also used AdoNetAppender, we decided to remove file logging all together.
Original
I have found a probable cause. I'll report the steps I took to investigate the issue.
I activated the Worker Processes feature in IIS.
When, after a couple of days of waiting, the issue started again, I found long running requests. They all had State ExecuteRequestHandler and Module Name ManagedPipelineHandler. They had Time Elapsed of hundreds of seconds.
I also activated the Failed Requests Tracing with a rule for long running requests with a Time Taken of 1 minute.
After a couple of days, I started to receive failed request reports. The failed request all have a GENERAL_SET_RESPONSE_HEADER event as their last event.
I added additional debug logging events for each requests. When debugging in my development environment, at one point, I started to see the hanging behaviour there, on one of the new logging statements(!). The application uses log4net.
I captured a stack trace:
log4net.dll!log4net.Appender.AppenderSkeleton.DoAppend(log4net.Core.LoggingEvent loggingEvent) log4net.dll!log4net.Util.AppenderAttachedImpl.AppendLoopOnAppenders(log4net.Core.LoggingEvent loggingEvent) log4net.dll!log4net.Repository.Hierarchy.Logger.CallAppenders(log4net.Core.LoggingEvent loggingEvent) log4net.dll!log4net.Repository.Hierarchy.Logger.Log(System.Type callerStackBoundaryDeclaringType, log4net.Core.Level level, object message, System.Exception exception) log4net.dll!log4net.Core.LogImpl.DebugFormat(string format, object arg0)
The DoAppend method uses lock(this), which may very well cause hangs.
I also found out that the config setting log4net.Internal.Debug was set to true, which I do not want under normal circumstances and this may be related. I did not attempt to understand the log4net code, but I remember that logging initially did not work, in the acceptance environment, so the setting may very well have been set to true then, causing the issue to start.
Another indication that this is happening with log4net is that when the issue last occurred, I realized that logging of level standard, only occurs in some POST requests. I found a POST-request that does not log and requests to it where handled normally, while the other POST-requests still hung.
For now, I have set log4net.Internal.Debug to false and will wait to see what happens.
IIS recycle fix this issue doesn't mean that this is an IIS issue because all asp.net application run in .net runtime unless it is proved that the request is hang in IIS module.
So you may need to wait this issue happen again, then create a Failed request tracing rule for time-taken. Then it will tell us this issue is happening on IIS pipeline module or .net runtime.
If all request hang in .net runtime. Then you may have to capture a hang dump and do a deep analysis via WINDGB and mex extension. It will tell us what's happening there.
Please help!
We have two domain controllers. For some reason, they stopped replicating in July 2016. Now, when we try to manually initiate replication, we get the following error:
"The directory service cannot replicate with this server because the time since the last replication with this server has exceeded the tombstone lifetime."
Of course this is producing "Trust relationship has been lost with domain controller" issues all over our network as computers and servers can't connect with each other.
One of the suggestions to resolve this has been to demote the domain controllers and bring them back up...which is apparently very complicated.
Is there anything else that can be done to get these two domain controllers to replicate again since it has been so long?
Thanks!
We demoted one of the domain controllers, then re-promoted it, and now everything works fine. Replication is successful.
In case of TL;DR - I basically need guidance regarding what tools are available to debug requests which are issued to IIS and which stall inside a module.
I have a problem with an old ASP 2.0 app at the moment whereby it will periodically become unavailable and recycling the app pool (horrible as that may be) doesn't bring it back up 100% of the time.
So first of all it presents itself as requests entering the app pool and being trapped in state 'BeginRequest' in RewriteModule.
It is not a specific request which is always the first to experience this issue. The issue cannot be easily recreated either.
Eventually requests join this backlog and when it becomes 70+ deep the app pool fails to respond to pings from WAS and it forcibly recycles. Predictably it doesn't stop on-time and the old app pool is forced to stop. When the new app pool comes up it either works just fine or it instantly experiences the same issue as the outgoing one and requests begin to queue.
In issues like this all the official guidance is understandably focussed around looking at why the RewriteModule may choke.
I have validated my redirections and though complex there are no obvious issues with syntax (XML validates).
Likewise in inetmgr loading up the URL Rewrite Module seems to parse the configs fine and show them visually.
Basic stuff like permissions is all fine.
When the app is working normally I also used Failed Request Tracing/Logging to look at the request pipeline for a sample URL which stalled and I can confirm that there is no circular logic or weird errors presenting - the request seems to be handled just fine. This also showed me how high up the rewritemodule is invoked and from this I really don't see how the issue could be app-related as .NET isn't invoked at this point.
Annoyingly when an app pool is experiencing this issue and I can throw in requests which just stall Failed Request Tracing is no good because you actually need a request to get to the end of it's journey and fail otherwise it refuses to log anything out.
I resorted to taking process dumps of affected w3wp.exe's and running them through DebugDiag. Unfortunately the only thing I see is that threads are open accessing the rewritemodule but precious little about what they are stuck on.
As anyone else would do I've tried to track the start of the issue back to any recently installed patches or code changes but nothing matches. Likewise this is happening on 3x servers otherwise I would try reinstalling the rewritemodule. Other sites on the same server which invoke rewritemodule are unaffected.
Has anyone else experienced issues like this - the net seems to have relatively little info in this case. Perhaps you can recommend further debugging tools or approaches for IIS which I can adapt to this scenario? This is sort of a cry for help from someone more used to Apache/Nginx - sorry for the long post.
Our site has been running for a few weeks in Azure without getting this error:
SqlException: Database 'database' on server 'server' is not currently
available. Please retry the connection later. If the problem
persists, contact customer support, and provide them the session
tracing ID of 'guid'.
It finally got that one day when there were a little over 2K of active (concurrent) users. This is the closest question that I can find in SO. We are not using EF though but rather we're using Dapper. I'm out of ideas how to debug our application to find out what caused the issue, and it's even harder now that the issue has not come up for the past 2 days. I definitely need to be on the lookout and I need you guys, any tip, on where I should be looking into, what I need to do to determine the cause of the issue, and possibly fix it.
It sounds like you need to handle transient failures via some sort of transient fault handling mechanism. Here is post asking a similar question:
SQL Azure Database retry logic David's answer is similar to the approach we took do deal with the issue.
Here is another link to some code that is similar to the David's and our solution to get your head around it. http://www.getcodesamples.com/src/4A7E4E66/41D6FAD
We had similar issues when we first moved to SQL Azure but by implementing back-off retry logic for the transient connection issues the majority of the time it recovers after a few seconds.
We went down the path of handling transient errors with the Azure Transient Fault Block, but this caused bigger issues - namely, if you reach the SQL connection limit (easy to do), having retry logic in place only makes things worse.
If it only happens once a month, I'd leave it be, and just handle it gracefully higher up the stack. An alternative is to create a custom retry policy to avoid retrying on certain errors, but it may still do more harm than good.
I am working on solving a problem that I have had for a couple of days now. Every time one of my sites are rebuild or the AppPool is recycled, the first pageload will hang forever (well, I've only waited up to 30 minutes). It is only happening on one particular site out of ~10 sites. It is an ASP.NET site.
Here are the things I have observed:
In IIS Manager under worker processes I can see the request. Verb = GET, Sate = ExecuteRequestHandler, Module Name = ManagedPipelineHandler. Time Elapsed just keeps increasing, of course.
If I close down the browser in which I made the initial request and then open a new one to make another request, the page will load instantly.
In my code the Application_Start of my Global.asax file is not called on the first request. It is called on the second request.
The worker proccess is causing the memory usage on my machine to go through the roof
I'm inexperienced in troubleshooting IIS, but hours and hours of searching has led me nowhere.
The only major code change we have made on the site recently is that we have started implementing logging using log4net. I have though tried to remove any log4net code, both from my web.config file and Global.asax - still no luck.
Has anyone else experienced this and if so how did you solve it?
Any and all help will be much appreciated.
ADD:
If I place a .txt file in the root of the site and load that as the first thing after a build it will load instantly.
However the worker proccess still acts exactly as before and the memory usage still goes through the roof.
Final edit:
I feel like such an idiot. I can't explain why, but for some reason my break points in Global.asax suddenly got hit and I was able to identify the problem. It was a call to a database via Entity Framework that was badly written - i.e. the filtering was done after all the rows from the column in question had been fetched. And to make it worse, the filtering was done inside a foreach loop. Anyway, now everything is back to normal and I'm happy.
Possibly stating the obvious but you haven't got any silly code in your global asax in the app_start that could be causing this?
Sounds like an infinite loop or something?
Just a quick note what happend in my case:
Neither Process Monitor nor Failed Request Tracing was of any help. The website simply loaded (nearly) forever.
Finally, after waiting for several minutes an error occurred stating that it "cannot locate the network path".
The reason was that I entered a connection string to a non-existing SQL Server instance, so it somehow keept searching for the server. Finally, a timeout occured.
The solution was to simply specify the correct SQL Server in the connection string inside Web.Config.