Here is my problem. I am working on a batch process that spawns multiple tasks. Each task is basically doing some journal postings. The tasks are run in parallel. Now the journal is a counting journal with close to 10k lines. This process runs for hours as there are around hundred journals to be posted. The process runs fine on physical dev boxes, AOS and SQL on same box. But on a virtual server, its behavior is different. AOS starts consuming all the memory while the lines are getting added and at one point, memory hits 100% and AOS throws out of memory exception and dies, other times process just hangs and waits for memory to be released, which takes a long time. The journal posting is standard AX process and is not customised. The AX environment is 2012 R1 and latest kernel hotfixes are applied (KB2962510). I explored this property called MaxMemLoad that allows you to restrict the memory an AOS can consume on a server, but did not help at all.
The AX environment is composed of three AOSs in a cluster.
How can i restrict this crazy memory consumption?
EDIT:
Thanks to Matej i made some progress. SQL Server version was 2008 R2 SP1 and I applied the latest SP3. Interestingly out of three AOSs in cluster two now have much better memory graph, less than 45%. But the third one is still having weird memory usage. All three AOSs are same versions of AX, similar system configs (windows 2008 r2, 24 GB RAM, 4 cores). I had also applied the latest kernel hotfix on all AOSs. At the moment I am doing a full CIL on this particular server and run the batch again if that helps. I am attaching three graphs, generated using performance monitor, for CPU and memory, as you can see the memory on Server 01 is very erratic, not releasing memory on time, the other two are more stable. Any ideas?
Related
I have an interesting problem with how Windows and .Net manage memory for Asp.Net applications that I can't explain myself. The problem is that I have a big Asp.Net application that after starts up can take about 1 GB memory according Resource Manager. We tried to test how many instances of the application we can run at the same time on a single machine with 14-16 GB memory.
First test is with an Azure Windows 2016 server with 8 vCPUs, 14 GB RAM, HDD.
After a few instances:
After 30 instances:
As you can see, private byes and working set of some instances reduced a lot. Based on what I read from how memory is managed (aka working set, physical memory, virtual memory, page files...), I can understand how the OS can take physical memory away from some idle processes for the others that are in need. So far so good.
Then we tested the same scenario with another Azure Windows 2016 server with 4 vCPUs, 16 GB RAM, but this one uses SSD.
After about 20 instances, we got OutOfMemoryException:
The key difference I could see is that memory of all those w3wp processes were still high. In other words, they were not reduced as in the test above.
My question is why the behaviors were different? What prevented the second cases from saving memory to page file (my guess!) and thus caused OutOfMemoryException?
Checking pagefile setting showed us that it was stilled enabled in "System managed size" mode but somehow Windows refused to use it for the w3wp processes. We tried to change it to custom size and set it to 20 GB and everything started working again as expected. I must admit that I still don't know why Windows 2016 behaves like that when SSD is used though.
Server Specks
Microsoft Windows Server 2003 Enterprise Edition SP2
IIS 6
.net4
Intel(R) Xeon(R) CPU
X5680 # 3.33GHz, 2.00GB of RAM
Physical Address Extension
I am having trouble finding the cause of our server's random downtime. Our clients inform us that their website goes down for hours at a time. Sometimes users are able to log in however the site is extremely slow/unstable and unusable. Sometimes users are not able to log in at all. When users are able to log in not all images are displayed (they get the image not found image).
We upgraded their website from .net1 to .net4 because we thought the cause of their downtime and random user log out was due to them running their website on .net1. The website was running fine with no issues for a few months.
The first time the server started to go down after that was due to the drive with which the website resided on running out of disk space. There was 40GB partitioned to this drive and 20GB was added. This didn't resolve the issue for very long.
The second time the server would randomly go down, I noticed in the Event viewer, that the web worker associated with the app pool used by the website would periodically require to be recylcled. That is, in the Security tab of the Event Viewer I would periodically see an event with ID 1074 reading 'A worker process with process id of '1540' serving application pool 'Net4' has requested a recycle because the worker process reached its allowed processing time limit.'. I then went into this app pool's properties and saw that the app pool would be recycled every 29 hours, which is the default. I modified this to have the app pool recycle every day at 3:00am. Since that we have not seen this event in the Event Viewer. We were able to catch the website during one of its downtimes before this was changed and recycled the app pool manually. This resolved the issue in this one instance.
This did not permanently fix the issue however, as we are still receiving emails from our client informing us that the website is down for hours at a time.
I then set up a performance monitor counter log. We have managed to monitor the server's performance during many of these downtimes. It does not appear to be a problem with memory as there is plenty of space on the drive. It does not appear to be a memory leak or related to excessive paging as there are no running processes which take up an excessive amount of % Processor Time and the Pages/Second Memory counter does not peak at an excessive amount during most of the downtime (I'll explain why excessive paging occurs later). The total IO Data Bytes/sec and IO Other Data Bytes/sec Process counter does not appear to be usually high or low during downtime. The total Thread Count and Handle Count Process counter do not exhibit any abnormal spikes or drops during this time. The total thread count, at a given time, seems to be between 600 and 900, give or take. The total handle count, at a given time, seems to be between 15,000 and 23,00, give or take. The % Time in Jit .NET CLR jit counter for instance w3wp is 0 for about half of the time and will randomly peak at almost 100 the other half, most of the time peaking for just a moment but rarely peaking for about 10 minutes, unrelated to downtime.
There are random times throughout the day where the process dsmcsvc takes up most, if not all, of the % Processor Time. This is a process run by the Symantec Antivirus software. When this process takes up the % Processor Time there is a corresponding event in the Event Viewer signifying that a new virus definition file has been uploaded that is, an Application event with ID 7 'New virus definition file loaded. Version: #version number#'. When this event occurs, the Pages/Sec counter spikes. Sometimes it spikes to only 200-300 but will at times peak over 10,000. This event seems to be completely unrelated to website downtime. I have researched the Symantec Antivirus software and found that there is a known memory leak in old versions of this software. I have found that this software is known to cause high memory usage when the link to a process called NavLogon.exe is broken/does not exist. This process does not appear to exist on the server so I currently have no way of restoring the link to it. I also found that this software uses Crypt32.dll and that old versions of Crypt32.dll have a known memory leak. The Crypt32.dll which exists on the server was last updated in 2007.
The Performance Monitor log monitors the total Sessions Active ASP.Net Applications counter. During downtime, the total number of sessions does not exhibit any abnormal behavior, there are a normal amount of active sessions during this time. Active sessions at a given time can be between 0 and 200. I was informed that the time when the most users are active is during 1st shift, however during about 10pm and 2am every day, this number peaks.
The site runs JavaScript client side, and Visual Basic.net server side. All users have about 10-15 session variables almost all of the time.
When the site goes down there are no events which seem to correspond to its downtime in the Event Viewer.
I also have set up a W3C Extended Log File Format log for this site. During downtime there seems be an excessive amount of GET requests for a Telerik.RadUploadProgressHandler.ashx.
I have seriously run out of ideas at this point and have extensively searched the web for solutions and come up empty. Any feedback as to why this may be occurring would be great.
It does not appear to be a problem with memory as there is plenty of space on the drive.
Really? Memory and hard drive space are two completely different things. 2GB of RAM was okay a decade ago, when that server was new, but today it's laughably small.
But don't bother upgrading or adding RAM. This server is old enough, the problem is probably just that the hardware is reaching the end of it's useful life. Additionally, the operating system is also nearing it's end of life. Server 2003 is scheduled for end of life on July 14, 2015. After that date, there will be no new patches of any kind produced for Server 2003... not even critical security patches. That will make Server 2003 completely unsuitable as a web server.
This seems like a good time to execute a transition to a completely new server.
We're running into a strange problem. Our ASP.NET application is running on 64-bit Windows 2008/IIS7 machine with 16Gb of RAM. When w3wp.exe process reaches 4Gb (we track it simple via Task Manager on the server) - Out of Memory exeption is thrown even though there's a plenty of memory still available.
Is there a known issue were ASP.NET process is limited to 4Gb of memory on 64bit system (and using 64bit app pool)?
Is there any way to lift that limit?
It kind of sounds like you have an undisposed resource somewhere that ends up getting garbage collected eventually, but not quickly enough for your needs. Do you reuse any SQLConnection objects? Or MailClient objects? Or unmanaged Image objects?
As for the lower-than-expected memory limit, there are two types of memory use by a ASP.NET app. One is reserved memory and the other is actually used memory. I believe the task manager tracks actual memory use, but reserved memory probably also has a limit. To find out how much reserved memory your process is taking up, go to IIS7, click on the server (the top level, above app pools and sites folder), then click the Processes option and then click your app's process. It should show you CPU use, number of requests and memory usage (both reserved and actual).
The problem is with Memory management because I keep receiving “Out of Memory exception”.
Here are the scenarios where we face the problem:
Please note:
1. The site/application is developed in ASP.Net and uploaded on a server with the following specs:
- Windows Server 2008 (R2) Standard
- Intel Xeon L5520#2.27GHz 2.27GHz
- RAM = 8GB
- System Type = 64bit
The application is event management based web application where the requirements include saving huge amount of data in Sessions etc (mentioning this in case it is relevant)
The applications/site works fine until we:
Edit a file directly on the server
Update a file from repository
Copy/Paste a file (we don’t usually edit code using this technique)
Please note, all of the above hold true ONLY when the traffic to the site is high that is,
The issue/error “Out of Memory” is not produced when the traffic/visits is low
Details of:
System Properties > Advanced > Performance Settings > Advanced tab
Total paging file size for all drives: 16362 MB
In web.config
Is there any way we can debug this problem to the core and find out a solution. Can you please provide links/help where we can further investigate this problem?
Best regards,
Farrukh
Out of Memory Exceptions are common with applications that see periodic transaction surges while keeping larger volumes of data in memory. This problem does, however, depend on your application and architecture. Below are a few pointers:
Hardware - you have Xeon 5500 (Intel Nehalem chips). These are very good at handling memory. You should be good here.
OS - Windows Server 2008 R2 - As an OS this system will handle more than enough memory for you (you are good here, see link for capabilities: Memory Limits for Windows)
Physical Memory - Did you say you have 8 GB on the server? Note you app is allowing 16 GB. There is one issue. If your app requests more memory than physically available you will see your error. But this is not your only concern ...
CLR / GC limitations - Your application has a "paging file size" of 16+ GB. This is probably your issue.
GC is the heart of your problem for you. In terms of why, it is the same reason Java and the JVM have issues whenever an application exceeds 2-4 GB. That requires a look at the actual process of GC.
You have "old generation" and "young generation" Garbage Collection processes. As you app runs the CLR tries to keep your memory space organized. These processes force all threads to pause (phase changes) when GC mark and swap processes occur. The problem here is, depending on how your code is written and the amount of memory you keep around for long periods, you can run into memory issues.
Any time you press a runtime environment to exceed the 4 GB threshold you will see exponential increases in collection times. When you hit the "stop the world" pause (the old gen GC where everything gets cleaned up) the CLR has to go through the entire heap and de-allocate memory. Based on your app, 16 GB may give you issues even with more physical memory (Windows Server 2008 R2 - Enterprise or DataCenter can support 2 TB). Even if you feed it more physical memory you may see LONG collection times when your full GC hits.
Ideally I would do the following:
Get more physical memory (you never want to come withing 600MB of your total physical memory allocated to your application to avoid out of memory errors, but your buffer does depend on your load and the application's ability to handle it ... you may want a larger safety net to be safe).
Once you have the physical memory you need run GC logs while stressing the app. This will give you an idea where you see exponential degradation in performance and what level your app can support when considering Heap size (Memory). You may want to find a way to get your 16GB page down to a smaller size. I do know with .Net 4.0 Microsoft has made some solid improvements to the GC process, including allowing a background thread to maintain GC. This should give you the ability to support larger heaps (in theory) ... but nothing beats real tests on the app. Check out this link for more info:
Garbage Collection Performance (Asp.net 4.0) - Also, as I am limited on links. Navigate to the Fundamentals page for some great explanations on new GC features of ASP.Net 4.0
(http://msdn.microsoft.com/en-us/library/ee787088.aspx#concurrent_garbage_collection)
Hope this helps!
PS - Anyone out there on lesser hardware will need to be aware of the ASP.NET use of the GC thread. If you are running something in development like a Core Duo you have to consider that 50% of your compute power will go to GC optimization. This means that Hardware (number of cores) is important to consider. If you have more than you need this process should theoretically help performance. If you are constrained on cores either get better hardware or use an older version of ASP.Net or consider turning the feature off (if possible). Second, if latency is a concern, using "hyper-threading" does have an impact on performance as well. You always get better performance on "physical" cores ... but that will not be a concern for 99.9% of the applications out there.
2 GB by default. If the application is large address space aware (linked with /LARGEADDRESSAWARE), it gets 4 GB (see http://msdn.microsoft.com/en-us/library/aa366778.aspx)
They're still limited to 2 GB since many application depends on the top bit of pointers to be zero.
I'm running a Windows 2008 server (a VPS with 1GB of RAM), with SQL Server Express and IIS 7 installed. On it I'm hosting a NopCommerce 1.7 website, with a database of around 26 000 products.
Right now I'm the only user of the website (it's in development) and I'm getting rather bad performance from it. To be more specific every time I make a request, the worker process goes to 90-100% CPU usage for a few seconds. Is it me or this is a lot for a 1 user NopCommerce website? Any ideas why this happens and what I can do to rectify it or further investigate?
PS: the worker process uses between 100MB-400MB of memory (private working set), and SQL Server with this database, around 160MB. Do you have any suggestions other then the obvious one to get more RAM? I intend to get one more GB but I fear this will not solve the cpu usage problem.
You've already stated you're going to get more RAM, but don't be surprised how much a lack of RAM can impact the CPU. If your RAM is not able to hold large objects efficiently because of lack of space (and I'd say using 40% of available RAM qualifies), then the CPU has to work harder to page things in and out of virtual memory. 90% is a little overkill for this, but with the server specs you give it's not impossible.
The most likely problem is that there is a hole in your code somewhere. My guess is that you have either an infinite loop or a direct memory leak (resources open during requests that aren't closed perhaps?). Your best bet would be to get the IIS Debug Diagnostics tool, install it and set up reports to find out what is going on directly on the server.