I'm running a simple CRUD app built with ASP.NET Core and EF Core 3.1 in a docker swarm cluster on ubuntu. I'm only using managed code.
The container has a 10GB memory limit specified. I can inspect a running container and verify that this limit is actually set, I also see that DOTNET_RUNNING_IN_CONTAINER is set to true. When the app is started the memory consumption is about 700MB and it slowly builds up. Once it reaches 7GB (I see it in container generic stats) I start getting OutOfMemoryExceptions and it stays at this level for days. So the first question is
Why doesn't it go up to 10 GB?
Anyway I expect memory leaks so I have a dotnet-gcdump tool installed in this same container so I go ahead and collect the dump for future analysis with dotnet-gcdump collect. Once I execute this command I see the memory consumption of the running container drops from 7GB to 3GB and stays at this level. The resulting .gcdump file itself size though is only ~200MB with nothing suspicious in it. So next questions are
How does the collection of a dump reduce memory consumption? I'd assume it's doing GC with LOH compaction but it doesn't mention it in the docs.
Why isn't this memory freed automatically if the tool is able to do it?
Why is a resulting dump only 200 MB in size?
As the gcdump documentations explains: "GC dumps are created by triggering a GC in the target process, turning on special events, and regenerating the graph of object roots from the event stream".
Thus, it directly answers your question 2 - it triggers full GC, which may or may not be compacting, but it collects gen2 for sure. It also answers question 4 - it is not a "memory dump" but a special kind of diagnostics data about the objects graph (depndencies and typenames), without the data itself.
And regards to the questions 1 and 3 - it is an example of the GC being "not aggressive" enough. It is kind of the "living on the edge" problem when the process almost meets the containers limits and GC sometimes is not able to interpret it. In other words, it thinks it has enough space, but it doesn't. Please, be warned that this is a super-oversimplification. In such a case full GCs may not happen or happen too late. I would confirm that by observing the process by the dotnet-trace with gc-collect profile.
As a solution, consider setting the limit manually, by using GCHeapHardLimit, to some clearly smaller value like 8GB.
Related
I Just want to understand how we should plan for the capacity of a NiFi instance.
We have a NiFi instance which is having around 500 flows. So, the total number of processors enabled on NiFi canvas is around 4000. We do run 2-5 flows simultaneously which does not take more than half an hour i.e. we do process data in MBs.
It was working fine till now but we are seeing outofMemory error very often. So we increased xms and xmx parameters from 4g to 8g which has resolved the problem for now. But going forward we will have more flows and we may face outofmemory issue again.
So, can anyone help with matrix of capacity planning or any suggestion to avoid such issues before happening? eg:- If we have 3000 processors enabled with/without any processing then Xg amount memory required.
Any input on NiFi capacity planning would be appreciated.
Thanks in Advance.
OOM errors can occur due to specific memory consuming processors. For example: SplitXML is loading your whole record to memory, so it could load a 1GiB file for instance.
Each processors can document what resource considerations should be taken. All of the Apache processors(as far as I can tell) are documented in that matter so you can rely on them.
In our example, by the way, SplitXML can be replaced with SplitRecord which doesn't load all of the record to memory.
So even if you use 1000 processors simultaneously, they might not consume as much memory as one processor that loads your whole FlowFile's content to memory.
Check which processors you are using and make sure you don't use one like that(there are more like this one that load the whole document to memory).
We got a high traffic website which generates a lot of I/O. Within 10 minutes it has been reading over 10 gb of data (w3wp in question seen in task manager). For memory and application hangs I have been using WinDbg with success. But I don't know how I can find the object(s) / method(s) within a process which are responsible for the highest I/O.
Is this even possible?
Edit
The question is: Is there a way to profile I/O operations in a .NET assembly, say: list of threads sorted by highest disk I/O (or something similar that would help me where to look)
ANTS Performance Profiler
I have used this tool to great success - dealing with finding the specific instructions which are causing ~512GB of memory on a high-volume web farm getting chewed up within 5-10 minutes. Sounds like a very similar situation as yours.
Now, to be realistic - it's not going to magically solve your problem. It still requires a lot of setup, thorough analysis and detective work. But this tool definitely took the problem from "practically unsolvable" to "solvable within days".
Update:
As I mentioned in the comments (and Ben Emmett echoed), we can use ANTS to monitor memory, file system handles - pretty much any resource consumption and drill down the call stack to see the effects of specific routines.
I came up with this tool AppDynamics Lite which displays your application calls costs and performance in a visual way. It might help you to find out which functions are making the most costy IO operations.
Quoting;
Understand the health of your CLR with key metrics like response time, throughput, exception rate, and garbage collection time as well as key system resource like CPU, memory and disk I/O.
Worth giving a shot as it is trial/free for 30 days. Hope it helps.
Ps: I'm not affiliated with AppDynamics in any way.
You can use the (free) Windows Performance Toolkit from Windows 8 which does run also on Windows Vista and later. There you can turn on system wide profiling to see what was going on in all processes at once. No instrumentation necessary. Only one reboot is required to set an arcane registry key which is done by WPRUI.exe automatically.
With XPerf you could enable IO Init stack walking so that a call stack is taken for every IO which is started. The only issue is that the stacks will be broken for 64 bit processes which means that you will see only the first method above the BCL methods of your code because there is a Windows 7 bug in the stackwalking capabilities of the OS.
A workaround is to Ngen your assemblies or move to Server 2012 or switch to x86 for profiling to see deeper call stacks.
You will see all file IO and CPU activity even without any call stacks and the file names along how long the hard disc was used. That should give you good information which part of your app is causing the disc IO. From the partial call stacks you should be able to pinpoint your issue even without full stacks.
The tool will give you much more insight than any commercially available profiler at the expense that you need to learn how to use it. Since the call stacks do not end at your code or in user mode but in the kernel you can also determine if e.g. the virus scanner is causing significant IO delays. But you need to know how your processor does work. This toolset was originally aimed at kernel devs which explains why you see so many useless columns.
In the picture below you see file IO and CPU consumption stacked. When you select your high IO file in the disc IO graph it will highlight in the CPU consumption all related call stacks which were taken at the same time while the IO was active. This way you can diretly navigate from the IO to your potentially blocked threads.
The problem is with Memory management because I keep receiving “Out of Memory exception”.
Here are the scenarios where we face the problem:
Please note:
1. The site/application is developed in ASP.Net and uploaded on a server with the following specs:
- Windows Server 2008 (R2) Standard
- Intel Xeon L5520#2.27GHz 2.27GHz
- RAM = 8GB
- System Type = 64bit
The application is event management based web application where the requirements include saving huge amount of data in Sessions etc (mentioning this in case it is relevant)
The applications/site works fine until we:
Edit a file directly on the server
Update a file from repository
Copy/Paste a file (we don’t usually edit code using this technique)
Please note, all of the above hold true ONLY when the traffic to the site is high that is,
The issue/error “Out of Memory” is not produced when the traffic/visits is low
Details of:
System Properties > Advanced > Performance Settings > Advanced tab
Total paging file size for all drives: 16362 MB
In web.config
Is there any way we can debug this problem to the core and find out a solution. Can you please provide links/help where we can further investigate this problem?
Best regards,
Farrukh
Out of Memory Exceptions are common with applications that see periodic transaction surges while keeping larger volumes of data in memory. This problem does, however, depend on your application and architecture. Below are a few pointers:
Hardware - you have Xeon 5500 (Intel Nehalem chips). These are very good at handling memory. You should be good here.
OS - Windows Server 2008 R2 - As an OS this system will handle more than enough memory for you (you are good here, see link for capabilities: Memory Limits for Windows)
Physical Memory - Did you say you have 8 GB on the server? Note you app is allowing 16 GB. There is one issue. If your app requests more memory than physically available you will see your error. But this is not your only concern ...
CLR / GC limitations - Your application has a "paging file size" of 16+ GB. This is probably your issue.
GC is the heart of your problem for you. In terms of why, it is the same reason Java and the JVM have issues whenever an application exceeds 2-4 GB. That requires a look at the actual process of GC.
You have "old generation" and "young generation" Garbage Collection processes. As you app runs the CLR tries to keep your memory space organized. These processes force all threads to pause (phase changes) when GC mark and swap processes occur. The problem here is, depending on how your code is written and the amount of memory you keep around for long periods, you can run into memory issues.
Any time you press a runtime environment to exceed the 4 GB threshold you will see exponential increases in collection times. When you hit the "stop the world" pause (the old gen GC where everything gets cleaned up) the CLR has to go through the entire heap and de-allocate memory. Based on your app, 16 GB may give you issues even with more physical memory (Windows Server 2008 R2 - Enterprise or DataCenter can support 2 TB). Even if you feed it more physical memory you may see LONG collection times when your full GC hits.
Ideally I would do the following:
Get more physical memory (you never want to come withing 600MB of your total physical memory allocated to your application to avoid out of memory errors, but your buffer does depend on your load and the application's ability to handle it ... you may want a larger safety net to be safe).
Once you have the physical memory you need run GC logs while stressing the app. This will give you an idea where you see exponential degradation in performance and what level your app can support when considering Heap size (Memory). You may want to find a way to get your 16GB page down to a smaller size. I do know with .Net 4.0 Microsoft has made some solid improvements to the GC process, including allowing a background thread to maintain GC. This should give you the ability to support larger heaps (in theory) ... but nothing beats real tests on the app. Check out this link for more info:
Garbage Collection Performance (Asp.net 4.0) - Also, as I am limited on links. Navigate to the Fundamentals page for some great explanations on new GC features of ASP.Net 4.0
(http://msdn.microsoft.com/en-us/library/ee787088.aspx#concurrent_garbage_collection)
Hope this helps!
PS - Anyone out there on lesser hardware will need to be aware of the ASP.NET use of the GC thread. If you are running something in development like a Core Duo you have to consider that 50% of your compute power will go to GC optimization. This means that Hardware (number of cores) is important to consider. If you have more than you need this process should theoretically help performance. If you are constrained on cores either get better hardware or use an older version of ASP.Net or consider turning the feature off (if possible). Second, if latency is a concern, using "hyper-threading" does have an impact on performance as well. You always get better performance on "physical" cores ... but that will not be a concern for 99.9% of the applications out there.
2 GB by default. If the application is large address space aware (linked with /LARGEADDRESSAWARE), it gets 4 GB (see http://msdn.microsoft.com/en-us/library/aa366778.aspx)
They're still limited to 2 GB since many application depends on the top bit of pointers to be zero.
I have a small WCF service which is executed on an XP box with 256 megs of RAM running in VM.
When I make a request (with a request size of approximately 5mbs) to that service I always get the following message in the event log:
aspnet_wp.exe was recycled because memory consumption exceeded the 153 MB (60 percent of available RAM).
and the call fails with error 500.
I've tried to increase memory limit to 95% but it still takes up all the available memory and fails in the same manner.
It looks like something is wrong with my app (I do not reuse byte[] buffers and maybe something else) but I cannot find root cause of such memory overuse.
Profiling showed that all CLR objects that I have in memory together do not take up that much space.
Doing a dump analysis with windbg showed same situation - nothing that big in object heap.
How can I find out what is contributing to such memory overuse?
Is there any way to make a dump right before process is recycled (during peak mem usage)?
Tess Ferrandez's blog "If broken it is, fix it you should" has lots of hints, tips and recommendations for sorting out exactly this sort of problem.
Of particular use to you would be Lab 3: Memory, where she walks you through working out what has caused all the memory on your machine to disappear.
Could be a lot of things, hard to diagnose this one. Have you watched perfmon to see if the memory usage does peak on aspnet process or on the server itself? 256MB is pretty low, but it should still be able to handle it. Do you have a SWAP file on this machine? AT what point do you take the memory dump? Have you stepped though the code, and does it work on other machines? Perhaps it is getting stuck in a loop and leaking memory until it crashes?
Environment:
Windows 2003 Server (32 bit); IIS6, ASP.NET 2.0 (3.5); 4Gb Ram; 1 Worker Process
We have a situation where we have a very large System.XmlDocument is being loaded into memory, and then it heads into a complied XSL transform.
What is happening is when a web request comes in the server is sitting in an idle state with 2500Mb of available system memory.
As the XML DOM is populated, the available memory drops approx 500Mb at which point we get a System.OutOfMemoryException event. At this point the system should theoretically still have 2000Mb of available memory available to service the request (according to Perfmon).
The related questions I have are:
1) At what level in the stack is this out of memory limitation being met? OS? IIS? ASP.NET? worker process? Is this a per individual web request limit?
2) Is this limit configurable somewhere?
3) Why can’t this web request access the full available system memory?
1) I would guess at the worker process but this should be configurable within IIS to the limit of memory that a worker process can use. Another factor is what level of bits does your software use, e.g. 32 bit has a physical limit of 4 GB since this is the total address space.
2) Probably but don't forget that memory fragmentation may play a role in getting to out of memory faster than you think, e.g. if there is a memory request for a contiguous 1000 Mb piece of memory then this may not necessarily be found in the current memory.
3) Have you examined dump data to see what is in the memory when the exception gets thrown? If not, there are ways to get a snapshot of the memory to see what it looks like as this may give you more clues about what is going on.
You are running in a process. A process can only access 2 gigs of memory. This task is sharing memory with everything else running in this process, so this bit of code does not get the full 2 gig -- even if it is available.
There is a 3 gig switch on the os as well. I believe it is a registry setting. But you will have to search MSDN to find that info.
But realistically, you need to do this another way. Possibly by switching to a SAX style xml parser.
I'm sure there are some bright heads here that can answer your specific questions, but have you asked yourself if there is another way to do what you want? I specifically mean that you probably do not want to process a very large XML document, but you probably more specifically want to return something back to the client. Could you rewrite the code to avoid this XML document altogether, or perhaps not load it all into memory at the same time, and still produce the same end-result?
1) Dunno. Check your logs.
2) IIS limits memory divvied out to websites/application pools. Check your settings.
3) Servers are all about uptime; if an single app hogs all the resources everybody else suffers. Thats why enterprise apps like IIS limit memory to prevent runaways from taking down the entire server.