Does h2o cloud require a large amount of memory? - r

I'm trying to set up a H2O cloud on a 4 data nodes hadoop spark cluster using R in a Zeppelin notebook. I found that I have to give each executor at least 20Gb of memory before my R paragraph stops complaining about running out of memory (java error message of GC out of memory).
Is it expected that I need 20Gb of memory per executor for running an H2O cloud? Or are there any configuration entries that I can change to reduce the memory requirement?

There isn't enough information in this post to give specifics. But I will say that the presence of Java GC messages is not necessarily a problem, especially at startup. It's normal to see a flurry of GC messages at the beginning of a Java program's life as the heap expands from nothing to it's steady-state working size.
A sign that Java GC really is becoming a major problem is when you see back-to-back full GC cycles that have a real wall-clock time of seconds or more.

Related

.NetCore App Memory Leak - high Overhead|Unused memory

Working on a .Net Core app that reads data from source, transforms it, stores in in-memory queue, batches transformed data and writes it to a sink. As the process runs for a longer time, we observe that the memory of the VM starts decreasing until it is completely over, and I start getting "Out-of-memory" exceptions. We monitored the in-memory queue in the program, there is no data piling up there. I created a memory dump of the program from "Task Manager".
Most of the memory seems to be in Overhead|Unused. What does this mean? How can I fix this?
Something similar happened to me when the gc mode was server. In this mode the GC collects memory in much bigger chunks, so it might take a while until GC starts collecting memory. In my case the process reached around 20GB. When I changed to workstation mode the process reached around 1GB of memory
https://learn.microsoft.com/en-us/dotnet/core/run-time-config/garbage-collector#workstation-vs-server

dotnet-gcdump unexpected dump size and impact

I'm running a simple CRUD app built with ASP.NET Core and EF Core 3.1 in a docker swarm cluster on ubuntu. I'm only using managed code.
The container has a 10GB memory limit specified. I can inspect a running container and verify that this limit is actually set, I also see that DOTNET_RUNNING_IN_CONTAINER is set to true. When the app is started the memory consumption is about 700MB and it slowly builds up. Once it reaches 7GB (I see it in container generic stats) I start getting OutOfMemoryExceptions and it stays at this level for days. So the first question is
Why doesn't it go up to 10 GB?
Anyway I expect memory leaks so I have a dotnet-gcdump tool installed in this same container so I go ahead and collect the dump for future analysis with dotnet-gcdump collect. Once I execute this command I see the memory consumption of the running container drops from 7GB to 3GB and stays at this level. The resulting .gcdump file itself size though is only ~200MB with nothing suspicious in it. So next questions are
How does the collection of a dump reduce memory consumption? I'd assume it's doing GC with LOH compaction but it doesn't mention it in the docs.
Why isn't this memory freed automatically if the tool is able to do it?
Why is a resulting dump only 200 MB in size?
As the gcdump documentations explains: "GC dumps are created by triggering a GC in the target process, turning on special events, and regenerating the graph of object roots from the event stream".
Thus, it directly answers your question 2 - it triggers full GC, which may or may not be compacting, but it collects gen2 for sure. It also answers question 4 - it is not a "memory dump" but a special kind of diagnostics data about the objects graph (depndencies and typenames), without the data itself.
And regards to the questions 1 and 3 - it is an example of the GC being "not aggressive" enough. It is kind of the "living on the edge" problem when the process almost meets the containers limits and GC sometimes is not able to interpret it. In other words, it thinks it has enough space, but it doesn't. Please, be warned that this is a super-oversimplification. In such a case full GCs may not happen or happen too late. I would confirm that by observing the process by the dotnet-trace with gc-collect profile.
As a solution, consider setting the limit manually, by using GCHeapHardLimit, to some clearly smaller value like 8GB.

ArrayFire CPU, will it run out of memory due to late GC?

I am not entirely sure how ArrayFire manages memory on the RAM when using CPU mode. Based on task manager observation, it appears the device memory on RAM is not released right away, it looks like there is a GC stage.
Is this true?
What happen if I wanted to allocate a lot of RAM when GC hasn't release the device memory (RAM)? Will I run out of RAM? Or will that triggers GC somehow?
I am running into memory issues when allocating host memory (not device memory) and I still trying to figure out what's wrong. In the meantime, does GC really exists on CPU mode and will it cause out of memory if GC is triggered too late? And how should I fix this?
Thank you
ArrayFire will cache allocations and reuse them for later operations. Based on some heuristics or if an allocation fail, ArrayFire will call the garbage collector. You can manually run garbage collector by calling deviceGC which will release unlocked(unused) memory.

Can I tell if application has memory leak only based on it's memory consumption?

I was told on one of environments ASP.NET application consumes even up to 64GB of RAM. I don't know how long it takes to consume it and I have not tried to monitor this app with any kind of tool yet. But I suspect that this is some memory leak. My colleague said that maybe it is not and that it's possible that GC decides not to garbage collect because it still has 64GB RAM left.
From what I understand it's not possible to use that much of RAM without some extensive caching built in and I have not seen this in this applications' source code. I know GC can decide to grow Generation 0 when it sees that it needs more space but in order to consume 64GB this memory must be used by either Gen2 or LOH right? This is Business Intelligence app and it does store some data in Session between postbacks so that it does not hit data warehouse every time but still 64GB of RAM consumed seems suspicious to me.

aspnet_wp keeps recycling because of high memory consumption. How can I fix it?

I have a small WCF service which is executed on an XP box with 256 megs of RAM running in VM.
When I make a request (with a request size of approximately 5mbs) to that service I always get the following message in the event log:
aspnet_wp.exe was recycled because memory consumption exceeded the 153 MB (60 percent of available RAM).
and the call fails with error 500.
I've tried to increase memory limit to 95% but it still takes up all the available memory and fails in the same manner.
It looks like something is wrong with my app (I do not reuse byte[] buffers and maybe something else) but I cannot find root cause of such memory overuse.
Profiling showed that all CLR objects that I have in memory together do not take up that much space.
Doing a dump analysis with windbg showed same situation - nothing that big in object heap.
How can I find out what is contributing to such memory overuse?
Is there any way to make a dump right before process is recycled (during peak mem usage)?
Tess Ferrandez's blog "If broken it is, fix it you should" has lots of hints, tips and recommendations for sorting out exactly this sort of problem.
Of particular use to you would be Lab 3: Memory, where she walks you through working out what has caused all the memory on your machine to disappear.
Could be a lot of things, hard to diagnose this one. Have you watched perfmon to see if the memory usage does peak on aspnet process or on the server itself? 256MB is pretty low, but it should still be able to handle it. Do you have a SWAP file on this machine? AT what point do you take the memory dump? Have you stepped though the code, and does it work on other machines? Perhaps it is getting stuck in a loop and leaking memory until it crashes?

Resources