I create API that just return string("OK").
And I test it with following configuration.
I monitor it with visualVM, and notice that heap usage increase continuously.
Eventually jvm perform gc.
Question.
why heap usage increase continuously?
Hello,I guess there is other classes in SpringMVC is created。such as threadlocal
Better than guessing you can actually use jvisualvm to help you find what objects are there in heap by memory sampling. You can see the number of instances of objects and total size in the profiling section as shown in below screenshot. You can take snapshots while your load test runs and you can then later analyze them.
You can even take a heap dump and analyze that with tools like Eclipse Memory Analyzer
Related
I'm trying to get something to work but I run out of ideas so I figured I would ask here.
I have a kernel that has a large global size (usually 5 Million)
Each of the threads can require up to 1Mb of global memory (exact size not known in advance)
So i figured... ok, on my typical target GPU I have 6Gb and I can run 2880 threads in parrallel, more than enough right ?
My idea is to create a big buffer (well actually 2 because of the max buffer size limitation...)
Each thread pointing to a specific global memory area (with the coalescence and stuff, but you get the idea...)
My problem is, How do I know which thread is currenctly being run (in the kernel code) to point to the right memory area ?
I did find the cl_arm_get_core_id extension but this only gives me the workgroup, not the acutal thread being used, plus this does not seem to be available on all GPUs, since it's an extension.
I have the option to have work_group_size = nb_compute_units / nb_cores and have the offset to be arm_get_core_id() * work_group_size + global_id() % work_group_size
But maybe this group size is not optimal, and the portability issue still exists.
I can also enqueue a lot of kernels calls with global size 2880, and there I obviously know where to point to with the global Id.
But won't this lead to a lot of overhead because of the 5Million / 2880 kernel calls ? Plus any work group that finishes before the others will be idle until all workgroups for this call have finished their job.
Any ideas to do this properly are very welcome !
Well, you are storing 1MB per WI for temporal computations (because you are not saving them, otherwise your wouldn't have memory).
Then, why not simply let it spill to global memory? Does the compiler complain? If it does complain, then you need other approaches:
One possibility is to create a queue (just a boolean array), of the memory zones empty for usage by the WorkGroups. And every time a new workgroup is launched it takes an empty slot and sets the boolean to "used" state. You can do this with atomic_cmpxchg() atomic operation.
It may introduce a small overhead to launch each WG, but it would be probably negligible if each WI is needing 1MB of global memory.
Here you have a small example of how to do atomic_cmpxchg() LINK
I have a pretty small dataset, but large enough that it won't fit in the workspace or private memories in any GPU currently on the market. What this means is that each kernel must access the data in the global memory on the GPU. If I replicate this data to multiple copies in the global memory, can it increase performance/reduce latency, or is the memory controller restrictive and will only allow one core to access the global memory at a time? If this is device specific, are there any models which have this feature?
This is very much bound by the memory controller of the video card, and multiple copies of the same data won't help you. I am unaware of a gpu having more than one memory controller for global access.
Your access pattern of the memory will greatly effect the overall throughput of your kernel. Do you have a specific example/kernel that you need optimized?
I am new to NUMA-aware multithreaded programming. I am writing my code such that all the threads and their memory allocation are restricted to one node. At the beginning of the program, I make the following calls:
struct bitmask *bm = numa_parse_nodestring("0");
if (bm == 0) {
exit(1);
}
numa_bind(bm);
My understanding is that a call to numa_bind in this way would bind all threads and all memory allocation to node 0.
Furthermore, when I start pthreads from this code, I bind them to specific CPUs using:
pthread_setaffinity_n
However, when I look at /proc//numa_maps, I can still see that certain libraries (e.g., libc) are bound to the memory on node 1. How can I make sure that all the memory required by the process is bound to node 0?
Shared libraries like libc can't be bound to a memory bank specified by your process/application. Please see shared-library-numa
Code would tend to get cached in the local processor's L3 cache. Since it's read-only it's unlikely to generate any traffic once it's been loaded into cache. I wouldn't bother with it too much, unless you have profiling information showing it does pose a problem.
Running into a prickly problem with our web app here. (Asp.net 2.0 Win server 2008)
Our memory usage for the website, grows and grows even though I would expect it to remain at a fairly static level. (We have a small amount of data that gets stored in state).
Wanting to find out what the problem is, I've run a System.GC.Collect(); a few times, taken a memory dump and then loaded this memory dump into WinDbg.
When I do a DumpHeap -Stat I get an inordinately large number on particular type hanging around in memory.
0000064280580b40 713471 79908752 PaymentOption
so, doing a DumpHeap -MT for this type, I get a stack of object references. Picking a random number of these, I do a !gcroot and the command comes back reporting that no references are held to it.
To me, this is exactly when the GC should collect these items, but for some reason they have been left outstanding.
Can anybody offer an explanation as to what might be happening?
You could try using sosex.dll in Windbg, which is an extension written to help with .NET debugging. There is a command named !refs which is similar to !gcroot, in that it will show you all the objects referencing an object, plus it will show all the objects that it too is referencing.
In the example on the author's website, !refs is used against an object and the output looks like this:
0:000> !refs 0000000080000db8
Objects referenced by 0000000080000db8 (System.Threading.Mutex):
0000000080000ef0 32 Microsoft.Win32.SafeHandles.SafeWaitHandle
Objects referencing 0000000080000db8 (System.Threading.Mutex):
0000000080000e08 72 System.Threading.Mutex+<>c__DisplayClass3
0000000080000e50 64 System.Runtime.CompilerServices.RuntimeHelpers+CleanupCode
Few things:
GC.Collect won't help you do any debugging. The garbage collector is already being called: if any objects were available for collection it would have happened already.
Idle memory on a server is wasted memory. Are you sure memory is being 'leaked', or is it just that the framework is deciding it can keep more things in memroy or keep more memory around for faster access? In this case I suspect you are leaking memory, but it's something to double check for.
It sounds like something you don't expect is keeping a reference to PaymentOption objects. Perhaps a static collection somewhere? Or separate thread?
Does PaymentObject implement a finalizer by any chance? Does it call a STA COM object?
I'd be curious to see the output of !finalizequeue to see if the count of objects that are showing up on the heap are roughly the amount of any that might waiting to be finalized. Output should probably look something like this:
generation 0 has 57 finalizable objects (0409b5cc->0409b6b0)
generation 1 has 55 finalizable objects (0409b4f0->0409b5cc)
generation 2 has 0 finalizable objects (0409b4f0->0409b4f0)
Ready for finalization 0 objects (0409b6b0->0409b6b0)
If the number of Ready for finalization objects continues to grow, and your certain garbage collections are occuring (confirm via perfmon counters), then it might be a blocked finalizer thread. You might need to take several snapshots over the lifetime of the process (before a recycle) to confirm. I usually rely on the magic number of three, as long as the site is under some sort of load.
A bug in a finalizer can block the finalizer thread and prevent the objects from ever being collected.
If the PaymentOption object calls a legacy STA COM object, then this article ASP.NET Hang and OutOfMemory exceptions caused by STA components might point in the right direction.
Not without more info on your application. But we ran into some nasty memory problems a long time ago. Do you use ASP.NET caching? As Raymond Chen likes to say, "poor caching strategy is indisitinguishable from a memory leak."
Check out another tool - CLRProfiler.exe - it will help you traverse object reference trees to see where your objects are rooted. This is also good: link text
You've heard this before - if you have to GC.Collect, something is wrong.
Is the PaymentOption object created in an asynchronous process, by any chance? I remember something about, if you don't call EndInvoke, you can get problems like this.
I've been investigating the same issue myself and was asking why objects that had no references were not being collected.
Objects larger than 85,000 bytes are stored on the Large Object Heap, from which memory is freed up less frequently.
http://msdn.microsoft.com/en-us/magazine/cc534993.aspx
A single PaymentOption may not be that big, but are they contained within collections, or are they based on something like a DataSet? You should pick on few instances of the PaymentOption / collection / DataSet and then use the sos !objsize command to see big they are.
Unfortunately this doesn't really answer the question. I like to think I can trust the .net framework to take care of releasing unused memory whenever it needs to. However I see a lot of memory being used by the worker process running the app I am looking at, even when memory looks quite tight on the server.
FYI, SOS in .NET 4 supports a few new commands that might be of assistance, namely !gcwhere (locate the generation of an objection; sosex's gcgen) and !findroots (does what it says on the tin; sosex's !refs)
Both are documented on the SOS documentation and mentioned on Tess Ferrandez's blog.
Can I control the memory limit (i.e. when GC has to run) in my Flex application?
Check out the flash.system.System class. The "totalMemory" property will show you (in bytes) how much memory the current application is using. Calling System.gc() will run a GC. You could use a Timer to periodically check totalMemory and then preform gc if it exceeds a threshold. More info:
http://livedocs.adobe.com/flex/3/langref/flash/system/System.html
I am not 100% sure, but I believe the answer is no. Read this article.
I don't think so. That is probably a parameter of the flash player based at the client, and I assume it is also dependant on the exact resources the client machine has, i.e. more RAM means less frequent gc, etc.