Efficient memory management in R - r

I have 6 GB memory in my machine (Windows 7 Pro 64 bit) and in R, I get
> memory.limit()
6141
Of course, when dealing with big data, memory allocation error occurs. So in order to make R to use virtual memory, I use
> memory.limit(50000)
Now, when running my script, I don't have memory allocation error any more, but R hogs all the memory in my computer so I can't use the machine until the script is finished. I wonder if there is a better way to make R manage memory of the machine. I think something it can do is to use virtual memory if it is using physical memory more than user specified. Is there any option like that?

Look at the ff and bigmemory packages. This uses functions that know about R objects to keep them on disk rather than letting the OS (which just knows about chunks of memory, but not what they represent).

R doesn't manage the memory of the machine. That is the responsibility of the operating system. The only reason memory.size and memory.limit exist on Windows is because (from help("Memory-limits")):
Under Windows, R imposes limits on the total memory allocation
available to a single session as the OS provides no way to do so:
see 'memory.size' and 'memory.limit'.
R objects also have to occupy contiguous space in RAM, so you can run into memory allocation issues with only a few large objects. You could probably be more careful with the number/size of objects you create and avoid using so much memory.

This is not a solution but a suggestion. Use memory efficient objects wherever possible: for instance, use a matrix instead of a data.frame.
Here an example
m = matrix(rnorm(1000), 2, 2)
d = as.data.frame(m)
object.size(m)
232 bytes
object.size(d)
808 bytes

Related

Bnlearn out of memory

I'm running into memory issues using the bnlearn package's structure learning algorithms. Specifically, I notice that score based methods (e.g. hc and tabu) use LOTS of memory--especially when given a non-empty starting network.
Memory usage wouldn't be an issue except that it continually brings down both my laptop (16GB RAM) and a VM I'm using (128 GB RAM), yet the data set in question is a discrete BN with 41 nodes and ~250 rows (69KB in memory). The issue occurs both when running sequentially with 16GB of RAM and in parallel on a VM (32GB/core).
One last bit of detail: Occasionally I can get 100-200 nets with a random start to run successfully, but then one net will randomly get too big and bring the system down.
My question: I'm newer to BNs, so is this just inherent to the method or is it a memory management issue with the package?

When running RStan jobs in R on a cluster, is it possible using too many cores will result in insufficient memory?

I am currently trying to run a parallelized RStan job on a computing cluster in R. I am doing this by specifying the following two options:
options(mc.cores = parallel::detectCores())
rstan_options(auto_write = TRUE)
The above allocates 48 cores available and the total RAM I have is 180 GB. I always thought in theory that more cores and more RAM was better. I am running very long jobs and I am getting insufficient memory errors in my cluster. I am wondering if I am perhaps not giving enough memory to each core. Is it possible that the 48 cores each are splitting the 180 GB and each core is then maxed out?
If I were to use the 180 GB of RAM and instead had 3 cores, would this get around memory errors? Or is it no matter how many cores I have, the total memory will always be used up at some point if its a long job? Thanks!
RStan is only going to utilize as many cores as there are chains (by default 4). And if you are using a shell on a Linux cluster, it will fork rather than making copies of the data. So, if you have idle cores, you are better off utilizing them to parallelize computations within a Stan program using the map_rect function, if possible.
But all of that is probably unrelated to your memory problems. It should not require even 1 GB in most cases.

Force Vagrant to use Swap memory

I have one of those first alu iMacs with 2+2 GB ram. I use Vagrant to emulate advanced development environments, separated for different jobs.
When I have just one vagrant process running in the background, the computer gets to be slow as hell, because it is always out of memory.
The question is: can I use vagrant (or any app) to run only on swap memory, so it leaves all the memory for the os and other apps?
If there is any solution, how can I do that?
The short answer is: No, a process can not run in swap completely.
Processes must have their data in RAM for the CPU to be able to operate on it, infrequently used data is moved out to swap space when there's no longer space available in memory for everything that's loaded.
You could create a larger swap space and use ulimit to limit the amount of memory used by processes (i.e. force them into swap earlier), but this doesn't really address the root of your problem - that you're pretty much at the limit of your 4GB of memory.
Keep in mind that using swap space will always produce performance problems as (even with SSDs) reading from disk is far slower than reading from memory.
Short of upgrading to more memory, you could:
Reduce the amount of memory allocated by your vagrant box;
Use OS X's Activity Monitor to identify and close any programs/processes that are not in use but are still using memory.
but, again, these are just stop-gap solutions.
Simple answer is no.
Control swappiness has to be done within the VM, for example Linux, echo 100 > /proc/sys/vm/swappiness to set swap strategy to most aggressive mode. Remember, you have no control over where processes are running (physical memory VS swap)
However, by doing this, your host/guest will still be slow as hell as simply you don't have enough physical memory.
The ultimate solution is to add more RAM to your iMAC ;-D

How to modify a code acceding to a function?

I will try to explain my problem. There are 365 (global map)files in two directories dir1 and dir2, which have the same format ,byte,extend,etc. I computed the bias between two datasets using the function and code given below as follows:
How can I solve this problem?please
I suspect this is due to memory limitations on a 32-bit system. You want to allocate an array of 933M doubles, that requires 7.6Gb of continuous memory. I suggest you to read ?Memory and ?"Memory-limits" for more details. In particular, the latter says:
Error messages beginning ‘cannot allocate vector of size’ indicate
a failure to obtain memory, either because the size exceeded the
address-space limit for a process or, more likely, because the
system was unable to provide the memory. Note that on a 32-bit
build there may well be enough free memory available, but not a
large enough contiguous block of address space into which to map
it.
If this is indeed your problem, you may look into bigmemory package (http://cran.r-project.org/web/packages/bigmemory/index.html) which allows to manage massive matrixes with shared and file-based memory. There are also other strategies (e.g. using an SQLite database) to manage data that doesn't fit in memory all at once.
Update. Here is an excerpt from Memory-limit for Windows:
The address-space limit is 2Gb under 32-bit Windows unless the OS's default has been changed to allow more (up to 3Gb). See http://www.microsoft.com/whdc/system/platform/server/PAE/PAEmem.mspx and http://msdn.microsoft.com/en-us/library/bb613473(VS.85).aspx. Under most 64-bit versions of Windows the limit for a 32-bit build of R is 4Gb: for the oldest ones it is 2Gb. The limit for a 64-bit build of R (imposed by the OS) is 8Tb.
It is not normally possible to allocate as much as 2Gb to a single vector in a 32-bit build of R even on 64-bit Windows because of preallocations by Windows in the middle of the address space.
Under Windows, R imposes limits on the total memory allocation available to a single session as the OS provides no way to do so: see memory.size and memory.limit.

OpenCL shared memory optimisation

I am solving a 2d Laplace equation using OpenCL.
The global memory access version runs faster than the one using shared memory.
The algorithm used for shared memory is same as that in the OpenCL Game of Life code.
https://www.olcf.ornl.gov/tutorials/opencl-game-of-life/
If anyone has faced the same problem please help. If anyone wants to see the kernel I can post it.
If your global-memory really runs faster than your local-memory version (assuming both are equally optimized depending on the memory space you're using), maybe this paper could answer your question.
Here's a summary of what it says:
Usage of local memory in a kernel add another constraint to the number of concurrent workgroups that can be run on the same compute unit.
Thus, in certain cases, it may be more efficient to remove this constraint and live with the high latency of global memory accesses. More wavefronts (warps in NVidia-parlance, each workgroup is divided into wavefronts/warps) running on the same compute unit allow your GPU to hide latency better: if one is waiting for a memory access to complete, another can compute during this time.
In the end, each kernel will take more wall-time to proceed, but your GPU will be completely busy because it is running more of them concurrently.
No, it doesn't. It only says that ALL OTHER THINGS BEING EQUAL, an access from local memory is faster than an access from global memory. It seems to me that global accesses in your kernel are being coalesced which yields better performance.
Using shared memory (memory shared with CPU) isn't always going to be faster. Using a modern graphics card It would only be faster in the situation that the GPU/CPU are both performing oepratoins on the same data, and needed to share information with each-other, as memory wouldn't have to be copied from the card to the system and vice-versa.
However, if your program is running entirely on the GPU, it could very well execute faster by running in local memory (GDDR5) exclusively since the GPU's memory will not only likely be much faster than your systems, there will not be any latency caused by reading memory over the PCI-E lane.
Think of the Graphics Card's memory as a type of "l3 cache" and your system's memory a resource shared by the entire system, you only use it when multiple devices need to share information (or if your cache is full). I'm not a CUDA or OpenCL programmer, I've never even written Hello World in these applications. I've only read a few white papers, it's just common sense (or maybe my Computer Science degree is useful after all).

Resources