Measuring memory usage of distributed arrays - julia

I've been having trouble measuring the memory usage of distributed arrays and could not find a solution on Julia Discourse. Consider the following example.
addprocs(2)
#everywhere using DistributedArrays
m = [rand(100,100) for i in 1:1000]
d = distribute(m)
How much memory does the array d use? I tried the following code, but its not quite right.
memory = sizeof(d)
for a in d
memory += sizeof(a)
end
It does not agree with the total memory usage indicated by my system. Any help would be greatly appreciated.

Related

Rf_allocVector only allocates and does not zero out memory

Original motivation behind this is that I have a dynamically sized array of floats that I want to pass to R through Rcpp without either incurring the cost of a zeroing out nor the cost of a deep copy.
Originally I had thought that there might be some way to take heap allocated array, make it aware to R's gc system and then wrap it with other data to create a "Rcpp::NumericVector" but it seems like that that's not possible - or doable with my current knowledge.
However and correct me if I'm wrong it looks like simply constructing a NumericVector with a size N and then using it as an N sized allocation will call R.h's Rf_allocVector and that itself does not either zero out the allocated array - I tested it on a small C program that gets dyn.loaded into R and it looks like garbage values. I also took a peek at the assembly and there doesn't seem to be any zeroing out.
Can anyone confirm this or offer any alternate solution?
Welcome to StackOverflow.
You marked this rcpp but that is a function from the C API of R -- whereas the Rcpp API offers you its constructors which do in fact set the memory tp zero:
> Rcpp::cppFunction("NumericVector goodVec(int n) { return NumericVector(n); }")
> sum(goodVec(1e7))
[1] 0
>
This creates a dynamically allocated vector using R's memory functions. The vector is indistinguishable from R's own. And it has the memory set to zero
as we use R_Calloc, which is documented in Writing R Extension to setting the memory to zero. (We may also use memcpy() explicitly, you can check the sources.)
So in short, you just have yourself confused over what the C API of R, as well as Rcpp offer, and what is easiest to use when. Keep reading documentation, running and writing examples, and studying existing code. It's all out there!

Max Array Length in Julia

I can create an array of a million elements like this:
Array(1:1_000_000)
Vector{Int64} with 1000000 elements
but if I try to create an array of a billion elements I get this:
Array(1:1_000_000_000)
Julia has exited.
Press Enter to start a new session.
Is Julia not able to handle a billion elements in an array or what am I doing wrong here?
You are creating an Array of Int64, each of which needs to be stored in memory:
julia> sizeof(3)
8
So at some point you're bound to run out of memory - this is not due to some inherent limit on the number of elements in an array, but rather the size of the overall array, which in turn depends on the size of each element. Consider:
julia> sizeof(Int8(3))
1
julia> [Int8(1) for _ in 1:1_000_000_000]
1000000000-element Array{Int8,1}:
1
1
1
⋮
1
1
1
so filling the array with a smaller data type (8-bit rather than 64-bit Integer) allows me to create an array with more elements.
While there is no limit how big an Array can be in Julia there is obviously the available RAM memory limit (mentioned in the other answer). Basically, you can assume that all your available system memory can be allocated for a Julia process. sizeof is a good way to calculate how much RAM you need.
However, if you actually do big array computing in Julia the above limit can be circumvented in many ways:
Use massive memory machines from a major cloud computing provider. I use Julia on AWS Linux and it walks like a charm - you can have a machine up to 4TB RAM on a virtual machine and 24TB RAM on a bare metal machine. While it is not a Julia solution, sometimes it is the easiest and cheapest way to go.
Sometimes your data is sparse - you do not actually use all of those memory cells. In such cases consider SparseArrays. In other cases your sparse data is formatted in some specific way (e.g. non-zero values only on diagonal) in that case use BanndedMatrices.jl. It is worth noting that there is even a Julia package for infinite algebra. Basically whatever you find at the Julia Matrices project is worth looking at.
You can use memory mapping - that means that most of your array is on disk and only some part is hold in RAM. In this way you are limited by your disk space rather than the RAM.
You can use DistributedArrays.jl and have a single huge Array hosted on several machines.
Hope it will be useful for you or other people trying to do big data algebra in Julia.

In R, how can I know the maximum amount of memory that a function uses during execution?

Or is there a way to calculate the maximum amount of memory from a memory profiling method like profvis?
R has a built-in profiler Rprof capable for tracking this:
Rprof(filename = "Rprof.out", memory.profiling = TRUE)

Creating distance matrix in R for a matrix in a higher dimensional space

I have created an euclidean distance matrix using dist() function in R.
Below is my R script. As the dimensions of matrix would be 16809 * 16809 while running this script in R I got the error message:
Error: cannot allocate vector of size 1.1 Gb
So is there any way to get rid of this error?
I haven't used parallelization in R previously. Can it be done using parallelization?
rnd.points = matrix(runif(3 * 16809), ncol = 3)
rnd.points <- rnd.points[1:5,]
ds <- dist(rnd.points)
as.matrix(ds) -> nt
nt
As #Gopola said: dist(.) computes all pairwise distances, and hence needs
O(n^2) memory. Indeed, dist() is efficient and only stores half of the symmetric n x n matrix.
If I compute dist() on a computer with enough RAM, it works nicely, and indeed creates an object ds of size 1.1 Gb ... which is not so large for today's computers.
rnd.points <- matrix(runif(3 * 16809), ncol = 3)
ds <- dist(rnd.points)
object.size(ds)
Note however that your
as.matrix(ds) -> nt
is not such a good idea as the resulting matrix nt is indeed (almost) twice the size of ds, as nt is of course a n x n matrix.
O/S has a principal limit on RAM-addressing ( smaller for a 32-bit system, larger for 64-bit system )
O/S next has a design-based limit for a max RAM a process can allocate ( +kill-s afterwards )
Had the same InRAM constraints in python and went beyond that
Sure, at some cost, but was a worth piece of experience.
python numpy has a wonderfull feature for this very scenario seamlessly inbuilt - a .memmap(). The word seamlessly is intentionally emphasised, as this is of the core importance for your problem re-formulation / re-design costs. There are tools available, but it will be your time to master 'em and to re-design your algoritm ( libraries et al ) so as these can use the new tools - guess what - SEAMLESSLY. This is the hidden part of the iceberg.
Handy R tools available:
filebacked.big.matrix which also supports an HPC cluster-wide sharing for distributed processing ( thus solving both PSPACE and PTIME dimensions of the HPC processing challenge, unless you fortunately hit the filesystem fileSize ceiling )
ff which allowslibrary(ff)pt_coords <- ff( vmode = "double", dim = c(16809, 3), initdata = 0 )pt_dists <- ff( vmode = "double", dim = c(16809, 16809), initdata = -1 )and work with it in as simple as in matrix-alike [row,column] mode to fill in the points and process their pair-wise distances et al,
?ffsave for further details on saving your resulting distances data
and last, but not least
mmap + indexing
Parallel? No.Distributed?Yes, might help with PTIME:
As noted with filebacked.big.matrix there are chances to segment the computational PSPACE into smaller segments for distributed processing and reduction of the PTIME, but the concept is in principle just a concurrent (re)-use of available resouces, not the [ PARALLEL ] system-behaviour ( while it is necessary to admit, that lot of marketing ( the bad news is that even the technology marketing has joined this unfair and knowingly incorrect practice ) texts mis-uses the word parallel / parallelism in places, where a just concurrent system-behaviour is observed ( there are not many real, true-PARALLEL, systems ) ).
Conclusion:
Big matrices are doable in R well beyond the InRAM limits, select the tools most suitable for your problem-domain and harness all the HPC-resources you may.
Error: cannot allocate vector of size 1.1 Gb is solved.
There is nothing but resources, that imposts limits and delays on our computing-ready tasks, so do not hesitate to make your move while computing resources are still available for your Project, otherwise you will find yourself, with all the re-engineered software ready, but waiting in a queue for the computing resources.

tracking memory usage and garbage collection in R

I am running functions which are deeply nested and consume quite a bit of memory as reported by the Windows task manager. The output variables are relatively small (1-2 orders of magnitude smaller than the amount of memory consumed), so I am assuming that the difference can be attributed to intermediate variables assigned somewhere in the function (or within sub-functions being called) and a delay in garbage collection. So, my questions are:
1) Is my assumption correct? Why or why not?
2) Is there any sense in simply nesting calls to functions more deeply rather than assigning intermediate variables? Will this reduce memory usage?
3) Suppose a scenario in which R is using 3GB of memory on a system with 4GB of RAM. After running gc(), it's now using only 2GB. In such a situation, is R smart enough to run garbage collection on its own if I had, say, called another function which used up 1.5GB of memory?
There are certain datasets I am working with which are able to crash the system as it runs out of memory when they are processed, and I'm trying to alleviate this. Thanks in advance for any answers!
Josh
1) Memory used to represent objects in R and memory marked by the OS as in-use are separated by several layers (R's own memory handling, when and how the OS reclaims memory from applications, etc.). I'd say (a) I don't know for sure but (b) at times the task manager's notion of memory use might not accurately reflect the memory actually in use by R, but that (c) yes, probably the discrepancy you describe reflects memory allocated by R to objects in your current session.
2) In a function like
f = function() { a = 1; g=function() a; g() }
invoking f() prints 1, implying that memory used by a is still being marked as in use when g is invoked. So nesting functions doesn't help with memory management, probably the reverse.
Your best bet is to clean-up or re-use variables representing large allocations before making more large allocations. Appropriately designed functions can help with this, e.g.,
f = function() { m = matrix(0, 10000, 10000); 1 }
g = function() { m = matrix(0, 10000, 10000); 1 }
h = function() { f(); g() }
The large memory of f is no longer needed by the time f returns, and so is available for garbage collection if the large memory required for g necessitates this.
3) If R tries to allocate memory for a variable and can't, it'll run its garbage collector a and try again. So you don't gain anything by running gc() yourself.
I'd make sure that you've written memory efficient code, and if there are still issues I'd move to a 64bit platform where memory is less of an issue.
R has facilities for memory profiling, but it needs to be built that. While we enable that for Debian / Ubuntu, I do not know what the default for Windows is.
Usage of memory profiling is discussed (briefly) in the 'Writing R Extensions' manual.
Coping with (limited) memory on a 32-bit system (and particularly Windows) has its challenges. Most people will recommend that you switch to a system with as much RAM as possible running a 64-bit OS.

Resources