I am trying to initialize a two-dimensional array and then filling it up gradually. However whenever I try to initialize it, it gives Out of Memory error.
D = zeros(1000000, 1000000);
Is there any way to resolve the error and get a workaround for this ?
The problem is that an array of this size would take almost 8TB of ram. If you want an array this big where almost all of the elements are 0, you can use spzeros(1000000, 1000000) (defined in SparseArrays).
Related
Original motivation behind this is that I have a dynamically sized array of floats that I want to pass to R through Rcpp without either incurring the cost of a zeroing out nor the cost of a deep copy.
Originally I had thought that there might be some way to take heap allocated array, make it aware to R's gc system and then wrap it with other data to create a "Rcpp::NumericVector" but it seems like that that's not possible - or doable with my current knowledge.
However and correct me if I'm wrong it looks like simply constructing a NumericVector with a size N and then using it as an N sized allocation will call R.h's Rf_allocVector and that itself does not either zero out the allocated array - I tested it on a small C program that gets dyn.loaded into R and it looks like garbage values. I also took a peek at the assembly and there doesn't seem to be any zeroing out.
Can anyone confirm this or offer any alternate solution?
Welcome to StackOverflow.
You marked this rcpp but that is a function from the C API of R -- whereas the Rcpp API offers you its constructors which do in fact set the memory tp zero:
> Rcpp::cppFunction("NumericVector goodVec(int n) { return NumericVector(n); }")
> sum(goodVec(1e7))
[1] 0
>
This creates a dynamically allocated vector using R's memory functions. The vector is indistinguishable from R's own. And it has the memory set to zero
as we use R_Calloc, which is documented in Writing R Extension to setting the memory to zero. (We may also use memcpy() explicitly, you can check the sources.)
So in short, you just have yourself confused over what the C API of R, as well as Rcpp offer, and what is easiest to use when. Keep reading documentation, running and writing examples, and studying existing code. It's all out there!
I want to use the plotting functionality of Plots.jl with an image loaded using the load_image() function of ArrayFire.
What I have is :
AFArray: 1000×300×3 Array{Float32,3}
What I want is :
300×1000 Array{RGB{Any},2} with eltype RGB
I couldn't be able to find direct conversion in documentations. Is there any efficient way to do this?
I don't know specifically about ArrayFire arrays, but in general you can use reinterpret for operations like this. If you want the new array to reside on the cpu, then copy it over.
Then, ideally, you could just do
rgb = reinterpret(RGB{Float32}, A)
Unfortunately, MxNx3 is not the optimal layout for RGB arrays, since you want the 3-values to be located sequentially. So you should either make sure that the array has 3xMxN-layout, or you can do permutedims(A, (3, 1, 2)).
Finally, to get a matrix, you must drop the leading singleton dimension, otherwise you get a 1xMxN array.
So,
rgb = dropdims(reinterpret(RGB{Float32}, permutedims(A, (3, 1, 2))); dims=1)
I assumed that you actually want RGB{Float32} instead of RGB{Any}.
BTW, I'm not sure how this will work if you want to keep the final array on the GPU.
Edit: You might consider reshape instead of dropdims, it seems slightly faster on my pc.
Suppose i have an array z given by :
z = array(runif(100*50*200),c(100,50,200))
Is there a faster way to do :
dim(z) = c(100,50,1,200)
z = z[,,rep(1,300),]
Note that this is an exemple, the new dimension where i want to repeat the array is not always the 3rd, and the starting dimension of the array is not always 3.
profvis::profvis() only shows that the Garbage collector takes a certain time in the computation, but it does not show other internals..
It might be an allocation issue, although i'm not shure why it takes that kind of time. I have several of those very basic calls in my code, and 95% of my runtime is spend there.. So even if it's unavoidable, can you explain to me why is it so long ?
After compacting an array(putting required elements from an input array into an output array) by doing a scan operation, there might be some empty spaces left in the output(compacted) array in a contiguous form after the required elements are placed. Is there a way to free these empty spaces in the OpenCL kernel code itself without going back in the host(just for the sake of deleting)...?
for eg I have an input array of 100 elements with some no.s greater than 50 and some of them less than 50 and want to store the no.s more than 50 in a different array and do further processing only on those elements in that array, and I don't know the size of this output array since I don't know how many no.s are actually greater than 50(so I declare the size of this array to be 100)... then after performing a scan I get the output array with all elements more than 50... but there might be some continuous spaces empty in the output array after the storage of these elements... then how do we delete these spaces... Is there a way of doing this in the kernel code itself...? Or do we have to come back in the Host code for this...?
How do we deal with such compacted arrays to do further processing if we can't delete the remaining spaces in the kernel code itself and also if we don't want to go back in the host code..?
There is no simple solution to your problem I'm afraid.
What I think you might do, is to have a counter of the elements in each array. You can increment the counter first locally with atomic_inc() and then globally with atomic_add().
This way at the end of your kernel execution the total number of elements in each array will be present.
You can also use this atomic operation as an index for the array. This way you can write to the output without any "hole" in your array. However you will probably lose some speed due to abusing of atomic operations I'm afraid.
I have a code which has a 2D local array (cval).This local array is being calculated by every processor and in the end I call MPI_ALLREDUCE to sum this local array to a global array(gns).
This local array has different sizes for different processors.The way I do a all reduce is as follows
k = n2spmax- n2spmin + 1 ! an arbitrary big value
do i = nmin, nmax
call MPI_ALLREDUCE(cval(i,:),gns(i,:),k,MPI_DOUBLE_PRECISION,MPI_SUM,MPI_COMM_WORLD,ierr)
end do
Is this the correct way of writing it.I am not sure about it ?
No, you can't do it this way. MPI_Allreduce requires that all of the processes in the communicator are contributing the same amount of data. That's why there's a single count argument.
To give more guidance on what is the right way to do this, we'll need a bit more clarity on what you're trying to do. Is the idea that you're calculating gns(i,j) = the sum over all ranks of cval(i,j), but not all ranks have all the cval(i,j)s?