How do I create a multidimensional matrix (ArrayD) using ndarray? - multidimensional-array

I would like to create a multidimensional matrix in Rust (the product of which exceeds 1e6). I found the ndarray crate, however the documentation gives no explanation on how to use the ArrayD type which seems to correspond to my needs.

You can pass a &[usize] (or Vec<usize>) containing N values with the shape of the array to create an N-dimensional array to any constructor function which accepts the shape of the array, like Array::zeros. For example, the following code creates an Array with 9 dimensions of shape 4 * 7 * 6 * 5 * 2 * 10 * 9 * 3 * 8:
//! ```cargo
//! [dependencies]
//! ndarray = "*"
//! ```
extern crate ndarray;
use ndarray::ArrayD;
fn main() {
let mut array = ArrayD::zeros([4, 7, 6, 5, 2, 10, 9, 3, 8].as_ref());
array[[1; 9].as_ref()] = 123;
println!("{:?}", array[[0; 9].as_ref()]);
println!("{:?}", array[[1; 9].as_ref()]);
}
Output:
0
123

Related

Compute product of large 3-D arrays in R

I am working on an optimization problem, and to supply the analytic gradient to the routine, I need to compute the gradient of large 3D arrays with respect to parameters. The largest of these arrays s are of dimensions [L,N,J] where L,J ~ 2000, and N= 15. L and N stand for nodes over which the arrays are then aggregated up with some fixed weights w to vectors of length J. Computing the gradient naively generates a [L,N,J,J] arrays x whose elements are x(l,n,j,k) = -s(l,n,j)s(l,n,k) if j=/=k and x(l,n,j,j) = s(l,n,j)(1-s(l,n,j)).
Several functions in the procedure would use x as input, but as of right now I cannot keep x in memory due to its size. My approach so far has been to compute and directly aggregate up x over L and N to only ever store JxJ matrices, but the downside is that I cannot reuse x in other functions. This is what the following code does:
arma::mat agg_dsnode_ddelta_v3(arma::cube s_lnj,
arma::mat w_ln,
arma::vec w_l){
// Normal Matrix dimensions
unsigned int L = s_lnj.n_rows;
unsigned int N = s_lnj.n_cols;
unsigned int J = s_lnj.n_slices;
//resulting matrix
arma::mat ds_ddelta_jj = arma::mat(J,J, arma::fill::zeros);
for (unsigned int l = 0; l < L; l++) {
for (unsigned int n = 0; n < N; n++) {
arma::vec s_j = s_lnj.subcube(arma::span(l), arma::span(n), arma::span());
ds_ddelta_jj += - arma::kron(w_l(l) * w_ln(l,n) * s_j, s_j.as_row()) + arma::diagmat(w_l(l) * w_ln(l,n) * s_j);
}
}
return ds_ddelta_jj;
}
Alternatively, the 4-D array x could for instance be computed with sparseMatrix, but this approach does not scale up when the L and J increase
library(Matrix)
L = 2
N = 3
J = 4
s_lnj <- array(rnorm(L*N*J), dim=c(L,N,J))
## create spare Matrix with s(l,n,:) vertically on the diagonal
As_lnj = A = sparseMatrix(i=c(1:(L*N*J)),j=rep(1:(L*N), each=J),x= as.vector(aperm(s_lnj, c(3, 1, 2))))
## create spare Matrix with s(l,n,:) horizontally on the diagonal
Bs_lnj = sparseMatrix(i=rep(1:(L*N), each=J),j=c(1:(L*N*J)),x= as.vector(aperm(s_lnj, c(3, 1, 2))))
## create spare Matrix with s(l,n,:) diagonnally
Cs_lnj = sparseMatrix(i=c(1:(L*N*J)),j=c(1:(L*N*J)),x= as.vector(aperm(s_lnj, c(3, 1, 2))))
## compute 4-D array with sparseMatrix product
x = -(As_lnj %*% Bs_lnj) + Cs_lnj
I was wondering if you knew of faster way to implement the first code, or alternatively of an approach that would make the second one scalable.
Thank you in advance

Learning Binary Search in python 3.7

I found this code on https://www.geeksforgeeks.org/binary-search/
# Python Program for recursive binary search.
# Returns index of x in arr if present, else -1
def binarySearch (arr, l, r, x):
# Check base case
if r >= l:
mid = l + (r - l)/2;
# If element is present at the middle itself
if arr[mid] == x:
return mid
# If element is smaller than mid, then it
# can only be present in left subarray
elif arr[mid] > x:
return binarySearch(arr, l, mid-1, x)
# Else the element can only be present
# in right subarray
else:
return binarySearch(arr, mid+1, r, x)
else:
# Element is not present in the array
return -1
# Test array
arr = [ 2, 3, 4, 10, 40, 50, 80, 140, 200, 2000, 100]
x = 50
# Function call
result = binarySearch(arr, 0, len(arr)-1, int)
if result != -1:
print ("Element is present at index %d" % result)
else:
print ("Element is not present in array")
However, when I run it I get this problem: TypeError: list indices must be integers or slices, not float
I'm not sure how to convert do that. I attempted to set the entire array as an int but that didn't work or replace x with int and that didn't work either.
Any suggestion?
The issue is on this line:
mid = l + (r - l)/2;
In Python 3 / does floating point division and as mid is used as an array index it needs to be an int. To do integer division use //
mid = l + (r - l) // 2;
There is also another issue with the call to the function:
result = binarySearch(arr, 0, len(arr) - 1, int)
The last parameter should not be int but x (the variable you are searching for):
result = binarySearch(arr, 0, len(arr) - 1, x)
when you pass in int as the last parameter you'll get an error TypeError: unorderable types: int() > type()

Understanding XQuery position(): inclusive or exclusive end position

I have read a lot of XQuery position, but all examples are about >, <, or =. But you can also use x - y and I am confused as to what is inclusive and what is not.
[position() = $startPosition to $endPosition]
Let's say $startPosition is 1 (as I have read that position does not start with 0 but with 1), what will return the first hit? $endPosition set to 1 as well, or to 2?
In other words, given an expected return of n, what would be the formula for both variables? To make things more clear, we can add an incrementing loop ($iteration). Basically we are generating a search that will find all subsequent hits with position. (As an example.)
$endPosition = 1 + ($iteration * n);
$startPosition = $endPosition - n;
This is what I came up with. This will result in the following outcome, for $iteration starting from 1 and incrementing, and n of 3.
1:
$endPosition = 1 + (1 * 3); // 4
$startPosition = 4 - 3; // 1
2:
$endPosition = 1 + (2 * 3); // 7
$startPosition = 7 - 3; // 4
3:
$endPosition = 1 + (3 * 3); // 10
$startPosition = 10 - 3; // 7
But, is this correct? I am not sure. Is the $endPosition included? If not, my code is correct, if not - it isn't, and then I am interested in the correct formula.
The expression $sequence[position() = $a to $b] is equivalent to the following FLWOR expression (where $position start at 1):
for $x at $position in $seq
where $position >= $a and $position <= $b
return $x
So to skip the first two items and then return the following five, you need $seq[position() = 3 to 7].
Here is how you can go from this to a subsequence function that uses a 0-based offset and the number of items to return:
declare function local:subsequence(
$seq as item()*,
$offset as xs:integer,
$length as xs:integer
) as item()* {
let $start := $offset + 1,
$end := $offset + $length
return $seq[position() = $start to $end]
};
local:subsequence(1 to 100, 0, 5), (: returns (1, 2, 3, 4, 5) :)
local:subsequence(1 to 100, 13, 3) (: returns (14, 15, 16) :)
It's unclear the specific problem you're trying to solve, so I don't know if this answers your question, but let's unpack that expression first:
[position() = $startPosition to $endPosition]
Say $startPosition is 1 and $endPosition is 3. That will evaluate to:
[position() = (1, 2, 3)]
That predicate will return true any time position() equals any of the values in the sequence on the right.

OpenCL out of bounds errors

This kernel works fine:
__kernel void test(__global float* a_Direction, __global float* a_Output, const unsigned int a_Count)
{
int index = get_global_id(0);
if (index < a_Count)
{
a_Output[index * 3 + 0] = a_Direction[index * 3 + 0] * 0.5f + 0.5f;
a_Output[index * 3 + 1] = a_Direction[index * 3 + 1] * 0.5f + 0.5f;
a_Output[index * 3 + 2] = a_Direction[index * 3 + 2] * 0.5f + 0.5f;
}
}
This kernel produces out of bounds errors:
__kernel void test(__global float3* a_Direction, __global float3* a_Output, const unsigned int a_Count)
{
int index = get_global_id(0);
if (index < a_Count)
{
a_Output[index].x = a_Direction[index].x * 0.5f + 0.5f;
a_Output[index].y = a_Direction[index].y * 0.5f + 0.5f;
a_Output[index].z = a_Direction[index].z * 0.5f + 0.5f;
}
}
To me it seems like they should both do the exact same thing.
But for some reason only one of the two works.
Am I missing something obvious?
The exact error is: "CL_OUT_OF_RESOURCES error executing CL_COMMAND_READ_BUFFER on GeForce GTX580M (Device 0).
#arsenm in his/her answer as well as #Darkzeros gave the proper explanation but I feel like it is interesting to develop a bit. The problem is that in the second kernel these is a "hidden" alignment that happens. As the standard states in the section 6.1.5.:
For 3-component vector data types, the size of the data type is 4 *
sizeof(component). This means that a 3-component vector data type will
be aligned to a 4 * sizeof(component) boundary.
Let's illustrate that with an example:
assuming that a_Direction is made of 9 floats and that you use 3 threads/workitems to process these elements. In the first kernel these is no problem: the thread 0 will handle the elements with the indexes 0, 1, 2, the thread 1 the elements 3, 4, 5 and finally, the thread 2 the elements 6, 7, 8: everything is fine.
However for the second kernel, assuming the data structure you use stays the same from the host side point of view (i.e. an array going from 0 to 8), the thread 0 will handle the elements 0, 1, 2 (and will also access the element 4 because the float3 type vector will behave like a float4 type vector without doing anything with it).The second thread i.e. the thread 1 won't access the elements 3, 4, 5 but the elements 4, 5, 6 (and 7 without doing anything with it).
Therefore, and this is where the problem arise, the thread 2 will try to access the elements 8, 9, 10 (and 11), hence out of bounds access.
To summary, a vector of 3 elements behaves like a vector of 4 elements.
Now, if you want to use vectors without changing your data structure in the host side, you can use the vload3 and vstore3 functions as described in the section 3.12.7. of the standard. Like that:
vstore3(vload3(index, a_Direction) * 0.5f + 0.5f, index, a_Output));
BTW, you don't have to bother with statements like (assuming a proper alignment):
a_Output[index].x = a_Direction[index].x * 0.5f + 0.5f;
a_Output[index].y = a_Direction[index].y * 0.5f + 0.5f;
a_Output[index].z = a_Direction[index].z * 0.5f + 0.5f;
This statement is enough (no need to write a line for every elements):
a_Output[index] = a_Direction[index] * 0.5f + 0.5f;
The problem you're probably having is you've allocated a buffer that is n * 3 * sizeof(float) for your float3s, but the size and alignment of float3 is 16, and not 12.

Summing elems of array using binary recursion

I wasn't starting to understand linear recursion and then I thought I practice up on sorting algorithms and then quick sort was where I had trouble with recursion. So I decided to work with a simpler eg, a binary sum that I found online. I understand that recursion, like all function calls, are executed one # a time and not at the same time (which is what multi-threading does but is not of my concern when tracing). So I need to execute all of recursive call A BEFORE recursive call B, but I get lost in the mix. Does anyone mind tracing it completely. The e.g. I have used of size, n = 9 where elems are all 1's to keep it simple.
/**
* Sums an integer array using binary recursion.
* #param arr, an integer array
* #param i starting index
* #param n size of the array
* floor(x) is largest integer <= x
* ceil(x) is smallest integer >= x
*/
public int binarySum(int arr[], int i, int n) {
if (n == 1)
return arr[i];
return binarySum(arr, i, ceil(n/2)) + binarySum(arr,i + ceil(n/2), floor(n/2));
}
What I personally do is start with an array of size 2. There are two elements.
return binarySum(arr, i, ceil(n/2)) + binarySum(arr,i + ceil(n/2), floor(n/2)) will do nothing but split the array into 2 and add the two elements. - case 1
now, this trivial starting point will be the lowest level of the recursion for the higher cases.
now increase n = 4. the array is split into 2 : indices from 0-2 and 2-4.
now the 2 elements inside indices 0 to 2 are added in case 1 and so are the 2 elements added in indices 2-4.
Now these two results are added in this case.
Now we are able to make more sense of the recursion technique, some times understanding bottom up is easier as in this case!
Now to your question consider an array of 9 elements : 1 2 3 4 5 6 7 8 9
n = 9 => ceil(9/2) = 5, floor(9/2) = 4
Now first call (top call) of binarySum(array, 0, 9)
now n = size is not 1
hence the recursive call....
return binarySum(array, 0, 5) + binarySum(array, 5, 4)
now the first binarySum(array, 0 ,5) operates on the first 5 elements of the array and the second binarySum(array,5,4) operates on the last 4 elements of the array
hence the array division can be seen like this: 1 2 3 4 5 | 6 7 8 9
The first function finds the sum of the elements: 1 2 3 4 5
and the second function finds the sum of the elements 6 7 8 9
and these two are added together and returned as the answer to the top call!
now how does this 1+2+3+4+5 and 6+7+8+9 work? we recurse again....
so the tracing will look like
1 2 3 4 5 | 6 7 8 9
1 2 3 | 4 5 6 7 | 8 9
1 2 | 3 4 | 5 6 | 7 8 | 9
[1 | 2]___[3]___[4 5]___[6 7]___[8 9]
Till this we are fine..we are just calling the functions recursively.
But now, we hit the base case!
if (n == 1)
return arr[i];
[1 + 2]____[3]____[4 + 5]____[6 + 7]____[8 + 9]
[3 + 3] ____ [9] ____[13 + 17]
[6 + 9] [30]
[15 + 30]
[45]
which is the sum.
So for understanding see what is done to the major instance of the problem and you can be sure that the same thing is going to happen to the minor instance of the problem.
This example explains binary sum with trace in java
the trace is based on index of array , where 0 - is yours starting index and 8 is length of the array
int sum(int* arr, int p, int k) {
if (p == k)
return arr[k];
int s = (p + k) / 2;
return sum(arr, p, s) + sum(arr, s + 1, k);
}

Resources