How does one compute the sum of a 1D array with BLAS? - vector

In BLAS level 1 there are *ASUM and *NRM2 that compute the L1 and L2 norms of vectors, but how does one compute the (signed) sum of a vector? There's got to be something better than filling another vector full of ones and doing a *DOT...

BLAS does not provide a horizontal sum operation like you are seeking, because it's not an operation that is frequently needed by linear algebra libraries.
Many DSP libraries do provide this operation; for example, on OS X and iOS you would use the vDSP_sve( ) function provided by the Accelerate framework. Unfortunately, available DSP libraries tend to vary a lot from platform to platform, so we would need to know more about what platform[s] you're targeting.

You can do a dot product where the second vector has an increment of zero. Using C it would be like this:
int n;
int ix = 1;
int iy = 0;
double y = 1.0;
ddot_(&n, x, &ix, &y, &iy);

One way is to use a dot product with a vector of ones, more specifically to use the cblas_caxpy function.

As seen in http://www.netlib.org/blas/blasqr.pdf, xAXPY supports vector summation.

Related

Simple function to generate random number sequence without knowing previous number but know current index (no variable assignment)?

Is there any (simple) random generation function that can work without variable assignment? Most functions I read look like this current = next(current). However currently I have a restriction (from SQLite) that I cannot use any variable at all.
Is there a way to generate a number sequence (for example, from 1 to max) with only n (current number index in the sequence) and seed?
Currently I am using this:
cast(((1103515245 * Seed * ROWID + 12345) % 2147483648) / 2147483648.0 * Max as int) + 1
with max being 47, ROWID being n. However for some seed, the repeat rate is too high (3 unique out of 47).
In my requirements, repetition is ok as long as it's not too much (<50%). Is there any better function that meets my need?
The question has sqlite tag but any language/pseudo-code is ok.
P.s: I have tried using Linear congruential generators with some a/c/m triplets and Seed * ROWID as Seed, but it does not work well, it's even worse.
EDIT: I currently use this one, but I do not know where it's from. The rate looks better than mine:
((((Seed * ROWID) % 79) * 53) % "Max") + 1
I am not sure if you still have the same problem but I might have a solution for you.
What you could do is use Pseudo Random M-sequence generators based on shifting registers. Where you just have to take high enough order of you primitive polynomial and you don't need to store any variables really.
For more info you can check the wiki page
What you would need to code is just the primitive polynomial shifting equation and I have checked in an online editor it should be very easy to do. I think the easiest way for you would be to use Binary base and use PRBS sequences and depending on how many elements you will have you can choose your sequence length. For example this is the implementation for length of 2^15 = 32768 (PRBS15), the primitive polynomial I took from the wiki page (There youcan find the primitive polynomials all the way to PRBS31 what would be 2^31=2.1475e+09)
Basically what you need to do is:
SELECT (((ROWID << 1) | (((ROWID >> 14) <> (ROWID >> 13)) & 1)) & 0x7fff)
The beauty of this approach is if you take the sequence of the PRBS with longer period than your ROWID largest value you will have unique random index. Very simple. :)
If you need help with searching for primitive polynomials you can see my github repo which deals exactly with finding primitive polynomials and unique m-sequences. It is currently written in Matlab, but I plan to write it in python in next few days.
Cheers!
What about using good hash function and map result into [1...max] range?
Along the lines (in pseudocode). sha1 was added to SQLite 3.17.
sha1(ROWID) % Max + 1
Or use any external C code for hash (murmur, chacha, ...) as shown here
A linear congruential generator with appropriately-chosen parameters (a, c, and modulus m) will be a full-period generator, such that it cycles pseudorandomly through every integer in its period before repeating. Although you may have tried this idea before, have you considered that m is equivalent to max in your case? For a list of parameter choices for such generators, see L'Ecuyer, P., "Tables of Linear Congruential Generators of Different Sizes and Good Lattice Structure", Mathematics of Computation 68(225), January 1999.
Note that there are some practical issues to implementing this in SQLite, especially if your SQLite version supports only 32-bit integers and 64-bit floating-point numbers (with 52 bits of precision). Namely, there may be a risk of—
overflow if an intermediate multiplication exceeds 32 bits for integers, and
precision loss if an intermediate multiplication results in a greater-than-52-bit number.
Also, consider why you are creating the random number sequence:
Is the sequence intended to be unpredictable? In that case, a linear congruential generator alone is not enough, and you should generate unique identifiers by other means, such as by combining unique numbers with cryptographically random numbers.
Will the numbers generated this way be exposed in any way to end users? If not, there is no need to obfuscate them by "shuffling" them.
Also, depending on the SQLite API you're using (for your programming language), there may be a way to write a custom function to convert the seed and ROWID to a random unique number. The details, however, depend heavily on the specific SQLite API. Another answer shows an example for Perl.

Rust - Vector with size defined at runtime

How do you create an array, in rust, whose size is defined at run time?
Basically, how do you convert in rust the following code:
void f(int n){ return std::vector<int>(n); }
?
This is not possible in rust:
let n = 15;
let board: [int, ..n];
Note: I saw that it was impossible to do this in a simple manner, here, but I refuse to accept that such a simple thing is impossible :p
Thanks a lot!
Never-mind, I found it the way:
let n = 15; // number of items
let val = 17; // value to replicate
let v = std::vec::from_elem(val, n);
The proper way in modern Rust is vec![value; size].
Values are cloned, which is quite a relief compared to other languages that casually hand back a vector of references to the same object. E.g. vec![vec![]; 2] creates a vector where both elements are independent vectors, 3 vectors in total. Python's [[]] * 2 creates a vector of length 2 where both elements are (references to) the same nested vector.

New to OCaml: How would I go about implementing Gaussian Elimination?

I'm new to OCaml, and I'd like to implement Gaussian Elimination as an exercise. I can easily do it with a stateful algorithm, meaning keep a matrix in memory and recursively operating on it by passing around a reference to it.
This statefulness, however, smacks of imperative programming. I know there are capabilities in OCaml to do this, but I'd like to ask if there is some clever functional way I haven't thought of first.
OCaml arrays are mutable, and it's hard to avoid treating them just like arrays in an imperative language.
Haskell has immutable arrays, but from my (limited) experience with Haskell, you end up switching to monadic, mutable arrays in most cases. Immutable arrays are probably amazing for certain specific purposes. I've always imagined you could write a beautiful implementation of dynamic programming in Haskell, where the dependencies among array entries are defined entirely by the expressions in them. The key is that you really only need to specify the contents of each array entry one time. I don't think Gaussian elimination follows this pattern, and so it seems it might not be a good fit for immutable arrays. It would be interesting to see how it works out, however.
You can use a Map to emulate a matrix. The key would be a pair of integers referencing the row and column. You'll want to use your own get x y function to ensure x < n and y < n though, instead of accessing the Map directly. (edit) You can use the compare function in Pervasives directly.
module OrderedPairs = struct
type t = int * int
let compare = Pervasives.compare
end
module Pairs = Map.Make (OrderedPairs)
let get_ n set x y =
assert( x < n && y < n );
Pairs.find (x,y) set
let set_ n set x y v =
assert( x < n && y < n );
Pairs.add (x,y) set v
Actually, having a general set of functions (get x y and set x y at a minimum), without specifying the implementation, would be an even better option. The functions then can be passed to the function, or be implemented in a module through a functor (a better solution, but having a set of functions just doing what you need would be a first step since you're new to OCaml). In this way you can use a Map, Array, Hashtbl, or a set of functions to access a file on the hard-drive to implement the matrix if you wanted. This is the really important aspect of functional programming; that you trust the interface over exploiting the side-effects, and not worry about the underlying implementation --since it's presumed to be pure.
The answers so far are using/emulating mutable data-types, but what does a functional approach look like?
To see, let's decompose the problem into some functional components:
Gaussian elimination involves a sequence of row operations, so it is useful first to define a function taking 2 rows and scaling factors, and returning the resultant row operation result.
The row operations we want should eliminate a variable (column) from a particular row, so lets define a function which takes a pair of rows and a column index and uses the previously defined row operation to return the modified row with that column entry zero.
Then we define two functions, one to convert a matrix into triangular form, and another to back-substitute a triangular matrix to the diagonal form (using the previously defined functions) by eliminating each column in turn. We could iterate or recurse over the columns, and the matrix could be defined as a list, vector or array of lists, vectors or arrays. The input is not changed, but a modified matrix is returned, so we can finally do:
let out_matrix = to_diagonal (to_triangular in_matrix);
What makes it functional is not whether the data-types (array or list) are mutable, but how they they are used. This approach may not be particularly 'clever' or be the most efficient way to do Gaussian eliminations in OCaml, but using pure functions lets you express the algorithm cleanly.

Performing operations on CUDA matrices while reading from a global Point

Hey there,
I have a mathematical function (multidimensional which means that there's an index which I pass to the C++-function on which single mathematical function I want to return. E.g. let's say I have a mathematical function like that:
f = Vector(x^2*y^2 / y^2 / x^2*z^2)
I would implement it like that:
double myFunc(int function_index)
{
switch(function_index)
{
case 1:
return PNT[0]*PNT[0]*PNT[1]*PNT[1];
case 2:
return PNT[1]*PNT[1];
case 3:
return PNT[2]*PNT[2]*PNT[1]*PNT[1];
}
}
whereas PNT is defined globally like that: double PNT[ NUM_COORDINATES ]. Now I want to implement the derivatives of each function for each coordinate thus generating the derivative matrix (columns = coordinates; rows = single functions). I wrote my kernel already which works so far and which call's myFunc().
The Problem is: For calculating the derivative of the mathematical sub-function i concerning coordinate j, I would use in sequential mode (on CPUs e.g.) the following code (whereas this is simplified because usually you would decrease h until you reach a certain precision of your derivative):
f0 = myFunc(i);
PNT[ j ] += h;
derivative = (myFunc(j)-f0)/h;
PNT[ j ] -= h;
now as I want to do this on the GPU in parallel, the problem is coming up: What to do with PNT? As I have to increase certain coordinates by h, calculate the value and than decrease it again, there's a problem coming up: How to do it without 'disturbing' the other threads? I can't modify PNT because other threads need the 'original' point to modify their own coordinate.
The second idea I had was to save one modified point for each thread but I discarded this idea quite fast because when using some thousand threads in parallel, this is a quite bad and probably slow (perhaps not realizable at all because of memory limits) idea.
'FINAL' SOLUTION
So how I do it currently is the following, which adds the value 'add' on runtime (without storing it somewhere) via preprocessor macro to the coordinate identified by coord_index.
#define X(n) ((coordinate_index == n) ? (PNT[n]+add) : PNT[n])
__device__ double myFunc(int function_index, int coordinate_index, double add)
{
//*// Example: f[i] = x[i]^3
return (X(function_index)*X(function_index)*X(function_index));
// */
}
That works quite nicely and fast. When using a derivative matrix with 10000 functions and 10000 coordinates, it just takes like 0.5seks. PNT is defined either globally or as constant memory like __constant__ double PNT[ NUM_COORDINATES ];, depending on the preprocessor variable USE_CONST.
The line return (X(function_index)*X(function_index)*X(function_index)); is just an example where every sub-function looks the same scheme, mathematically spoken:
f = Vector(x0^3 / x1^3 / ... / xN^3)
NOW THE BIG PROBLEM ARISES:
myFunc is a mathematical function which the user should be able to implement as he likes to. E.g. he could also implement the following mathematical function:
f = Vector(x0^2*x1^2*...*xN^2 / x0^2*x1^2*...*xN^2 / ... / x0^2*x1^2*...*xN^2)
thus every function looking the same. You as a programmer should only code once and not depending on the implemented mathematical function. So when the above function is being implemented in C++, it looks like the following:
__device__ double myFunc(int function_index, int coordinate_index, double add)
{
double ret = 1.0;
for(int i = 0; i < NUM_COORDINATES; i++)
ret *= X(i)*X(i);
return ret;
}
And now the memory accesses are very 'weird' and bad for performance issues because each thread needs access to each element of PNT twice. Surely, in such a case where each function looks the same, I could rewrite the complete algorithm which surrounds the calls to myFunc, but as I stated already: I don't want to code depending on the user-implemented function myFunc...
Could anybody come up with an idea how to solve this problem??
Thanks!
Rewinding back to the beginning and starting with a clean sheet, it seems you want to be able to do two things
compute an arbitrary scalar valued
function over an input array
approximate the partial derivative of an arbitrary scalar
valued function over the input array
using first order accurate finite differencing
While the function is scalar valued and arbitrary, it seems that there are, in fact, two clear forms which this function can take:
A scalar valued function with scalar arguments
A scalar valued function with vector arguments
You appeared to have started with the first type of function and have put together code to deal with computing both the function and the approximate derivative, and are now wrestling with the problem of how to deal with the second case using the same code.
If this is a reasonable summary of the problem, then please indicate so in a comment and I will continue to expand it with some code samples and concepts. If it isn't, I will delete it in a few days.
In comments, I have been trying to suggest that conflating the first type of function with the second is not a good approach. The requirements for correctness in parallel execution, and the best way of extracting parallelism and performance on the GPU are very different. You would be better served by treating both types of functions separately in two different code frameworks with different usage models. When a given mathematical expression needs to be implemented, the "user" should make a basic classification as to whether that expression is like the model of the first type of function, or the second. The act of classification is what drives algorithmic selection in your code. This type of "classification by algorithm" is almost universal in well designed libraries - you can find it in C++ template libraries like Boost and the STL, and you can find it in legacy Fortran codes like the BLAS.

maple 13 tridiagonal matrix help

i am trying to make a 100 x 100 tridiagonal matrix with 2's going down the diagonal and -1's surrounding the 2's. i can make a tridiagonal matrix with only 1's in the three diagonals and preform matrix addition to get what i want, but i want to know if there is a way to customize the three diagonals to what ever you want. maplehelp doesn't list anything useful.
The Matrix function in the LinearAlgebra package can be called with a parameter (init) that is a function that can assign a value to each entry of the matrix depending on its position.
This would work:
f := (i, j) -> if i = j then 2 elif abs(i - j) = 1 then -1 else 0; end if;
Matrix(100, f);
LinearAlgebra[BandMatrix] works too (and will be WAY faster), especially if you use storage=band[1]. You should probably use shape=symmetric as well.
The answers involving an initializer function f will do O(n^2) work for square nxn Matrix. Ideally, this task should be O(n), since there are just less than 3*n entries to be filled.
Suppose also that you want a resulting Matrix without any special (eg. band) storage or indexing function (so that you can later write to any part of it arbitrarily). And suppose also that you don't want to get around such an issue by wrapping the band structure Matrix with another generic Matrix() call which would double the temp memory used and produce collectible garbage.
Here are two ways to do it (without applying f to each entry in an O(n^2) manner, or using a separate do-loop). The first one involves creation of the three bands as temps (which is garbage to be collected, but at least not n^2 size of it).
M:=Matrix(100,[[-1$99],[2$100],[-1$99]],scan=band[1,1]);
This second way uses a routine which walks M and populates it with just the three scalar values (hence not needing the 3 band lists explicitly).
M:=Matrix(100):
ArrayTools:-Fill(100,2,M,0,100+1);
ArrayTools:-Fill(99,-1,M,1,100+1);
ArrayTools:-Fill(99,-1,M,100,100+1);
Note that ArrayTools:-Fill is a compiled external routine, and so in principal might well be faster than an interpreted Maple language (proper) method. It would be especially fast for a Matrix M with a hardware datatype such as 'float[8]'.
By the way, the reason that the arrow procedure above failed with error "invalid arrow procedure" is likely that it was entered in 2D Math mode. The 2D Math parser of Maple 13 does not understand the if...then...end syntax as the body of an arrow operator. Alternatives (apart from writing f as a proc like someone else answered) is to enter f (unedited) in 1D Maple notation mode, or to edit f to use the operator form of if. Perhaps the operator form of if here requires a nested if to handle the elif. For example,
f := (i,j) -> `if`(i=j,2,`if`(abs(i-j)=1,-1,0));
Matrix(100,f);
jmbr's proposed solutions can be adapted to work:
f := proc(i, j)
if i = j then 2
elif abs(i - j) = 1 then -1
else 0
end if
end proc;
Matrix(100, f);
Also, I understand your comment as saying you later need to destroy the band matrix nature, which prevents you from using BandMatrix - is that right? The easiest solution to that is to wrap the BandMatrix call in a regular Matrix call, which will give you a Matrix you can change however you'd like:
Matrix(LinearAlgebra:-BandMatrix([1,2,1], 1, 100));

Resources