How is fast fma() implemented - math

FMA is a fused multiply-add instruction. The fmaf (float x, float y, float z) function in glibc calls the vfmadd213ss instruction. The fma(double x, double y, double z) use double type. The enter link description here give a software implementation of the fma,but the code too complicated. I want to implement a fast approximate fma function and the error of result is not too large. I hope call it when I implement the powf() function.

Related

Fast multi-dimensional Walsh-Hadamard transforms in Julia?

I was looking for a fast implementation of FWHT(Fast Walsh-Hadamard transformation) to understand it and implement it in python (implementation should be able to handle an n-dimensional array and should be able to apply the transformation on any specific set of dimensions).
I came across the Julia implementation (https://github.com/stevengj/Hadamard.jl) which seems to be pretty good but as I am new to Julia I am not able to understand a part of the code:
for (Tr,Tc,fftw,lib) in ((:Float64,:Complex128,"fftw",FFTW.libfftw),
(:Float32,:Complex64,"fftwf",FFTW.libfftwf))
#eval function Plan_Hadamard{N}(X::StridedArray{$Tc,N}, Y::StridedArray{$Tc,N},
region, flags::Unsigned, timelimit::Real,
bitreverse::Bool)
set_timelimit($Tr, timelimit)
dims, howmany = dims_howmany(X, Y, [size(X)...], region)
dims = hadamardize(dims, bitreverse)
plan = ccall(($(string(fftw,"_plan_guru64_dft")),$lib),
PlanPtr,
(Int32, Ptr{Int}, Int32, Ptr{Int},
Ptr{$Tc}, Ptr{$Tc}, Int32, UInt32),
size(dims,2), dims, size(howmany,2), howmany,
X, Y, FFTW.FORWARD, flags)
set_timelimit($Tr, NO_TIMELIMIT)
if plan == C_NULL
error("FFTW could not create plan") # shouldn't normally happen
end
return cFFTWPlan{$Tc,FFTW.FORWARD,X===Y,N}(plan, flags, region, X, Y)
end
#eval function Plan_Hadamard{N}(X::StridedArray{$Tr,N}, Y::StridedArray{$Tr,N},
region, flags::Unsigned, timelimit::Real,
bitreverse::Bool)
set_timelimit($Tr, timelimit)
dims, howmany = dims_howmany(X, Y, [size(X)...], region)
dims = hadamardize(dims, bitreverse)
kind = Array{Int32}(size(dims,2))
kind[:] = R2HC
plan = ccall(($(string(fftw,"_plan_guru64_r2r")),$lib),
PlanPtr,
(Int32, Ptr{Int}, Int32, Ptr{Int},
Ptr{$Tr}, Ptr{$Tr}, Ptr{Int32}, UInt32),
size(dims,2), dims, size(howmany,2), howmany,
X, Y, kind, flags)
set_timelimit($Tr, NO_TIMELIMIT)
if plan == C_NULL
error("FFTW could not create plan") # shouldn't normally happen
end
return r2rFFTWPlan{$Tr,(map(Int,kind)...),X===Y,N}(plan, flags, region, X, Y)
end
end
In the above code what is the plan variable, how is it used, and where can I find its implementation?
What are the inputs in the curly braces for the below line?
cFFTWPlan{$Tc,FFTW.FORWARD,X===Y,N}
This is constructing an FFTW "plan" to perform a multidimensional FFT. The cFFTWPlan type is a wrapper around the C fftw_plan pointer, and is implemented in the FFTW.jl module. The arguments in curly braces are Julia type parameters: in this case, indicating the number type (Tc), the FFTW transform direction FORWARD, whether the transform is in-place (X===Y), and the dimensionality of the transform (N). There are two methods here, one for an FWHT of complex-number data that creates a cFFTWPlan (which calls fftw_plan_guru_dft) and one for real-number data that creates an r2rFFTWPlan (which calls fftw_plan_guru_r2r). (These internal types of FFTW.jl are undocumented. The low-level C calls directly to the FFTW library are documented in the FFTW manual.
It should, in principle, be possible to make similar calls to FFTW for NumPy arrays. However, the existing pyFFTW wrappers don't seem to support FFTW's r2r transforms (needed for FWHTs of real data), so you'd have to add that.
Or you could call the Julia Hadamard.jl module from Python via the pyjulia package. Or you could use some other Python FWHT package, like https://github.com/FALCONN-LIB/FFHT

Algebra Solving and Derivatives?

I have no idea if this is even remotely possible (I looked up "computing algebra" etc with discouraging results). How can one compute Algebra and find Derivatives with Unity?
For example, simplifying the distance formula with one variable (x unkown, some function f(x) known):
d = sqrt( (int-x)^2 + (int-f(x))^2 );
and then finding the derivative of this simplified expression?:
d=>d'
Thank you for your time and any light you can shed on this question. And once again, I have no idea if algebraic operations are even commonplace among most programs, let alone Unity-script specifically.
I have also noticed a few systems claiming algebra manipulation (e.g. http://coffeequate.readthedocs.org/en/latest/), but even if this is so how would one go about applying these systems to unity?
If you are writing in C#, you can pull off derivatives with delegates and the definition of a derivative, like this:
delegate double MathFunc(double d);
MathFunc derive(MathFunc f, float h) {
return (x) => (f(x+h) - f(x)) / h;
}
where f in the function you are taking the derivative of, and h determines how accurate your derivative is.

GNU GMP mpz_powm() without mod

GNU GMP provides a functions called mpz_powm(rop, base, exp, mod) which allows me to power a very big integer value by another very big integer value. The function also forces me to modulate the result by the 4th parameter. That's what the "m" stands for in mpz_powm. The reason why there isn't a function without a mod parameter could be to avoid very big results which may fill up your whole memory like: 2^(2^64). I'd like to know if there is a possibility anyway to use that function without specifying a mod parameter by just taking the risk to reach your memory limit.
You are looking for mpz_pow_ui (). If the argument you wish to pass does not fit in a single word then the result wouldn't fit in memory anyway (except for the trivial cases):
void mpz_pow_ui (mpz_t ROP, mpz_t BASE, unsigned long int EXP)
If you don't want to modulate your answer, you'll need to use mpz_pow_ui. However, because exponentiating with a large mpz_t will create an integer that won't fit into memory, the exponent has to be an unsigned long int.
So just convert your exponent, and then use the function:
mpz_pow_ui (rop, base, mpz_get_ui(exp))
However, if your exponent is larger than ULONG_MAX, (typically 2^32-1), you'll get overflow errors.

How does one compute the sum of a 1D array with BLAS?

In BLAS level 1 there are *ASUM and *NRM2 that compute the L1 and L2 norms of vectors, but how does one compute the (signed) sum of a vector? There's got to be something better than filling another vector full of ones and doing a *DOT...
BLAS does not provide a horizontal sum operation like you are seeking, because it's not an operation that is frequently needed by linear algebra libraries.
Many DSP libraries do provide this operation; for example, on OS X and iOS you would use the vDSP_sve( ) function provided by the Accelerate framework. Unfortunately, available DSP libraries tend to vary a lot from platform to platform, so we would need to know more about what platform[s] you're targeting.
You can do a dot product where the second vector has an increment of zero. Using C it would be like this:
int n;
int ix = 1;
int iy = 0;
double y = 1.0;
ddot_(&n, x, &ix, &y, &iy);
One way is to use a dot product with a vector of ones, more specifically to use the cblas_caxpy function.
As seen in http://www.netlib.org/blas/blasqr.pdf, xAXPY supports vector summation.

Why are the arguments to atan2 Y,X rather than X,Y?

In C the atan2 function has the following signature:
double atan2( double y, double x );
Other languages do this as well. This is the only function I know of that takes its arguments in Y,X order rather than X,Y order, and it screws me up regularly because when I think coordinates, I think (X,Y).
Does anyone know why atan2's argument order convention is this way?
Because I believe it is related to arctan(y/x), so y appears on top.
Here's a nice link talking about it a bit: Angles and Directions
My assumption has always been that this is because of the trig definition, ie that
tan(theta) = opposite / adjacent
When working with the canonical angle from the origin, opposite is always Y and adjacent is always X, so:
atan2(opposite, adjacent) = theta
Ie, it was done that way so there's no ordering confusion with respect to the mathematical definition.
Suppose a rectangle triangle with its opposite side called y, adjacent side called x:
tan(angle) = y/x
arctan(tan(angle)) = arctan(y/x)
It's because in school, the mnemonic for calculating the gradient
is rise over run, or in other words dy/dx, or more briefly y/x.
And this order has snuck into the arguments of arctangent functions.
So it's a historical artefact. For me it depends on what I'm thinking
about when I use atan2. If I'm thinking about differentials, I get it right
and if I'm thinking about coordinate pairs, I get it wrong.
The order is atan2(X,Y) in excel so I think the reverse order is a programming thing. atan(Y/X) can easily be changed to atan2(Y,X) by putting a '2' between the 'n' and the '(', and replacing the '/' with a ',', only 2 operations. The opposite order would take 4 operations and some of the operations would be more complex (cut and paste).
I often work out my math in Excel then port it to .NET, so will get hung up on atan2 sometimes. It would be best if atan2 could be standardized one way or the other.
It would be more convenient if atan2 had its arguments reversed. Then you wouldn't need to worry about flipping the arguments when computing polar angles. The Mathematica equivalent does just that: https://reference.wolfram.com/language/ref/ArcTan.html
Way back in the dawn of time, FORTRAN had an ATAN2 function with the less convenient argument order that, in this reference manual, is (somewhat inaccurately) described as arctan(arg1 / arg2).
It is plausible that the initial creator was fixated on atan2(arg1, arg2) being (more or less) arctan(arg1 / arg2), and that the decision was blindly copied from FORTRAN to C to C++ and Python and Java and JavaScript.

Resources