I'm attempting to create a fixed-point square root function for a Xilinx FPGA (hence real types are out, and David Bishops ieee_proposed library is also unsupported for XST synthesis).
I've settled on a Newton-Raphson method to calculate the reciprocal square root (as it involves fewer divisions).
One of the remaining dilemmas I have is how to generate the initial seed. I looked at the Fast Inverse Square Root, but it only appears to work for floating point arithmetic.
My best thoughts at the moment are, to take the length of the input value (ie. find the index of the most significant, non-zero bit), halve it crudely and use that as the power for a seed.
I wrote a short test script to quickly check the accuracy (its in Matlab but that's just so I could plot a graph...)
x = 1:2^24;
gen_result = zeros(1,length(x));
seed_vals = zeros(1,length(x));
for i = 1:length(x)
result = 2^-ceil(log2(x(i))/2); %effectively creates seed value from top bit index
seed_vals(i) = 1/result; %Store seed value
for j = 1:6
result = result*(1.5-0.5*x(i)*result^2); %reciprocal root
end
gen_result(i) = 1/result; %single division at the end
end
And unsurprisingly, the seed becomes wildly inaccurate each time a number increases in size, and this increases as the magnitude of the input increases. As a graph this can be seen as:
The red line is the value of the seed, and as can be seen, is increasing increasing in powers of 2.
My question very simple: Are there any other simple methods I could use to generate a seed value for fixed point square root values in VHDL, ideally which don't cause ever increasing amounts of inaccuracy (and hence require more iterations each time the input increases in size).
Any other incidental advise on how to approach finding fixed points square roots in VHDL would be gratefully received!
I realize this is an old question but I did end up here and this was kind of useful so I want to add my bit.
Assuming your Xilinx chip has an embedded multiplier, you could consider this approach to help get a better starting seed. The basic premise is to convert the input integer to fixed point with all fraction bits, and then use the embedded multiplier to scale half of your initial seed value by 0.X (which in hindsight is probably what people mean when they say "normalize to the region [0.5..1)", now that I think about it). It's basically piecewise linear interpolation of your existing seed method. The steps below should translate relatively easily to RTL, as they're just bit-shifts, adds, and one unsigned multiply.
1) Begin with your existing seed value (e.g. for x=9e6, you would generate s=4096 as the seed for your first guess with your "crude halving" method)
2) Right-shift the existing seed value by 1 to get the previous seed value (s_half = s >> 1 = 2048)
3) Left-shift the input until the most significant bit is a 1. In the event you are sqrting 32-bit ints, x_scale would then be 2304000000 = 0x89544000
4) Slice the upper e.g. 18 bits off of x_scale and multiply by an 18-bit version of s_half (I suggest 18 because I happen to know some Xilinx chips have embedded 18x18 multipliers). For this case, the result, x_scale(31 downto 14) = 140625 = 0x22551.
At least, that's what the multiplier thinks - we're going to use fixed point so that it's actually 0b0.100010010101010001 = 0.53644 instead of 140625.
The result of this multiplication will be s_scale = s_half * x_scale(31 downto 14) = 2048 * 140625 = 288000000, but this output is in 18.18 format (18 integer bits, 18 fraction bits). Take the upper 18 bits, and you get s_scale(35 downto 18) = 1098
5) Add the upper 18 bits of s_scale to s_half to get your improved seed, in this case s_improved = 1098+2048 = 3146
Now you can do a few iterations of Newton-Raphson with this seed. For x=9e6, your crude halving approach would give an initial seed of 4096, the fixed-point scale outlined above gives you 3146, and the actual sqrt(9e6) is 3000. This value is half-way between your seed steps, and my napkin math suggests it saved about 3 iterations of Newton-Raphson
Related
Im trying to apply the fourier phase shift theorem to a complex signal in R. However, only the magnitude of my signal shifts as I expect it. I think it should be possible to apply this theorem to complex signals, so probably I make an error somewhere. My guess is that there is an error in the frequency axis I calculate.
How do I correctly apply the fourier shift theorem to a complex signal (using R)?
i = complex(0,0,1)
t.in = (1+i)*matrix(c(1,0,0,0,0,0,0,0,0,0))
n.shift = 5
#the output of fft() has the mean / 0 frequency at the first element
#it then increases to the highest frequency, flips to negative frequencies
#and then increases again to the negative frequency closest to 0
N = length(t.in)
if (N%%2){#odd
kmin = -(N-1)/2
kmax = (N-1)/2
} else {#even
kmin = -N/2
kmax = N/2-1
#center frequency negative, is that correct?
}
#create frequency axis for fft() output, no sampling frequency or sample duration needed
k = (kmin:kmax)
kflip = floor(N/2)
k = k[c((kflip+1):N,1:kflip)]
f = 2*pi*k/N
shiftterm = exp( -i*n.shift*f )
T.in = fft(t.in)
T.out = T.in*shiftterm
t.out = fft(T.out, inverse=T)/N
par(mfrow=c(2,2))
plot(Mod(t.in),col="green");
plot(Mod(t.out), col="red");
plot(Arg(t.in),col="green");
plot(Arg(t.out),col="red");
As you can see the magnitude of the signal is nicely shifted, but the phase is scrambled. I think the negative frequencies are where my error is, but I cant see it.
What am I doing wrong?
The questions about fourier phase shift theorem I could find:
real 2d signal in python
real 2d signal in matlab
real 1d signal in python
math question about what fourier shift does
But these were not about complex signals.
Answer
As Steve suggested in the comments, I checked the phase on the 6th element.
> Arg(t.out)[6]
[1] 0.7853982
> Arg(t.in)[1]
[1] 0.7853982
So the only element that has a magnitude (at least one order of magnitude higher than the EPS) does have the phase that I expected.
TL;DR The result from the original approach in the question was already correct, we see the Gibbs Phenomenon sliding by.
Just discard low magnitude elements?
If ever the phase of elements that should be zero will be a problem I can run t.out[Mod(t.out)<epsfactor*.Machine$double.eps] = 0 where in this case epsfactor has to be 10 to get rid of the '0' magnitude elements.
Adding that line before plotting gives the following result, which is what I expected to get beforehand. However, the 'scrambled' phase might actually be accurate in most cases as I'll explain below.
The original result really was correct
Just setting low magnitude elements to 0 does not make the phase of the shifted signal more intuitive however. This is a plot where I apply a 4.5 sample shift, the phase is still 'scrambled'.
Applying fourier shift equivalent to downsmapling shifted fourier interpolation
It occurred to me that applying a non-integer number of elements phase shift is equivalent to fourier interpolating the signal and then downsample the interpolated signal at points between the original elements. Since the vector I used as input is an impulse function, the fourier interpolated signal is just not well behaved. Then the signal after applying the fourier phase shift theorem can be expected to have exactly the phase that the fourier interpolated signal has, as seen below.
Gibbs Ringing
Its just at the discontinuities where phase is not well behaved and where small rounding errors might cause large errors in the reconstructed phase. So not really related to low magnitude but to not well defined fourier transform of the input vector. This is called Gibbs Ringing, I could use low-pass filtering with a gaussian filter to decrease it.
Questions related to fourier interpolation and phase shift
symbolic approach in R to estimate fourier transform error
non integer signal shift by use of linear interpolation
downsampling complex signal
fourier interpolation application
estimating sub-sample shift between two signals using fourier transforms
estimating sub-sample shift between two signals without interpolation
I want to calculate and plot the probability density of a wave function in Julia. I wrote a small snippet of Julia code for evaluating the following function:
The Julia (incomplete) code is:
set_bigfloat_precision(100)
A = 10
C = 5
m = BigFloat(9.10938356e-31)
ℏ = BigFloat(1.054571800e-34)
t = exp(-(sqrt(C * m) / ℏ))
The last line where I evaluate t gives 0.000000000000.... I tried to set the precision of the BigFloat as well. No luck! What am I doing wrong? Help appreciated.
While in comments Chris Rackauckas has pointed out you entered the formula wrong. I figured it was interesting enough to answer the question anyway
Lets break it down so we can see what we are raising:
A = 10
C = 5
m = BigFloat(9.10938356e-31)
h = BigFloat(1.054571800e-34)
z = -sqrt(C * m)/h
t = exp(z)
So
z =-2.0237336022083455711032042949257e+19
so very roughly z=-2e19)
so roughly t=exp(-2e19) (ie t=1/((e^(2*10^19)))
That is a very small number.
Consider that
exp(big"-1e+10") = 9.278...e-4342944820
and
exp(big"-1e+18") = 2.233...e-434294481903251828
and yes, julia says:
exp(big"-2e+19) = 0.0000
exp(big"-2e+19) is a very small number.
That puts us in context I hope. Very small number.
So julia depends on MPFR for BigFloats
You can try MPFR online. At precision 8192, exp(-2e10)=0
So same result.
Now, it is not the precision that we care about.
But rather the range of the exponant.
MPFR use something kinda like IEEE style floats, where precision is the length of the mantissa, and then you have a exponent. 2^exponent * mantissa
So there is a limit on the range of the exponent.
See: MPFR docs:
Function: mpfr_exp_t mpfr_get_emin (void)
Function: mpfr_exp_t mpfr_get_emax (void)
Return the (current) smallest and largest exponents allowed for a floating-point variable. The smallest positive value of a floating-point variable is one half times 2 raised to the smallest exponent and the largest value has the form (1 - epsilon) times 2 raised to the largest exponent, where epsilon depends on the precision of the considered variable.
Now julia does set these to there maximum range the fairly default MPFR compile will allow. I've been digging around the MPFR source trying to find where this is set, but can't find it. I believe it is related to the max fault a Int64 can hold.
Base.MPFR.get_emin() = -4611686018427387903 =typemin(Int64)>>1 + 1
You can adjust this but only up.
So anyway
0.5*big"2.0"^(Base.MPFR.get_emin()) = 8.5096913117408361391297879096205e-1388255822130839284
but
0.5*big"2.0"^(Base.MPFR.get_emin()-1) = 0.00000000000...
Now we know that
exp(x) = 2^(log(2,e)*x)
So we can exp(z) = 2^(log(2,e)*z)
log(2,e)*z = -29196304319863382016
Base.MPFR.get_emin() = -4611686018427387903
So since the exponent (rough -2.9e19) is less than the minimum allowed exponent (roughly -4.3e17).
An underflow occurs.
Thus your answer as to why you get zero.
It may (or may not) be possible to recomplile MPFR with Int128 exponents, but julia hasn't.
Perhaps julia should throw a Underflow exception.
Free encouraged to report that as an issue on the Julia Bug Tracker.
I'm looking for a 2 or 3 parameters math formula with the following characteristics:
Simple (the fewest amount of operations the better)
Random output (non-periodic)
Normalized (Meaning the output will never be outside a given range; doesn't matter the range since once I know the range I can just divide and add/subtract to get it into the 0 to 1 range I'm looking for)
White noise (the more samples you get the more evenly distributed the outputs get across the range of possible output values, with no gaps or hotspots, to the extent permitted by the floating-point standard)
Random all the way down (no gradual changes between output values even if the inputs are changed by the smallest amount the float standard will allow. I understand that given the nature of randomness, it is possible two output values might be close together once in a while, but that must only happen by coincidence, and not because of smoothness or periodicity)
Uses only the operations listed bellow (but of course, any operations that can be done by a combination of the ones listed bellow are also allowed)
I need this because I need a good source of controllable randomness for some experiments I'm doing with Cycles material nodes in Blender. And since that is where the formula will be implemented, the only operations I have available are:
Addition
Subtraction
Multiplication
Division
Power (X to the power of Y)
Logarithm (I think it's X Log Y; I'm not very familiar with the logarithm operation, so I'm not 100% sure if that is enough to specify which type of logarithm it is; let me know if you need more information about it)
Sine
Cosine
Tangent
Arcsine
Arccosine
Arctangent (not Atan2, but that can be created by combining operations if necessary)
Minimum (Returns the lowest of 2 numbers)
Maximum (Returns the highest of 2 numbers)
Round (Returns the closest round number to the input)
Less-than (Returns 1 if X is less than Y, zero otherwise)
Greater-than (Returns 1 if X is more than Y, zero otherwise)
Modulo (Produces a sawtooth pattern of period Y; for positive X values it's in the 0 to Y range, and for negative values of X it's in the -Y to zero range)
Absolute (strips the sign of the input value, makes it positive if it was negative, doesn't do anything if it's already positive)
There is no iteration nor looping functionality available (and of course, branching can only be done by calculating all the branches and then doing something like multiplying the results of the branches not meant to be taken by zero and then adding the results of all of them together).
How can I round an excessively precise fraction to a less precise format that is more humanly readable?
I'm working with JPEG EXIF exposure time data extracted by MS' Windows Imaging Component. WIC returns exposure times in fractional form with separate ints for numerator and denominator.
WIC usually works as expected, but with some JPEGs, WIC returns exposure times in millionths of a second, meaning that instead of reporting e.g. a 1/135 second exposure time, it reports an exposure time of 7391/1000000 seconds. The difference between 1/135 and 7391/1000000 is quite small but the latter is not intuitive to most users. As such, I'd like to round overly precise exposure times to the nearest standard exposure times used in photography.
Is there a better way to do this other than using a lookup table of known-reasonable exposure times and finding the nearest match?
You can compute the continued fraction expansion of the large fraction. Then take one of the first convergents as your approximate fraction.
In your case, you get
7391/1000000 = [ 0; 135, 3, 2, ...]
so the first convergent is 1/135=0.0074074..., the next
1/(135+1/3) = 3/406 = 0.00738916256...
and the third
1/(135+1/(3+1/2)) = 1/(135+2/7) = 7/947 = 0.00739176346...
To compute the (first) coefficients of a continuous fraction development, you start with xk=x0. Then iteratively apply the procedure
Separate xk=n+r into integer n and fractional part r.
The integer is the next coefficient ak, with the inverse of the fractional part you start this procedure anew, xk = 1/r
Applied to the given number, this produces exactly the start of the sequence as above Then reconstruct the rational expressions, continue until the inverse of the square of the denominator is smaller than a given tolerance.
Try this:
human_readable_denominator = int(0.5 + 1 / precise_exposure_time)
With the example you gave:
human_readable_denominator = int(0.5 + 1 / (7391/1000000))
= 135
This works well for exposure times less than 1/2 second. For longer exposure times, converting to a 1/X format doesn't make sense.
Phil
Take a look at approxRational in Haskell's Data.Ratio. You give it a number and an epsilon value, and it gives the nicest rational number within epsilon of that number. I imagine other languages have similar library functions, or you can translate the Haskell source for approxRational.
I'm writing a vertex shader at the moment, and I need some random numbers. Vertex shader hardware doesn't have logical/bit operations, so I cannot implement any of the standard random number generators.
Is it possible to make a random number generator using only standard arithmetic? the randomness doesn't have to be particularly good!
If you don't mind crappy randomness, a classic method is
x[n+1] = (x[n] * x[n] + C) mod N
where C and N are constants, C != 0 and C != -2, and N is prime. This is a typical pseudorandom generator for Pollard Rho factoring. Try C = 1 and N = 8051, those work ok.
Vertex shaders sometimes have built-in noise generators for you to use, such as cg's noise() function.
Use a linear congruential generator:
X_(n+1) = (a * X_n + c) mod m
Those aren't that strong, but at least they are well known and can have long periods. The Wikipedia page also has good recommendations:
The period of a general LCG is at most
m, and for some choices of a much less
than that. The LCG will have a full
period if and only if:
1. c and m are relatively prime,
2. a - 1 is divisible by all prime factors of m,
3. a - 1 is a multiple of 4 if m is a multiple of 4
Believe it or not, I used newx = oldx * 5 + 1 (or a slight variation of it) in several videogames. The randomness is horrible--it's more of a scrambled sequence than a random generator. But sometimes that's all you need. If I recall correctly, it goes through all numbers before it repeats.
It has some terrible characteristics. It doesn't ever give you the same number twice in a row. A few of us did a bunch of tests on variations of it and we used some variations in other games.
We used it when there was no good modulo available to us. It's just a shift by two and two adds (or a multiply by 5 and one add). I would never use it nowadays for random numbers--I'd use an LCG--but maybe it would work OK for a shader where speed is crucial and your instruction set may be limited.