right now i have a bottle neck in my program I'm trying to write. i am trying to use a pinch gesture to control the scale of a UIImage. its the calculation of the scale that is causing the program to slow down and become choppy. below is the equation.
currentScale = (currentDistance / initialDistance) * scaleMod;
scaleMod is what ever the current scale was the user took their fingers off the screen. so the next time the user does a pinch the old scale is essentially the starting point of the new scaling action.
1) Can't you calculate scaleMod / initialDistance once while currentDistance is changing. That way you only have do that value times currentDistance, which removes a divide.
2) Make sure that this is actually the bottleneck. It most likely isn't, unless your doing something really wrong.
For any type of the three vars, this calculation can easily be done millions of times per second with little performance impact. Your problem is elsewhere.
If you fix the scaleMod and initialDistance to powers of 2 you could use shifts for faster multiplication and division.
See here for reference.
You could store scaleMod / initialDistance. When scaling is active (the user's fingers are still on the screen), multiply that value by currentDistance as needed.
Once the user has finished pinching, store the new scaleMod / initialDistance value for the next time pinching happens.
if you are doing computation with int (or other integer), see if he can do it using float precision. Floating-point divide is faster than integer (fewer bits to divide, assuming your CPU) has floating-point unit).
also, try to factor out division as multiplication by a reciprocal.
Check that InitialDistance != 0 first! :)
Related
Has anyone experiences replacing floating point operations on ATMega (2560) based systems? There are a couple of very common situations which happen every day.
For example:
Are comparisons faster than divisions/multiplications?
Are float to int type cast with followed multiplication/division faster than pure floating point operations without type cast?
I hope I don't have to make a benchmark just for me.
Example one:
int iPartialRes = (int)fArg1 * (int)fArg2;
iPartialRes *= iFoo;
faster as?:
float fPartialRes = fArg1 * fArg2;
fPartialRes *= iFoo;
And example two:
iSign = fVal < 0 ? -1 : 1;
faster as?:
iSign = fVal / fabs(fVal);
the questions could be solved just by thinking a moment about it.
AVRs does not have a FPU so all floating point related stuff is done in software --> fp multiplication involves much more than a simple int multiplication
since AVRs also does not have a integer division unit a simple branch is also much faster than a software division. if dividing floating points this is the worst worst case :)
but please note, that your first 2 examples produce very different results.
This is an old answer but I will submit this elaborated answer for the curious.
Just typecasting a float will truncate it ie; 3.7 will become 3, there is no rounding.
Fastest math on a 2560 will be (+,-,*) with divide being the slowest due to no hardware divide. Typecasting to an unsigned long int after multiplying all operands by a pseudo decimal point that suits your fractal number(1) range that your floats are expected to see and tracking the sign as a bool will give the best range/accuracy compromise.
If your loop needs to be as fast as possible, avoid even integer division, instead multiplying by a pseudo fraction instead and then doing your typecast back into a float with myFloat(defined elsewhere) = float(myPseudoFloat) / myPseudoDecimalConstant;
Not sure if you came across the Show info page in the playground. It's basically a sketch that runs a benchmark on your (insert Arduino model here) Shows the actual compute times for various things and systems. The Mega 2560 will be very close to an At Mega 328 as far as FLOPs goes, up to 12.5K/s (80uS per divide float). Typecasting would likely handicap the CPU more as it introduces more overhead and might even give erroneous results due to rounding errors and lack of precision.
(1)ie: 543.509,291 * 100000 = 543,509,291 will move the decimal 6 places to the maximum precision of a float on an 8-bit AVR. If you first multiply all values by the same constant like 1000, or 100000, etc, then the decimal point is preserved and then you cast it back to a float number by dividing by your decimal constant when you are ready to print or store it.
float f = 3.1428;
int x;
x = f * 10000;
x now contains 31428
I have been researching the log-sum-exp problem. I have a list of numbers stored as logarithms which I would like to sum and store in a logarithm.
the naive algorithm is
def naive(listOfLogs):
return math.log10(sum(10**x for x in listOfLogs))
many websites including:
logsumexp implementation in C?
and
http://machineintelligence.tumblr.com/post/4998477107/
recommend using
def recommend(listOfLogs):
maxLog = max(listOfLogs)
return maxLog + math.log10(sum(10**(x-maxLog) for x in listOfLogs))
aka
def recommend(listOfLogs):
maxLog = max(listOfLogs)
return maxLog + naive((x-maxLog) for x in listOfLogs)
what I don't understand is if recommended algorithm is better why should we call it recursively?
would that provide even more benefit?
def recursive(listOfLogs):
maxLog = max(listOfLogs)
return maxLog + recursive((x-maxLog) for x in listOfLogs)
while I'm asking are there other tricks to make this calculation more numerically stable?
Some background for others: when you're computing an expression of the following type directly
ln( exp(x_1) + exp(x_2) + ... )
you can run into two kinds of problems:
exp(x_i) can overflow (x_i is too big), resulting in numbers that you can't add together
exp(x_i) can underflow (x_i is too small), resulting in a bunch of zeroes
If all the values are big, or all are small, we can divide by some exp(const) and add const to the outside of the ln to get the same value. Thus if we can pick the right const, we can shift the values into some range to prevent overflow/underflow.
The OP's question is, why do we pick max(x_i) for this const instead of any other value? Why don't we recursively do this calculation, picking the max out of each subset and computing the logarithm repeatedly?
The answer: because it doesn't matter.
The reason? Let's say x_1 = 10 is big, and x_2 = -10 is small. (These numbers aren't even very large in magnitude, right?) The expression
ln( exp(10) + exp(-10) )
will give you a value very close to 10. If you don't believe me, go try it. In fact, in general, ln( exp(x_1) + exp(x_2) + ... ) will give be very close to max(x_i) if some particular x_i is much bigger than all the others. (As an aside, this functional form, asymptotically, actually lets you mathematically pick the maximum from a set of numbers.)
Hence, the reason we pick the max instead of any other value is because the smaller values will hardly affect the result. If they underflow, they would have been too small to affect the sum anyway, because it would be dominated by the largest number and anything close to it. In computing terms, the contribution of the small numbers will be less than an ulp after computing the ln. So there's no reason to waste time computing the expression for the smaller values recursively if they will be lost in your final result anyway.
If you wanted to be really persnickety about implementing this, you'd divide by exp(max(x_i) - some_constant) or so to 'center' the resulting values around 1 to avoid both overflow and underflow, and that might give you a few extra digits of precision in the result. But avoiding overflow is much more important about avoiding underflow, because the former determines the result and the latter doesn't, so it's much simpler just to do it this way.
Not really any better to do it recursively. The problem's just that you want to make sure your finite-precision arithmetic doesn't swamp the answer in noise. By dealing with the max on its own, you ensure that any junk is kept small in the final answer because the most significant component of it is guaranteed to get through.
Apologies for the waffly explanation. Try it with some numbers yourself (a sensible list to start with might be [1E-5,1E25,1E-5]) and see what happens to get a feel for it.
As you have defined it, your recursive function will never terminate. That's because ((x-maxlog) for x in listOfLogs) still has the same number of elements as listOfLogs.
I don't think that this is easily fixable either, without significantly impacting either the performance or the precision (compared to the non-recursive version).
How big should epsilon be when checking if dot product is close to 0?
I am working on raytracing project, and i need to check if the dot product is 0. But
that will probably never happen, so I want to take it as 0 if its value is in a small
area [-eps, +eps], but I am not sure how big should eps be?
Thanks
Since you describe this as part of a ray-tracing project, your accuracy needed is likely dictated by the "world coordinates" of a scene, or perhaps even the screen coordinates to which those are translated. Either of these would give an acceptable target for absolute error in your calculation.
It might be possible to backtrack from that to the accuracy required in the intermediate calculations you are doing, such as forming an inner product which is supposed "theoretically" to be zero. For example, you might be trying to find a shortest path (reflected light) between two smooth bodies, and the disappearance of the inner product (perpendicularity) gives the location of a point.
In a case like that the inner product may be a quadratic in the unknowns (location of a point) that you seek. It's possible that the unknowns form a "double root" (zero of multiplicity 2), making the location of that root extra sensitive to the computation of the inner product being zero.
For such cases you would want to get roughly twice the number of digits "zero" in the inner product as needed in the accuracy of the location. Essentially the inner product changes very slowly with location in the neighborhood of a double root.
But your application might not be so sensitive; analysis of the algorithm involved is necessary to give you a good answer. As a general rule I do the inner product in double precision to get answers that may be reliable as far as single precision, but this may be too costly if the ray-tracing is to be done in real time.
There is no definitive answer. I use two approaches.
If you are only concerned by floating point error then you can use a pretty small value, comparable to the smallest floating number that the compiler can handle. In c/c++ you can use the definitions provided in float.h, such as DBL_MIN to check for these numbers. I'd use a small multiple of the number, e.g., 10. * DBL_MIN as the value for eps.
If the problem is not floating point math rounding error, then I use a value that is small (say 1%) compared to the modulus of the smallest vector.
we have a particle detector hard-wired to use 16-bit and 8-bit buffers. Every now and then, there are certain [predicted] peaks of particle fluxes passing through it; that's okay. What is not okay is that these fluxes usually reach magnitudes above the capacity of the buffers to store them; thus, overflows occur. On a chart, they look like the flux suddenly drops and begins growing again. Can you propose a [mostly] accurate method of detecting points of data suffering from an overflow?
P.S. The detector is physically inaccessible, so fixing it the 'right way' by replacing the buffers doesn't seem to be an option.
Update: Some clarifications as requested. We use python at the data processing facility; the technology used in the detector itself is pretty obscure (treat it as if it was developed by a completely unrelated third party), but it is definitely unsophisticated, i.e. not running a 'real' OS, just some low-level stuff to record the detector readings and to respond to remote commands like power cycle. Memory corruption and other problems are not an issue right now. The overflows occur simply because the designer of the detector used 16-bit buffers for counting the particle flux, and sometimes the flux exceeds 65535 particles per second.
Update 2: As several readers have pointed out, the intended solution would have something to do with analyzing the flux profile to detect sharp declines (e.g. by an order of magnitude) in an attempt to separate them from normal fluctuations. Another problem arises: can restorations (points where the original flux drops below the overflowing level) be detected by simply running the correction program against the reverted (by the x axis) flux profile?
int32[] unwrap(int16[] x)
{
// this is pseudocode
int32[] y = new int32[x.length];
y[0] = x[0];
for (i = 1:x.length-1)
{
y[i] = y[i-1] + sign_extend(x[i]-x[i-1]);
// works fine as long as the "real" value of x[i] and x[i-1]
// differ by less than 1/2 of the span of allowable values
// of x's storage type (=32768 in the case of int16)
// Otherwise there is ambiguity.
}
return y;
}
int32 sign_extend(int16 x)
{
return (int32)x; // works properly in Java and in most C compilers
}
// exercise for the reader to write similar code to unwrap 8-bit arrays
// to a 16-bit or 32-bit array
Of course, ideally you'd fix the detector software to max out at 65535 to prevent wraparound of the sort that is causing your grief. I understand that this isn't always possible, or at least isn't always possible to do quickly.
When the particle flux exceeds 65535, does it do so quickly, or does the flux gradually increase and then gradually decrease? This makes a difference in what algorithm you might use to detect this. For example, if the flux goes up slowly enough:
true flux measurement
5000 5000
10000 10000
30000 30000
50000 50000
70000 4465
90000 24465
60000 60000
30000 30000
10000 10000
then you'll tend to have a large negative drop at times when you have overflowed. A much larger negative drop than you'll have at any other time. This can serve as a signal that you've overflowed. To find the end of the overflow time period, you could look for a large jump to a value not too far from 65535.
All of this depends on the maximum true flux that is possible and on how rapidly the flux rises and falls. For example, is it possible to get more than 128k counts in one measurement period? Is it possible for one measurement to be 5000 and the next measurement to be 50000? If the data is not well-behaved enough, you may be able to make only statistical judgment about when you have overflowed.
Your question needs to provide more information about your implementation - what language/framework are you using?
Data overflows in software (which is what I think you're talking about) are bad practice and should be avoided. While you are seeing (strange data output) is only one side effect that is possible when experiencing data overflows, but it is merely the tip of the iceberg of the sorts of issues you can see.
You could quite easily experience more serious issues like memory corruption, which can cause programs to crash loudly, or worse, obscurely.
Is there any validation you can do to prevent the overflows from occurring in the first place?
I really don't think you can fix it without fixing the underlying buffers. How are you supposed to tell the difference between the sequences of values (0, 1, 2, 1, 0) and (0, 1, 65538, 1, 0)? You can't.
How about using an HMM where the hidden state is whether you are in an overflow and the emissions are observed particle flux?
The tricky part would be coming up with the probability models for the transitions (which will basically encode the time-scale of peaks) and for the emissions (which you can build if you know how the flux behaves and how overflow affects measurement). These are domain-specific questions, so there probably aren't ready-made solutions out there.
But one you have the model, everything else---fitting your data, quantifying uncertainty, simulation, etc.---is routine.
You can only do this if the actual jumps between successive values are much smaller than 65536. Otherwise, an overflow-induced valley artifact is indistinguishable from a real valley, you can only guess. You can try to match overflows to corresponding restorations, by simultaneously analysing a signal from the right and the left (assuming that there is a recognizable base line).
Other than that, all you can do is to adjust your experiment by repeating it with different original particle flows, so that real valleys will not move, but artifact ones move to the point of overflow.
I have balls bouncing around and each time they collide their speed vector is reduced by the Coefficient of Restitution.
Right now my balls CoR for my balls is .80 . So after many bounces my balls have "stopped" rolling because their speed has becoming some ridiculously small number.
In what stage is it appropriate to check if a speed value is small enough to simply call it zero (so I don't have the crazy jittering of the balls reacting to their micro-velocities). I've read on some forums before that people will sometimes use an epsilon constant, some small number and check against that.
Should I define an epsilon constant and do something like:
if Math.abs(velocity.x) < epsilon then velocity.x = 0
Each time I update the balls velocity and position? Is this what is generally done? Would it be reasonable to place that in my Vector classes setters for x and y? Or should I do it outside of my vector class when I'm calculating the velocities.
Also, what would be a reasonable epsilon value if I was using floats for my speed vector?
A reasonable value for epsilon is going to depend on the constraints of your system. If you are representing the ball graphically, then your epsilon might correspond to, say, a velocity of .1 pixels a second (ensuring that your notion of stopping matches the user's experience of the screen objects stopping). If you're doing a physics simulation, you'll want to tune it to the accuracy to which you're trying to measure your system.
As for how often you check - that depends as well. If you're simulating something in real time, the extra check might be costly, and you'll want to check every 10 updates or once per second or something. Or performance might not be an issue, and you can check with every update.
Instead of an epsilon for an IsStillMoving function, maybe you could use an UpdatePosition function, scheduled on an object-by-object basis based on its velocity.
I'd do something like this (in my own make-it-up-as-you-go pseudocode):
void UpdatePosition(Ball b) {
TimeStamp now = Clock.GetTime();
float secondsSinceLastUpdate = now.TimeSince(b.LastUpdate).InSeconds;
Point3D oldPosition = b.Position;
Point3D newPosition = CalculatePosition(b.Position, b.Velocity, interval);
b.MoveTo(newPosition);
float epsilonOfAccuracy = 0.5; // Accurate to one half-pixel
float pixelDistance = Camera.PixelDistance(oldPosition, newPosition);
float fps = System.CurrentFramesPerSecond;
float secondsToMoveOnePixel = (pixelDistance * secondsSinceLastUpdate) / fps;
float nextUpdateInterval = secondsToMoveOnePixel / epsilonOfAccuracy;
b.SetNextUpdateAt(now + nextUpdateInterval);
}
Balls moving very quickly would get updated on every frame. Balls moving more slowly might update every five or ten frames. And balls that have stopped (or nearly stopped) would update only very very rarely.
IMO your epsilon approach is fine. I would just experiment to see what looks or feels natural to the animation in the game.
Epsilon by nature is the smallest possible increment. Unfortunately, computers have different "minimal" increments of their own depending on the floating point representation. I would be very careful (and might even go higher than what I would calculate just for safety) playing around with that, especially if I want a code to be portable.
You may want to write a function that figures out the minimal increment on your floats rather than use a magic value.