Related
I have been working for a couple of months with OpenMDAO and I find myself struggling with my code when I want to impose conditions for trying to replicate a physical/engineering behaviour.
I have tried using sigmoid functions, but I am still not convinced with that, due to the difficulty about trading off sensibility and numerical stabilization. Most of times I found overflows in exp so I end up including other conditionals (like np.where) so loosing linearity.
outputs['sigmoid'] = 1 / (1 + np.exp(-x))
I was looking for another kind of step function or something like that, able to keep linearity and derivability to the ease of the optimization. I don't know if something like that exists or if there is any strategy that can help me. If it helps, I am working with an OpenConcept benchmark, which uses vectorized computations ans Simpson's rule numerical integration.
Thank you very much.
PD: This is my first ever question in stackoverflow, so I would like to apologyze in advance for any error or bad practice commited. Hope to eventually collaborate and become active in the community.
Update after Justin answer:
I will take the opportunity to define a little bit more my problem and the strategy I tried. I am trying to monitorize and control thermodynamics conditions inside a tank. One of the things is to take actions when pressure P1 reaches certein threshold P2, for defining this:
eval= (inputs['P1'] - inputs['P2']) / (inputs['P1'] + inputs['P2'])
# P2 = threshold [Pa]
# P1 = calculated pressure [Pa]
k=100 #steepness control
outputs['sigmoid'] = (1 / (1 + np.exp(-eval * k)))
eval was defined in order avoid overflows normalizing the values, so when the threshold is recahed, corrections are taken. In a very similar way, I defined a function to check if there is still mass (so flowing can continue between systems):
eval= inputs['mass']/inputs['max']
k=50
outputs['sigmoid'] = (1 / (1 + np.exp(-eval*k)))**3
maxis also used for normalizing the value and the exponent is added for reaching zero before entering in the negative domain.
PLot (sorry it seems I cannot post images yet for my reputation)
It may be important to highlight that both mass and pressure are calculated from coupled ODE integration, in which this activation functions take part. I guess OpenConcept nature 'explore' a lot of possible values before arriving the solution, so most of the times giving negative infeasible values for massand pressure and creating overflows. For that sometimes I try to include:
eval[np.where(eval > 1.5)] = 1.5
eval[np.where(eval < -1.5)] = -1.5
That is not a beautiful but sometimes effective solution. I try to avoid using it since I taste that this bounds difficult solver and optimizer work.
I could give you a more complete answer if you distilled your question down to a specific code example of the function you're wrestling with and its expected input range. If you provide that code-sample, I'll update my answer.
Broadly, this is a common challenge when using gradient based optimization. You want some kind of behavior like an if-condition to turn something on/off and in many cases thats a fundamentally discontinuous function.
To work around that we often use sigmoid functions, but these do have some of the numerical challenges you pointed out. You could try a hyberbolic tangent as an alternative, though it may suffer the same kinds of problems.
I will give you two broad options:
Option 1
sometimes its ok (even if not ideal) to leave the purely discrete conditional in the code. Lets say you wanted to represent a kind of simple piecewise function:
y = 2x; x>=0
y = 0; x < 0
There is a sharp corner in that function right at 0. That corner is not differentiable, but the function is fine everywhere else. This is very much like the absolute value function in practice, though you might not draw the analogy looking at the piecewise definition of the function because the piecewise nature of abs is often hidden from you.
If you know (or at least can check after the fact) that your final answer will no lie right on or very near to that C1 discontinuity, then its probably fine to leave the code the way is is. Your derivatives will be well defined everywhere but right at 0 and you can simply pick the left or the right answer for 0.
Its not strictly mathematically correct, but it works fine as long as you're not ending up stuck right there.
Option 2
Apply a smoothing function. This can be a sigmoid, or a simple polynomial. The exact nature of the smoothing function is highly specific to the kind of discontinuity you are trying to approximate.
In the case of the piecewise function above, you might be tempted to define that function as:
2x*sig(x)
That would give you roughly the correct behavior, and would be differentiable everywhere. But wolfram alpha shows that it actually undershoots a little. Thats probably undesirable, so you can increase the exponent to mitigate that. This however, is where you start to get underflow and overflow problems.
So to work around that, and make a better behaved function all around, you could instead defined a three part piecewise polynomial:
y = 2x; x>=a
y = c0 + c1*x + c2*x**2; -a <= x < a
y = 0 x < -a
you can solve for the coefficients as a function of a (please double check my algebra before using this!):
c0 = 1.5a
c1 = 2
c2 = 1/(2a)
The nice thing about this approach is that it will never overshoot and go negative. You can also make a reasonably small and still get decent numerics. But if you try to make it too small, c2 will obviously blow up.
In general, I consider the sigmoid function to be a bit of a blunt instrument. It works fine in many cases, but if you try to make it approximate a step function too closely, its a nightmare. If you want to represent physical processes, I find polynomial fillet functions work more nicely.
It takes a little effort to derive that polynomial, because you want it to be c1 continuous on both sides of the curve. So you have to construct the system of equations to solve for it as a function of the polynomial order and the specific relaxation you want (0.1 here).
My goto has generally been to consult the table of activation functions on wikipedia: https://en.wikipedia.org/wiki/Activation_function
I've had good luck with sigmoid and the hyperbolic tangent, scaling them such that we can choose the lower and upper values as well as choosing the location of the activation on the x-axis and the steepness.
Dymos uses a vectorization that I think is similar to OpenConcept and I've had success with numpy.where there as well, providing derivatives for each possible "branch" taken. It is true that you may have issues with derivative mismatches if you have an analysis point right on the transition, but often I've had success despite that. If the derivative at the transition becomes a hinderance then implementing a sigmoid or relu are more appropriate.
If x is of a magnitude such that it can cause overflows, consider applying units or using scaling to put it within reasonable limits if you cannot bound it directly.
I'm trying to calculate rewards for liquidity providers and I found this equation that Uniswap apparently uses:
Basic Formula (L = liquidity): (L_you / L_others) * (24h_swap_volume * pool_fee_rate)
And I'm trying to implement this in my smart contract but I can't seem to be able to because the liquidity held by others will always be larger than the liquidity you hold which requires a decimal value so my question is: How do I use this equation in a Solidity smart contract without falling into floating-point hell?
After doing a little more digging I found that you can use ABDKMath to achieve this you can find more information using the links below:
Library: https://github.com/abdk-consulting/abdk-libraries-solidity
Article: https://www.bitcoininsider.org/article/68630/10x-better-fixed-point-math-solidity
As a code example here is a quick snippet of how I implemented this:
uint256 fee = ABDKMath64x64.mulu(
ABDKMath64x64.divu(
tokensProvided,
others
),
taxableValue
);
This is probably not the best solution but it worked for me. Also, take a look at https://github.com/paulrberg/prb-math for a higher precision fixed-point math library.
Regarding gas efficiency here is a quote from the PRB math lib:
The typeless PRBMath library is faster than ABDKMath for abs, exp,
exp2, gm, inv, ln, log2. Conversely, it is slower than ABDKMath for
avg, div, mul, powu and sqrt. There are two technical reasons why
PRBMath lags behind ABDKMath's mul and div functions
So for this use case, I think it is a better idea to use ABDK Math instead of PRB math unless you plan on transferring more than 2 to the 128th power of Wei.
This is a continuation of my questions:
Declaring a functional recursive sequence in Matlab
Is there a more efficient way of nesting logarithms?
Nesting a specific recursion in Pari-GP
But I'll keep this question self contained. I have made a coding project for myself; which is to program a working simple calculator for a tetration function I've constructed. This tetration function is holomorphic, and stated not to be Kneser's solution (as to all the jargon, ignore); long story short, I need to run the numbers; to win over the nay-sayers.
As to this, I have to use Pari-GP; as this is a fantastic language for handling large numbers and algebraic expressions. As we are dealing with tetration (think numbers of the order e^e^e^e^e^e); this language is, of the few that exist, the best for such affairs. It is the favourite when doing iterated exponential computations.
Now, the trouble I am facing is odd. It is not so much that my code doesn't work; it's that it's overflowing because it should over flow (think, we're getting inputs like e^e^e^e^e^e; and no computer can handle it properly). I'll post the first batch of code, before I dive deeper.
The following code works perfectly; and does everything I want. The trouble is with the next batch of code. This produces all the numbers I want.
\\This is the asymptotic solution to tetration. z is the variable, l is the multiplier, and n is the depth of recursion
\\Warning: z with large real part looks like tetration; and therefore overflows very fast. Additionally there are singularities which occur where l*(z-j) = (2k+1)*Pi*I.
\\j,k are integers
beta_function(z,l,n) =
{
my(out = 0);
for(i=0,n-1,
out = exp(out)/(exp(l*(n-i-z)) +1));
out;
}
\\This is the error between the asymptotic tetration and the tetration. This is pretty much good for 200 digit accuracy if you need.
\\modify the 0.000000001 to a bigger number to make this go faster and receive less precision. When graphing 0.0001 is enough
\\Warning: This will blow up at some points. This is part of the math; these functions have singularities/branch cuts.
tau(z,l,n)={
if(1/real(beta_function(z,l,n)) <= 0.000000001, //this is where we'll have problems; if I try to grab a taylor series with this condition we error out
-log(1+exp(-l*z)),
log(1 + tau(z+1,l,n)/beta_function(z+1,l,n)) - log(1+exp(-l*z))
)
}
\\This is the sum function. I occasionally modify it; to make better graphs, but the basis is this.
Abl(z,l,n) = {
beta_function(z,l,n) + tau(z,l,n)
}
Plugging this in, you get the following expressions:
Abl(1,log(2),100)
realprecision = 28 significant digits (20 digits displayed)
%109 = 0.15201551563214167060
exp(Abl(0,log(2),100))
%110 = 0.15201551563214167060
Abl(1+I,2+0.5*I,100)
%111 = 0.28416643148885326261 + 0.80115283113944703984*I
exp(Abl(0+I,2+0.5*I,100))
%112 = 0.28416643148885326261 + 0.80115283113944703984*I
And so on and so forth; where Abl(z,l,n) = exp(Abl(z-1,l,n)). There's no problem with this code. Absolutely none at all; we can set this to 200 precision and it'll still produce correct results. The graphs behave exactly as the math says they should behave. The problem is, in my construction of tetration (the one we actually want); we have to sort of paste together the solutions of Abl(z,l,n) across the value l. Now, you don't have to worry about any of that at all; but, mathematically, this is what we're doing.
This is the second batch of code; which is designed to "paste together" all these Abl(z,l,n) into one function.
//This is the modified asymptotic solution to the Tetration equation.
beta(z,n) = {
beta_function(z,1/sqrt(1+z),n);
}
//This is the Tetration function.
Tet(z,n) ={
if(1/abs(beta_function(z,1/sqrt(1+z),n)) <= 0.00000001,//Again, we see here this if statement; and we can't have this.
beta_function(z,1/sqrt(1+z),n),
log(Tet(z+1,n))
)
}
This code works perfectly for real-values; and for complex values. Some sample values,
Tet(1+I,100)
%113 = 0.12572857262453957030 - 0.96147559586703141524*I
exp(Tet(0+I,100))
%114 = 0.12572857262453957030 - 0.96147559586703141524*I
Tet(0.5,100)
%115 = -0.64593666417664607364
exp(Tet(0.5,100))
%116 = 0.52417133958039107545
Tet(1.5,100)
%117 = 0.52417133958039107545
We can also effectively graph this object on the real-line. Which just looks like the following,
ploth(X=0,4,Tet(X,100))
Now, you may be asking; What's the problem then?
If you try and plot this function in the complex plane, it's doomed to fail. The nested logarithms produce too many singularities near the real line. For imaginary arguments away from the real-line, there's no problem. And I've produced some nice graphs; but the closer you get to the real line; the more it misbehaves and just short circuits. You may be thinking; well then, the math is wrong! But, no, the reason this is happening is because Kneser's tetration is the only tetration that is stable about the principal branch of the logarithm. Since this tetration IS NOT Kneser's tetration, it's inherently unstable about the principal branch of the logarithm. Of course, Pari just chooses the principal branch. So when I do log(log(log(log(log(beta(z+5,100)))))); the math already says this will diverge. But on the real line; it's perfectly adequate. And for values of z with an imaginary argument away from zero, we're fine too.
So, how I want to solve this, is to grab the Taylor series at Tet(1+z,100); which Pari-GP is perfect for. The trouble?
Tet(1+z,100)
*** at top-level: Tet(1+z,100)
*** ^------------
*** in function Tet: ...unction(z,1/sqrt(1+z),n))<=0.00000001,beta_fun
*** ^---------------------
*** _<=_: forbidden comparison t_SER , t_REAL.
The numerical comparison I've done doesn't translate to a comparison between t_SER and t_REAL.
So, my question, at long last: what is an effective strategy at getting the Taylor series of Tet(1+z,100) using only real inputs. The complex inputs near z=0 are erroneous; the real values are not. And if my math is right; we can take the derivatives along the real-line and get the right result. Then, we can construct a Tet_taylor(z,n) which is just the Taylor Series expansion. Which; will most definitely have no errors when trying to graph.
Any help, questions, comments, suggestions--anything, is greatly appreciated! I really need some outside eyes on this.
Thanks so much if you got to the bottom of this post. This one is bugging me.
Regards, James
EDIT:
I should add that a Tet(z+c,100) for some number c is the actual tetration function we want. There is a shifting constant I haven't talked about yet. Nonetheless; this is spurious to the question, and is more a mathematical point.
This is definitely not an answer - I have absolutely no clue what you are trying to do. However, I see no harm in offering suggestions. PARI has a built in type for power series (essentially Taylor series) - and is very good at working with them (many operations are supported). I was originally going to offer some suggestions on how to get a Taylor series out of a recursive definition using your functions as an example - but in this case, I'm thinking that you are trying to expand around a singularity which might be doomed to failure. (On your plot it seems as x->0, the result goes to -infinity???)
In particular if I compute:
log(beta(z+1, 100))
log(log(beta(z+2, 100)))
log(log(log(beta(z+3, 100))))
log(log(log(log(beta(z+4, 100)))))
...
The different series are not converging to anything. Even the constant term of the series is getting smaller with each iteration, so I am not entirely sure there is even a Taylor series expansion about x = 0.
Questions/suggestions:
Should you be expanding about a different point? (say where the curve
crosses the x-axis).
Does the Taylor series satisfy some recursive relation? For example: A(z) = log(A(z+1)). [This doesn't work, but perhaps there is another way to write it].
I suspect my answer is unlikely to be satisfactory - but then again your question is more mathematical than a practical programming problem.
So I've successfully answered my question. I haven't programmed in so long; I'm kind of shoddy. But I figured it out after enough coffee. I created 3 new functions, which allow me to grab the Taylor series.
\\This function attempts to find the number of iterations we need.
Tet_GRAB_k(A,n) ={
my(k=0);
while( 1/real(beta(A+k,n)) >= 0.0001, k++);
return(k);
}
\\This function will run and produce the same results as Tet; but it's slower; but it let's us estimate Taylor coefficients.
\\You have to guess which k to use for whatever accuracy before overflowing; which is what the last function is good for.
Tet_taylor(z,n,k) = {
my(val = beta(z+k,n));
for(i=1,k,val = log(val));
return(val);
}
\\This function produces an array of all the coefficients about a value A.
TAYLOR_SERIES(A,n) = {
my(ser = vector(40,i,0));
for(i=1,40, ser[i] = polcoeff(Tet_taylor(A+z,n,Tet_GRAB_k(A,n)),i-1,z));
return(ser);
}
After running the numbers, I'm confident this works. The Taylor series is converging; albeit rather slowly and slightly less accurately than desired; but this will have to do.
Thanks to anyone who read this. I'm just answering this question for completeness.
I am suddenly in a recursive language class (sml) and recursion is not yet physically sensible for me. I'm thinking about the way a floor of square tiles is sometimes a model or metaphor for integer multiplication, or Cuisenaire Rods are a model or analogue for addition and subtraction. Does anyone have any such models you could share?
Imagine you're a real life magician, and can make a copy of yourself. You create your double a step closer to the goal and give him (or her) the same orders as you were given.
Your double does the same to his copy. He's a magician too, you see.
When the final copy finds itself created at the goal, it has nowhere more to go, so it reports back to its creator. Which does the same.
Eventually, you get your answer back – without having moved an inch – and can now create the final result from it, easily. You get to pretend not knowing about all those doubles doing the actual hard work for you. "Hmm," you're saying to yourself, "what if I were one step closer to the goal and already knew the result? Wouldn't it be easy to find the final answer then ?" (*)
Of course, if you were a double, you'd have to report your findings to your creator.
More here.
(also, I think I saw this "doubles" creation chain event here, though I'm not entirely sure).
(*) and that is the essence of the recursion method of problem solving.
How do I know my procedure is right? If my simple little combination step produces a valid solution, under assumption it produced the correct solution for the smaller case, all I need is to make sure it works for the smallest case – the base case – and then by induction the validity is proven!
Another possibility is divide-and-conquer, where we split our problem in two halves, so will get to the base case much much faster. As long as the combination step is simple (and preserves validity of solution of course), it works. In our magician metaphor, I get to create two copies of myself, and combine their two answers into one when they are finished. Each of them creates two copies of themselves as well, so this creates a branching tree of magicians, instead of a simple line as before.
A good example is the Sierpinski triangle which is a figure that is built from three quarter-sized Sierpinski triangles simply, by stacking them up at their corners.
Each of the three component triangles is built according to the same recipe.
Although it doesn't have the base case, and so the recursion is unbounded (bottomless; infinite), any finite representation of S.T. will presumably draw just a dot in place of the S.T. which is too small (serving as the base case, stopping the recursion).
There's a nice picture of it in the linked Wikipedia article.
Recursively drawing an S.T. without the size limit will never draw anything on screen! For mathematicians recursion may be great, engineers though should be more cautious about it. :)
Switching to corecursion ⁄ iteration (see the linked answer for that), we would first draw the outlines, and the interiors after that; so even without the size limit the picture would appear pretty quickly. The program would then be busy without any noticeable effect, but that's better than the empty screen.
I came across this piece from Edsger W. Dijkstra; he tells how his child grabbed recursions:
A few years later a five-year old son would show me how smoothly the idea of recursion comes to the unspoilt mind. Walking with me in the middle of town he suddenly remarked to me, Daddy, not every boat has a lifeboat, has it? I said How come? Well, the lifeboat could have a smaller lifeboat, but then that would be without one.
I love this question and couldn't resist to add an answer...
Recursion is the russian doll of programming. The first example that come to my mind is closer to an example of mutual recursion :
Mutual recursion everyday example
Mutual recursion is a particular case of recursion (but sometimes it's easier to understand from a particular case than from a generic one) when we have two function A and B defined like A calls B and B calls A. You can experiment this very easily using a webcam (it also works with 2 mirrors):
display the webcam output on your screen with VLC, or any software that can do it.
Point your webcam to the screen.
The screen will progressively display an infinite "vortex" of screen.
What happens ?
The webcam (A) capture the screen (B)
The screen display the image captured by the webcam (the screen itself).
The webcam capture the screen with a screen displayed on it.
The screen display that image (now there are two screens displayed)
And so on.
You finally end up with such an image (yes, my webcam is total crap):
"Simple" recursion is more or less the same except that there is only one actor (function) that calls itself (A calls A)
"Simple" Recursion
That's more or less the same answer as #WillNess but with a little code and some interactivity (using the js snippets of SO)
Let's say you are a very motivated gold-miner looking for gold, with a very tiny mine, so tiny that you can only look for gold vertically. And so you dig, and you check for gold. If you find some, you don't have to dig anymore, just take the gold and go. But if you don't, that means you have to dig deeper. So there are only two things that can stop you:
Finding some gold nugget.
The Earth's boiling kernel of melted iron.
So if you want to write this programmatically -using recursion-, that could be something like this :
// This function only generates a probability of 1/10
function checkForGold() {
let rnd = Math.round(Math.random() * 10);
return rnd === 1;
}
function digUntilYouFind() {
if (checkForGold()) {
return 1; // he found something, no need to dig deeper
}
// gold not found, digging deeper
return digUntilYouFind();
}
let gold = digUntilYouFind();
console.log(`${gold} nugget found`);
Or with a little more interactivity :
// This function only generates a probability of 1/10
function checkForGold() {
console.log("checking...");
let rnd = Math.round(Math.random() * 10);
return rnd === 1;
}
function digUntilYouFind() {
if (checkForGold()) {
console.log("OMG, I found something !")
return 1;
}
try {
console.log("digging...");
return digUntilYouFind();
} finally {
console.log("climbing back...");
}
}
let gold = digUntilYouFind();
console.log(`${gold} nugget found`);
If we don't find some gold, the digUntilYouFind function calls itself. When the miner "climbs back" from his mine it's actually the deepest child call to the function returning the gold nugget through all its parents (the call stack) until the value can be assigned to the gold variable.
Here the probability is high enough to avoid the miner to dig to the earth kernel. The earth kernel is to the miner what the stack size is to a program. When the miner comes to the kernel he dies in terrible pain, when the program exceed the stack size (causes a stack overflow), it crashes.
There are optimization that can be made by the compiler/interpreter to allow infinite level of recursion like tail-call optimization.
Take fractals as being recursive: the same pattern get applied each time, yet each figure differs from another.
As natural phenomena with fractal features, Wikipedia presents:
Moutain ranges
Frost crystals
DNA
and, even, proteins.
This is odd, and not quite a physical example except insofar as dance-movement is physical. It occurred to me the other morning. I call it "Written in Latin, solved in Hebrew." Huh? Surely you are saying "Huh?"
By it I mean that encoding a recursion is usually done left-to-right, in the Latin alphabet style: "Def fac(n) = n*(fac(n-1))." The movement style is "outermost case to base case."
But (please check me on this) at least in this simple case, it seems the easiest way to evaluate it is right-to-left, in the Hebrew alphabet style: Start from the base case and move outward to the outermost case:
(fac(0) = 1)
(fac(1) = 1)*(fac(0) = 1)
(fac(2))*(fac(1) = 1)*(fac(0) = 1)
(fac(n)*(fac(n-1)*...*(fac(2))*(fac(1) = 1)*(fac(0) = 1)
(* Easier order to calculate <<<<<<<<<<< is leftwards,
base outwards to outermost case;
more difficult order to calculate >>>>>> is rightwards,
outermost case to base *)
Then you do not have to suspend items on the left while awaiting the results of calculations further right. "Dance Leftwards" instead of "Dance rightwards"?
The typical FFT for audio looks pretty similar to this, with most of the action happening on the far left side
http://www.flight404.com/blog/images/fft.jpg
He multiplied it by a partial sine wave to get it to the bottom, but the article isn't too specific on this part of it. It also seems like a "good enough" modification of the dataset, rather than one based on some property. I understand that human hearing is better suited to the higher frequencies, thus, most music will have amplified bass and attenuated treble so that both sound to us as being of relatively equal strength.
My question is what modification needs to be done to the FFT to compensate for this standard falloff?
for(i = 0; i < fft.length; i++){
fft[i] = fft[i] * Math.log(i + 1); // does, eh, ok but the high
// end is still not really "loud"
// enough
}
EDIT ::
http://en.wikipedia.org/wiki/Equal-loudness_contour
I came across this article, I think it might be the direction to head in, but there still might be some property of an FFT that needs to be counteracte.
First, are you sure you want to do this? It makes sense to compensate for some things, like the microphone response not being flat, but not human perception. People are used to hearing sounds with the spectral content that the sounds have in the real world, not along perceptual equal loudness curves. If you play a sound that you've modified in the way you suggest it would sound strange. Maybe some people like the music to have enhanced low frequencies, but this is a matter of taste, not psychophysics.
Or maybe you are compensating for some other reason, for example, taking into account the poorer sensitivity to lower frequencies might enhance a compression algorithm. Is this the idea?
If you do want to normalize by the equal loudness curves, one should note that most of the curves and equations are in terms of sound pressure level (SPL). SPL is the log of the square of the waveform amplitude, so when you work with the FFTs, it's probably easiest to work with their square (the power specta). (Or, of course, you could compensate in other ways by, say, multiplying by sqrt(log(i+1)) in your equation above -- assuming that the log was an approximation of the inverse equal-loudness curve.)
I think the equal loudness contour is exactly the right direction.
However, its shape depends on the absolute pressure level.
In other words the sensitivity curve of our hearing changes with sound pressure.
There is no "correct normalization" if you have no information about absolute levels.
If this is a problem depends on what you want to do with the data.
The loudness contour is standardized in ISO 226 but this document is not freely available for download. It should be in a decent university library though.
Here is another source for
loudness contours
So you are trying to raise the level of the high end frequencies? Sounds like a high pass filter with a minimum multiplier might work, so that you don't attenuate the low frequency signals too much. Pick up a good book on filter design, maybe monkey around with this applet
In the old days of first samplers, this is before MOTU Boost people :) it wasn't FFT but simple (Fairlight or Roland it first I think) Normalisation done on the original or resulting time-domain signal (if you are doing beat slicing, recycle-style); can't you do that? Or only go for the FFT after you compensate to counteract for it?
Seems like a two phase procedure otherwise, I'd personally leave FFT as is for the task..