What algorithms could I use for audio volume level? [closed] - apache-flex

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
Improve this question
Let's say I have a slider that can go between 0 and 1. The SoundTransform.volume also ranges between 0 (silent) and 1 (full volume), but if I use a linear function, let's say SoundTransform.volume = slider.volume, the result is rather not pleasing - the perception is that the volume dramatically changes in the lower half and does almost nothing in the upper half of the slider.
I really haven't studied the human ear, but I overheard once that human perception is logarithmic, or something similar. What algorithms should I use for setting the SoundTransform.volume?

human perception in general is logarithmic, also when it comes to things as luminosity, etc. ... this enables us to register small changes to small "input signals" of our environement, or to put it another way: to always percieve a change of a perceivable physical quantity in relation to its value ...
thus, you should modify the volume to grow exponentially, like this:
y = (Math.exp(x)-1)/(Math.E-1)
you can try other bases as well:
y = (Math.pow(base,x)-1)/(base-1)
the bigger the value of base is, the stronger the effect, the slower volume starts growing in the beginning and the faster it grows in the end ...
a slighty simpler approach, giving you similar results (you are only in the interval between 0 and 1, so approximations are quite simple, actually), is to exponantiate the original value, as
y = Math.pow(x, exp);
for exp bigger than 1, the effect is, that the output (i.e. the volume in you case) first goes up slower, and then faster towards the end ... this is very similar to exponential functions ... the bigger exp, the stronger the effect ...

Human hearing is logarithmic, so you want an exponential function (the inverse) to apply to the linear output of your slider. I don't know if human hearing is closer to ln or log:
For Ln:
e^x
For Log:
10^x
You could experiment with other bases too. You will then need to scale your output so that it covers the available range of values.
Update
After a bit of research it seems that base 2 would be appropriate since the power is related to the square of the pressure. If anyone knows better, please correct me.
I think what you want is:
v' = 2^v.a^v - 1
a = ( 2^(log2(m+1)/n) )/2
v is your linear input value ranging from 0..n
v' is your logarithmic value ranging from 0..m
The -1 in the first equation is to give you an output range from 0 instead of 1 (since k^0=1).
The m+1 is to compensate for this so you get 0..m not 0..m+1
You can of course get tweak this to suit your requirements.

Hearing is complicated, the perceived loudness varies according to frequency, the duration of the sample, and from person to person. So this cannot be solved mathematically but by trying a variety of functions for the control and picking the one which 'feels' the best.
Do you find at the moment that varying the control at the low end of the range has little effect on the apparent volume, but that the volume increases rapidly at the upper end of the range? Or do you hear the reverse, the volume varies too quickly at the low end and not enough at the high end? Or would you like finer control over the volume at medium levels?
Increased low-volume sensitivity:
SoundTransform.volume = Math.sin(x * Math.PI / 2);
Increased high-volume sensitivity:
SoundTransform.volume = (Math.pow(base,x) - 1)/(base-1);
or
SoundTransform.volume = Math.pow(x, base);
Where base > 1, try different values and see how it feels. Or more drastically, a 90 degree circular arc:
SoundTransform.volume = 1 - Math.sqrt(1-(x * x));
Where x is slider.volume and is between 0 and 1.
Please do let us know how you get on!

Yes, human perception is logarithmic. Considering this, you should adjust a volume exponentially, so that the percieved increase becomes linear. See decibel on Wikipedia

Android already do such things from Audio Framework.It use decibels to adjust the volumes. User can use steps such as from 1 to 7 for ringtone or 1 to 15 for music.
The formula is:
User call set volume API linearly but get amplitude exponentially. graph as below:

A 3db increase means you are doubling the volume, but the human ear requires ~6db increase to perceive a doubling in volume.
However, a strictly logarithmic curve, while accurately modeling the human perception of volume, has a usability problem.
When people want a loud volume, the knob becomes too sensitive at the upper end, making it difficult to find the "right" volume.
You've probably had this problem before... 7 is too soft, 8 is too loud, meanwhile 1-3 are inaudible over background noise.
So, I recommend a logarithmic scale, but with a floor at the low end and a soft knee at the top to allow a more linear response, especially in the "loud" part of the knob.
Oh, and make sure the knob goes up to 11. ;)

The human ear indeed perceives sounds on a logarithmic scale of increasing intensity, and because of that, the unit generally used to measure acoustic intensity is the decibel (which is actually used for all sorts of intensities and powers, not just those of sound, and also happens to be a dimensionless unit). The reference level, 0 dB, is usually set to the lower bound of human hearing, and every ten-decibel increase above that is equivalent to an increase in power by a factor of 10.
Note, however, that you should first check with other people and see what they think, just in case; what sounds odd to you may not sound odd to others. If they agree with you, then go right ahead and do it exponentially, but if you're in the minority, then it might just be your own ears that are the problem.
EDIT: Ignore my previous third paragraph. Refer to back2dos's answer if you decide to do it exponentially.

This is a javascript function i have for a logarithmic scale for dbm.
The input is a percentage (0.00 to 1.00) and the max value (my implementation uses 12db)
The mid point is set to 0.5 and that will be 0db.
When the percentage is zero, the output is negative Infinity.
function percentageToDb(p, max) {
return max * (1 - (Math.log(p) / Math.log(0.5)));
};

Related

Recomendations (functions/solution) to apply in OpenMDAO instead of boolean conditions (if/else)

I have been working for a couple of months with OpenMDAO and I find myself struggling with my code when I want to impose conditions for trying to replicate a physical/engineering behaviour.
I have tried using sigmoid functions, but I am still not convinced with that, due to the difficulty about trading off sensibility and numerical stabilization. Most of times I found overflows in exp so I end up including other conditionals (like np.where) so loosing linearity.
outputs['sigmoid'] = 1 / (1 + np.exp(-x))
I was looking for another kind of step function or something like that, able to keep linearity and derivability to the ease of the optimization. I don't know if something like that exists or if there is any strategy that can help me. If it helps, I am working with an OpenConcept benchmark, which uses vectorized computations ans Simpson's rule numerical integration.
Thank you very much.
PD: This is my first ever question in stackoverflow, so I would like to apologyze in advance for any error or bad practice commited. Hope to eventually collaborate and become active in the community.
Update after Justin answer:
I will take the opportunity to define a little bit more my problem and the strategy I tried. I am trying to monitorize and control thermodynamics conditions inside a tank. One of the things is to take actions when pressure P1 reaches certein threshold P2, for defining this:
eval= (inputs['P1'] - inputs['P2']) / (inputs['P1'] + inputs['P2'])
# P2 = threshold [Pa]
# P1 = calculated pressure [Pa]
k=100 #steepness control
outputs['sigmoid'] = (1 / (1 + np.exp(-eval * k)))
eval was defined in order avoid overflows normalizing the values, so when the threshold is recahed, corrections are taken. In a very similar way, I defined a function to check if there is still mass (so flowing can continue between systems):
eval= inputs['mass']/inputs['max']
k=50
outputs['sigmoid'] = (1 / (1 + np.exp(-eval*k)))**3
maxis also used for normalizing the value and the exponent is added for reaching zero before entering in the negative domain.
PLot (sorry it seems I cannot post images yet for my reputation)
It may be important to highlight that both mass and pressure are calculated from coupled ODE integration, in which this activation functions take part. I guess OpenConcept nature 'explore' a lot of possible values before arriving the solution, so most of the times giving negative infeasible values for massand pressure and creating overflows. For that sometimes I try to include:
eval[np.where(eval > 1.5)] = 1.5
eval[np.where(eval < -1.5)] = -1.5
That is not a beautiful but sometimes effective solution. I try to avoid using it since I taste that this bounds difficult solver and optimizer work.
I could give you a more complete answer if you distilled your question down to a specific code example of the function you're wrestling with and its expected input range. If you provide that code-sample, I'll update my answer.
Broadly, this is a common challenge when using gradient based optimization. You want some kind of behavior like an if-condition to turn something on/off and in many cases thats a fundamentally discontinuous function.
To work around that we often use sigmoid functions, but these do have some of the numerical challenges you pointed out. You could try a hyberbolic tangent as an alternative, though it may suffer the same kinds of problems.
I will give you two broad options:
Option 1
sometimes its ok (even if not ideal) to leave the purely discrete conditional in the code. Lets say you wanted to represent a kind of simple piecewise function:
y = 2x; x>=0
y = 0; x < 0
There is a sharp corner in that function right at 0. That corner is not differentiable, but the function is fine everywhere else. This is very much like the absolute value function in practice, though you might not draw the analogy looking at the piecewise definition of the function because the piecewise nature of abs is often hidden from you.
If you know (or at least can check after the fact) that your final answer will no lie right on or very near to that C1 discontinuity, then its probably fine to leave the code the way is is. Your derivatives will be well defined everywhere but right at 0 and you can simply pick the left or the right answer for 0.
Its not strictly mathematically correct, but it works fine as long as you're not ending up stuck right there.
Option 2
Apply a smoothing function. This can be a sigmoid, or a simple polynomial. The exact nature of the smoothing function is highly specific to the kind of discontinuity you are trying to approximate.
In the case of the piecewise function above, you might be tempted to define that function as:
2x*sig(x)
That would give you roughly the correct behavior, and would be differentiable everywhere. But wolfram alpha shows that it actually undershoots a little. Thats probably undesirable, so you can increase the exponent to mitigate that. This however, is where you start to get underflow and overflow problems.
So to work around that, and make a better behaved function all around, you could instead defined a three part piecewise polynomial:
y = 2x; x>=a
y = c0 + c1*x + c2*x**2; -a <= x < a
y = 0 x < -a
you can solve for the coefficients as a function of a (please double check my algebra before using this!):
c0 = 1.5a
c1 = 2
c2 = 1/(2a)
The nice thing about this approach is that it will never overshoot and go negative. You can also make a reasonably small and still get decent numerics. But if you try to make it too small, c2 will obviously blow up.
In general, I consider the sigmoid function to be a bit of a blunt instrument. It works fine in many cases, but if you try to make it approximate a step function too closely, its a nightmare. If you want to represent physical processes, I find polynomial fillet functions work more nicely.
It takes a little effort to derive that polynomial, because you want it to be c1 continuous on both sides of the curve. So you have to construct the system of equations to solve for it as a function of the polynomial order and the specific relaxation you want (0.1 here).
My goto has generally been to consult the table of activation functions on wikipedia: https://en.wikipedia.org/wiki/Activation_function
I've had good luck with sigmoid and the hyperbolic tangent, scaling them such that we can choose the lower and upper values as well as choosing the location of the activation on the x-axis and the steepness.
Dymos uses a vectorization that I think is similar to OpenConcept and I've had success with numpy.where there as well, providing derivatives for each possible "branch" taken. It is true that you may have issues with derivative mismatches if you have an analysis point right on the transition, but often I've had success despite that. If the derivative at the transition becomes a hinderance then implementing a sigmoid or relu are more appropriate.
If x is of a magnitude such that it can cause overflows, consider applying units or using scaling to put it within reasonable limits if you cannot bound it directly.

Statistical best fit for gesture detection

I have a linear regression equation from school , which gives a value between 1 and -1 indicative of whether or not a set of data points are close enough to a linear function
and the equation given here
http://people.hofstra.edu/stefan_waner/realworld/calctopic1/regression.html
under best fit of a line. I would like to use these to do simple gesture detection based on a point in 3-space (x,y,z) - forward, back, left, right, up, down. First I would see if they fall on a line in 2 of the 3 dimensions, then I would see if that line's slope approached zero or infinity.
Is this fast enough for functional gesture recognition? If not, could someone propose an alternative algorithm?
If I've understood your question correctly then (1) the calculation you describe here would probably be plenty fast enough, (2) it may not actually do what you want, and (3) the stuff that'll be slow in an actual implementation would lie elsewhere.
So, I think you're proposing to do this. (1) Identify the positions of ... something ... (the user's hand, perhaps) in three-dimensional space, at several successive times. (2) For (say) each of {x,y} and {x,z}, look at those two coordinates of each point, compute the correlation coefficient (which is what your formula describes) and see whether it's close to +-1. (3) If both correlation coefficients are close to +-1 then the points lie approximately on a straight line; calculate the gradient of that line (using a formula similar to that of the correlation coefficient). (4) If the gradients are both very close to 0 or +- infinity, then your line is approximately parallel to one axis, which is the case you're trying to recognize.
1: Is it fast enough? You might perhaps be sampling at 50 frames per second or thereabouts, and your gestures might take a second to execute. So you'll have somewhere on the order of 50 positions. So, the total number of arithmetic operations you'll need is maybe a few hundred (including a modest number of square roots). In the worst case, you might be doing this in emulated floating-point on a slow ARM processor or something; in that case, each arithmetic operation might take a couple of hundred cycles, so the whole thing might be 100k cycles, which for a really slow processor running at 100MHz would be about a millisecond. You're not going to have any problem with the time taken to do this calculation.
2: Is it the right thing? It's not clear that it's the right calculation. For instance, suppose your user's hand moves back and forth rapidly several times along the x-axis; that will give you a positive result; is that what you want? Suppose the user attempts the gesture you want but moves at slightly the wrong angle; you may get a negative result. Suppose they move exactly along the x-axis for a bit and then along the y-axis for a bit; then the projections onto the {x,y}, {x,z} and {y,z} planes will all pass your test. These all seem like results you might not want.
3: Is it where the real cost will lie? This all assumes you've already got (x,y,z) coordinates. Getting those is probably going to be more expensive than processing them. For instance, if you have a camera-based system of some kind then there'll be some nontrivial image processing for every frame. Or perhaps you're integrating up data from accelerometers (which, by the way, is likely to give nasty inaccurate position results); the chances are that you're doing some filtering and other calculations to get position data. I bet that the cost of performing a calculation like this one will be substantially less than the cost of getting the coordinates in the first place.

Finding area of straight line with graph (Math question but needed for flot)

Okay, so this is a straight math question and I read up on meta that those need to be written to sound like programming questions. I'll do my best...
So I have graph made in flot that shows the network usage (in bytes/sec) for the user. The data is 4 minutes apart when there is activity, and otherwise set at the start of the usage range (let's say day 1) and the end of the range (day 7). The data is coming from a CGI script I have no control over, so I'm fairly limited in what I can provide the user.
I never took trig or calculus, so I'm pretty much in over my head. What I want is for the user to have the option to click any point on the graph and see their bandwidth usage for that moment. Since the lines between real data points are drawn straight, this can be done by getting the points before and after where the user has clicked and finding the y-interval.
It took me weeks to finally get a helpful math person to explain this to me. Everyone else has insisted on trying to teach me Riemann sum techniques and all sorts of other heavy stuff that not only is confusing to me, doesn't seem necessary for the problem.
But I also want the user to be able to highlight the graph from two arbitrary points on the y-axis (time) to get the amount of network usage total during that range. I know this would be inaccurate, but I need it to be the right inaccurate using a solid equation.
I thought this was the area under the line, but experiments with much simpler graphs makes this seem just far too high. I figured out I could take the distance from y2 - y1 and multiply it by x2 - x1 and then divide by two to get the area of the graph below the line like a triangle, but again, the numbers seemed to high. (maybe they are just big numbers and I don't get this math stuff at all).
So what I need, if anyone would be really awesome enough to provide it before this question is closed down for being too pure-math, is either the name of the concept I should be researching or the equation itself. Or the bad news that I do need advanced math to get an accurate result.
I am not bad at math, just as a last note, I just am not familiar with math beyond 10th grade and so I need some place to start. All the math sites seem to keep it too simple or way over my paygrade.
If I understood correctly what you're asking (and that is somewhat doubtful), you should find what you seek in these links:
Linear interpolation
(calculating the value of the point in between)
Trapezoidal rule
(calculating the area below the "curve")
*****Edit, so we can get this over :) without much ado:*****
So I have graph made in flot that shows the network usage (in bytes/sec) for the user. The data is 4 minutes apart when there is activity, and otherwise set at the start of the usage range (let's say day 1) and the end of the range (day 7). The data is coming from a CGI script I have no control over, so I'm fairly limited in what I can provide the user.
What is a "flot" ?
Okey, so you have speed on y axis [in bytes/sec]; and time on x axis in [sec], right?
That means, that if you're flotting (I'm bored, yes :) speed over time, in linear segments, interpolating at some particular point in time you'll get speed at that particular point in time.
If you wish to calculate how much bandwidth you've spend, you need to determine the area beneath that curve. The area from point "a" to point "b" will determine the spended bandwidth in [bytes] in that time period.
It took me weeks to finally get a helpful math person to explain this to me. Everyone else has insisted on trying to teach me Riemann sum techniques and all sorts of other heavy stuff that not only is confusing to me, doesn't seem necessary for the problem.
In the immortal words of Snoopy: "Good grief !"
But I also want the user to be able to highlight the graph from two arbitrary points on the y-axis (time) to get the amount of network usage total during that range. I know this would be inaccurate, but I need it to be the right inaccurate using a solid equation.
It would not be inaccurate.
It would be actually perfectly accurate (well, apart from roundoff error in bytes :), since you're using linear interpolation on linear segments.
I thought this was the area under the line, but experiments with much simpler graphs makes this seem just far too high. I figured out I could take the distance from y2 - y1 and multiply it by x2 - x1 and then divide by two to get the area of the graph below the line like a triangle, but again, the numbers seemed to high. (maybe they are just big numbers and I don't get this math stuff at all).
"like a triangle" --> should be "like a trapezoid"
If you do deltax*(y2-y1)/2 you will get the area, yes (this works only for linear segments). This is the basis principle of trapezoidal rule.
If you're uncertain about what you're calculating use dimensional analysis: speed is in bytes/sec, time is in sec, bandwidth is in bytes. Multiplying speed*time=bandwidth, and so on.
What I want is for the user to have
the option to click any point on the
graph and see their bandwidth usage
for that moment. Since the lines
between real data points are drawn
straight, this can be done by getting
the points before and after where the
user has clicked and finding the
y-interval.
Yes, that's a good way to find that instantaneous value. When you report that value back, it's in the same units as the y-axis, so that means bytes/sec, right?
I don't know how rapidly the rate changes between points, but it's even simpler if you simply pick the closest point and report its value. You simplify your problem without sacrificing too much accuracy.
I thought this was the area under the
line, but experiments with much
simpler graphs makes this seem just
far too high. I figured out I could
take the distance from y2 - y1 and
multiply it by x2 - x1 and then divide
by two to get the area of the graph
below the line like a triangle, but
again, the numbers seemed to high.
(maybe they are just big numbers and I
don't get this math stuff at all).
To calculate the total bytes over a given time interval, you should find the index closest to the starting and ending point and multiply the value of y by the spacing of your x-points and add them all together. That will give you the total # of bytes consumed during that time interval, but there's one more wrinkle you might have forgotten.
You said that the points come in "4 minutes apart", and your y-axis is in bytes/second. Remember that units matter. Your area is the sum of bytes/second times a spacing in minutes. To make the units come out right you have to multiply by 60 seconds/minute to get the final value of bytes that you want.
If that "too high" value is still off, consider units again. It's 1024 bytes per kbyte, and 1024*1024 bytes per MB. Check the units of the values you're checking the calculation against.
UPDATE:
No wonder you're having problems. Your original question CLEARLY stated bytes/sec. Even this question is imprecise and confusing. How did you arrive at "amount of data" at a given time stamp? Are those the total bits transferred since the last time stamp? If yes, simply add the values between the start and end of the interval you want and convert to the units convenient for you.
The network usage total is not in bytes (kilo-, mega-, whatever) per second. It would be in just straight bytes (or kilo-, or whatever).
For example, 2 megabytes per second over an interval of 10 seconds would be 20 megabytes total. It would not be 20 megabytes per second.
Or do you perhaps want average bytes per second over an interval?
This would be a lot easier for you if you would accept that there is well-established terminology for the concepts that you are having trouble expressing concisely or accurately, and that these mathematical terms have been around far longer than you. Since you've clearly gone through most of the trouble of understanding the concepts, you might as well break down and start calling them by their proper names.
That said:
There are 2 obvious ways to graph bandwidth, and two ways you might be getting the bandwidth data from the server. First, there's the cumulative usage function, which for any time is simply the total amount of data transferred since the start of the measurement. If you plot this function, you get a graph that never decreases (since you can't un-download something). The units of the values of this function will be bytes or kB or something like that.
What users are typically interested is in the instantaneous usage function, which is an indicator of how much bandwidth you are using right now. This is what users typically want to see. In mathematical terms, this is the derivative of the cumulative function. This derivative can take on any value from 0 (you aren't downloading) to the rated speed of your network link (indicating that you're pushing as much data as possible through your connection). The units of this function are bytes per second, or something related like Mbps (megabits per second).
You can approximate the instantaneous bandwidth with the average data usage over the past few seconds. This is computed as
(number of bytes transferred)
-----------------------------------------------------------------
(number of seconds that elapsed while transferring those bytes)
Generally speaking, the smaller the time interval, the more accurate the approximation. For simplicity's sake, you usually want to compute this as "number of bytes transferred since last report" divided by "number of seconds since last report".
As an example, if the server is giving you a report every 4 minutes of "total number of bytes transferred today", then it is giving you the cumulative function and you need to approximate the derivative. The instantaneous bandwidth usage rate you can report to users is:
(total transferred as of now) - (total as of 4 minutes ago) bytes
-----------------------------------------------------------
4*60 seconds
If the server is giving you reports of the form "number of bytes transferred since last report", then you can directly report this to users and plot that data relative to time. On the other hand, if the user (or you) is concerned about a quota on total bytes transferred per day, then you will need to transform the (approximately) instantaneous data you have into the cumulative data. This process, known as computing the integral, is the opposite of computing the derivative, and is in some ways conceptually simpler. If you've kept track of each of the reports from the server and the timestamp, then for each time, the value you plot is the total of all the reports that came in before that time. If you're doing this in realtime, then every time you get a new report, the graph jumps up by the amount in that report.
I am not bad at math, ... I just am not familiar with math beyond 10th grade
This is like saying "I'm not bad at programming, I have no trouble with ifs and loops but I never got around to writing more than one function."
I would suggest you enrol in a maths class of some kind. An understanding of matrices and the basics of calculus gives you an appreciation of many things, and can be useful in all sorts of areas. You'll be able to understand more of Wikipedia articles and SO answers - and questions!
If you can't afford that, try to find some lecture videos or something.
Everyone else has insisted on trying to teach me Riemann sum techniques
I can't see why. You don't need them for this - though if you had learned them, I expect you would find it easier to come up with a solution. You see, Riemann sums attempt to give you a "familiar" notion of area. The sort of area you (hopefully) learned years ago.
Getting the area below your usage graph between two points will tell you (approximately) how much was used over that period.
How do you find the area of a floor plan? You break it up into rectangles and triangles, find the area of each, and add them together. You can do the same thing with your graph, basically. Someone has worked out a simple way of doing this called the trapezoidal rule. It's just a matter of choosing how to divide your graph into strips, and in your case this is easy: just use the data points themselves as dividers. (You'll also need to work out the value of the graph at the left and right ends of the region selected by the user, using linear interpolation.)
If there's anything I've said that isn't clear to you (as there may well be), please leave a comment.

Normalizing FFT Data for Human Hearing

The typical FFT for audio looks pretty similar to this, with most of the action happening on the far left side
http://www.flight404.com/blog/images/fft.jpg
He multiplied it by a partial sine wave to get it to the bottom, but the article isn't too specific on this part of it. It also seems like a "good enough" modification of the dataset, rather than one based on some property. I understand that human hearing is better suited to the higher frequencies, thus, most music will have amplified bass and attenuated treble so that both sound to us as being of relatively equal strength.
My question is what modification needs to be done to the FFT to compensate for this standard falloff?
for(i = 0; i < fft.length; i++){
fft[i] = fft[i] * Math.log(i + 1); // does, eh, ok but the high
// end is still not really "loud"
// enough
}
EDIT ::
http://en.wikipedia.org/wiki/Equal-loudness_contour
I came across this article, I think it might be the direction to head in, but there still might be some property of an FFT that needs to be counteracte.
First, are you sure you want to do this? It makes sense to compensate for some things, like the microphone response not being flat, but not human perception. People are used to hearing sounds with the spectral content that the sounds have in the real world, not along perceptual equal loudness curves. If you play a sound that you've modified in the way you suggest it would sound strange. Maybe some people like the music to have enhanced low frequencies, but this is a matter of taste, not psychophysics.
Or maybe you are compensating for some other reason, for example, taking into account the poorer sensitivity to lower frequencies might enhance a compression algorithm. Is this the idea?
If you do want to normalize by the equal loudness curves, one should note that most of the curves and equations are in terms of sound pressure level (SPL). SPL is the log of the square of the waveform amplitude, so when you work with the FFTs, it's probably easiest to work with their square (the power specta). (Or, of course, you could compensate in other ways by, say, multiplying by sqrt(log(i+1)) in your equation above -- assuming that the log was an approximation of the inverse equal-loudness curve.)
I think the equal loudness contour is exactly the right direction.
However, its shape depends on the absolute pressure level.
In other words the sensitivity curve of our hearing changes with sound pressure.
There is no "correct normalization" if you have no information about absolute levels.
If this is a problem depends on what you want to do with the data.
The loudness contour is standardized in ISO 226 but this document is not freely available for download. It should be in a decent university library though.
Here is another source for
loudness contours
So you are trying to raise the level of the high end frequencies? Sounds like a high pass filter with a minimum multiplier might work, so that you don't attenuate the low frequency signals too much. Pick up a good book on filter design, maybe monkey around with this applet
In the old days of first samplers, this is before MOTU Boost people :) it wasn't FFT but simple (Fairlight or Roland it first I think) Normalisation done on the original or resulting time-domain signal (if you are doing beat slicing, recycle-style); can't you do that? Or only go for the FFT after you compensate to counteract for it?
Seems like a two phase procedure otherwise, I'd personally leave FFT as is for the task..

Detecting and fixing overflows

we have a particle detector hard-wired to use 16-bit and 8-bit buffers. Every now and then, there are certain [predicted] peaks of particle fluxes passing through it; that's okay. What is not okay is that these fluxes usually reach magnitudes above the capacity of the buffers to store them; thus, overflows occur. On a chart, they look like the flux suddenly drops and begins growing again. Can you propose a [mostly] accurate method of detecting points of data suffering from an overflow?
P.S. The detector is physically inaccessible, so fixing it the 'right way' by replacing the buffers doesn't seem to be an option.
Update: Some clarifications as requested. We use python at the data processing facility; the technology used in the detector itself is pretty obscure (treat it as if it was developed by a completely unrelated third party), but it is definitely unsophisticated, i.e. not running a 'real' OS, just some low-level stuff to record the detector readings and to respond to remote commands like power cycle. Memory corruption and other problems are not an issue right now. The overflows occur simply because the designer of the detector used 16-bit buffers for counting the particle flux, and sometimes the flux exceeds 65535 particles per second.
Update 2: As several readers have pointed out, the intended solution would have something to do with analyzing the flux profile to detect sharp declines (e.g. by an order of magnitude) in an attempt to separate them from normal fluctuations. Another problem arises: can restorations (points where the original flux drops below the overflowing level) be detected by simply running the correction program against the reverted (by the x axis) flux profile?
int32[] unwrap(int16[] x)
{
// this is pseudocode
int32[] y = new int32[x.length];
y[0] = x[0];
for (i = 1:x.length-1)
{
y[i] = y[i-1] + sign_extend(x[i]-x[i-1]);
// works fine as long as the "real" value of x[i] and x[i-1]
// differ by less than 1/2 of the span of allowable values
// of x's storage type (=32768 in the case of int16)
// Otherwise there is ambiguity.
}
return y;
}
int32 sign_extend(int16 x)
{
return (int32)x; // works properly in Java and in most C compilers
}
// exercise for the reader to write similar code to unwrap 8-bit arrays
// to a 16-bit or 32-bit array
Of course, ideally you'd fix the detector software to max out at 65535 to prevent wraparound of the sort that is causing your grief. I understand that this isn't always possible, or at least isn't always possible to do quickly.
When the particle flux exceeds 65535, does it do so quickly, or does the flux gradually increase and then gradually decrease? This makes a difference in what algorithm you might use to detect this. For example, if the flux goes up slowly enough:
true flux measurement
5000 5000
10000 10000
30000 30000
50000 50000
70000 4465
90000 24465
60000 60000
30000 30000
10000 10000
then you'll tend to have a large negative drop at times when you have overflowed. A much larger negative drop than you'll have at any other time. This can serve as a signal that you've overflowed. To find the end of the overflow time period, you could look for a large jump to a value not too far from 65535.
All of this depends on the maximum true flux that is possible and on how rapidly the flux rises and falls. For example, is it possible to get more than 128k counts in one measurement period? Is it possible for one measurement to be 5000 and the next measurement to be 50000? If the data is not well-behaved enough, you may be able to make only statistical judgment about when you have overflowed.
Your question needs to provide more information about your implementation - what language/framework are you using?
Data overflows in software (which is what I think you're talking about) are bad practice and should be avoided. While you are seeing (strange data output) is only one side effect that is possible when experiencing data overflows, but it is merely the tip of the iceberg of the sorts of issues you can see.
You could quite easily experience more serious issues like memory corruption, which can cause programs to crash loudly, or worse, obscurely.
Is there any validation you can do to prevent the overflows from occurring in the first place?
I really don't think you can fix it without fixing the underlying buffers. How are you supposed to tell the difference between the sequences of values (0, 1, 2, 1, 0) and (0, 1, 65538, 1, 0)? You can't.
How about using an HMM where the hidden state is whether you are in an overflow and the emissions are observed particle flux?
The tricky part would be coming up with the probability models for the transitions (which will basically encode the time-scale of peaks) and for the emissions (which you can build if you know how the flux behaves and how overflow affects measurement). These are domain-specific questions, so there probably aren't ready-made solutions out there.
But one you have the model, everything else---fitting your data, quantifying uncertainty, simulation, etc.---is routine.
You can only do this if the actual jumps between successive values are much smaller than 65536. Otherwise, an overflow-induced valley artifact is indistinguishable from a real valley, you can only guess. You can try to match overflows to corresponding restorations, by simultaneously analysing a signal from the right and the left (assuming that there is a recognizable base line).
Other than that, all you can do is to adjust your experiment by repeating it with different original particle flows, so that real valleys will not move, but artifact ones move to the point of overflow.

Resources