How to convert a spectrogram matrix into wav file - r

Is there a way to convert a matrix representing a grayscale spectrogram (values non-complex and between 0 and 1) like the one shown in the image below back into a sound file, e.g. wav file? This post explains how to do it with a seewave spectrogram using the istft function. However, in my case I see two problems which need to be solved:
The original spectrogram (obtained by signal::specgram) is lost and matrix dimensions are different from the original spectrogram (i.e. both frequency and time are up-/ or downsampled) while exact frequency and time values for each row and each column are known
The matrix values range between 0 and 1 and are not complex as required by istft
Furthermore, the dimensions of the original spectrogram, the sample frequency of the original wave object and the window length and overlap used to obtain the original spectrogram are known.
Thank you!

audio is just a curve which wobbles over time where this wobble mirrors your eardrum or microphone pickup membrane ... this signal is in the time domain where axis are time on X and curve height on Y ... typical CD quality audio has 44,100 samples per second meaning you capture that number of points on this audio curve per second ... what gets captured is the audio curve height whereas time is implied knowing each sample is captured in a known sample rate ... so sample rate is one of the two critical audio attributes on digital audio ... bit depth is the other attribute ... if you devote two bytes ( 16 bits ) to record CD quality curve height you get 2 raised to the 16th power ( 2^16 == 65536 ) distinct possible values to store the curve height
its critical to emphasize a raw audio signal is in the time domain (X is time Y is curve height) ... when you send a set of these samples into a fft call the data gets transformed into the frequency domain (X is frequency Y is magnitude [energy]) so the direct dimension of time is gone yet is baked into the notion of that entire body of frequency domain data ... there are trade offs when deciding both the number of samples you feed into the fft call ( sample window size ) namely to increase the frequency resolution of the freq domain signal (to lower incr_freq ) you need more audio samples to get fed into the fft call however to gain temporal specificity in the freq domain you need as few samples as possible which you pay for by getting a lower frequency resolution and lower peak freq ( lower nyquist limit )
to generate a spectrogram you feed a memory buffer of say 4096 samples of this curve height array ( time domain ) into a Fourier Transform ( fft ) which will return back an array ( freq domain ) of same number of array elements yet this time each element stores a complex number from which you can calculate the magnitude ( energy level ) and phase ... array element zero is the DC bias which can be ignored ... each array element represents a distinct frequency where the freq increment can be calculated
with sample_rate of 44100 samples per second, and one second worth of samples ( 44100 )
this gives you a frequency increment resolution of 1 hertz ... IE each freq bin is 1 Hertz apart
incr_freq := sample_rate / number_of_samples
nyquist_limit_index := int(number_of_samples / 2)
here is how you can iterate across the array complex_fft (in go not r)
for index_fft, curr_complex := range complex_fft { // we really only use half this range + 1
if index_fft <= nyquist_limit_index && curr_freq >= min_freq && curr_freq < max_freq {
curr_real = real(curr_complex) // pluck out real portion of complex number
curr_imag = imag(curr_complex) // ditto for imaginary portion
curr_mag = 2.0 * math.Sqrt(curr_real*curr_real+curr_imag*curr_imag) / number_of_samples
curr_theta = math.Atan2(curr_imag, curr_real)
curr_dftt := discrete_fft{
real: 2.0 * curr_real,
imaginary: 2.0 * curr_imag,
magnitude: curr_mag,
theta: curr_theta,
}
as time marches along you repeat above process of feeding the next set of 4096 samples into the fft api call so you collect a set of pairs of time domain arrays and their corresponding freq domain representation
the process which created your plot has done this repeat process which is why time is shown as X axis ... on your plot each vertical bar of data represents output from single fft call where its resultant magnitude is shown as the dark portions of that vertical bar and the lighter dots on the plot show the lower energy frequencies ... only after the process which generated that plot progressed over time was the data available to plot the next vertical bar as the plot progressed from left to right hence the time axis across the X axis on bottom
another critical insight is to be aware you can start with audio (time domain) ... populate a window of samples ( 4096 for example ) and send this array into a fft call to obtain a new array (freq domain) of frequencies each with its magnitude and phase ... here is the pure magic, you can then perform an inverse Fourier Transform ( ifft ) on this freq domain array to get an array in the time domain which will match (to a 1st approx ) your original input audio signal
so in your case walk across your data from left to right on the plot and for each set of vertical magnitude values ( indicated by grayscale ) which is a single frequency domain array perform this inverse Fourier Transform which will give you the raw audio signal ( time domain ) only for a very quick segment of time ( as defined by the 4096 audio samples or similar ) ... this raw audio is the payload portion of a wav file ... repeat this process for the next vertical column of data until you have walked across the entire plot from left to right ... stitch together this sequence of payload buffers into a wav file

Related

Why the need for a mask when performing Fast Fourier Transform?

I'm trying to find out the peak frequencies hidden in my data using the fft() method in R. While preparing the data, a more experienced user recommends to create a "mask" (more after explaining the details), that does give me the exact diagram I'm looking for. The problem is, I don't understand what it does or why it's needed.
To give some context, I'm working with .txt files with around 12000 entries each. It's voltage vs. time information, and the expected result is just a sinusoidal wave with a clear peak frequency that should be close to 1-2 Hz. This is an example of what one of those files look like:
I've been trying to use the Fast Fourier Transform method fft() implemented in R to find the peak frequencies and get a diagram that reflected them clearly. At first, I calculate some things that I understand are going to be useful, like the Nyquist frequency and the range of frequencies I'll show in the final graph:
n = length(variable)
dt = time[5]-time[4]
df = 1/(max(time)) #Find out the "unit" frequency
fnyquist = 1/(2*dt) #The Nyquist frequency
f = seq(-fnyquist, fnyquist-df, by=df) #These are the frequencies I'll plot
But when I plot the absolute value of what fft(data) calculates vs. the range of frequencies, I get this:
The peak frequency seems to be close to 50 Hz, but I know that's not the case. It should be close to 1 Hz. I'm a complete newbie in R and in Fourier analysis, so after researching a little, I found in a Swiss page that this can be solved by creating a "mask", which is actually just a vector with a repeatting patern (1, -1, 1, -1...) with the same length as my data vector itself:
mask=rep(c(1, -1),length.out=n)
Then if I multiply my data vector by this mask and plot the results:
results = mask*data
plot(f,abs(fft(results)),type="h")
I get what I was looking for. (This is the graph after limiting the x-axis to a reasonable scale).
So, what's the mask actually doing? I undestand it's changing my data point signs in an alternate manner, but I don't get why it would take the infered peak frequencies from ~50 Hz to the correct result of ~1 Hz.
Thanks in advance!
Your "mask" is one of two methods of performing an fftshift, which is commonly done to center the 0 Hz output of an FFT in the middle of a graph or plot (instead of at the left edge, with the negative frequencies wrapping around to the right edge).
To perform an fftshift, you can hetrodyne or modulate your data (by Fs/2) before the FFT, or simply do a circular shift by 50% after the FFT. Both produce the same result. They are the same due to the shift property of the DFT.

Find start point (time) of each cycle in a sine wave

I am tying to achieve sine wave gradually changing from 8Hz to 2Hz over 5 seconds:
This waveform was produced in Cool Edit. I gave it a start frequency of 8Hz, an end frequency of 2Hz and a duration of 5 seconds. The sine wave gradually changes from one frequency to the other over the given time.
My question is, how can I accurately find the start time of each cycle (highlighted with a red dot), using a FOR loop?
Pseudo code:
time = 5 //Duration
freq1 = 8 //Start frequency
freq2 = 2 //End frequency
cycles = ( (freq1 + freq2) / 2 ) * time //Total number of cycles
for(i = 0; i < cycles; i++) {
/* Formula to find start time of each cycle */
}
That is backward thinking for this problem which leads to madness in the program. Not to mention the individual waves will not be a sin wave because the frequency is changing (they will be slightly distorted) which you will not achieve with your generator and also there is very slight chance the ending signal will stop on zero after 5sec. Instead do a continuous sin wave with variable frequency:
First compute actual frequency
linear interpolation will suffice (unless you need different change)
f=f0+(f1-f0)*t/T
where:
f0=8 [Hz] start frequency
f1=2 [Hz] stop frequency
T =5 [s] change time
t =<0,T> is actual time in [s]
compute the sin wave data
for (t=0.0,angle=0.0;t<=T;t+=dt)
{
f=f0+((f1-f0)*t/T); // actual frequency
signal=Amplitude*sin(angle); // your signal put it in a array or output somewhere ...
angle+=6.283185307179586476925286766559*dt*f; // update phase
while (angle>6.283185307179586476925286766559) // cut just to avoid floating rounding problems
angle-=6.283185307179586476925286766559;
}
Where dt [s] is a time step you want to sample your signal with. If you are generating this in Real Time and outputting to real HW you can use a timer or measure the time directly (with performance counters on Windows or by RDTSC or whatever you have at disposal)
If you got predefined number of samples n for this then
dt=T/double(n-1);
Here sample output (n=image width):
If you also need the number of periods then add counter increment inside the angle cut while loop And also there is your zero point too (but if samplerate is too small or you need high precision you need to interpolate the real zero position).

Can anyone please provide me the algorithm to find the information entropy and correlation analysis of an encrypted color image

I have the mathematical formula. But I have doubt whether one can calculate entropy for Color Images ( normally i have seen for Black and White Images only).
H ( m ) = ∑ - P ( m ) log P ( m )
How should I calculate The value of P( m )
Secondly, I also have the formula for Correlation Analysis but don't really know how to find it out. I have the co relation formula
r (xy) = cov( x , y ) / squareroot(D ( x ) D ( y ))
It is said that x and y are greyscale values. what do you mean by greyscale values and how do I calculate those?
Question 1:
To calculate the entropy, you need to estimate the probability of each RGB value - i.e. the count of pixels with that value divided by the number of pixel in the image. Than you operate the formula over all positive probabilities. There are two possible implementations of holding the RGB counts:
Simple histogram - 3-dimensional array 256X256X256, slow and memory consuming
Sparse histogram - hold a hash table with RGB as key and count as value.
The first choice may be better for very large images.
Note however that for rather small images, the probability estimation might be very inaccurate, as small noise may spread a single color to multiple bin. In the worse case all counts will be 1. To handle that problem, you may consider using larger bins - e.g. divide the RGB values by 4, so you only have 64^3 bins.
Question 2: You're talking about the correlation coefficient, AKA normalize correlation http://en.wikipedia.org/wiki/Cross-correlation#Normalized_cross-correlation
Gray-scale is just the common term for the value of pixel in a gray-scale image, usually in the 0-255 range.

heat transfer for spherical coordinates boundary conditions implementation

I want to apply heat transfer ( heat conduction and convection) for a hemisphere. It is a transient homogeneous heat transfer in spherical coordinates. There is no heat generation. Boundary conditions of hemisphere is in the beginning at Tinitial= 20 degree room temperature. External-enviromental temperature is -22 degree. You can imagine that hemisphere is a solid material. Also, it is a non-linear model, because thermal conductivity is changing after material is frozen, and this is going to change the temperature profile.
I want to find the temperature profile of this solid during a certain time until center temperature reach to -22 degree.
In this case, Temperature depends on 3 parameters : T(r,theta,t). radius, angle, and time.
1/α(∂T(r,θ,t))/∂t =1/r^2*∂/∂r(r^2(∂T(r,θ,t))/∂r)+ 1/(r^2*sinθ )∂/∂θ(sinθ(∂T(r,θ,t))/∂θ)
I applied finite difference method using matlab, However, boundary conditions have issues. There are convection on surface of the hemisphere, and conduction in the inner nodes, bottom of the hemisphere has constant temperature which is air temperature (-22). You can see the scripts which i am using for BCs in the matlab file.
% Temperature at surface of hemisphere solid boundary node
for i=nodes
for j=1:1:(nodes-1)
Qcd_ot(i,j)= ((k(i,j)+ k(i-1,j))/2)*A(i-1,j)*(( Told(i,j)-Told(i-1,j))/dr); % heat conduction out of node
Qcv(i,j) = h*(Tair-Told(i,j))*A(i,j); % heat transfer through convectioin on surface
Tnew(i,j) = ((Qcv(i,j)-Qcd_ot(i,j))/(mass(i,j)*cp(i,j))/2)*dt + Told(i,j);
end % end of for loop
end
% Temperature at inner nodes
for i=2:1:(nodes-1)
for j=2:1:(nodes-1)
Qcd_in(i,j)= ((k(i,j)+ k(i+1,j))/2)*A(i,j) *((2/R)*(( Told(i+1,j)-Told(i,j))/(2*dr)) + ((Told(i+1,j)-2*Told(i,j)+Told(i-1,j))/(dr^2)) + ((cot(y)/(R^2))*((Told(i,j+1)-Told(i,j-1))/(2*dy))) + (1/(R^2))*(Told(i,j+1)-2*Told(i,j)+ Told(i,j-1))/(dy^2));
Qcd_out(i,j)= ((k(i,j)+ k(i-1,j))/2)*A(i-1,j)*((2/R)*(( Told(i,j)-Told(i-1,j))/(2*dr)) +((Told(i+1,j)-2*Told(i,j)+Told(i-1,j))/(dr^2)) + ((cot(y)/(R^2))*((Told(i,j+1)-Told(i,j-1))/(2*dy))) + (1/(R^2))*(Told(i,j+1)-2*Told(i,j)+ Told(i,j-1))/(dy^2));
Tnew(i,j) = ((Qcd_in(i,j)-Qcd_out(i,j))/(mass(i,j)*cp(i,j)))*dt + Told(i,j);
end %end for loop
end % end for loop
%Temperature for at center line nodes
for i=2:1:(nodes-1)
for j=1
Qcd_line(i,j)=((k(i,j)+ k(i+1,j))/2)*A(i,j)*(Told(i+1,j)-Told(i,j))/dr;
Qcd_lineout(i,j)=((k(i,j)+ k(i-1,j))/2)*A(i-1,j)*(Told(i,j)-Told(i-1,j))/dr;
Tnew(i,j)= ((Qcd_line(i,j)-Qcd_lineout(i,j))/(mass(i,j)*cp(i,j)))*dt + Told(i,j);
end
end
% Temperature at bottom point (center) of the hemisphere solid
for i=1
for j=1:1:(nodes-1)
Qcd_center(i,j)=(((k(i,j)+k(i+1,j))/2)*A(i,j)*(Told(i+1,j)-Tair)/dr);
Tnew(i,j)= ((Qcd_center(i,j))/(mass(i,j)*cp(i,j)))*dt + Told(i,j);
end
end
% Temperature at all bottom points of the hemisphere
Tnew(:,nodes)=-22;
Told=Tnew;
t=t+dt;
Tnew temperatures values are getting bigger exponentially after program is run, and then becoming NaN. It supposed to show me cooling and freezing temperature profile of solid until it reaches to Tair temperature. I could not figure out the reasons why it is changing like that.
I would like to hear your suggestions for BCs implementation to this program, or how should i change them according to this conditions. Thanks in advance !!
Your code is too long to read and understand completely, but it looks like you are using a simple forward Euler scheme, is that correct? If so, try to reduce the time-step dt, maybe by a lot, since this method can become numerically unstable if dt is too big. This might slow down the speed of the computation (again by a lot), but that is the price you pay for such a simple algorithm. There are alternatives methods that do not suffer from instability, but they are much harder to implement, since you need to solve a system of equations.
I did some thermal simulations using this simple scheme a long time ago. I found that the stability criteria was dt < (dx)^2 * c_p * rho / (6 * k), which should be valid for a simulation on a 3D cartesian grid, where dx is the spatial step, c_p is the specific heat, rho the density and k the thermal conductivity of the material. I don't know how to convert this to your case with spherical coordinates. The thing I learned then was to choose small time-steps, but more importantly as large dx as possible: when you reduce dx by a factor 2, you also need to reduce dt by a factor 4 to keep things stable. At the same time, for a 3D problem, the number of elements will increase by a factor 8. So the total simulation time scales with 1 / (dx)^5!!!

Adaptive time/position series filter in R...?

I'm trying to filter a time/position data series to produce a smoothed plot. I am measuring depth vs time (mechanical system) where the velocity is changing. I calculate velocity from the measured depth/time values and can plot velocity vs. depth, but at low speeds, the noise is excessive (for various reasons). The trend at low speeds is correct, but I'd like to be able to apply a filter that will use an adaptive smoothing routine, i.e. for low speeds (where I have many data points) I need to use a larger smoothing window, and for high speeds (few data points) I need to use a smaller window.
I've looked a bit and have figured out a solution using rollapply() but was wondering if there are other approaches. In particular, I'm not clear on how to "vectorise" an operation. I'm a relatively new coder so I'm sorry if my code is a bit amateurish. My solution is below:
adapt<-function(x,wmin,wmax) {
# adapt takes a vector of calculated velocities (x), a minimum window size (wmin),
# and a maximum window size (wmax). It returns a vector of filtered velocities
#
x<-ifelse(is.na(x),0,x) # check for na values
x<-ifelse(is.infinite(1/x),1/wmax,x) # check for infinite values
x<-runmed(x,11) # smooth raw velocities using 11 point window
wins<-ceiling(ifelse(is.infinite(1/x),wmin,1+wmax/(1+x)^15)) # set window widths
wins<-ifelse(wins<=wmin,wmin,wins) # set min windows
wins<-ifelse(wins>wmax,wmax,wins) # set max windows
out<-rollapply(x,width=wins,median) # apply filter to each element
out[length(x)]<-0 # set last value to zero
return(out)
}

Resources