R: Inverse fft() to confirm my manual DFT algorithm inaccurate? - r

Using R, before assessing some metric of accuracy on my own manual implementation of DFT, I wanted to do a sanity check on how well stats::fft() performs by doing the following:
sig.ts = ts( sin(2*pi*freq1*t) + sin(2*pi*freq2*t) );
sig.rt = fft(fft(sig.ts)/N, inverse="true");
#the two plots so perfectly align that you can't see them both
max(abs(sig.ts - sig.rt)) / max(sig.ts);
#arbitrary crude accuracy metric=1.230e-15 - EXCELLENT!
But I wanted to write the code for DFT myself, to ensure I understand it, then invert it in the hopes that it would be the same:
##The following is the slow DFT for now, not the FFT...
sR = 102.4; #the number of Hz at which we sample
freq1=3; freq2=12; #frequency(ies) of the wave
t = seq(1/sR,10, 1/sR);
sig.ts = ts( sin(2*pi*freq1*t) + sin(2*pi*freq2*t) );
N=length(t); kk=seq(0,N/2-1, 1); nn=seq(0,N-1, 1);
for(k in kk){
sig.freqd[k]=0;
for(n in nn){
sig.freqd[k] = sig.freqd[k] + sig.ts[n+1]*exp(-j*2*pi*n*k/N); } }
sig.freqd = (1/N)*sig.freqd; #for Normalization
#Checking the "accuracy" of my manual implementation of DFT...
sig.freqd_inv=Re(fft(sig.freqd, inverse="true"));
plot(t[1:100], window(sig.ts,end=100), col="black", type="l",lty=1,lwd=1, xaxt="n");
lines(t[1:100],window(sig.freqd_inv,end=100), col="red", type="l",lty=1,lwd=1, xaxt="n");
axis(1, at=seq(round(t[1],1),round(t[length(t)],1), by=0.1), las=2);
max(abs(sig.ts[1:(N/2-1)] - sig.freqd_inv)) / max(sig.ts[1:(N/2-1)]); #the metric here =1.482 unfortunately
Even without the metric, the plot makes it obvious that something's off here - it's lower amplitude, maybe out of phase, and more jagged. In all of my self-studying, I will say that I am a bit confused about how sensitive this all is to vector length..as well as how to ensure that the imaginary component's phase information is taken into account when plotting.
Bottom line, any insight into what's wrong with my DFT algorithm would be helpful. I don't want to just blackbox my use of functions - I want to understand these things more deeply before moving on to more complicated functions.
Thanks,
Christian

The main issues arise from the signal indexing. First to get a full transform usable by R's fft(..., inverse = TRUE), you would need to compute all N coefficients (even if the coefficients above N/2-1 could be obtained by symmetry).
Then you should realize that array indexing in R are 1-based. So, while indexing sig.freqd[k], the index k should start at 1 instead of 0. Since the argument to exp(-1i*2*pi*n*k/N) should start with n=0andk=0`, you'll need to adjust the indices:
kk=seq(1,N, 1); nn=seq(1,N, 1);
for(k in kk){
sig.freqd[k]=0i;
for(n in nn){
sig.freqd[k] = sig.freqd[k] + sig.ts[n]*exp(-1i*2*pi*(n-1)*(k-1)/N);
}
}
I've also changed you usage of j to represent the imaginary number 1i since that's the usual notation recognized by R (and R was complaining about it when trying your posted sample as-is). If you had defined j=1i that shouldn't affect the results.
Note also that R's fft is unnormalized. So to obtain the same result for the forward transform, your DFT implementation should not include the 1/N normalization. On the other hand, you will need to add this factor as a final step in order to get the full-circle forward+backward transform to match the original signal.
With these changes you should have the following code:
##The following is the slow DFT for now, not the FFT...
sR = 102.4; #the number of Hz at which we sample
freq1=3; freq2=12; #frequency(ies) of the wave
t = seq(1/sR,10, 1/sR);
sig.ts = ts( sin(2*pi*freq1*t) + sin(2*pi*freq2*t) );
N=length(t); kk=seq(1,N, 1); nn=seq(1,N, 1);
for(k in kk){
sig.freqd[k]=0i;
for(n in nn){
sig.freqd[k] = sig.freqd[k] + sig.ts[n]*exp(-1i*2*pi*(n-1)*(k-1)/N);
}
}
#Checking the "accuracy" of my manual implementation of DFT...
sig.freqd_inv=(1/N)*Re(fft(sig.freqd, inverse="true"));
plot(t[1:100], window(sig.ts,end=100), col="black", type="l",lty=1,lwd=2, xaxt="n");
lines(t[1:100],window(sig.freqd_inv,end=100), col="red", type="l",lty=2,lwd=1, xaxt="n");
axis(1, at=seq(round(t[1],1),round(t[length(t)],1), by=0.1), las=2);
max(abs(sig.ts - sig.freqd_inv)) / max(sig.ts)
This should yield a metric around 1.814886e-13, which is probably more in line with what you were expecting. The corresponding plot should also be showing the orignal signal and the roundtrip signal overlapping:

Related

Find root using uniroot

I'm trying to find a root of the following function (based on the Gamma (gamma()) function) using the uniroot() function:
cv = 0.056924/1.024987^2
fx2 = function(theta, eta){
p1 = 1 - 2/(theta*(1-eta))
p2 = 1 - 1/(theta*(1-eta))
return(( gamma(p1)/(gamma(p2))^2 ) - (cv+1) )
}
This function gives me the following plot:
v = seq(0, 1, 0.01)
plot(v, fx2(3.0, v), type='l' )
It seems to me that the root of this function is close to 0.33, but the uniroot() function doesn't find the root, returning the following result:
uniroot(fx2, interval = c(0,0.3), theta=3 )
Error in uniroot(fx2, interval = c(0, 0.3), theta = 3) :
f() values at end points not of opposite sign
How do I find the root of this function? Are there any other packages with a more accurate algorithm?
I first rewrote your function to (optionally) express gamma(p1)/gamma(p2)^2 in terms of a computation that's first done on the log scale (via lgamma()) and then exponentiated. This is more numerically stable, and the consequences will become clear below ... (It's possible that I screwed up the log-scale computation — you should double-check it. Update/warning: reading the documentation more carefully (!!), lgamma() evaluates to the log of the absolute value of the gamma function. So there may be some weird sign stuff going on in the answer below. The fact remains that if you are evaluating ratios of gamma functions for x<0 (i.e. in the regime where the value can go negative), Bad Stuff is very likely going to happen.
cv = 0.056924/1.024987^2
fx3 <- function(theta, eta, lgamma = FALSE) {
p1 <- 1 - 2/(theta*(1-eta))
p2 <- 1 - 1/(theta*(1-eta))
if (lgamma) {
val <- exp(lgamma(p1) - 2*lgamma(p2)) - (cv+1)
} else {
val <- ( gamma(p1)/(gamma(p2))^2 ) - (cv+1)
}
}
Compute the function with and without log-scaling:
x <- seq(0, 1, length.out = 20001)
v <- sapply(x, fx3, theta = 3.0, lgamma = TRUE)
v2 <- sapply(x, fx3, theta = 3.0, lgamma = FALSE)
Find root (more explanation below):
uu <- uniroot(function(eta) fx3(3.0, eta, lgamma = TRUE),
c(0.4, 0.5))
Plot it:
par(las=1, bty="l")
plot(x, abs(v), col = as.numeric(v<0) + 1, type="p", log="y",
pch=".", cex=3)
abline(v = uu$root, lty=2)
cvec <- sapply(c("blue","magenta"), adjustcolor, alpha.f = 0.2)
points(x, abs(v2), col=cvec[as.numeric(v2<0) + 1], pch=".", cex=3)
Here I'm plotting the absolute value on a log scale, with sign indicated by colour (black/blue >0, red>magenta <0). Black/red is the log-scale calculation, blue/magenta is the original calculation. I also plotted the function at very high resolution to try to avoid missing or mischaracterizing features.
There's a lot of weird stuff going on here.
both versions of the function do something interesting near x=1/3; the original version looks like a pole (value diverges to +∞, "returns" from -∞), while the log-scale computation goes up to +∞ and returns without changing sign.
the log-scale computation has a root near x=0.45 (absolute value becomes small while the sign flips), but the original computation doesn't — presumably because of some kind of catastrophic loss of precision? If we give uniroot bounds that don't include the pole, it can find this root.
there are further poles and/or roots at larger values of x that I didn't explore.
All of this basically says that it's pretty dangerous to mess around with this function without knowing what its mathematical properties are. I discovered some stuff by numerical exploration, but it would be best to analyze the function so that you really know what's happening; any numerical exploration can be fooled if the function is sufficiently strangely behaved.

How do I find level sets for a function on R^d, in R?

I am looking for an efficient way to find level sets of an arbitrary function from [0,1]^d to R.
To be clear: with a level set I mean the set of points in [0,1]^d that are mapped to the same value.
In all of my applications, the level sets are connected. They are lines, planes, or some higher dimensional hyperplane, but apart from the connectedness, they do not satisfy some general criterium.
I am looking for a subset of the level set that has a high density everywhere.
When I limit my functions to 2d, I can use the function contourLines from the package grDevices, which does exactly what I am looking for:
test <- function(x,y){
y-(x^2-6*x+9)
}
Mat = matrix(0,100,100)
x <- seq(-10,10,length.out = 100)
y <- seq(-10,10,length.out = 100)
for(i in (1:100)){
for(j in (1:100)){
Mat[i,j] = test(x[i],y[j])
}
}
cont <- contourLines(x, y, Mat, levels = 0)
Unfortunately I have not been able to find a function that does the same trick in higher dimensions.
To give a bit more context to the problem:
I have a 'wild' function, of which I hardly know anything, but I can easily evaluate it at any point in R^d. This function divides the R^d (or [0,1]^d, to make it a bit simpler), into a positive part (level sets larger than 0), and a negative part (level sets smaller than 0). I am looking for the boundary separating the two, which is the level set for 0.

How to fix a straight line plot of a logistic map in R?

The logistic map (a map is a function that takes its value at any time step to its value at the next time step) is a model that has its roots in the prediction of animal population sizes. It has become famous, in part, due to special cases of its parameterization that exhibit surprising chaotic behavior. The logistic map equation is
xi+1 = rxi(1 - xi)
where xi ∈ [0,1] is the value ratio of current population size to maximum possible size at time i, xi+1 is the ratio at the next generation and r is the driving rate, representing animal reproduction and death. For r < 3.5 the population eventually reaches a stable size or will oscillate between a set of fixed values. However, if r > 3.5 then the system destabilizes and exhibits chaotic behavior!
That is background or context for the following problem statement:
Generate a set of points S = {r, x} where, for each r ∈ [1.0, 4.1] by increments of 0.001025 there will be a sequence of xi values for i = 0,...,16. So, for each r value there will be 17 xi values. Use x0 = 0.01. Depending on your implementation, you may find the rbind function useful. It may take a few seconds for the code to run since it will generate a lot of points in S. No more than 10 lines of R code.
Admittedly, this is a lab assignment; however, I am not a student in the class. I am learning R, and I am trying to work through the online assignments and come up with a solution myself. I have tried to create the set of points to plot, and based on manual verification of a few points, the set looks accurate.
for(j in c(0:3024)) {
rm(x)
x <- 1:17
x[1] <- 0.01
r <- 1 + (j * 0.001025)
for(i in c(1:(17-1))) {
x[i+1] <- r *x[i] * (1 - x[i])
}
if (j==0) {
binded <- cbind(r,x)
} else {
binded <- rbind(binded, cbind(r,x))
}
}
When I invoke plot(binded, pch='.') RStudio displays the result as a straight line. So I am unsure if I am using plot correctly, or even if I am generating all the points correctly. If I decrease the maximum value of j to something less than 2000, you will see a plot; it is just when the j value iterates up to 3024 that you only plot a straight line.
I believe your code is correct, what happens is when time exceeds 4, the of iterations are widely unstable and are going to -infinity. This large variation in the y value is compressing the scale and making the plot look like a flat line.
Cutting off the tail end of the matrix makes a very interesting plot:
plot(binded[-which(binded[,2]<0),], pch=".")
If you do want to plot the entire matrix, consider manually setting your y-axis limits to [0,1]. This way, the plot won't be stretched down to -1e24.
As an added bonus, here's a version in a different plotting library that has points colored by i.

Plot decision boundary from weight vector

How do I plot decision boundary from weight vector?
My original data is 2-dimensional but non-linearly separable so I used a polynomial transformation of order 2 and therefore I ended up with a 6-dimensional weight vector.
Here's the code I used to generate my data:
polar2cart <- function(theta,R,x,y){
x = x+cos(theta) * R
y = y+sin(theta) * R
c=matrix(x,ncol=1000)
c=rbind(c,y)
}
cart2polar <- function(x, y)
{
r <- sqrt(x^2 + y^2)
t <- atan(y/x)
c(r,t)
}
R=5
eps=5
sep=-5
c1<-polar2cart(pi*runif(1000,0,1),runif(1000,0,eps)+R,0,0)
c2<-polar2cart(-pi*runif(1000,0,1),runif(1000,0,eps)+R,R+eps/2,-sep)
data <- data.frame("x" = append(c1[1,], c2[1,]), "y" = append(c1[2,], c2[2,]))
labels <- append(rep(1,1000), rep(-1, 1000))
and here's how it is displayed (using ggplot2):
Thank you in advance.
EDIT: I'm sorry if I didn't provide enough information about the weight vector. The algorithm I'm using is pocket which is a variation of perceptron, which means that the output weight vector is the perpendicular vector that determines the hyper-plane in the feature space plus the bias . Therefore, the hyper-plane equation is , where are the variables. Now, since I used a polynomial transformation of order 2 to go from a 2-dimensional space to a 5-dimensional space, my variables are : and thus the equation for my decision boundary is:
So basically, my question is how do I go about drawing my decision boundary given
PS: I've found a solution while waiting, it might not be the best approach but, it gives the expected results. I'll share it as soon as I finish my project if anyone is interested. Meanwhile, I'd love to hear a better alternative.

Implementing a different Kernel for 2D Kernel Density Estimation in R

I'm looking for some help understanding how to implement a 2-dimensional kernel density method, with a isotropic variance, and a bivariate normal kernel, kind of, but instead of using the typical distance, because the data is on the surface of the earth, I need to use a great-circle distance.
I'd like to replicate this in R, but I can't figure out how to use a distance metric other than the simple euclidean distance for any of the built in estimators, and since it uses a complex method with convolutions to add the kernels. Does anyone have a way to program an arbitrary kernel?
I ended up modifying the kde2d function from the MASS library. Some significant revision was needed, as is shown below. That said, the code is very flexible, allowing an arbitrary 2-d kernel to be used. (rdist.earth() was used for the great circle distance, h is the chosen bandwidth, in this case, in km, and n is the number of grid points in each direction to be used. rdist.earth requires the "fields" library)
The function could be modified to perform calculations in more than 2d, but the grid gets large very fast in higher dimensions. (Not that it's small now.)
Comments and suggestions on elegance or performance are welcome!
kde2d_mod <- function (data, h, n = 200, lims = c(range(data$lat), range(data$lon))) {
#Data is a matrix: lon,lat for each source. (lon,lat to match rdist.earth format.)
print(Sys.time()) #for timing
nx <- dim(data)[1]
if (dim(data)[2] != 2)
stop("data vectors have only lat-long data")
if (any(!is.finite(data)))
stop("missing or infinite values in the data are not allowed")
if (any(!is.finite(lims)))
stop("only finite values are allowed in 'lims'")
#Grid:
g<-grid(n,lims) #Function to create grid.
#The distance matrix gets large... Can we work around it? YES WE CAN!
sets<-ceiling(dim(g)[1]/10000)
#Allocate our output:
z<-rep(as.double(0),dim(g)[1])
for (i in (1:sets)-1) {
g_subset=g[(i*10000+1):(min((i+1)*10000,dim(g)[1])),]
a_matrix<-rdist.earth(g_subset,data,miles=FALSE)
z[(i*10000+1):(min((i+1)*10000,dim(g)[1]))]<- apply( #Here is my kernel...
a_matrix,1,FUN=function(X)
{sum(exp(-X^2/(2*(h^2))))/(2*pi*nx)}
)
rm(a_matrix)
}
print(Sys.time())
#Un-transpose the final data.
z<-t(matrix(z,n,n))
dim(z)<-c(n^2,1)
z<-as.vector(z)
return(z)
}
The key point here is that any kernel can be used in that inner loop; the downside is that this is evaluated at grid points, so a high-res grid is needed to run this; FFT would be great, but I didn't attempt it.
Grid Function:
grid<- function(n,lims) {
num <- rep(n, length.out = 2L)
gx <- seq.int(lims[1L], lims[2L], length.out = num[1L])
gy <- seq.int(lims[3L], lims[4L], length.out = num[2L])
v1=rep(gy,length(gx))
v2=rep(gx,length(gy))
v1<-matrix(v1, nrow=length(gy), ncol=length(gx))
v2<-t(matrix(v2, nrow=length(gx), ncol=length(gy)))
grid_out<-c(unlist(v1),unlist(v2))
grid_out<-aperm(array(grid_out,dim=c(n,n,2)),c(3,2,1) ) #reshape
grid_out<-unlist(as.list(grid_out))
dim(grid_out)<-c(2,n^2)
grid_out<-t(grid_out)
return(grid_out)
}
You can plot the values using image.plot, with the v1 and v2 matrices for your x,y points:
kde2d_mod_plot<-function(kde2d_mod_output,n,lims) ){
num <- rep(n, length.out = 2L)
gx <- seq.int(lims[1L], lims[2L], length.out = num[1L])
gy <- seq.int(lims[3L], lims[4L], length.out = num[2L])
v1=rep(gy,length(gx))
v2=rep(gx,length(gy))
v1<-matrix(v1, nrow=length(gy), ncol=length(gx))
v2<-t(matrix(v2, nrow=length(gx), ncol=length(gy)))
image.plot(v1,v2,matrix(kde2d_mod_output,n,n))
map('world', fill = FALSE,add=TRUE)
}

Resources