Equal frequency binning for circular data in R - r

I am trying to divide my wind direction data into four equal frequency bins, without having a fixed break at 0° / 360°.
I am aware of the equal_freq() function from the funModeling package, but this function does not take the circular nature of the wind data into account and calculates the breaks by starting at 0° and ending at 360°.
Is there a way to calculate dynamic equal frequencies that can span over the null jump between 0 and 360 degrees?
Here is a minimal reproducible example:
wind_dirs<-runif(n=2000, min=0, max=360) #create a homogenous wind direction distribution
equal_freq(wind_dirs, 4) #this has a fixed break at 0° / 360°
I tried the circular package in R, but there is no function for equal frequency binning.
I also considered manually defining a breakpoint, e.g. the most prevailing direction, but this creates about the same problem as a break at 0°.
Any ideas are greatly appreciated.

Related

Average height of a point based on nearby points

I have a situation in my game. I am experimenting with terrain generation.
I have a bunch of peaks, whose position and elevation i know.
I have a point which is surrounded by all these peaks. I know its position. I am trying to calculate the elevation of this point.
I would like to calculate the height of this point, based on how close/far it is to each of these peaks, and the elevation of each of these peaks.
Example:
Peak 1 is at (0,0), with an elevation of 500
Peak 2 is at (100,100), with an elevation of 1000
Peak 3 is at (0,100), with an elevation of 750
If my point is at (99,99), i want the elevation of this point to be as close to 1000.
What is the name of this problem?
If you already have a solution to this, that too will be much appreciated.
Note: In addition, it will be helpful if the formula/equation also allows me to generate negative elevations. for example, a point midway between all the peaks could as well be under sea level. Any formula i can menatally think of usually gives me just positive results. I assume some kind of 'Slope' must be considered to allow this.
One equation i though of so far is
P1.height * (Sum of all distances - distance from P1)/(Sum of all distances) +
P2.height * (Sum of all distances - distance from P2)/(Sum of all distances) +
... Pn.height * (Sum of all distances - distance from Pn)/(Sum of all distances)
Thank you.
To draw the peaks your game needs to convert the coordinates of the peaks to screen coordinates.
Such calculation is usually done by multiplying a matrix with the vector containing the coordinates (in java AWT such matrix would be called a transform).
What you need is the inverse of that matrix so that you can apply it to your screen coordinates.
So the solution is:
get the matrix that is used for rendering the terrain
calculate the inverse matrix
apply it to your screen coordinates
And it might even be more efficient not to use the original matrix to calculate the inverse matrix but use the parameters (zero point, scale factors and rotation angle) which were used to calculate the original matrix. The same parameters can be used to calculate the inverse matrix.

Why the need for a mask when performing Fast Fourier Transform?

I'm trying to find out the peak frequencies hidden in my data using the fft() method in R. While preparing the data, a more experienced user recommends to create a "mask" (more after explaining the details), that does give me the exact diagram I'm looking for. The problem is, I don't understand what it does or why it's needed.
To give some context, I'm working with .txt files with around 12000 entries each. It's voltage vs. time information, and the expected result is just a sinusoidal wave with a clear peak frequency that should be close to 1-2 Hz. This is an example of what one of those files look like:
I've been trying to use the Fast Fourier Transform method fft() implemented in R to find the peak frequencies and get a diagram that reflected them clearly. At first, I calculate some things that I understand are going to be useful, like the Nyquist frequency and the range of frequencies I'll show in the final graph:
n = length(variable)
dt = time[5]-time[4]
df = 1/(max(time)) #Find out the "unit" frequency
fnyquist = 1/(2*dt) #The Nyquist frequency
f = seq(-fnyquist, fnyquist-df, by=df) #These are the frequencies I'll plot
But when I plot the absolute value of what fft(data) calculates vs. the range of frequencies, I get this:
The peak frequency seems to be close to 50 Hz, but I know that's not the case. It should be close to 1 Hz. I'm a complete newbie in R and in Fourier analysis, so after researching a little, I found in a Swiss page that this can be solved by creating a "mask", which is actually just a vector with a repeatting patern (1, -1, 1, -1...) with the same length as my data vector itself:
mask=rep(c(1, -1),length.out=n)
Then if I multiply my data vector by this mask and plot the results:
results = mask*data
plot(f,abs(fft(results)),type="h")
I get what I was looking for. (This is the graph after limiting the x-axis to a reasonable scale).
So, what's the mask actually doing? I undestand it's changing my data point signs in an alternate manner, but I don't get why it would take the infered peak frequencies from ~50 Hz to the correct result of ~1 Hz.
Thanks in advance!
Your "mask" is one of two methods of performing an fftshift, which is commonly done to center the 0 Hz output of an FFT in the middle of a graph or plot (instead of at the left edge, with the negative frequencies wrapping around to the right edge).
To perform an fftshift, you can hetrodyne or modulate your data (by Fs/2) before the FFT, or simply do a circular shift by 50% after the FFT. Both produce the same result. They are the same due to the shift property of the DFT.

Calculate total absolute curvature from coordinates in R

Given a set of coordinates corresponding to a closed shape, I want to calculate the total absolute curvature, which requires calculating the curvature for each point, taking the absolute value, and summing them. Simple enough.
I used the answer to this question to calculate the curvature from a matrix of x y coordinates (xymat) and get what I thought would be the total absolute curvature:
sum(abs(predict(smooth.spline(xymat), deriv = 2)$y))
The problem is that total absolute curvature has a minimum value of 2*pi and is exactly that for circles, but this code is evaluating to values less than 2*pi:
library(purrr)
xymat <- map_df(data.frame(degrees=seq(0:360)),
function(theta) data.frame(x = sin(theta), y = cos(theta)))
sum(abs(predict(smooth.spline(xymat), deriv = 2)$y))
This returns 1.311098 instead of the expected value of 6.283185.
If I change the df parameter of smooth.spline to 3 as in the previous answer, the returned value is 3.944053, still shy of 2*pi (the df value smooth.spline calculated for itself was 2.472213).
Is there a better way to calculate curvature? Is smooth.spline parameterized by arc length or will incorporating it (somehow) rescue this calculation?
Okay, a few things before we begin. You're using degrees in your seq, which will give you incorrect results (0 to 360 degrees). You can check that this is wrong by taking cos(360) in R, which isn't 1. This is explained in the documentation for the trig functions under Details.
So let's change your function to this
xymat <- map_df(data.frame(degrees=seq(0,2*pi,length=360)),
function(theta) data.frame(x = sin(theta), y = cos(theta)))
If you plot this, this indeed looks like a circle.
Let's actually restrict this to the lower half of the circle. If you put a spline through this without understanding the symmetry and looking at the plot, chances are that you'll get a horizontal line through the circle.
Why? because the spline doesn't know that it's symmetric above and below y = 0. The spline is trying to fit a function that explains the "data", not trace an arc. It splits the difference between two symmetric sets of points around y = 0.
If we restrict the spline to the lower half of the circle, we can use y values between 1 and -1, like this:
lower.semicircle <- data.frame(predict(smooth.spline(xymat[91:270,], all.knots = T)))
And let's fit a spline through it.
lower.semicircle.pred<-data.frame(predict(smooth.spline(lower.semicircle, all.knots = T)))
Note that I'm not using the deriv function here. That is for a different problem in the cars example to which you linked. You want total absolute curvature and they are looking at rate of change of curvature.
What we have now is an approximation to a lower semicircle using splines. Now you want the distance between all of the little sequential points like in the integral from the wikipedia page.
Let's calculate all of the little arc distances using a distance matrix. This literally calculates the Euclidean distances between each point to every other point.
all.pairwise.distances.in.the.spline.approx<-dist(lower.semicircle.pred, diag=F)
dist.matrix<-as.matrix(all.pairwise.distances.in.the.spline.approx)
seq.of.distances.you.want<-dist.matrix[row(dist.matrix) == col(dist.matrix) + 1]
This last object is what you need to sum across.
sum(seq.of.distances.you.want)
..which evaluates to [1] 3.079 for the lower semicircle, around half of your 2*pi expected value.
It's not perfect but splines have problems with edge effects.

How to compute something like the angle between two non-unit vectors

I need to sort a set of vectors in circular order. The simplest approach would be to use the angle between the vectors and a fixed axis. To get the angle, one would have to normalize the vectors which includes performing an expensive square root calculation.
As I want to avoid the costs and I don't need the particular angle - just some value that gives me the same order - I was wondering if there is a way to calculate a value for each vector that does not require the vector to be normalized and yields a similar value like the angle (i.e. if angle(x) > angle(y) then f(x) > f(y)).
The ratio of the y component to the x component should be enough to order the vectors without normalizing them. if the y:x ratio is larger, then the angle will be steeper. That'll work at least for the 1st quadrant (0 to 90 degrees), but the general idea should be enough to get you started.

Calculating the volume under a surface

I have created a 3D plot (a surface) using wireframe function. I wonder if there is any functions by which I can calculate the volume under the surface in a 3D plot?
Here is a sample of my data plus the wrieframe syntax I used to create my 3D (surface) plot:
x1<-c(13,27,41,55,69,83,97,111,125,139)
x2<-c(27,55,83,111,139,166,194,222,250,278)
x3<-c(41,83,125,166,208,250,292,333,375,417)
x4<-c(55,111,166,222,278,333,389,445,500,556)
x5<-c(69,139,208,278,347,417,487,556,626,695)
x6<-c(83,166,250,333,417,500,584,667,751,834)
x7<-c(97,194,292,389,487,584,681,779,876,974)
x8<-c(111,222,333,445,556,667,779,890,1001,1113)
x9<-c(125,250,375,500,626,751,876,1001,1127,1252)
x10<-c(139,278,417,556,695,834,974,1113,1252,1391)
df<-data.frame(x1,x2,x3,x4,x5,x6,x7,x8,x9,x10)
df.matrix<-as.matrix(df)
wireframe(df.matrix,
aspect = c(61/87, 0.4),scales=list(arrows=FALSE,cex=.5,tick.number="10",z=list(arrows=T)),ylim=c(1:10),xlab=expression(phi1),ylab="Percentile",zlab=" Loss",main="Random Classifier",
light.source = c(10,10,10),drape=T,col.regions = rainbow(100, s = 1, v = 1, start = 0, end = max(1,100 - 1)/100, alpha = 1),screen=list(z=-60,x=-60))
Note: my real data is a 100X100 matrix
Thanks
The data you are feeding to wireframe is a grid of values. Hence one estimate of the volume of whatever underlying surface this is approximating is the sum of the grid values multiplied by the grid cell areas. This is just like adding up the heights of histogram bars to get the number of values in your histogram.
The problem I see with you doing this on your data is that the cell areas are going to be in odd units - percentiles on one axis, phi on the other has unknown units, so your volume is going to have units of loss times units of percentile times units of phi.
This isn't a problem if you want to compare volumes of similar things on exactly the same grid, but if you have surfaces on different grids (different values of phi, or different percentiles) then you need to be careful.
Now, noting that wireframe doesn't draw like a 3d histogram would (looking like square tower blocks) this gives us another way to estimate the volume. Your 10x10 matrix is plotted as 9x9 squares. Divide each of those squares into triangles and then compute the volume of the 192 right truncated triangular prisms (I think this is what they are - they are equilateral triangular prisms with a right angle and one sloping end). The formula for that should be out there somewhere. Probably base area times height to the centroid of the triangle or something.
I thought maybe this would be in the raster package, but it isn't. There's code for computing the surface area but not the volume! I'm sure the raster maintainer would be happy to have some code for this!
If the points are arbitrary (ie, don't follow smooth function), it seems like you're looking for the volume of the convex hull (minimum surface) surrounding these points. One package to help you calculate this is alphashape3d.
You'll need a 3-column matrix of the coordinates to form the right type of object to make the calculation but it seems rather straight-forward.

Resources