Calculate total absolute curvature from coordinates in R - r

Given a set of coordinates corresponding to a closed shape, I want to calculate the total absolute curvature, which requires calculating the curvature for each point, taking the absolute value, and summing them. Simple enough.
I used the answer to this question to calculate the curvature from a matrix of x y coordinates (xymat) and get what I thought would be the total absolute curvature:
sum(abs(predict(smooth.spline(xymat), deriv = 2)$y))
The problem is that total absolute curvature has a minimum value of 2*pi and is exactly that for circles, but this code is evaluating to values less than 2*pi:
library(purrr)
xymat <- map_df(data.frame(degrees=seq(0:360)),
function(theta) data.frame(x = sin(theta), y = cos(theta)))
sum(abs(predict(smooth.spline(xymat), deriv = 2)$y))
This returns 1.311098 instead of the expected value of 6.283185.
If I change the df parameter of smooth.spline to 3 as in the previous answer, the returned value is 3.944053, still shy of 2*pi (the df value smooth.spline calculated for itself was 2.472213).
Is there a better way to calculate curvature? Is smooth.spline parameterized by arc length or will incorporating it (somehow) rescue this calculation?

Okay, a few things before we begin. You're using degrees in your seq, which will give you incorrect results (0 to 360 degrees). You can check that this is wrong by taking cos(360) in R, which isn't 1. This is explained in the documentation for the trig functions under Details.
So let's change your function to this
xymat <- map_df(data.frame(degrees=seq(0,2*pi,length=360)),
function(theta) data.frame(x = sin(theta), y = cos(theta)))
If you plot this, this indeed looks like a circle.
Let's actually restrict this to the lower half of the circle. If you put a spline through this without understanding the symmetry and looking at the plot, chances are that you'll get a horizontal line through the circle.
Why? because the spline doesn't know that it's symmetric above and below y = 0. The spline is trying to fit a function that explains the "data", not trace an arc. It splits the difference between two symmetric sets of points around y = 0.
If we restrict the spline to the lower half of the circle, we can use y values between 1 and -1, like this:
lower.semicircle <- data.frame(predict(smooth.spline(xymat[91:270,], all.knots = T)))
And let's fit a spline through it.
lower.semicircle.pred<-data.frame(predict(smooth.spline(lower.semicircle, all.knots = T)))
Note that I'm not using the deriv function here. That is for a different problem in the cars example to which you linked. You want total absolute curvature and they are looking at rate of change of curvature.
What we have now is an approximation to a lower semicircle using splines. Now you want the distance between all of the little sequential points like in the integral from the wikipedia page.
Let's calculate all of the little arc distances using a distance matrix. This literally calculates the Euclidean distances between each point to every other point.
all.pairwise.distances.in.the.spline.approx<-dist(lower.semicircle.pred, diag=F)
dist.matrix<-as.matrix(all.pairwise.distances.in.the.spline.approx)
seq.of.distances.you.want<-dist.matrix[row(dist.matrix) == col(dist.matrix) + 1]
This last object is what you need to sum across.
sum(seq.of.distances.you.want)
..which evaluates to [1] 3.079 for the lower semicircle, around half of your 2*pi expected value.
It's not perfect but splines have problems with edge effects.

Related

How to simulate distances from a fixed point to random points within a given radius in R?

I am trying to simulate N distances between a fixed point and other points randomly distributed around it within a given radius.
One way I've thought of is to simulate coordinates for the random points, then calculate the distances, then exclude distances greater than the given radius (say r = 250m):
X <- runif(N, -250, 250) # simulate random X coordinate
Y <- runif(N, -250, 250) # simulate random Y coordinate
distance <- sqrt(X^2 + Y^2) # calculate distance from random points to center
distance <- distance[distance < 250] # only include values within given radius
However, I am wondering if there is a way to simulate these distances without simulating the coordinates themselves. My end goal is to be able to do this in JAGS so solutions that work in JAGS are preferred. Is there a probability distribution that could be used to describe the probability of these distances to random points? An ideal solution would look something like this:
distance ~ pDistribution(N, 250)
or alternatively in JAGS:
for (1 in 1:N) {
distance[i] ~ pDistribution(250)
}
#jlhoward had a good idea with thinking in polar coordinates - that's what got me going in the right direction. However, by using r = runif(250), you would end up with points clustered around the center. To have a uniformly random distribution of points throughout the circle, there must be more points at greater distances from the center (because circumference/area increase with radius). Turns out you can do this with r <- 250 * sqrt(runif(N, 0, 1)). For my problem, all I needed was to generate these distances (i.e., radii), not the actual points, so this code is an adequate solution. This great video on finding random points in a circle is what helped me finally figure it out.

Understanding mean curvature of a 3D surface

I am currently trying to understand the calculation of the mean curvature for a 3D surface, where one coordinate is a function of the other two coordinates.
Looking at wikipedia https://en.wikipedia.org/wiki/Mean_curvature#Surfaces_in_3D_space under "[For the special case of a surface defined as a function of two coordinates, e.g. z = S(x,y)]" they give this formula:
mean curvature
What i don't understand here is the div(z - S) . If z = S(x,y) then i would think that z is the same as S and thus z - S equals 0.
I tried to follow the cited literature but i didn't find what i was looking for.
Apparently i misunderstand something here and z is not the same as S?
Any help would be appreciated.
z-S(x,y) is a function of 3 variables, the gradient of which is (-S_x,-S_y,1), see the second line. Then you normalize this gradient vector and compute the divergence of the normalized vector field.

Position(t) on cubic bezier curve

The only equation to calculate this that I can find involves t in the range [0, 1], but I have no idea how long it will take to travel the entire path, so I can't calculate (1 - t).
I know the speed at which I'm traveling, but it seems to be a heavy idea to calculate the total time beforehand (nor do I actually know how to do that calculation). What is an equation to figure out the position without knowing the total time?
Edit To clarify on the cubic bezier curve: I have four control points (P0 to P1), and to get a value on the curve with t, I need to use the four points as such:
B(t) = (1-t)^3P0 + 3t(1-t)^2P1 + 3t^2(1-t)P2 + t^3P3
I am not using a parametric equation to define the curve. The control points are what define the curve. What I need is an equation that does not require the use of knowing the range of t.
I think there is a misunderstanding here. The 't' in the cubic Bezier curve's definition does not refer to 'time'. It is parameter that the x, y or even z functions based on. Unlike the traditional way of representing y as a function of x, such as y=f(x), an alternative way of representing a curve is by the parametric form that represents x, y and z as functions of an additional parameter t, C(t)=(x(t), y(t), z(t)). Typically the t value will range from 0 to 1, but this is not a must. The common representation for a circle as x=cos(t) and y=sin(t) is an example of parametric representation. So, if you have the parametric representation of a curve, you can evaluate the position on the curve for any given t value. It has nothing to do with the time it takes to travel the entire path.
You have the given curve and you have your speed. To calculate what you're asking for you need to divide the total distance by the speed you traveled given that time. That will give you the parametric (t) you need. So if the total curve has a distance of 72.2 units and your speed is 1 unit then your t is 1/72.2.
Your only missing bit is calculating the length of a given curve. This is typically done by subdividing it into line segments small enough that you don't care, and then adding up the total distance of those line segments. You could likely combine those two steps as well if you were so inclined. If you have your given speed, just iteration like 1000th of the curve add the line segment between the start and point 1000th of the way through the curve, and subtract that from how far you need to travel (given that you have speed and time, you have distance you need to travel), and keep that up until you've gone as far as you need to go.
The range for t is between 0 and 1.
x = (1-t)*(1-t)*(1-t)*p0x + 3*(1-t)*(1-t)*t*p1x + 3*(1-t)*t*t*p2x + t*t*t*p3x;
y = (1-t)*(1-t)*(1-t)*p0y + 3*(1-t)*(1-t)*t*p1y + 3*(1-t)*t*t*p2y + t*t*t*p3y;

How to find the smallest ellipse covering a given fraction of a set of points in R?

I'm wondering: Is there some function/clever way to find the smallest ellipse covering a given fraction of a set of 2d points in R? With smallest I mean the ellipse with the smallest area.
Clarification: I'm fine with an approximately correct solution if the number of points are large (as I guess an exact solution would have to try all combinations of subsets of points)
This question might sound like a duplicate of the question Ellipse containing percentage of given points in R but the way that question is phrased the resulting answer does not result in the smallest ellipse. For example, using the solution given to Ellipse containing percentage of given points in R:
require(car)
x <- runif(6)
y <- runif(6)
dataEllipse(x,y, levels=0.5)
The resulting ellipse is clearly not the smallest ellipse containing half of the points, Which, I guess, would be a small ellipse covering the three points up in the top-left corner.
I think I have a solution which requires two functions, cov.rob from the MASS package and ellipsoidhull from the cluster package. cov.rob(xy, quantile.used = 50, method = "mve") finds approximately the "best" 50 points out of the total number of 2d points in xy that are contained in the minimum volume ellipse. However, cov.rob does not directly return this ellipse but rather some other ellipse estimated from the best points (the goal being to robustly estimate the covariance matrix). To find the actuall minimum ellipse we can give the best points to ellipsoidhull which finds the minimum ellipse, and we can use predict.ellipse to get out the coordinates of the path defining the hull of the elllipse.
I'm not 100% certain this method is the easiest and/or that it works 100% (It feels like it should be possible to avoid the seconds step of using ellipsoidhull but I havn't figured out how.). It seems to work on my toy example at least....
Enough talking, here is the code:
library(MASS)
library(cluster)
# Using the same six points as in the question
xy <- cbind(x, y)
# Finding the 3 points in the smallest ellipse (not finding
# the actual ellipse though...)
fit <- cov.rob(xy, quantile.used = 3, method = "mve")
# Finding the minimum volume ellipse that contains these three points
best_ellipse <- ellipsoidhull( xy[fit$best,] )
plot(xy)
# The predict() function returns a 2d matrix defining the coordinates of
# the hull of the ellipse
lines(predict(best_ellipse), col="blue")
Looks pretty good! You can also inspect the ellipse object for more info
best_ellipse
## 'ellipsoid' in 2 dimensions:
## center = ( 0.36 0.65 ); squared ave.radius d^2 = 2
## and shape matrix =
## x y
## x 0.00042 0.0065
## y 0.00654 0.1229
## hence, area = 0.018
Here is a handy function which adds an ellipse to an existing base graphics plot:
plot_min_ellipse <- function(xy, points_in_ellipse, color = "blue") {
fit <- cov.rob(xy, quantile.used = points_in_ellipse, method = "mve")
best_ellipse <- ellipsoidhull( xy[fit$best,] )
lines(predict(best_ellipse), col=color)
}
Let's use it on a larger number of points:
x <- runif(100)
y <- runif(100)
xy <- cbind(x, y)
plot(xy)
plot_min_ellipse(xy, points_in_ellipse = 50)
This sounds very much like a 2D confidence interval. Try http://stat.ethz.ch/R-manual/R-devel/library/cluster/html/ellipsoidhull.html. You will probably need to run it on each combination of N points, then choose the smallest result.

How do I calculate the "difference" between two sequences of points?

I have two sequences of length n and m. Each is a sequence of points of the form (x,y) and represent curves in an image. I need to find how different (or similar) these sequences are given that fact that
one sequence is likely longer than the other (i.e., one can be half or a quarter as long as the other, but if they trace approximately the same curve, they are the same)
these sequences could be in opposite directions (i.e., sequence 1 goes from left to right, while sequence 2 goes from right to left)
I looked into some difference estimates like Levenshtein as well as edit-distances in structural similarity matching for protein folding, but none of them seem to do the trick. I could write my own brute-force method but I want to know if there is a better way.
Thanks.
Do you mean that you are trying to match curves that have been translated in x,y coordinates? One technique from image processing is to use chain codes [I'm looking for a decent reference, but all I can find right now is this] to encode each sequence and then compare those chain codes. You could take the sum of the differences (modulo 8) and if the result is 0, the curves are identical. Since the sequences are of different lengths and don't necessarily start at the same relative location, you would have to shift one sequence and do this again and again, but you only have to create the chain codes once. The only way to detect if one of the sequences is reversed is to try both the forward and reverse of one of the sequences. If the curves aren't exactly alike, the sum will be greater than zero but it is not straightforward to tell how different the curves are simply from the sum.
This method will not be rotationally invariant. If you need a method that is rotationally invariant, you should look at Boundary-Centered Polar Encoding. I can't find a free reference for that, but if you need me to describe it, let me know.
A method along these lines might work:
For both sequences:
Fit a curve through the sequence. Make sure that you have a continuous one-to-one function from [0,1] to points on this curve. That is, for each (real) number between 0 and 1, this function returns a point on the curve belonging to it. By tracing the function for all numbers from 0 to 1, you get the entire curve.
One way to fit a curve would be to draw a straight line between each pair of consecutive points (it is not a nice curve, because it has sharp bends, but it might be fine for your purpose). In that case, the function can be obtained by calculating the total length of all the line segments (Pythagoras). The point on the curve corresponding to a number Y (between 0 and 1) corresponds to the point on the curve that has a distance Y * (total length of all line segments) from the first point on the sequence, measured by traveling over the line segments (!!).
Now, after we have obtained such a function F(double) for the first sequence, and G(double) for the second sequence, we can calculate the similarity as follows:
double epsilon = 0.01;
double curveDistanceSquared = 0.0;
for(double d=0.0;d<1.0;d=d+epsilon)
{
Point pointOnCurve1 = F(d);
Point pointOnCurve2 = G(d);
//alternatively, use G(1.0-d) to check whether the second sequence is reversed
double distanceOfPoints = pointOnCurve1.EuclideanDistance(pointOnCurve2);
curveDistanceSquared = curveDistanceSquared + distanceOfPoints * distanceOfPoints;
}
similarity = 1.0/ curveDistanceSquared;
Possible improvements:
-Find an improved way to fit the curves. Note that you still need the function that traces the curve for the above method to work.
-When calculating the distance, consider reparametrizing the function G in such a way that the distance is minimized. (This means you have an increasing function R, such that R(0) = 0 and R(1)=1,
but which is otherwise general. When calculating the distance you use
Point pointOnCurve1 = F(d);
Point pointOnCurve2 = G(R(d));
Subsequently, you try to choose R in such a way that the distance is minimized. (to see what happens, note that G(R(d)) also traces the curve)).
Why not do some sort of curve fitting procedure (least-squares whether it be ordinary or non-linear) and see if the coefficients on the shape parameters are the same. If you run it as a panel-data sort of model, there are explicit statistical tests whether sets of parameters are significantly different from one another. That would solve the problem of the the same curve but sampled at different resolutions.
Step 1: Canonicalize the orientation. For example, let's say that all curved start at the endpoint with lowest lexicographic order.
def inCanonicalOrientation(path):
return path if path[0]<path[-1] else reversed(path)
Step 2: You can either be roughly accurate, or very accurate. If you wish to be very accurate, calculate a spline, or fit both curves to a polynomial of appropriate degree, and compare coefficients. If you'd like just a rough estimate, do as follows:
def resample(path, numPoints)
pathLength = pathLength(path) #write this function
segments = generateSegments(path)
currentSegment = next(segments)
segmentsSoFar = [currentSegment]
for i in range(numPoints):
samplePosition = i/(numPoints-1)*pathLength
while samplePosition > pathLength(segmentsSoFar)+currentSegment.length:
currentSegment = next(segments)
segmentsSoFar.insert(currentSegment)
difference = samplePosition - pathLength(segmentsSoFar)
howFar = difference/currentSegment.length
yield Point((1-howFar)*currentSegment.start + (howFar)*currentSegment.end)
This can be modified from a linear resampling to something better.
def error(pathA, pathB):
pathA = inCanonicalOrientation(pathA)
pathB = inCanonicalOrientation(pathB)
higherResolution = max([len(pathA), len(pathB)])
resampledA = resample(pathA, higherResolution)
resampledB = resample(pathA, higherResolution)
error = sum(
abs(pointInA-pointInB)
for pointInA,pointInB in zip(pathA,pathB)
)
averageError = error / len(pathAorB)
normalizedError = error / Z(AorB)
return normalizedError
Where Z is something like the "diameter" of your path, perhaps the maximum Euclidean distance between any two points in a path.
I would use a curve-fitting procedure, but also throw in a constant term, i.e. 0 =B0 + B1*X + B2*Y + B3*X*Y + B4*X^2 etc. This would catch the translational variance and then you can do a statistical comparison of the estimated coefficients of the curves formed by the two sets of points as a way of classifying them. I'm assuming you'll have to do bi-variate interpolation if the data form arbitrary curves in the x-y plane.

Resources