I am trying to plot large amounts of points using some library. The points are ordered by time and their values can be considered unpredictable.
My problem at the moment is that the sheer number of points makes the library take too long to render. Many of the points are redundant (that is - they are "on" the same line as defined by a function y = ax + b). Is there a way to detect and remove redundant points in order to speed rendering ?
Thank you for your time.
The following is a variation on the Ramer-Douglas-Peucker algorithm for 1.5d graphs:
Compute the line equation between first and last point
Check all other points to find what is the most distant from the line
If the worst point is below the tolerance you want then output a single segment
Otherwise call recursively considering two sub-arrays, using the worst point as splitter
In python this could be
def simplify(pts, eps):
if len(pts) < 3:
return pts
x0, y0 = pts[0]
x1, y1 = pts[-1]
m = float(y1 - y0) / float(x1 - x0)
q = y0 - m*x0
worst_err = -1
worst_index = -1
for i in xrange(1, len(pts) - 1):
x, y = pts[i]
err = abs(m*x + q - y)
if err > worst_err:
worst_err = err
worst_index = i
if worst_err < eps:
return [(x0, y0), (x1, y1)]
else:
first = simplify(pts[:worst_index+1], eps)
second = simplify(pts[worst_index:], eps)
return first + second[1:]
print simplify([(0,0), (10,10), (20,20), (30,30), (50,0)], 0.1)
The output is [(0, 0), (30, 30), (50, 0)].
About python syntax for arrays that may be non obvious:
x[a:b] is the part of array from index a up to index b (excluded)
x[n:] is the array made using elements of x from index n to the end
x[:n] is the array made using first n elements of x
a+b when a and b are arrays means concatenation
x[-1] is the last element of an array
An example of the results of running this implementation on a graph with 100,000 points with increasing values of eps can be seen here.
I came across this question after I had this very idea. Skip redundant points on plots. I believe I came up with a far better and simpler solution and I'm happy to share as my first proposed solution on SO. I've coded it and it works well for me. It also takes into account the screen scale. There may be 100 points in value between those plot points, but if the user has a chart sized small, they won't see them.
So, iterating through your data/plot loop, before you draw/add your next data point, look at the next value ahead and calculate the change in screen scale (or value, but I think screen scale for the above-mentioned reason is better). Now do the same for the next value ahead (getting these values is just a matter of peeking ahead in your array/collection/list/etc adding the for next step increment (probably 1/2) to the current for value whilst in the loop). If the 2 values are the same (or perhaps very minor change, per your own preference), you can skip this one point in your chart by simply adding 'continue' in the loop, skipping adding the data point as the point lies exactly on the slope between the point before and after it.
Using this method, I reduce a chart from 963 points to 427 for example, with absolutely zero visual change.
I think you might need to perhaps read this a couple of times to understand, but it's far simpler than the other best solution mentioned here, much lighter weight, and has zero visual effect on your plot.
I would probably apply a "least squares" algorithm to obtain a line of best fit. You can then go through your points and downfilter consecutive points that lie close to the line. You only need to plot the outliers, and the points that take the curve back to the line of best fit.
Edit: You may not need to employ "least squares"; if your input is expected to hover around "y=ax+b" as you say, then that's already your line of best fit and you can just use that. :)
Related
I have about 50 datasets that include all trades within a timeframe of 30 days for about 10 pairs on 5 exchanges. All pairs are of the same asset class, meaning they are strongly correlated and expect to have similar properties, but are on different scales. An example of this data would be
set.seed(1)
n <- 1000
dates <- seq(as.POSIXct("2019-08-05 00:00:00", tz="UTC"), as.POSIXct("2019-08-05 23:59:00", tz="UTC"), by="1 min")
x <- data.frame("t" = sort(sample(dates, 1000)),"p" = cumsum(sample(c(-1, 1), n, TRUE)))
Roughly, I need to identify the relevant local minima and maxima, which happen daily. The yellow marks are my points of interest. Unlike this example, there is usually only one such point per day and I consider each day separately. However, it is hard to filter out noise from my actual points of interest.
My actual goal is to find the exact point, at which the pair started to make a jump and the exact point, at which the jump is over. This needs to be as accurate as possible, as I want to observe which asset moved first and which asset followed at which point in time (as said, they are highly correlated).
Between two extreme values, I want to minimize the distance and maximize the relative/absolute change, as my points of interest are usually close to each other and their difference is quite large.
I already looked at other questions like
Finding local maxima and minima and Algorithm to locate local maxima and also this algorithm that has the same goal. However, my dataset is extremely noisy. I already reduced the dataset to 5-minute intervals, however, this has led to omitting the relevant points in the functions to identify local minima & maxima. Therefore, this was a not good solution given my goal.
How can I achieve my goal with a quite accurate algorithm? Manually skimming through all the time-series is not an option, since this would require me to evaluate 50 * 30 time-series manually, which is too time-consuming. I'm really puzzled and trying to find a suitable solution for a week.
If more code snippets are demanded, I'm happy to share, however they didn't give me meaningful results, which would be opposed to the idea of providing a minimum working example, therefore I decided to leave them out for now.
EDIT:
First off, I updated the plot and added timestamps to the dataset to give you an idea (the actual resolution). Ideally, the algorithm would detect both jumps on the left. The inner two dots because they're closer together and jump without interception, and the outer dots because they're more extreme in values. In fact, this maybe answers the question whether the algorithm is allowed to look into the future. Yes, if there's another local extrema in the range of, say, 30 observations (or 30 minutes), then ignore the intermediate local extrema.
In my data, jumps have been from 2% - ~ 15%, such that a jump needs to be at least 2% to be considered. And only if a threshold of 15 (this might be adaptable) consecutive steps in the same direction before / after the peaks and valleys is reached.
A very naive approach was to subset the data around the global minimum and maximum of a day. In most cases, this has denoised data and worked as an indicator. However, this is not robust when the global extrema are not in the range of the jump.
Hope this clarifies why this isn't a statistical question (there are some tests to determine whether a jump has happened, but not for jump arrival time afaik).
In case anyone wants a real example:
this is a corresponding graph, this is the raw data of the relevant period and this is the reduced dataset.
Perhaps as a starting point, look at function streaks
in package PMwR (which I maintain). A streak is
defined as a move of a specified size that is
uninterrupted by a countermove of the same size. The
function works with returns, not differences, so I add
100 to your data.
For instance:
set.seed(1)
n <- 1000
x <- 100 + cumsum(sample(c(-1, 1), n, TRUE))
plot(x, type = "l")
s <- streaks(x, up = 0.12, down = -0.12)
abline(v = s[, 1])
abline(v = s[, 2])
The vertical lines show the starts and ends of streaks.
Perhaps you can then filter the identified streaks by required criteria such as length. Or
you may play around with different thresholds for up
and down moves (though this is not really recommended
in the current implementation, but perhaps the results
are good enough). For instance, up streaks might look as follows. A green vertical shows the start of a streak; a red line shows its end.
plot(x, type = "l")
s <- streaks(x, up = 0.12, down = -0.05)
s <- s[!is.na(s$state) & s$state == "up", ]
abline(v = s[, 1], col = "green")
abline(v = s[, 2], col = "red")
Summary of Question:
Are there any easy to implement algorithms for reducing the number of points needed to represent a time series without altering how it appears in a plot?
Motivating Problem:
I'm trying to interactively visualize 10 to 15 channels of data logged from an embedded system at ~20 kHz. Logs can cover upwards of an hour of time which means that I'm dealing with between 1e8 and 1e9 points. Further, I care about potentially small anomalies that last for very short periods of time (i.e. less than 1 ms) such that simple decimation isn't an option.
Not surprisingly, most plotting libraries get a little sad if you do the naive thing and try to hand them arrays of data larger than the dedicated GPU memory. It's actually a bit worse than this on my system; using a vector of random floats as a test case, I'm only getting about 5e7 points out of the stock Matlab plotting function and Python + matplotlib before my refresh rate drops below 1 FPS.
Existing Questions and Solutions:
This problem is somewhat similar to a number of existing questions such as:
How to plot large data vectors accurately at all zoom levels in real time?
How to plot large time series (thousands of administration times/doses of a medication)?
[Several Cross Validated questions]
but deals with larger data sets and/or is more stringent about fidelity at the cost of interactivity (it would be great to get 60 FPS silky smooth panning and zooming, but realistically, I would be happy with 1 FPS).
Clearly, some form of data reduction is needed. There are two paradigms that I have found while searching for existing tools that solve my problem:
Decimate but track outliers: A good example of this is Matlab + dsplot (i.e. the tool suggested in the accepted answer of the first question I linked above). dsplot decimates down to a fixed number of evenly spaced points, but then adds back in outliers identified using the standard deviation of a high pass FIR filter. While this is probably a viable solution for several classes of data, it potentially has difficulties if there is substantial frequency content past the filter cutoff frequency and may require tuning.
Plot min and max: With this approach, you divide the time series up in to intervals corresponding to each horizontal pixel and plot just the minimum and maximum values in each interval. Matlab + Plot (Big) is a good example of this, but uses an O(n) calculation of min and max making it a bit slow by the time you get to 1e8 or 1e9 points. A binary search tree in a mex function or python would solve this problem, but is complicated to implemented.
Are there any simpler solutions that do what I want?
Edit (2018-02-18): Question refactored to focus on algorithms instead of tools implementing algorithms.
I had the very same problem displaying pressure timeseries of hundreds of sensors, with samples every minute for several years. In some cases (like when cleaning the data), I wanted to see all the outliers, others I was more interested in the trend. So I wrote a function that can reduce the number of data points using two methods: visvalingam and Douglas-Peucker. The first tend to remove outliers, and the second keeps them. I've optimized the function to work over large datasets.
I did that after realizing that all the plotting methods weren't capable to handle that many points, and the ones that did, were decimating the dataset in a way that I couldn't control. The function is the following:
function [X, Y, indices, relevance] = lineSimplificationI(X,Y,N,method,option)
%lineSimplification Reduce the number of points of the line described by X
%and Y to N. Preserving the most relevant ones.
% Using an adapted method of visvalingam and Douglas-Peucker algorithms.
% The number of points of the line is reduced iteratively until reaching
% N non-NaN points. Repeated NaN points in original data are deleted but
% non-repeated NaNs are preserved to keep line breaks.
% The two available methods are
%
% Visvalingam: The relevance of a point is proportional to the area of
% the triangle defined by the point and its two neighbors.
%
% Douglas-Peucker: The relevance of a point is proportional to the
% distance between it and the straight line defined by its two neighbors.
% Note that the implementation here is iterative but NOT recursive as in
% the original algorithm. This allows to better handle large data sets.
%
% DIFFERENCES: Visvalingam tend to remove outliers while Douglas-Peucker
% keeps them.
%
% INPUTS:
% X: X coordinates of the line points
% Y: Y coordinates of the line points
% method: Either 'Visvalingam' or 'DouglasPeucker' (default)
% option: Either 'silent' (default) or 'verbose' if additional outputs
% of the calculations are desired.
%
% OUTPUTS:
% X: X coordinates of the simplified line points
% Y: Y coordinates of the simplified line points
% indices: Indices to the positions of the points preserved in the
% original X and Y. Therefore Output X is equal to the input
% X(indices).
% relevance: Relevance of the returned points. It can be used to furder
% simplify the line dinamically by keeping only points with
% higher relevance. But this will produce bigger distortions of
% the line shape than calling again lineSimplification with a
% smaller value for N, as removing a point changes the relevance
% of its neighbors.
%
% Implementation by Camilo Rada - camilo#rada.cl
%
if nargin < 3
error('Line points positions X, Y and target point count N MUST be specified');
end
if nargin < 4
method='DouglasPeucker';
end
if nargin < 5
option='silent';
end
doDisplay=strcmp(option,'verbose');
X=double(X(:));
Y=double(Y(:));
indices=1:length(Y);
if length(X)~=length(Y)
error('Vectors X and Y MUST have the same number of elements');
end
if N>=length(Y)
relevance=ones(length(Y),1);
if doDisplay
disp('N is greater or equal than the number of points in the line. Original X,Y were returned. Relevances were not computed.')
end
return
end
% Removing repeated NaN from Y
% We find all the NaNs with another NaN to the left
repeatedNaNs= isnan(Y(2:end)) & isnan(Y(1:end-1));
%We also consider a repeated NaN the first element if NaN
repeatedNaNs=[isnan(Y(1)); repeatedNaNs(:)];
Y=Y(~repeatedNaNs);
X=X(~repeatedNaNs);
indices=indices(~repeatedNaNs);
%Removing trailing NaN if any
if isnan(Y(end))
Y=Y(1:end-1);
X=X(1:end-1);
indices=indices(1:end-1);
end
pCount=length(X);
if doDisplay
disp(['Initial point count = ' num2str(pCount)])
disp(['Non repeated NaN count in data = ' num2str(sum(isnan(Y)))])
end
iterCount=0;
while pCount>N
iterCount=iterCount+1;
% If the vertices of a triangle are at the points (x1,y1) , (x2, y2) and
% (x3,y3) the are uf such triangle is
% area = abs((x1*(y2-y3)+x2*(y3-y1)+x3*(y1-y2))/2)
% now the areas of the triangles defined by each point of X,Y and its two
% neighbors are
twiceTriangleArea =abs((X(1:end-2).*(Y(2:end-1)-Y(3:end))+X(2:end-1).*(Y(3:end)-Y(1:end-2))+X(3:end).*(Y(1:end-2)-Y(2:end-1))));
switch method
case 'Visvalingam'
% In this case the relevance is given by the area of the
% triangle formed by each point end the two points besides
relevance=twiceTriangleArea/2;
case 'DouglasPeucker'
% In this case the relevance is given by the minimum distance
% from the point to the line formed by its two neighbors
neighborDistances=ppDistance([X(1:end-2) Y(1:end-2)],[X(3:end) Y(3:end)]);
relevance=twiceTriangleArea./neighborDistances;
otherwise
error(['Unknown method: ' method]);
end
relevance=[Inf; relevance; Inf];
%We remove the pCount-N least relevant points as long as they are not contiguous
[srelevance, sortorder]= sort(relevance,'descend');
firstFinite=find(isfinite(srelevance),1,'first');
startPos=uint32(firstFinite+N+1);
toRemove=sort(sortorder(startPos:end));
if isempty(toRemove)
break;
end
%Now we have to deal with contigous elements, as removing one will
%change the relevance of the neighbors. Therefore we have to
%identify pairs of contigous points and only remove the one with
%leeser relevance
%Contigous will be true for an element if the next or the previous
%element is also flagged for removal
contiguousToKeep=[diff(toRemove(:))==1; false] | [false; (toRemove(1:end-1)-toRemove(2:end))==-1];
notContiguous=~contiguousToKeep;
%And the relevances asoociated to the elements flagged for removal
contRel=relevance(toRemove);
% Now we rearrange contigous so it is sorted in two rows, therefore
% if both rows are true in a given column, we have a case of two
% contigous points that are both flagged for removal
% this process is demenden of the rearrangement, as contigous
% elements can end up in different colums, so it has to be done
% twice to make sure no contigous elements are removed
nContiguous=length(contiguousToKeep);
for paddingMode=1:2
%The rearragngement is only possible if we have an even number of
%elements, so we add one dummy zero at the end if needed
if paddingMode==1
if mod(nContiguous,2)
pcontiguous=[contiguousToKeep; false];
pcontRel=[contRel; -Inf];
else
pcontiguous=contiguousToKeep;
pcontRel=contRel;
end
else
if mod(nContiguous,2)
pcontiguous=[false; contiguousToKeep];
pcontRel=[-Inf; contRel];
else
pcontiguous=[false; contiguousToKeep(1:end-1)];
pcontRel=[-Inf; contRel(1:end-1)];
end
end
contiguousPairs=reshape(pcontiguous,2,[]);
pcontRel=reshape(pcontRel,2,[]);
%finding colums with contigous element
contCols=all(contiguousPairs);
if ~any(contCols) && paddingMode==2
break;
end
%finding the row of the least relevant element of each column
[~, lesserElementRow]=max(pcontRel);
%The index in contigous of the first element of each pair is
if paddingMode==1
firstElementIdx=((1:size(contiguousPairs,2))*2)-1;
else
firstElementIdx=((1:size(contiguousPairs,2))*2)-2;
end
% and the index in contigous of the most relevant element of each
% pair is
lesserElementIdx=firstElementIdx+lesserElementRow-1;
%now we set the least relevant element as NOT continous, so it is
%removed
contiguousToKeep(lesserElementIdx(contCols))=false;
end
%and now we delete the relevant continous points from the toRemove
%list
toRemove=toRemove(contiguousToKeep | notContiguous);
if any(diff(toRemove(:))==1) && doDisplay
warning([num2str(sum(diff(toRemove(:))==1)) ' continous elements removed in one iteration.'])
end
toRemoveLogical=false(pCount,1);
toRemoveLogical(toRemove)=true;
X=X(~toRemoveLogical);
Y=Y(~toRemoveLogical);
indices=indices(~toRemoveLogical);
pCount=length(X);
nRemoved=sum(toRemoveLogical);
if doDisplay
disp(['Iteration ' num2str(iterCount) ', Point count = ' num2str(pCount) ' (' num2str(nRemoved) ' removed)'])
end
if nRemoved==0
break;
end
end
end
function d = ppDistance(p1,p2)
d=sqrt((p1(:,1)-p2(:,1)).^2+(p1(:,2)-p2(:,2)).^2);
end
I am trying to calculate two volumes which are related to each other. In this case as one volume increases it means more of the other volume is possible.
My code is as follows:
Plot[{(6.78966*10^22)(b)},{((9.0226522*10^22)(x))}, {(b, 0, 5.5*10^6),(x, 0, 5.5*10^6)}]
I want this to be plotted on one graph, so it can show the relationship of increasing one volume while the other decreases. However, I can't get this to display in wolfram alpha, a graphing calculator, or mathematica. It seems extremely simple and I am probably just making a dumb error.
The error that is being thrown by mathematica is: ( " cannot be followed by " b,0,5.5*10^6)
But when I try it without the parenthesis it says I do not have enough rules to define my function. Is there a better way to do this?
What I am trying to do is find how many cm^3 of plutonium is needed to convert cm^3 of cadmium. I have done the relationships, but now I am trying to plot it. The maximum volume that can be utilized is 5.5*10^6. So I want one line to end when all of the cm^3 of the volume are cadmium and the other to end when all of the cm^3 is plutonium. This will allow me to find the point in which they intersect optimizing my problem.
Taking the maximum volume, m
m = 5.5*10^6;
Plot[{6.78966*10^22 (m - x), 9.0226522*10^22 x}, {x, 0, m}]
Solve[6.78966*10^22 (m - x) == 9.0226522*10^22 x, x]
{{x -> 2.36165*10^6}}
I have two sequences of length n and m. Each is a sequence of points of the form (x,y) and represent curves in an image. I need to find how different (or similar) these sequences are given that fact that
one sequence is likely longer than the other (i.e., one can be half or a quarter as long as the other, but if they trace approximately the same curve, they are the same)
these sequences could be in opposite directions (i.e., sequence 1 goes from left to right, while sequence 2 goes from right to left)
I looked into some difference estimates like Levenshtein as well as edit-distances in structural similarity matching for protein folding, but none of them seem to do the trick. I could write my own brute-force method but I want to know if there is a better way.
Thanks.
Do you mean that you are trying to match curves that have been translated in x,y coordinates? One technique from image processing is to use chain codes [I'm looking for a decent reference, but all I can find right now is this] to encode each sequence and then compare those chain codes. You could take the sum of the differences (modulo 8) and if the result is 0, the curves are identical. Since the sequences are of different lengths and don't necessarily start at the same relative location, you would have to shift one sequence and do this again and again, but you only have to create the chain codes once. The only way to detect if one of the sequences is reversed is to try both the forward and reverse of one of the sequences. If the curves aren't exactly alike, the sum will be greater than zero but it is not straightforward to tell how different the curves are simply from the sum.
This method will not be rotationally invariant. If you need a method that is rotationally invariant, you should look at Boundary-Centered Polar Encoding. I can't find a free reference for that, but if you need me to describe it, let me know.
A method along these lines might work:
For both sequences:
Fit a curve through the sequence. Make sure that you have a continuous one-to-one function from [0,1] to points on this curve. That is, for each (real) number between 0 and 1, this function returns a point on the curve belonging to it. By tracing the function for all numbers from 0 to 1, you get the entire curve.
One way to fit a curve would be to draw a straight line between each pair of consecutive points (it is not a nice curve, because it has sharp bends, but it might be fine for your purpose). In that case, the function can be obtained by calculating the total length of all the line segments (Pythagoras). The point on the curve corresponding to a number Y (between 0 and 1) corresponds to the point on the curve that has a distance Y * (total length of all line segments) from the first point on the sequence, measured by traveling over the line segments (!!).
Now, after we have obtained such a function F(double) for the first sequence, and G(double) for the second sequence, we can calculate the similarity as follows:
double epsilon = 0.01;
double curveDistanceSquared = 0.0;
for(double d=0.0;d<1.0;d=d+epsilon)
{
Point pointOnCurve1 = F(d);
Point pointOnCurve2 = G(d);
//alternatively, use G(1.0-d) to check whether the second sequence is reversed
double distanceOfPoints = pointOnCurve1.EuclideanDistance(pointOnCurve2);
curveDistanceSquared = curveDistanceSquared + distanceOfPoints * distanceOfPoints;
}
similarity = 1.0/ curveDistanceSquared;
Possible improvements:
-Find an improved way to fit the curves. Note that you still need the function that traces the curve for the above method to work.
-When calculating the distance, consider reparametrizing the function G in such a way that the distance is minimized. (This means you have an increasing function R, such that R(0) = 0 and R(1)=1,
but which is otherwise general. When calculating the distance you use
Point pointOnCurve1 = F(d);
Point pointOnCurve2 = G(R(d));
Subsequently, you try to choose R in such a way that the distance is minimized. (to see what happens, note that G(R(d)) also traces the curve)).
Why not do some sort of curve fitting procedure (least-squares whether it be ordinary or non-linear) and see if the coefficients on the shape parameters are the same. If you run it as a panel-data sort of model, there are explicit statistical tests whether sets of parameters are significantly different from one another. That would solve the problem of the the same curve but sampled at different resolutions.
Step 1: Canonicalize the orientation. For example, let's say that all curved start at the endpoint with lowest lexicographic order.
def inCanonicalOrientation(path):
return path if path[0]<path[-1] else reversed(path)
Step 2: You can either be roughly accurate, or very accurate. If you wish to be very accurate, calculate a spline, or fit both curves to a polynomial of appropriate degree, and compare coefficients. If you'd like just a rough estimate, do as follows:
def resample(path, numPoints)
pathLength = pathLength(path) #write this function
segments = generateSegments(path)
currentSegment = next(segments)
segmentsSoFar = [currentSegment]
for i in range(numPoints):
samplePosition = i/(numPoints-1)*pathLength
while samplePosition > pathLength(segmentsSoFar)+currentSegment.length:
currentSegment = next(segments)
segmentsSoFar.insert(currentSegment)
difference = samplePosition - pathLength(segmentsSoFar)
howFar = difference/currentSegment.length
yield Point((1-howFar)*currentSegment.start + (howFar)*currentSegment.end)
This can be modified from a linear resampling to something better.
def error(pathA, pathB):
pathA = inCanonicalOrientation(pathA)
pathB = inCanonicalOrientation(pathB)
higherResolution = max([len(pathA), len(pathB)])
resampledA = resample(pathA, higherResolution)
resampledB = resample(pathA, higherResolution)
error = sum(
abs(pointInA-pointInB)
for pointInA,pointInB in zip(pathA,pathB)
)
averageError = error / len(pathAorB)
normalizedError = error / Z(AorB)
return normalizedError
Where Z is something like the "diameter" of your path, perhaps the maximum Euclidean distance between any two points in a path.
I would use a curve-fitting procedure, but also throw in a constant term, i.e. 0 =B0 + B1*X + B2*Y + B3*X*Y + B4*X^2 etc. This would catch the translational variance and then you can do a statistical comparison of the estimated coefficients of the curves formed by the two sets of points as a way of classifying them. I'm assuming you'll have to do bi-variate interpolation if the data form arbitrary curves in the x-y plane.
I am writing a physics engine/simulator which incorporates 3D space flight, planetary/stellar gravitation, ship thrust and relativistic effects. So far, it is going very well, however, one thing that I need help with is the math of the collision detection algorithm.
The iterative simulation of movement that I am using is basically as follows:
(Note: 3D Vectors are ALL CAPS.)
For each obj
obj.ACC = Sum(all acceleration influences)
obj.POS = obj.POS + (obj.VEL * dT) + (obj.ACC * dT^2)/2 (*EQ.2*)
obj.VEL = obj.VEL + (obj.ACC * dT)
Next
Where:
obj.ACC is the acceleration vector of the object
obj.POS is the position or location vector of the object
obj.VEL is the velocity vector of the object
obj.Radius is the radius (scalar) of the object
dT is the time delta or increment
What I basically need to do is to find some efficient formula that derives from (EQ.2) above for two objects (obj1, obj2) and tell if they ever collide, and if so, at what time. I need the exact time both so that I can determine if it is in this particular time increment (because acceleration will be different at different time increments) and also so that I can locate the exact position (which I know how to do, given the time)
For this engine, I am modelling all objects as spheres, all this formula/algorithm needs to do is to figure out at what points:
(obj1.POS - obj2.POS).Distance = (obj1.Radius + obj2.Radius)
where .Distance is a positive scalar value. (You can also square both sides if this is easier, to avoid the square root function implicit in the .Distance calculation).
(yes, I am aware of many, many other collision detection questions, however, their solutions all seem to be very particular to their engine and assumptions, and none appear to match my conditions: 3D, spheres, and acceleration applied within the simulation increments. Let me know if I am wrong.)
Some Clarifications:
1) It is not sufficient for me to check for Intersection of the two spheres before and after the time increment. In many cases their velocities and position changes will far exceed their radii.
2) RE: efficiency, I do not need help (at this point anyway) with respect to determine likely candidates for collisions, I think that I have that covered.
Another clarification, which seems to be coming up a lot:
3) My equation (EQ.2) of incremental movement is a quadratic equation that applies both Velocity and Acceleration:
obj.POS = obj.POS + (obj.VEL * dT) + (obj.ACC * dT^2)/2
In the physics engines that I have seen, (and certainly every game engine that I ever heard of) only linear equations of incremental movement that apply only Velocity:
obj.POS = obj.POS + (obj.VEL * dT)
This is why I cannot use the commonly published solutions for collision detection found on StackOverflow, on Wikipedia and all over the Web, such as finding the intersection/closest approach of two line segments. My simulation deals with variable accelerations that are fundamental to the results, so what I need is the intersection/closest approach of two parabolic segments.
On the webpage AShelley referred to, the Closest Point of Approach method is developed for the case of two objects moving at constant velocity. However, I believe the same vector-calculus method can be used to derive a result in the case of two objects both moving with constant non-zero acceleration (quadratic time dependence).
In this case, the time derivative of the distance-squared function is 3rd order (cubic) instead of 1st order. Therefore there will be 3 solutions to the Time of Closest Approach, which is not surprising since the path of both objects is curved so multiple intersections are possible. For this application, you would probably want to use the earliest value of t which is within the interval defined by the current simulation step (if such a time exists).
I worked out the derivative equation which should give the times of closest approach:
0 = |D_ACC|^2 * t^3 + 3 * dot(D_ACC, D_VEL) * t^2 + 2 * [ |D_VEL|^2 + dot(D_POS, D_ACC) ] * t + 2 * dot(D_POS, D_VEL)
where:
D_ACC = ob1.ACC-obj2.ACC
D_VEL = ob1.VEL-obj2.VEL (before update)
D_POS = ob1.POS-obj2.POS (also before update)
and dot(A, B) = A.x*B.x + A.y*B.y + A.z*B.z
(Note that the square of the magnitude |A|^2 can be computed using dot(A, A))
To solve this for t, you'll probably need to use formulas like the ones found on Wikipedia.
Of course, this will only give you the moment of closest approach. You will need to test the distance at this moment (using something like Eq. 2). If it is greater than (obj1.Radius + obj2.Radius), it can be disregarded (i.e. no collision). However, if the distance is less, that means the spheres collide before this moment. You could then use an iterative search to test the distance at earlier times. It might also be possible to come up with another (even more complicated) derivation which takes the size into account, or possible to find some other analytic solution, without resorting to iterative solving.
Edit: because of the higher order, some of the solutions to the equation are actually moments of farthest separation. I believe in all cases either 1 of the 3 solutions or 2 of the 3 solutions will be a time of farthest separation. You can test analytically whether you're at a min or a max by evaluating the second derivative with respect to time (at the values of t which you found by setting the first derivative to zero):
D''(t) = 3 * |D_ACC|^2 * t^2 + 6 * dot(D_ACC, D_VEL) * t + 2 * [ |D_VEL|^2 + dot(D_POS, D_ACC) ]
If the second derivative evaluates to a positive number, then you know the distance is at a minimum, not a maximum, for the given time t.
Draw a line between the start location and end location of each sphere. If the resulting line segments intersect the spheres definitely collided at some point and some clever math can find at what time the collision occurred. Also make sure to check if the minimum distance between the segments (if they don't intersect) is ever less than 2*radius. This will also indicate a collision.
From there you can backstep your delta time to happen exactly at collision so you can correctly calculate the forces.
Have you considered using a physics library which already does this work? Many libraries use far more advanced and more stable (better integrators) systems for solving the systems of equations you're working with. Bullet Physics comes to mind.
op asked for time of collision. A slightly different approach will compute it exactly...
Remember that the position projection equation is:
NEW_POS=POS+VEL*t+(ACC*t^2)/2
If we replace POS with D_POS=POS_A-POS_B, VEL with D_VEL=VEL_A-VEL_B, and ACC=ACC_A-ACC_B for objects A and B we get:
$D_NEW_POS=D_POS+D_VEL*t+(D_ACC*t^2)/2
This is the formula for vectored distance between the objects. In order to get the squared scalar distance between them, we can take the square of this equation, which after expansion looks like:
distsq(t) = D_POS^2+2*dot(D_POS,D_VEL)*t + (dot(D_POS, D_ACC)+D_VEL^2)*t^2 + dot(D_VEL,D_ACC)*t^3 + D_ACC^2*t^4/4
In order to find the time where collision occurs, we can set the equation equal to the square of the sum of radii and solve for t:
0 = D_POS^2-(r_A+r_B)^2 + 2*dot(D_POS,D_VEL)*t + (dot(D_POS, D_ACC)+D_VEL^2)*t^2 + dot(D_VEL,D_ACC)*t^3 + D_ACC^2*t^4/4
Now, we can solve for the equation using the quartic formula.
The quartic formula will yield 4 roots, but we are only interested in real roots. If there is a double real root, then the two objects touch edges at exactly one point in time. If there are two real roots, then the objects continuously overlap between root 1 and root 2 (i.e. root 1 is the time when collision starts and root 2 is the time when collision stops). Four real roots means that the objects collide twice, continuously between root pairs 1,2 and 3,4.
In R, I used polyroot() to solve as follows:
# initial positions
POS_A=matrix(c(0,0),2,1)
POS_B=matrix(c(2,0),2,1)
# initial velocities
VEL_A=matrix(c(sqrt(2)/2,sqrt(2)/2),2,1)
VEL_B=matrix(c(-sqrt(2)/2,sqrt(2)/2),2,1)
# acceleration
ACC_A=matrix(c(sqrt(2)/2,sqrt(2)/2),2,1)
ACC_B=matrix(c(0,0),2,1)
# radii
r_A=.25
r_B=.25
# deltas
D_POS=POS_B-POS_A
D_VEL=VEL_B-VEL_A
D_ACC=ACC_B-ACC_A
# quartic coefficients
z=c(t(D_POS)%*%D_POS-r*r, 2*t(D_POS)%*%D_VEL, t(D_VEL)%*%D_VEL+t(D_POS)%*%D_ACC, t(D_ACC)%*%D_VEL, .25*(t(D_ACC)%*%D_ACC))
# get roots
roots=polyroot(z)
# In this case there are only two real roots...
root1=as.numeric(roots[1])
root2=as.numeric(roots[2])
# trajectory over time
pos=function(p,v,a,t){
T=t(matrix(t,length(t),2))
return(t(matrix(p,2,length(t))+matrix(v,2,length(t))*T+.5*matrix(a,2,length(t))*T*T))
}
# plot A in red and B in blue
t=seq(0,2,by=.1) # from 0 to 2 seconds.
a1=pos(POS_A,VEL_A,ACC_A,t)
a2=pos(POS_B,VEL_B,ACC_B,t)
plot(a1,type='o',col='red')
lines(a2,type='o',col='blue')
# points of a circle with center 'p' and radius 'r'
circle=function(p,r,s=36){
e=matrix(0,s+1,2)
for(i in 1:s){
e[i,1]=cos(2*pi*(1/s)*i)*r+p[1]
e[i,2]=sin(2*pi*(1/s)*i)*r+p[2]
}
e[s+1,]=e[1,]
return(e)
}
# plot circles with radius r_A and r_B at time of collision start in black
lines(circle(pos(POS_A,VEL_A,ACC_A,root1),r_A))
lines(circle(pos(POS_B,VEL_B,ACC_B,root1),r_B))
# plot circles with radius r_A and r_B at time of collision stop in gray
lines(circle(pos(POS_A,VEL_A,ACC_A,root2),r_A),col='gray')
lines(circle(pos(POS_B,VEL_B,ACC_B,root2),r_B),col='gray')
Object A follows the red trajectory from the lower left to the upper right. Object B follows the blue trajectory from the lower right to the upper left. The two objects collide continuously between time 0.9194381 and time 1.167549. The two black circles just touch, showing the beginning of overlap - and overlap continues in time until the objects reach the location of the gray circles.
Seems like you want the Closest Point of Approach (CPA). If it is less than the sum of the radiuses, you have a collision. There is example code in the link. You can calculate each frame with the current velocity, and check if the CPA time is less than your tick size. You could even cache the cpa time, and only update when acceleration was applied to either item.