pointwise envelopes not including Theoretical line Foxall J - r

I am computing pointwise envelopes for the Foxall's J function to investigate the whether some point patterns of interest are clustered, avoid or are independent from other point patterns or polygons.
I am doing this using spatstat with a syntax similar to this:
evelope(my_pattern_of_interest, fun=Jfox, funargs=list(Y=my_other_patten),...)
I am calculating the envelopes for several replicated patterns (i.e., different transects) and then plotting the pooled (i.e., pool) the envelopes.
So far, whenever the Jfox is calculated between 2 point patterns, the shaded area representing the simulation envelope includes the theoretical line (i.e., represented by red dashes), as in this example:
Figure 1
Instead, when Jfox is calculated between a point pattern and polygons, it is frequent that the envelope area does not include the Theroretical line (at least in some regions). Like in this example:
Figure 2
What does this means?
From my understanding so far, if - or rather "where" since these are pointwise envelopes - the observed line (solid black) is within the shaded area, it means that the observed pattern (say clustering as in Figure 2) is not significantly different from what could be observed at random. Alternatively, if/where the observed line is out of the shaded area, then I have a significant difference.
What does the fact that the theoretical line lays outside the shaded area mean? Is it giving me another piece of information? How should I interpret this?
Thank you.

Related

Spatstat, using the Matérn cluster process to generate homogeneous landscapes, how do I interpret the Ripley K function?

I am looking to develop a point process that ranges from homogeneous, i.e. no correlation between points to a point cluster process that does have correlation between points. From experimentation I can see that using the Matérn cluster process I can generate landscapes that are clustered.
library(spatstat)
plot(rMatClust(kappa=3,r=0.1,mu=50))
I want to use the simplest code that increases the level of homogeneity, i.e. decreasing dependence of points on each other. I do not want to use a binary model where either the pattern is homogeneous or not. i.e. Just a poisson process which can be generated such as:
plot(rpoispp(150))
From experimentation I noticed that if I increase the radius of the clusters using the Matérn cluster process, I do seem to create a pseudo homogeneous pattern.
plot(rMatClust(kappa=3,r=0.3,mu=50))
plot(rMatClust(kappa=3,r=0.7,mu=50))
Is this a good way of generating degrees of homogeneity? I understand that I can use statistical tests to measure the degree of clustering compared to a complete poisson process, such as the Ripley K test. For example, if I assign the Matérn cluster process data to variables, such as:
a<-rMatClust(kappa=3,r=0.1,mu=50)
b<-rMatClust(kappa=3,r=0.3,mu=50)
c<-rMatClust(kappa=3,r=0.7,mu=50)
Then use the Ripley K test and plot the results:
plot(Kest(a))
plot(Kest(b))
plot(Kest(c))
I can see that the difference between a homogeneous poisson process and the clustered point process decreases. I still do not fully understand the significance of the various K values according to edge effects and so forth, and how to interpret the Ripley K function, but I think this is the right direction to be heading in? How do I interpret the Ripley K function? Another problem is the number of points in each plot, I do not have a consistent number of points in each plot, as can be seen by:
summary(a)
summary(b)
summary(c)
Any knowledgeable feedback on this is greatly appreciated.
The standard terminology is that you want to generate a clustered point pattern.
The function rMatClust generates a clustered point pattern at random, in a two-stage process. The first stage is to generate "parent" points completely at random. The second stage is to generate, for each "parent", a random number of "offspring" points, and to place the "offspring" points inside a circle of radius R around their "parent". The final result is the collection of all "offspring" points. From this description (and help(rMatClust)) you can figure out what happens for different parameter values.
The K function (not the "K test") is a summary of the spacing between points in a point pattern. At a distance r, the value of K(r) is the normalised average number of points observed to fall within distance r of a typical point in the pattern. It is normalised so that it does not depend on the number of points, making it possible to compare patterns with different numbers of points.
When you plot the K function, one of the curves is the theoretical curve that would be expected if the points are completely random, and the other curves are computed from the data point pattern. This allows you to assess whether the point pattern appears to be clustered.
I strongly suggest you do some reading in Chapter 7 of the spatstat book. You can download this chapter for free.

Approximate a list of points with a short list of Bézier curves

I have a list of (x, y) points. I know how to make a list of Bézier curves which pass through all of those points and have a continuous first (and second, though less important) derivative. However, the list that I end up with is far too long. I would prefer to approximate the points I have if it lets me cut down on the number of curves I have. I would like to be able to pass a parameter of either how close an approximation I get or a maximum number of curves, preferably the former.
The reason I want this is that the end result will have a graphical UI where users can edit the Bézier curves, and it isn't critical that the curves pass through each point exactly, as long as they are close. More curves makes it harder to edit.
EDIT:
Some more information about the purpose of this. I'm trying to make image editing software. When someone loads a bitmap, I want to be able to trace a center line. Potrace is what I would use to trace the outline of a shape, but it won't work for tracing strokes. I've been able to identify lots of points along the center line, and I want to turn this data into a list of connected Bézier curves. The reason I don't want to make a Bézier spline is that there will be too many control points for this to be easy to edit. "Too many" is not an easy-to-define term, but I would like to be able to pass a parameter to limit the number of curves. Either a function that minimizes how far the curves are from the points based on a maximum number of curves or a function that minimizes the number of curves based on a maximum deviation from the points.
Several approaches exist for achieving what you want to do:
1) Use RDP algorithm to reduce the number of points, then create a list of Bezier curves passing thru the remaining points.
2) Use curve fitting algorithms (for example, Schneider algorithm) to produce multiple Bezier curves that are connected with G1 (tangent) continuity. Check out Schneider algorithm implementation in this link.
3) Use least square fitting with B-spline to produce a single B-spline curve.
From implementation point of view, approach 1 is probably the easiest one for you as you already know how to create Bezier curves interpolating a list of points. Approach 3 will be much more difficult to implement and you probably will have to convert the B-spline curve into Bezier curves so as to use them at the UI level. Please refer to this SO article for detailed discussion.

How to smooth hand drawn lines?

So I am using Kinect with Unity.
With the Kinect, we detect a hand gesture and when it is active we draw a line on the screen that follows where ever the hand is going. Every update the location is stored as the newest (and last) point in a line. However the lines can often look very choppy.
Here is a general picture that shows what I want to achieve:
With the red being the original line, and the purple being the new smoothed line. If the user suddenly stops and turns direction, we think we want it to not exactly do that but instead have a rapid turn or a loop.
My current solution is using Cubic Bezier, and only using points that are X distance away from each other (with Y points being placed between the two points using Cubic Bezier). However there are two problems with this, amongst others:
1) It often doesn't preserve the curves to the distance outwards the user drew them, for example if the user suddenly stop a line and reverse the direction there is a pretty good chance the line won't extend to point where the user reversed the direction.
2) There is also a chance that the selected "good" point is actually a "bad" random jump point.
So I've thought about other solutions. One including limiting the max angle between points (with 0 degrees being a straight line). However if the point has an angle beyond the limit the math behind lowering the angle while still following the drawn line as best possible seems complicated. But maybe it's not. Either way I'm not sure what to do and looking for help.
Keep in mind this needs to be done in real time as the user is drawing the line.
You can try the Ramer-Douglas-Peucker algorithm to simplify your curve:
https://en.wikipedia.org/wiki/Ramer%E2%80%93Douglas%E2%80%93Peucker_algorithm
It's a simple algorithm, and parameterization is reasonably intuitive. You may use it as a preprocessing step or maybe after one or more other algorithms. In any case it's a good algorithm to have in your toolbox.
Using angles to reject "jump" points may be tricky, as you've seen. One option is to compare the total length of N line segments to the straight-line distance between the extreme end points of that chain of N line segments. You can threshold the ratio of (totalLength/straightLineLength) to identify line segments to be rejected. This would be a quick calculation, and it's easy to understand.
If you want to take line segment lengths and segment-to-segment angles into consideration, you could treat the line segments as vectors and compute the cross product. If you imagine the two vectors as defining a parallelogram, and if knowing the area of the parallegram would be a method to accept/reject a point, then the cross product is another simple and quick calculation.
https://www.math.ucdavis.edu/~daddel/linear_algebra_appl/Applications/Determinant/Determinant/node4.html
If you only have a few dozen points, you could randomly eliminate one point at a time, generate your spline fits, and then calculate the point-to-spline distances for all the original points. Given all those point-to-spline distances you can generate a metric (e.g. mean distance) that you'd like to minimize: the best fit would result from eliminating points (Pn, Pn+k, ...) resulting in a spline fit quality S. This technique wouldn't scale well with more points, but it might be worth a try if you break each chain of line segments into groups of maybe half a dozen segments each.
Although it's overkill for this problem, I'll mention that Euler curves can be good fits to "natural" curves. What's nice about Euler curves is that you can generate an Euler curve fit by two points in space and the tangents at those two points in space. The code gets hairy, but Euler curves (a.k.a. aesthetic curves, if I remember correctly) can generate better and/or more useful fits to natural curves than Bezier nth degree splines.
https://en.wikipedia.org/wiki/Euler_spiral

Control Charts in R

I am trying to create a control chart for metrics that are essentially increasing over time. If I attempt to create a Shewhart chart, there will be many points that are above the upper specification limit.
So for example,
My metric is Revenue. Since it is a fast growing company, Revenue is going to be increasing over the specification limit over time. The main thing I want to track is when it is below the lower specification limit.
I know this is very vague but essentially I want to create a control chart that has data increasing over time.
Thanks
Besterfield in his book Quality Control, sixth edition answers this question. He discusses this as a "Chart for Trends".
Chart for Trends created in Excel
The process involves regression to determine the slope of your center line. The equation is $$\overline{X}=a+bG$$ where $\overline{X}$ is the subgroup average, G is the subgroup number, a is the intercept, and b is the slope.
$$a=\frac{(\sum \overline{X})(\sum G^2)-(\sum G)(\sum G\overline{X})}{g\sum G^2-(\sum G)^2}$$
$$b=\frac{g\sum G \overline{X}-(\sum G)(\sum \overline{X})}{g\sum G^2-(\sum G)^2}$$
where g is the number of subgroups.
The coefficients of a and b are obtained by establishing by columns for G, $\overline{X}$, G$\overline{X}$, and G^2 … ; determining their sums; and inserting their sums into the equation.
Once the trend-line equation is known, it can be plotted on the chart by assuming values of G and calculating $\overline{X}$. When two points are plotted, the trend line is drawn between them. The control limits are drawn on each side of the trend line a distance (in the perpendicular direction) equal to $$A_2\overline{R}$$ …
The R chart will generally have the typical appearance… However the dispersion may also be increasing.
Besterfield also suggests having a URL and LRL or Upper Rejection Limit and Lower Rejection Limit as lines parallel to the horizontal axis and indicate times when the process would be unacceptable.
Insert the codes between the "$$" into the Online LaTeX Equation Editor if you would like to visualize the equations easier (a limitation of my reputation and this page).
Dropped by here today; it’s an old question that probably has been answered. If not, try a seasonality analysis and plot your deseasonalised séries (tendence removed) on a control chart tool like qicharts2 (in R). If you need more details, just let me know, I can elaborate a more detailed answer.
Sounds like plot() is your game here. Time on the X, revenue on the Y with an abline(h= ) for your specification limit. Set a horizontal line where your specification limit is and you are good to go.

Minimising interpolation error between two data sets

In the top of the diagrams below we can see some value (y-axis) changing over time (x-axis).
As this happens we are sampling the value at different and unpredictable times, also we are alternating the sampling between two data sets, indicated by red and blue.
When computing the value at any time, we expect that both red and blue data sets will return similar values. However as shown in the three smaller boxes this is not the case. Viewed over time the values from each data set (red and blue) will appear to diverge and then converge about the original value.
Initially I used linear interpolation to obtain a value, next I tried using Catmull-Rom interpolation. The former results in a values come close together and then drift apart between each data point; the latter results in values which remain closer, but where the average error is greater.
Can anyone suggest another strategy or interpolation method which will provide greater smoothing (perhaps by using a greater number of sample points from each data set)?
I believe what you ask is a question that does not have a straight answer without further knowledge on the underlying sampled process. By its nature, the value of the function between samples can be merely anything, so I think there is no way to assure the convergence of the interpolations of two sample arrays.
That said, if you have a prior knowledge of the underlying process, then you can choose among several interpolation methods to minimize the errors. For example, if you measure the drag force as a function of the wing velocity, you know the relation is square (a*V^2). Then you can choose polynomial fitting of the 2nd order and have pretty good match between the interpolations of the two serieses.
Try B-splines: Catmull-Rom interpolates (goes through the data points), B-spline does smoothing.
For example, for uniformly-spaced data (not your case)
Bspline(t) = (data(t-1) + 4*data(t) + data(t+1)) / 6
Of course the interpolated red / blue curves depend on the spacing of the red / blue data points,
so cannot match perfectly.
I'd like to quote Introduction to Catmull-Rom Splines to suggest not using Catmull-Rom for this interpolation task.
One of the features of the Catmull-Rom
spline is that the specified curve
will pass through all of the control
points - this is not true of all types
of splines.
By definition your red interpolated curve will pass through all red data points and your blue interpolated curve will pass through all blue points. Therefore you won't get a best fit for both data sets.
You might change your boundary conditions and use data points from both data sets for a piecewise approximation as shown in these slides.
I agree with ysap that this question cannot be answered as you may be expecting. There may be better interpolation methods, depending on your model dynamics - as with ysap, I recommend methods that utilize the underlying dynamics, if known.
Regarding the red/blue samples, I think you have made a good observation about sampled and interpolated data sets and I would challenge your original expectation that:
When computing the value at any time, we expect that both red and blue data sets will return similar values.
I do not expect this. If you assume that you cannot perfectly interpolate - and particularly if the interpolation error is large compared to the errors in samples - then you are certain to have a continuous error function that exhibits largest errors longest (time) from your sample points. Therefore two data sets that have differing sample points should exhibit the behaviour you see because points that are far (in time) from red sample points may be near (in time) to blue sample points and vice versa - if staggered as your points are, this is sure to be true. Thus I would expect what you show, that:
Viewed over time the values from each data set (red and blue) will appear to diverge and then converge about the original value.
(If you do not have information about underlying dynamics (except frequency content), then Giacomo's points on sampling are key - however, you need not interpolate if looking at info below Nyquist.)
When sampling the original continuous function, the sampling frequency should comply to the Nyquist-Shannon sampling theorem, otherwise the sampling process introduces an error (also known as aliasing). The error, being different in the two datasets, results in a different value when you interpolate.
Therefore, you need to know the highest frequency B of the original function and then collect samples with a frequency at least 2B. If your function has very high frequencies and you cannot sample that fast, you should at least try to filter them away before sampling.

Resources