I'm using the sf package in R to simulate a sample of agents moving between different nodes in a network across space-time.
I'm currently puzzled though by some behavior from st_intersects: I have the agents moving between nodes between each corner of the coordinate unit square as well as through the center at (.5,.5). However when I try to detect an agent at st_point(c(.1,.9)) intersecting with the geometry st_linestring(c(st_point(c(0,0)),st_point(c(0.5,0.5)))) I get an empty predicate return.
In contrast if I detect an agent moving along the x-axis or y-axis only, I am able to detect the point correctly. Why is this?
Minimum reproducible example in R v4.0.2:
library(sf)
l1 <- st_linestring(c(st_point(c(0,1)),st_point(c(0.5,0.5))))
p1 <- st_point(c(.1,.9)) ## on the line between (0,1) and (.5,.5); y=1-x x = f(t)
st_intersects(p1,l1) ## empty
#Sparse geometry binary predicate list of length 1, where the predicate was `intersects'
# 1: (empty)
## in contrast
l2 <- st_linestring(st_point(c(0,0)),st_point(c(1,0)))
p2 <- st_point(c(.1,0)) ## on the line between (0,0) and (1,0) ; y = 0; x = f(t)
st_intersects(p2,l2) ## returns 1 as I would expect
#Sparse geometry binary predicate list of length 1, where the predicate was `intersects'
# 1: 1
To elaborate a bit on the ege-rubak's answer: The point is off by tiny difference, most likely due to floating point math (which is inherently inaccurate).
As a workaround I suggest using sf::st_is_within_distance() with a sufficiently small dist value to eliminate rounding differences without introducing false positives / may require some tuning depending on the data used.
Consider this code, originally posted on the RStudio Community forum (where this question seems to have been cross posted): https://community.rstudio.com/t/simple-st-intersect-gives-unexpected-result/108214/3?u=jlacko
library(sf)
l1 <- st_linestring(c(st_point(c(0,1)),
st_point(c(0.5,0.5))))
p1 <- st_point(c(.1,.9)) ## on the line between (0,1) and (.5,.5); y=1-x x = f(t)
st_distance(l1, p1)[1,1]
# [1] 1.962616e-17
st_is_within_distance(p1,l1, 1/1000)
# Sparse geometry binary predicate list of length 1, where the
# predicate was `is_within_distance'
# 1: 1
Floating point geometri is inherently flawed. So your result is most likely due to the lack of precision in the computer representation of your real numbers. A possible workaround is to find the distance between the point and the line and accept the point as being on the line if the distance is smaller than some threshold.
Related
Is it possible to use spatstat to estimate the intensity function for a give ppp object and calculate its value considering a new point? For example, can I evaluate D at new_point:
# packages
library(spatstat)
# define a random point within Window(swedishpines)
new_point <- ppp(x = 45, y = 45, window = Window(swedishpines))
# estimate density
(D <- density(swedishpines))
#> real-valued pixel image
#> 128 x 128 pixel array (ny, nx)
#> enclosing rectangle: [0, 96] x [0, 100] units (one unit = 0.1 metres)
Created on 2021-03-30 by the reprex package (v1.0.0)
I was thinking that maybe I can superimpose() the two ppp objects (i.e. swedishpines and new_point) and then run density setting at = "points" and weights = c(rep(1, points(swedishpines)), 0) but I'm not sure if that's the suggested approach (and I'm not sure if the appended point is ignored during the estimation process).
I know that it may sound like a trivial question, but I read some docs and didn't find an answer or a solution.
There are two ways to do this.
The first is simply to take the pixel image of intensity, and extract the pixel values at the desired locations using [:
D <- density(swedishpines)
v <- D[new_points]
See the help for density.ppp and [.im.
The other way is to use densityfun:
f <- densityfun(swedishpines)
v <- f(new_points)
See the help for densityfun.ppp
The first route is more efficient and the second way is more accurate.
Technical issue: if some of the new_points could lie outside the window of swedishpines then the value at these points is (mathematically) undefined. Both of the methods described above will simply ignore such points, and the resulting vector v will be shorter than the number of new points. If you need to handle this continengcy, the easiest way is to use D[new_points, drop=FALSE] which returns NA values for such locations.
I would like to calculate dimensions of fractal written as a n-dimensional array of 0s and 1s. It includes boxing count, hausdorff and packing dimension.
I have only idea how to code boxing count dimensions (just counting 1's in n-dimensional matrix and then use this formula:
boxing_count=-log(v)/log(n);
where n-number of 1's and n-space dimension (R^n)
This approach simulate counting minimal resolution boxes 1 x 1 x ... x 1 so numerical it is like limit eps->0. What do you think about this solution?
Do you have any idea (or maybe code) for calculating hausdorff or packing dimension?
The Hausdorff and packing dimension are purely mathematical tools based in measure theory. They have wonderful properties in that context but are not well suited for experimentation. In short, there is no reason to expect that you can estimate their values based on a single matrix approximation to some set.
Box counting dimension, by contrast, is well suited for numerical investigation. Specifically, let N(e) denote the number of squares of side length e required to cover your fractal set. As you seem to know, the box counting dimension of your set is the limit as e->0 of
log(N(e))/log(1/e)
However, I don't think that just choosing the smallest available value of e is generally a good idea. The standard interpretation in the physics literature, as I understand it, is to presume that the relationship between N(e) and e should be maintained over a broad range of values. A standard way to compute the box-counting dimension is compute N(e) for some choices of e chosen from a sequence that tends geometrically to zero. We then fit a line to the points in a log-log plot of N(e) versus 1/e The box-counting dimension should be approximately the slope of that line.
Example
As a concrete example, the following Python code generates a binary matrix that describes a fractal structure.
import numpy as np
size = 1024
first_row = np.zeros(size, dtype=int)
first_row[int(size/2)-1] = 1
rows = np.zeros((int(size/2),size),dtype=int)
rows[0] = first_row
for i in range(1,int(size/2)):
rows[i] = (np.roll(rows[i-1],-1) + rows[i-1] + np.roll(rows[i-1],1)) % 2
m = int(np.log(size)/np.log(2))
rows = rows[0:2**(m-1),0:2**m]
We can view the fractal structure by simply interpreting each 1 as a black pixel and each zero as white pixel.
import matplotlib.pyplot as plt
plt.matshow(rows, cmap = plt.cm.binary)
This matrix makes a nice test since it can be shown that there is an actual limiting object whose fractal dimension is log(1+sqrt(5))/log(2) or approximately 1.694, yet it's complicated enough to make the box counting estimate a little tricky.
Now, this matrix is 512 rows by 1024 columns; it decomposes naturally into 2 matrices that are 512 by 512. Each of those decomposes naturally into 4 matrices that are 256 by 256, etc. For each such decomposition, we need to count the number of sub matrices that have at least one non-zero element. We can perform this analysis as follows:
cnts = []
for lev in range(m):
block_size = 2**lev
cnt = 0
for j in range(int(size/(2*block_size))):
for i in range(int(size/block_size)):
cnt = cnt + rows[j*block_size:(j+1)*block_size, i*block_size:(i+1)*block_size].any()
cnts.append(cnt)
data = np.array([(2**(m-(k+1)),cnts[k]) for k in range(m)])
data
# Out:
# array([[ 512, 45568],
# [ 256, 22784],
# [ 128, 7040],
# [ 64, 2176],
# [ 32, 672],
# [ 16, 208],
# [ 8, 64],
# [ 4, 20],
# [ 2, 6],
# [ 1, 2]])
Now, your idea is to simply compute log(45568)/log(512) or approximately 1.7195, which is not too bad. I'm recommending that we examine a log-log plot of this data.
xs = np.log(data[:,0])
ys = np.log(data[:,1])
plt.plot(xs,ys, 'o')
This indeed looks close to linear, indicating that we might expect our box-counting technique to work reasonably well. First, though, it might be reasonable to exclude the one point that appears to be an outlier. In fact, that's one of the desirable characteristics of this approach. Here's how to do so:
plt.plot(xs,ys, 'o')
xs = xs[1:]
ys = ys[1:]
A = np.vstack([xs, np.ones(len(xs))]).T
m,b = np.linalg.lstsq(A, ys)[0]
def line(x): return m*x+b
ys = line(xs)
plt.plot(xs,ys)
m
# Out: 1.6902585379630133
Well, the result looks pretty good. In particular, this is a definitive example that this approach can work better than the simple idea of using just one data point. In fairness, though, it's not hard to find examples where the simple approach works better. Also, this set is regular enough that we get some nice results. Generally, one can't really expect box-counting computations to be too reliable.
I have a problem I wish to solve in R with example data below. I know this must have been solved many times but I have not been able to find a solution that works for me in R.
The core of what I want to do is to find how to translate a set of 2D coordinates to best fit into an other, larger, set of 2D coordinates. Imagine for example having a Polaroid photo of a small piece of the starry sky with you out at night, and you want to hold it up in a position so they match the stars' current positions.
Here is how to generate data similar to my real problem:
# create reference points (the "starry sky")
set.seed(99)
ref_coords = data.frame(x = runif(50,0,100), y = runif(50,0,100))
# generate points take subset of coordinates to serve as points we
# are looking for ("the Polaroid")
my_coords_final = ref_coords[c(5,12,15,24,31,34,48,49),]
# add a little bit of variation as compared to reference points
# (data should very similar, but have a little bit of noise)
set.seed(100)
my_coords_final$x = my_coords_final$x+rnorm(8,0,.1)
set.seed(101)
my_coords_final$y = my_coords_final$y+rnorm(8,0,.1)
# create "start values" by, e.g., translating the points we are
# looking for to start at (0,0)
my_coords_start =apply(my_coords_final,2,function(x) x-min(x))
# Plot of example data, goal is to find the dotted vector that
# corresponds to the translation needed
plot(ref_coords, cex = 1.2) # "Starry sky"
points(my_coords_start,pch=20, col = "red") # start position of "Polaroid"
points(my_coords_final,pch=20, col = "blue") # corrected position of "Polaroid"
segments(my_coords_start[1,1],my_coords_start[1,2],
my_coords_final[1,1],my_coords_final[1,2],lty="dotted")
Plotting the data as above should yield:
The result I want is basically what the dotted line in the plot above represents, i.e. a delta in x and y that I could apply to the start coordinates to move them to their correct position in the reference grid.
Details about the real data
There should be close to no rotational or scaling difference between my points and the reference points.
My real data is around 1000 reference points and up to a few hundred points to search (could use less if more efficient)
I expect to have to search about 10 to 20 sets of reference points to find my match, as many of the reference sets will not contain my points.
Thank you for your time, I'd really appreciate any input!
EDIT: To clarify, the right plot represent the reference data. The left plot represents the points that I want to translate across the reference data in order to find a position where they best match the reference. That position, in this case, is represented by the blue dots in the previous figure.
Finally, any working strategy must not use the data in my_coords_final, but rather reproduce that set of coordinates starting from my_coords_start using ref_coords.
So, the previous approach I posted (see edit history) using optim() to minimize the sum of distances between points will only work in the limited circumstance where the point distribution used as reference data is in the middle of the point field. The solution that satisfies the question and seems to still be workable for a few thousand points, would be a brute-force delta and comparison algorithm that calculates the differences between each point in the field against a single point of the reference data and then determines how many of the rest of the reference data are within a minimum threshold (which is needed to account for the noise in the data):
## A brute-force approach where min_dist can be used to
## ameliorate some random noise:
min_dist <- 5
win_thresh <- 0
win_thresh_old <- 0
for(i in 1:nrow(ref_coords)) {
x2 <- my_coords_start[,1]
y2 <- my_coords_start[,2]
x1 <- ref_coords[,1] + (x2[1] - ref_coords[i,1])
y1 <- ref_coords[,2] + (y2[1] - ref_coords[i,2])
## Calculate all pairwise distances between reference and field data:
dists <- dist( cbind( c(x1, x2), c(y1, y2) ), "euclidean")
## Only take distances for the sampled data:
dists <- as.matrix(dists)[-1*1:length(x1),]
## Calculate the number of distances within the minimum
## distance threshold minus the diagonal portion:
win_thresh <- sum(rowSums(dists < min_dist) > 1)
## If we have more "matches" than our best then calculate a new
## dx and dy:
if (win_thresh > win_thresh_old) {
win_thresh_old <- win_thresh
dx <- (x2[1] - ref_coords[i,1])
dy <- (y2[1] - ref_coords[i,2])
}
}
## Plot estimated correction (your delta x and delta y) calculated
## from the brute force calculation of shifts:
points(
x=ref_coords[,1] + dx,
y=ref_coords[,2] + dy,
cex=1.5, col = "red"
)
I'm very interested to know if there's anyone that solves this in a more efficient manner for the number of points in the test data, possibly using a statistical or optimization algorithm.
Here is my problem.
I have an hypercube I built using the following codes:
X <- seq (-1/sqrt(2),1/sqrt(2),length.out=100)
Y <- seq (-sqrt(2)/(2*sqrt(3)),sqrt(2)/sqrt(3),length.out=100)
Z <- seq (-1/(2*sqrt(3)),sqrt(3)/2,length.out=100)
grid <- data.frame (expand.grid(X=X,Y=Y,Z=Z))
Then, I would delete from the grid data.frame all the points that are not located within the tetrahedron defined by the following coordinates:
w : (0,0,sqrt(3)/2)
x : (0,sqrt(2)/sqrt(3),-1/(2*sqrt(3)))
y : (-1/sqrt(2),-sqrt(2)/(2*sqrt(3)),-1/(2*sqrt(3)))
z : (1/sqrt(2),-sqrt(2)/(2*sqrt(3)),-1/(2*sqrt(3)))
I do not find a away to do this without howfully long codes. Can anyone help me please
Thanks !!!
Package ptinpoly has a function pip3d to find wether a point is in a polyhedron or not.
library(ptinpoly)
X <- seq(-1/sqrt(2),1/sqrt(2),length.out=10) #I used a smaller dataset here
Y <- seq(-sqrt(2)/(2*sqrt(3)),sqrt(2)/sqrt(3),length.out=10)
Z <- seq(-1/(2*sqrt(3)),sqrt(3)/2,length.out=10)
# The query points has to be inputted as a matrix.
grid <- as.matrix(expand.grid(X=X,Y=Y,Z=Z))
w <- c(0,0,sqrt(3)/2)
x <- c(0,sqrt(2)/sqrt(3),-1/(2*sqrt(3)))
y <- c(-1/sqrt(2),-sqrt(2)/(2*sqrt(3)),-1/(2*sqrt(3)))
z <- c(1/sqrt(2),-sqrt(2)/(2*sqrt(3)),-1/(2*sqrt(3)))
# The matrix of vertices
tetra_vert <- matrix(c(w,x,y,z),byrow=TRUE,nrow=4)
# The matrix of faces (each row correspond to a vector of vertices linked by a face.
tetra_faces <- matrix(c(1,2,3,
1,2,4,
1,3,4,
2,3,4),byrow=TRUE,nrow=4)
inout <- pip3d(tetra_vert, tetra_faces, grid)
The result is a vector of integers, 0 means the point fall on a face, 1 that it is inside the polyhedron, -1 outside.
The solution of your problem is therefore:
grid[inout%in%c(0,1),]
make planes which form the tetrahedron and compare if a point is on the right side of each of the planes.
pointers: think of calculating dot products with the plane normal and such. One option is to draw a vector from tetrahedron point to each corner, 4 in total and 1 vector from point to point and then use dotproducts and whatnot to see if the point-point vector is within the 4 others.
the point is probably within the tetrahedron if vector to it can be expressed as a sum of non negative multiples of the corner vectors and the vector short enough.
I have a set of points (with unknow coordinates) and the distance matrix. I need to find the coordinates of these points in order to plot them and show the solution of my algorithm.
I can set one of these points in the coordinate (0,0) to simpify, and find the others. Can anyone tell me if it's possible to find the coordinates of the other points, and if yes, how?
Thanks in advance!
EDIT
Forgot to say that I need the coordinates on x-y only
The answers based on angles are cumbersome to implement and can't be easily generalized to data in higher dimensions. A better approach is that mentioned in my and WimC's answers here: given the distance matrix D(i, j), define
M(i, j) = 0.5*(D(1, j)^2 + D(i, 1)^2 - D(i, j)^2)
which should be a positive semi-definite matrix with rank equal to the minimal Euclidean dimension k in which the points can be embedded. The coordinates of the points can then be obtained from the k eigenvectors v(i) of M corresponding to non-zero eigenvalues q(i): place the vectors sqrt(q(i))*v(i) as columns in an n x k matrix X; then each row of X is a point. In other words, sqrt(q(i))*v(i) gives the ith component of all of the points.
The eigenvalues and eigenvectors of a matrix can be obtained easily in most programming languages (e.g., using GSL in C/C++, using the built-in function eig in Matlab, using Numpy in Python, etc.)
Note that this particular method always places the first point at the origin, but any rotation, reflection, or translation of the points will also satisfy the original distance matrix.
Step 1, arbitrarily assign one point P1 as (0,0).
Step 2, arbitrarily assign one point P2 along the positive x axis. (0, Dp1p2)
Step 3, find a point P3 such that
Dp1p2 ~= Dp1p3+Dp2p3
Dp1p3 ~= Dp1p2+Dp2p3
Dp2p3 ~= Dp1p3+Dp1p2
and set that point in the "positive" y domain (if it meets any of these criteria, the point should be placed on the P1P2 axis).
Use the cosine law to determine the distance:
cos (A) = (Dp1p2^2 + Dp1p3^2 - Dp2p3^2)/(2*Dp1p2* Dp1p3)
P3 = (Dp1p3 * cos (A), Dp1p3 * sin(A))
You have now successfully built an orthonormal space and placed three points in that space.
Step 4: To determine all the other points, repeat step 3, to give you a tentative y coordinate.
(Xn, Yn).
Compare the distance {(Xn, Yn), (X3, Y3)} to Dp3pn in your matrix. If it is identical, you have successfully identified the coordinate for point n. Otherwise, the point n is at (Xn, -Yn).
Note there is an alternative to step 4, but it is too much math for a Saturday afternoon
If for points p, q, and r you have pq, qr, and rp in your matrix, you have a triangle.
Wherever you have a triangle in your matrix you can compute one of two solutions for that triangle (independent of a euclidean transform of the triangle on the plane). That is, for each triangle you compute, it's mirror image is also a triangle that satisfies the distance constraints on p, q, and r. The fact that there are two solutions even for a triangle leads to the chirality problem: You have to choose the chirality (orientation) of each triangle, and not all choices may lead to a feasible solution to the problem.
Nevertheless, I have some suggestions. If the number entries is small, consider using simulated annealing. You could incorporate chirality into the annealing step. This will be slow for large systems, and it may not converge to a perfect solution, but for some problems it's the best you and do.
The second suggestion will not give you a perfect solution, but it will distribute the error: the method of least squares. In your case the objective function will be the error between the distances in your matrix, and actual distances between your points.
This is a math problem. To derive coordinate matrix X only given by its distance matrix.
However there is an efficient solution to this -- Multidimensional Scaling, that do some linear algebra. Simply put, it requires a pairwise Euclidean distance matrix D, and the output is the estimated coordinate Y (perhaps rotated), which is a proximation to X. For programming reason, just use SciKit.manifold.MDS in Python.
The "eigenvector" method given by the favourite replies above is very general and automatically outputs a set of coordinates as the OP requested, however I noticed that that algorithm does not even ask for a desired orientation (rotation angle) for the frame of the output points, the algorithm chooses that orientation all by itself!
People who use it might want to know at what angle the frame will be tipped before hand so I found an equation which gives the answer for the case of up to three input points, however I have not had time to generalize it to n-points and hope someone will do that and add it to this discussion. Here are the three angles the output sides will form with the x-axis as a function of the input side lengths:
angle side a = arcsin(sqrt(((c+b+a)*(c+b-a)*(c-b+a)*(-c+b+a)*(c^2-b^2)^2)/(a^4*((c^2+b^2-a^2)^2+(c^2-b^2)^2))))*180/Pi/2
angle side b = arcsin(sqrt(((c+b+a)*(c+b-a)*(c-b+a)*(-c+b+a)*(c^2+b^2-a^2)^2)/(4*b^4*((c^2+b^2-a^2)^2+(c^2-b^2)^2))))*180/Pi/2
angle side c = arcsin(sqrt(((c+b+a)*(c+b-a)*(c-b+a)*(-c+b+a)*(c^2+b^2-a^2)^2)/(4*c^4*((c^2+b^2-a^2)^2+(c^2-b^2)^2))))*180/Pi/2
Those equations also lead directly to a solution to the OP's problem of finding the coordinates for each point because: the side lengths are already given from the OP as the input, and my equations give the slope of each side versus the x-axis of the solution, thus revealing the vector for each side of the polygon answer, and summing those sides through vector addition up to a desired vertex will produce the coordinate of that vertex. So if anyone can extend my angle equations to handling beyond three input lengths (but I note: that might be impossible?), it might be a very fast way to the general solution of the OP's question, since slow parts of the algorithms that people gave above like "least square fitting" or "matrix equation solving" might be avoidable.