Cluster by distance between a lot off point in R - r

I need to create clusters from the distance between clients if the distance between the points is less than certain precise value group them together.
I thought about using Delaunay but I'm not having success

Related

Finding the nearest zipcode from a list of zipcodes

I have a list of locations with zipcodes. I have another list of Distribution Centers that serve these locations. Is there anyway to map the nearest DC to each of these locations? I am an extremely green coder, but i have some experience with R
I'd need more information to give you some possible code to solve your problem however, here is one approach to solving your problem.
Convert your zipcodes to longitudes and latitudes.
Not sure what location data you have on your distribution centers, but you should be able to find a way to retrieve the long/lat of each of these.
For each zipcode, compute the the distance to each DC (using their respective longs/lats). To compute the distance, use the haversine formula. Find the minimum of these distances. This is your solution.

Merge neighbouring areas together, if tm_fill for one area is too small

I have made a map with zip codes of a town. The fill is the ratio of being a case or not case. But some zip codes have very few numbers in total so outliers distort the map.
Is there a way to merge the polygons and data of two neighboring areas based on their n automatically?
And if that is not possible, how can I merge rows of my sf/df without losing ID?
And I guess simplest would be just to set the zip codes to NA.
Depends on what you mean by "automatically". Here's a simple algorithm.
repeat:
Find the region with the smallest population.
If that's more than your threshold, stop
Find that region's neighbours, pick one (at random, or smallest population).
merge that neighbour with that region
Finding neighbours and merging can all be done with either the sf package or the sp package and friends (like spdep and rgeos).
Equally, this can be considered a clustering algorithm using a distance metric based on adjacency. You could do a full hierarchical clustering and then cut the tree at a point such that all clusters had N>threshold.
Now as to whether this is a good idea statistically is another question, and depends on what your goal here is. If you are worried about whether an underlying risk is, say > 0.5, and you are getting positives because you have a population of 3 and 2 positives "by chance" from a small sample (of 3), then you need to model your data and work out the probability of exceeding 0.5 given the data. Then map that, which will take into account the small sample size.

How to compute the propinquity of a graph in R?

I would like to test if the geographical location of my vertices (ie I have a matrix giving the distance between each pair of actors of my network) has an influence on the presence/absence of edges. If I have correctly understood, this feature is called propinquity....
In other words, I would like to know if two vertices are more likely (or less likely) to be connected if their distance is small.
Do you have any idea of how to do that in R? I usually use igraph but if another library does that I will use it of course :-).

How to compute Voronoi tesselation based on manhattan distance in R

I am trying to compute a Voronoi tesselation in 2D with the Manhattan distance in R.
Ideally this would be a function that takes a set of two-dimensional points and outputs a list of polygons that partition the space. I am not certain what representations of Voronoi tesselations are standard.
There are of course many ways to do this with the Euclidean metric (packages like deldir and qhull make this very easy), but I haven't found a way to do this for the manhattan distance. A search using sos's findFn('voronoi') also yielded no results.
Info: taxicabgeometry.net
Interactive: Manhattan-metric Voronoi diagram(Click version)
I've been rolling my own in python, and can sum up the basics here:
Between neighboring centroids is a perpendicular line, in manhattan metric - two rays and a 45 degree diagonal most likely, if the centroids are randomly generated, but a straight horizontal, vertical, or 45 degree diagonal line may also occur. Given a set of such lines for every centroid pair, the edges separating the regions are among them. Collect intersect points of each pair of lines which are equal-distant (within an epsilon), in manhattan metric, to it's 3 nearest centroids. Also collect the two mid points of the 45 degree diagonal which are similarly equal-distant to their nearest two centroids. The outer polies won't be closed. How to deal with them depends on what you need. The poly borders and border verts will need sorting, so your polies aren't a zigzagged mess. Winding order can be determined if they should be clockwise or other. More can be done, just depends on what you need.
Unfortunately, this gets exponentially slower the more points are involved. The intersecting of every bisector to every other bisector is the bottleneck. I've been attempting an insertion method, with some success, but . Now I'm thinking to try first creating a nearest-neighbor linkage between the centroids. If the neighbors are known, the bisectors to intersect will be minimal, and many centroids can be computed quickly.
Anyway, the brute-force approach does work:
The point near the cursor is actually 2 points of a tiny diagonal. It's a precise method, but more complicated than it first seems. The java code from the interactive link above may be faster, but was difficult to get solid and precise geometry from.
Sorry, I dont know R.
Maybe the question is about finding the maximum area of a square that match inside a circumcircle (of a triangle). The equation for such a square abs(x)+abs(y)=r (www.mathematische-basteleien.de/taxicabgeometry.htm). When you have a mesh of triangles the voronoi diagram is the dual.

Clustering GPS data using DBSCAN but clusters are not meaningful (in terms of size)

I am working with GPS data (latitude, longitude). For density based clustering I have used DBSCAN in R.
Advantages of DBSCAN in my case:
I don't have to predefine numbers of clusters
I can calculate a distance matrix (using Haversine Distance
Formula) and use that as input in dbscan
library(fossil)
dist<- earth.dist(df, dist=T) #df is dataset containing lat long values
library(fpc)
dens<-dbscan(dist,MinPts=25,eps=0.43,method="dist")
Now, when I look at the clusters, they are not meaningful. Some clusters have points which are more than 1km apart. I want dense clusters but not that big in size.
Different values of MinPts and eps are taken care of and I have also used k nearest neighbor distance graph to get an optimum value of eps for MinPts=25
What dbscan is doing is going to every point in my dataset and if point p has MinPts in its eps neighborhood it will make a cluster but at the same time it is also joining the clusters which are density reachable (which I guess are creating a problem for me).
It really is a big question, particularly "how to reduce size of a cluster without affecting its information too much", but I will write it down as the following points:
How to remove border points in a cluster? I know which points are in
which cluster using dens$cluster, but how would I know if a
particular point is core or border?
Is cluster 0 always noise?
I was under the impression that the size of a cluster would be
comparable to eps. But that's not the case because density reachable
clusters are combined together.
Is there any other clustering method which has the advantage of dbscan
but can give me more meaningful clusters?
OPTICS is another alternative but will it solve my issue?
Note: By meaningful I want to say closer points should be in a cluster. But points which are 1km or more apart should not be in the same cluster.
DBSCAN doesn't claim the radius is the maximum cluster size.
Have you read the article? It's looking for arbitrarily shaped clusters; eps is just the core size of a point; roughly the size used for density estimation; any point within this radius of a core point will be part of a cluster.
This makes it essentially the maximum step size to connect dense points. But they may still form a chain of density connected points, of arbitary shape or size.
I don't know what cluster 0 is in your R implementation. I've experimented with the R implementation, but it was waaaay slower than all the others. I don't recommend using R, there are much better tools for cluster analysis available, such as ELKI. Try running DBSCAN with your settings on ELKI, with LatLngDistanceFunction and and sort-tile-recursive loaded R-tree index. You'll be surprised how fast it can be, compared to R.
OPTICS is looking for the same density connected type of clusters. Are you sure this arbitrarily-shaped type of clusters is what you are looking for?
IMHO, you are using the wrong method for your goals (and you aren't really explaining what you are trying to achieve)
If you want a hard limit on the cluster diameter, use complete-linkage hierarchical clustering.

Resources