sf::st_intersection: virtually random error action - r

I have a relatively simple task to accomplish in R: I have two polygon layers, a "patches" layer containing 39 focal polygons and a "landcover" layer containing one multipolygon of the focal landcover type. I need to clip those two layers so that I keep only the extent of the focal landcover type within the focal polygons. Sometimes, this works fine with sf::st_intersection, sometimes this works fine using sf::st_differenceand a "negative" landcover layer (containing the extent of all non-focal landcover types) and sometimes none of both approaches work. At first, I thought that these different behaviors depend on the resulting topography complexitiy, but this does not seem to be the case.
The errors I get are of the form
Error in CPL_geos_op2(op, st_geometry(x), st_geometry(y)) :
Evaluation error: TopologyException: Input geom 1 is invalid: Ring Self-intersection at or near point
4372482.6526834015 5297568.4303682083 at 4372482.6526834015 5297568.4303682083.
so I checked for the land cover polygon and each of the focal polygons using sf:: st_is_simple ('patch') which in all cases yielded TRUE.
Consider these three cases:
The "simple" case, where sf::st_intersection works. An example (the
patch in blue, the land cover in green):
sf::st_intersection (focal_patch, focal_landcover):
The "intermediate" case, where sf::st_intersection does not work
but sf::st_difference can be used as a workaround when the focal
landcover is replaced by the non-focal land-cover. An example (the
patch in blue, the non-focal land cover in red:
sf::st_difference (patch, non-focal_landcover)
The "difficult" case where neither keeping the focal land cover (green) type using sf::st_intersection nor excluding the non-focal land cover type (red) using sf::st_difference work - I get similar errors for both approaches:
I was unable to make a reproducible example, so I hope that it is possible to figure out what happens here from the example images. I could not see any pattern in there so perhaps only someone with deep insights into st_intersection and st_difference can indiciate a solution to this...

The error you are describing is not random; ring self intersection means invalid geometry. You should be able to test for it via sf::st_is_valid().
This error is known to happen when using spatial objects originated in the realm of ESRI products, which use slightly different criteria for validity than OGC realm.
To overcome the issue you have several options:
filter the offending geometry out (via subsetting of your spatial object based on the result of sf::st_is_valid() = leaving only valid geometries in place)
try to correct the geometries via sf::st_make_valid() - note that this may result in altered geometry, and may require installation of {lwgeom} package
apply "magic dust" of a buffer of zero width on your invalid spatial object via sf::st_buffer(your_ object, 0). This hack will force creation of a new geometry, possibly overcoming the errors in the original one.
For more information consider the sf package documentation: https://r-spatial.github.io/sf/reference/valid.html

Related

Road Length within Polygons in R

I have a shape file of a road network and another shape-file containing area boundaries. Is there any better code that I can use to get length of roads that lies inside each polygon?
This Question was asked earlier with the difference that I want to use R instead of QGIS.
I tried:
intersec=intersect(roads,Polygon)
road_length=tapply(intersec$length, intersec$polygon, sum)
This works, but the problem is that the intersection does not divide the length of the roads, that cross to Polygons, but doubles them in the intersec file and assigns the full length of those roads to both Polygons.
How I found out about that Problem: There is no error message, but the following proove tells me that something is wrong:
a=sum(roads$length) and b=sum(intersec$length)
a and b do not have same length -> a is smaller than b.
I actually did this for a project about 8 months ago.
I had been getting into the sf way of dealing with spatial data, and so my solution uses Classes, Methods, and functions from that package.
First, I made sure both my roads and shapes had the same coordinate-reference-system (CRS) by using sf::st_transform on one of them. Then I used sf::st_intersection() to find the intersections, and used sf::st_length() on the result to get the lengths. You may need to aggregate the lengths at this point, depending on whether your roads were combined into one super-multi-line or if each road is its own object. The following gives the gist of what I think ought to work:
sf::st_intersection(road, shape) %>% # Find the intersections, which should all be points or multilines
dplyr::mutate(len_m = sf::st_length(geom)) %>% # Find the length of each line
dplyr::group_by(SHAPE_COLUMNS) %>% # Here you need to insert all the columns from your shapes
dplyr::summarize(len_m = sum(len_m))

How to use the function r.cost to get the least-cost path between two polygons?

I am a beginner in GRASS but I would like to get the least-cost path between two polygons. More exactely, I would like to get the smallest cost from any point situated at the edge of one polygon (polygon A) to any point situated at the edge of another polygon (polygon B).
Until now, I used the function CostDistance and CostPath of ArcGIS by using a cost raster where each cell had a cost value, a shapefile for the first polygon, and a shapefile for the second polygon. I would like to do the same thing with GRASS. I think that the function r.cost allows to do this. But I don't know how to specify in parameters the two polygons in GRASS ?
Have you got an example of how to use r.cost with two polygons in R with package spgrass6?
Thanks very much for your help.
If the use of GRASS is not mandatory and sticking with R is sufficient, you should check the marmap package. Section 2.4 of the vignette (vignette("marmap")) is entitled:
2.4 Using bathymetric data for least-cost path analysis
The marmap package allows for computation of least-cost path constrained within a range of depth/altitude between any number of points. The two key functions here are trans.mat() to create a transition matrix similar to the cost-raster you mention. Then, lc.dist() computes the least-cost distance and allows to plot the path between points.
Detailed examples are provided in the marmap vignette.

Detecting floor under an object in PCL

I'm very new to PCL.
I try to detect the floor under an object for checking if the object topples or is it positioned horizontally.
I've checked API and found the method: pcl::PointCloud< T >::at.
Seems like I could detect Z-value of a point using at. Is it correct?
If yes, I'm confused, how it should work. Mathematically a point is infinite small. On my scans I see the point-density the smaller the more distinct they are in Z-direction.
Will at always return a point? Is the value the mean of nearest physical points?
As referenced in the documentation, pcl::PointCloud< T >::at returns the information of a single point (the coordinates plus other data depending on the point format) given column and row information (roughly the X,Y in the depth image). For this reason, this method just works on organized clouds.
Unfortunately, not every point is a valid point. Unless you filter the point cloud, you could find invalid measurements (points which have NaN components). This is pretty normal, just discard those points using a filter. Your intuition is right, the point density is smaller the further away you go from the sensor.
As for what you're trying to achieve, you should take a look at the planar segmentation tutorial on the PCL website and at the Table Object Detector software by Nicolas Burrus. The latter extracts a plane, and the clusters of objects on top of it.

What's the point in QPainter::drawConvexPolgon

From the docs:
QPainter offers two methods of painting QPolygons: drawPolygon and drawConvexPolygon.
Nowhere in the documentation is it made clear what the difference between them is. Additionally, the drawConvexPolygon docs state
If the supplied polygon is not convex, i.e. it contains at least one angle larger than 180 degrees, the results are undefined.
So... what is it for? I hoped the method would somehow find the convex hull of my polygon and paint that, however that doesn't seem to be the case.
The QPainter::drawConvexPolygon() documentation says:
On some platforms (e.g. X11), the drawConvexPolygon() function can be faster than the drawPolygon() function.
So,
drawPolygon() is more generic as it also allows to paint non-convex polygons (but drawing might be slower)
drawConvexPolygon() can only be used to draw convex polygons, but might be faster on specific platforms
For example, when doing 3D-rendering, you can use a Polygon Mesh which consists of convex polygons only to make rendering simpler, in which case the faster drawConvexPolygon() would be the better choice (since you need to paint a large number of convex polygons).
Determining which part of the polygon is the outside and inside (for filling purposes) makes different choices depending on if the polygon contains a convex region. Think about how to determine the inside of a star shape vs. the inside of a rectangle.

Cluster assignments differ sometimes in two DBSCAN implementations

I have implemented the DBSCAN algorithm in R, and i am matching the cluster assignments with the DBSCAN implementation of the fpc library. Testing is done on synthetic data which is generated as given in the fpc library dbscan example:
n <- 600
x <- cbind(runif(10, 0, 10)+rnorm(n, sd=0.2), runif(10, 0, 10)+rnorm(n, sd=0.3))
Clustering is done with parameters as below:
eps = 0.2
MinPts = 5
I am comparing the cluster assignments of the fpc::dbscan with my implementation of dbscan . Maximum of the runs shows every point was classified identically by both implementations.
But there are some cases where 1 or 2 points and some rare times 5 or 6 points are assigned to different clusters in my implementation than that in the fpc implementation. I have noticed that only border points classification differs. After plotting i have seen that the points whose cluster membership does not match in the implementations are in such a position, such that it can be assigned to any of its surrounding clusters, depending on from which cluster's seed point it was discovered first.
I am showing an image with 150 points (to avoid clutter), where 1 point classification differs. Note that mismatch point cluster number is always greater in my implementation than the fpc implementation.
Plot of clusters.
Top inset is fpc::dbscan, bottom inset is my dbscan implementation
Note The point which differs in my implementation is marked with an exclamation mark (!)
I am also uploading zoomed images of the mismatch section:
My dbscan implementation output
+ are core points
o are border points
- are noise points
! highlights the differing point
fpc::dbscan implementation output
triangles are core points
coloured circles are border points
black circles are noise points
Another example:
My dbscan implementation output
fpc::dbscan implementation output
EDIT
Equal x-y scaled example
As requested by Anony-Mousse
In different cases sometimes it seems that my implementation has classified the mismatch point correctly and sometimes it seems fpc implementation has classified the mismatch correctly. See below:
fpc::dbscan (with the triangle plot ones) seems to have classified the mismatch point correctly
my dbscan implementation (with + plot ones) seems to have classified the mismatch point correctly
Question
I am new into cluster analysis therefore i have another question: is these type of difference allowable?
In my implementation i am scanning from the first point to the last point as it is supplied, also in fpc::dbscan the points are scanned in the same order. In such case both of the implementation should have discovered the mismatch point (marked by !) from the same cluster center. Also i have generates some cases in which fpc::dbscan marks a point as noise, but my implementation assigns it to some clusters. In this case why is this difference occurring?
Code segments on request.
DBSCAN is known to be order dependant for border points. They will be assigned to the cluster they are first discovered from. If a border point is not dense, but in the vincinity of two dense points from different clusters, it can be assigned to either.
This is why DBSCAN is often described as "order independent, except for border points".
Try shuffling the data (or reversing!), then rerunning your algorithm. The results should change.
As I assume neither your nor the fpc implementation has index support (to speed up range queries and make the algorithm run in O(n log n)), I'd guess that one of the implementations is processing the points in forward order, the other one in backward order. '''Update: indexes should not play much of a role, as they don't change the order across clusters, only within one cluster'''.
Another option for "generating" this difference is to
keep the first (non-noise) cluster assignment of each point (IIRC official DBSCAN pseudocode)
keep the last cluster assignment of each point (fbc::dbscan seems to do this)
These will also generate different results on objects that are border points to more than once cluster. There also is the possibility to assign these points to both cluters, which will yield a non-strict partitioning of the data set. Usually, the benefits of having a strict partitioning are more important than having a fully deterministic result.
Don't get me wrong: the "overwrite" strategy of fbc::dbscan doesn't substantially change the results. I would probably even implement it that way myself.
Are any non-border points affected?

Resources