I'm wanting to find the nearest polygons in a simple features data frame in R to a set of points in another simple features data frame using the sf package in R. I've been using 'st_is_within_distance' in 'st_join' statements, but this returns everything within a given distance, not simply the closest features.
Previously I used 'gDistance' from the 'rgeos' package with 'sp' features like this:
m = gDistance(a, b, byid = TRUE)
row = apply(m, 2, function(x) which(x == min(x)))
labels = unlist(b#data[row, ]$NAME)
a$NAME <- labels
I'm wanting to translate this approach of finding nearest features for a set of points using rgeos and sp to using sf. Any advice or suggestions greatly appreciated.
It looks like the solution to my question was already posted -- https://gis.stackexchange.com/questions/243994/how-to-calculate-distance-from-point-to-linestring-in-r-using-sf-library-and-g -- this approach gets just what I need given an sf point feature 'a' and sf polygon feature 'b':
closest <- list()
for(i in seq_len(nrow(a))){
closest[[i]] <- b[which.min(
st_distance(b, a[i,])),]
}
Related
I have a lat/lon combination and want to check whether the point is inside a polygon (sp::Polygon class)
Consider this example:
UKJ32 <- sp::Polygon(cbind(c(-1.477037449999955, -1.366895449999959, -1.365159449999965, -1.477037449999955),
c(50.923958250000027, 50.94686525000003, 50.880069750000018, 50.923958250000027))) %>%
list() %>%
sp::Polygons(ID="UKJ32 - Southampton")
I would now like to test whether the points in df are in this polygon (and if so, return the Polygon ID).
tibble(lon = c(-1.4, 10), lat = c(50.9, 10))
Can someone tell me how I get to the result
tibble(lon = c(-1.4, 10), lat = c(50.9, 10), polyg_ID = 'UKJ32')
If you wish to stick to sp, there is a point.in.polygon() function in sp package:
UKJ32 <- sp::Polygon(cbind(c(-1.477037449999955, -1.366895449999959, -1.365159449999965, -1.477037449999955),
c(50.923958250000027, 50.94686525000003, 50.880069750000018, 50.923958250000027))) |>
list() |>
sp::Polygons(ID="UKJ32 - Southampton")
a <- tibble::tibble(lon = c(-1.4, 10), lat = c(50.9, 10))
sp::point.in.polygon(a$lon, a$lat, UKJ32#Polygons[[1]]#coords[,1], UKJ32#Polygons[[1]]#coords[,2])
#> [1] 1 0
Created on 2022-10-16 with reprex v2.0.2
The {sp} package is by now somewhat dated - after having lived a long & fruitful life - and most of current action happens in context of its successor, the {sf} package.
Assigning some kind of a polygon feature - either an id or a metric - to a points dataset is a frequent use case. It at present often done via a sf::st_join() call. For an example in action consider this earlier answer https://stackoverflow.com/a/64704624/7756889
I suggest that you try to move your workflow to the more current {sf} package; you will find it easier to keep up with recent development.
And even if this were not possible for whatever reason - use sp::Polygons() with utmost caution. I carries no information about coordinate reference system - which is a fancy way of saying it has no way of interpreting the coordinate numbers. Are they decimal degrees, or meters? Could be feet or fathoms for all that I know.
Strictly speaking you should not be allowed to proceed with a point-in-polygon operation calculation without this information.
Trying to get it done via mapply or something like this without iterations - I have a spatial dataframe in R and would like to subset all more complicated shapes - ie shapes with 10 or more coordinates. The shapefile is substantial (10k shapes) and the method that is fine for a small sample is very slow for a big one. The iterative method is
Street$cc <-0
i <- 1
while(i <= nrow(Street)){
Street$cc[i] <-length(coordinates(Street)[[i]][[1]])/2
i<-i+1
}
How can i get the same effect in any array way? I have a problem with accessing few levels down from the top (Shapefile/lines/Lines/coords)
I tried:
Street$cc <- lapply(slot(Street, "lines"),
function(x) lapply(slot(x, "Lines"),
function(y) length(slot(y, "coords"))/2))
/division by 2 as each coordinate is a pair of 2 values/
but is still returns a list with number of items per row, not the integer telling me how many items are there. How can i get the number of coordinates per each shape in a spatial dataframe? Sorry I do not have a reproducible example but you can check on any spatial file - it is more about accessing low level property rather than a very specific issue.
EDIT:
I resolved the issue - using function
tail()
Here is a reproducible example. Slightly different to yours, because you did not provide data, but the principle is the same. The 'principle' when drilling down into complex S4 structures is to pay attention to whether each level is a list or a slot, using [[]] to access lists, and # for slots.
First lets get a spatial ploygon. I'll use the US state boundaries;
library(maps)
local.map = map(database = "state", fill = TRUE, plot = FALSE)
IDs = sapply(strsplit(local.map$names, ":"), function(x) x[1])
states = map2SpatialPolygons(map = local.map, ID = IDs)
Now we can subset the polygons with fewer than 200 vertices like this:
# Note: next line assumes that only interested in one Polygon per top level polygon.
# I.e. assumes that we have only single part polygons
# If you need to extend this to work with multipart polygons, it will be
# necessary to also loop over values of lower level Polygons
lengths = sapply(1:length(states), function(i)
NROW(states#polygons[[i]]#Polygons[[1]]#coords))
simple.states = states[which(lengths < 200)]
plot(simple.states)
I followed How do I extract raster values from polygon data then join into spatial data frame? (which was helpful) to create a matrix (then data frame) of mean raster values to a polygon. The problem now is that I want to know which polygon is which. My SpatialPolygonsDataFrame has an ID value in p$Block_ID. Is there a way to bring that over in the extract() code?
Alternatively, does the extract() function report output in the order it was input (that would make sense)? i.e. the order of p$Block_ID will be preserved in the output? I looked through the documentation and it was not clear one way or the other. If so it is easy enough to add an ID column to the extract() output.
Here is my generalized code for reference. NOTE note reproducible because I don't think it really needs to be at this point. Where r is a raster and p in the polygons
extract(r, p, small = TRUE, fun = mean, na.rm = TRUE, df = TRUE, nl = 1)
Thoughts?
The values are returned in order, as one would expect in R, and as stated in the manual (?extract): The order of the returned values corresponds to the order of object y
Thus you can do (reproducible example from ?extract)
e <- extract(r, p)
ee <- data.frame(ID=p$Block_ID, e)
I could not get R. Hijmans answer working for me. I found that this works.
e = extract(r, p)
e$ID = as.factor(e$ID)
levels(e$ID) = levels(p$Block_ID)
I have one table containing +500k rows with coordinates x, y grouped by shapeid (289 ids in total) and forming a polygon.
shapeid x y
1 679400.3 6600354
1 679367.9 6600348
1 679313.3 6600340
1 679259.5 6600331
1 679087.5 6600201
0 661116.3 6606615
0 661171.5 6606604
0 661182.7 6606605
0 661198.9 6606606
0 661205.9 6606605
... ... ...
I want to find the coordinates which intersects or lies closest to each other, in essence finding the physical neighbours for each shapeid.
The results should look something like:
shapeid shapeid_neighbour1 shapeid_neighbour2
So I tried using sp and rgeos like so:
library(sp)
library(rgeos)
mydata <- read.delim('d:/temp/testfile.txt', header=T, sep=",")
sp.mydata <- mydata
coordinates(sp.mydata) <- ~x+y
When I run class, everything looks fine:
class(sp.mydata)
[1] "SpatialPointsDataFrame"
attr(,"package")
[1] "sp"
I now try calculating the distance by each point:
d <- gDistance(sp.mydata, byid=T)
R Studio encounters fatal error. Any ideas? My plan is then to use:
min.d <- apply(d, 1, function(x) order(x, decreasing=F)[2])
To find the second shortest distance, i.e. the closest point. But maybe this isn't the best approach to do what I want - finding the physical neighbours for each shapeid?
Assuming that each shapeid of your dataframe identifies the vertices of a polygon, you need first to create a SpatialPolygons object from the coordinates and then apply the function gDistance to know the distance between any pair of polygons (assuming that is what you are looking for). In order to create a SpatialPolygons you need a Polygons and in turn a Polygon object. You can find details in the help page of the sp package under Polygon.
You might find soon a problem: the coordinates of each polygons must close, i.e. the last vertex must be the same as the first for each shapeid. As far as I can see from your data, that seems not to be the case for you. So you should "manually" add a row for each subset of your data.
You can try this (assuming that df is your starting dataframe):
require(rgeos)
#split the dataframe for each shapeid and coerce to matrix
coordlist<-lapply(split(df[,2:3],df$shapeid),as.matrix)
#apply the following command only if the polygons don't close
#coordlist<-lapply(coordilist, function(x) rbind(x,x[1,]))
#create a SpatialPolygons for each shapeid
SPList<-lapply(coordlist,function(x) SpatialPolygons(list(Polygons(list(Polygon(x)),1))))
#initialize a matrix of distances
distances<-matrix(0,ncol=length(SPList),nrow=length(SPList))
#calculate the distances
for (i in 1:(length(SPList)-1))
for (j in (i+1):length(SPList))
distances[i,j]<-gDistance(SPList[[i]],SPList[[j]])
This may require some time, since you are calculating 289*288/2 polygons distances. Eventually, you'll obtain a matrix of distances.
I am trying to use the interp1 function in R for linearly interpolating a matrix without using a for loop. So far I have tried:
bthD <- c(0,2,3,4,5) # original depth vector
bthA <- c(4000,3500,3200,3000,2800) # original array of area
Temp <- c(4.5,4.2,4.2,4,5,5,4.5,4.2,4.2,4)
Temp <- matrix(Temp,2) # matrix for temperature measurements
# -- interpolating bathymetry data --
depthTemp <- c(0.5,1,2,3,4)
layerZ <- seq(depthTemp[1],depthTemp[5],0.1)
library(signal)
layerA <- interp1(bthD,bthA,layerZ);
# -- interpolate= matrix --
layerT <- list()
for (i in 1:2){
t <- Temp[i,]
layerT[[i]] <- interp1(depthTemp,t,layerZ)
}
layerT <- do.call(rbind,layerT)
So, here I have used interp1 on each row of the matrix in a for loop. I would like to know how I could do this without using a for loop. I can do this in matlab by transposing the matrix as follows:
layerT = interp1(depthTemp,Temp',layerZ)'; % matlab code
but when I attempt to do this in R
layerT <- interp1(depthTemp,t(Temp),layerZ)
it does not return a matrix of interpolated results, but a numeric array. How can I ensure that R returns a matrix of the interpolated values?
There is nothing wrong with your approach; I probably would avoid the intermediate t <-
If you want to feel R-ish, try
apply(Temp,1,function(t) interp1(depthTemp,t,layerZ))
You may have to add a t(ranspose) in front of all if you really need it that way.
Since this is a 3d-field, per-row interpolation might not be optimal. My favorite is interp.loess in package tgp, but for regular spacings other options might by available. The method does not work for you mini-example (which is fine for the question), but required a larger grid.