How do I find the overlap between two shapefiles? - r

I have two shapefiles (sf), one with polygons and one with points. As output I want a df showing which points fall within which polygons, something like this:
polygon overlap geometry
polygon1 point34 c(3478,234872)
polygon1 point56 c(23423,234982)
polygon2 point23 c(23498,2334)
polygon3 point45 c(872348,23847)
polygon3 point87 c(234982,1237)
polygon3 point88 c(234873,2873)
I assume I'll have to do something with st_intersection() but up to now I did not manage to get the desired output.

After fiddling around I came up with this solution, but I'm pretty sure it is not the most elegant. x and y are shapefiles, x with points and y with polygons.
count_overlap <- function(x, y){
f1 <- function(z){
r <- st_intersects(x,y[z,])
return(r)
}
l1 <- c(1:nrow(y))
l2 <- lapply(l1, f1)
l3 <- lapply(l2, unlist)
r <- sapply(l3, sum)
y$overlap <- r
return(y)
}
The result is the original y sf/dataframe with an added column called 'overlap' that shows the counts of points from x that fall within the polygon. Not exactly what I asked for in the question but a good outcome for me personally.

Try using over in sp:
library(sp)
out = over(pnt,plgn)
from ?over:
x = "SpatialPoints", y = "SpatialPolygons"
returns a numeric vector of length equal to the number of points; the number is the index (number) of the polygon of y in which a point falls; NA denotes the point does not fall in a polygon; if a point falls in multiple polygons, the last polygon is recorded.

Related

Repeat for loop for all rows of a spatial points data frame

I want to calculate the shortestPath distance (using gDistance package) between a set of geographic coordinates, using a transition layer of the ocean to prevent 'movement' across land.
Here is how I created the transition layer:
library(raster); library(gdistance); library(maptools); library(rgdal); library(sp)
mapcrs <- "+proj=longlat +datum=WGS84 +no_defs"
data(wrld_simpl)
world <- wrld_simpl
worldshp <- spTransform(world, mapcrs)
ras <- raster(nrow=300,ncol=300)
crs(ras) <- crs(oceans.shp)
extent(ras) <- extent(worldshp)
landmask <- rasterize(worldshp, ras)
landras <- is.na(landmask)
tr <- transition(landras, transitionFunction = mean, directions = 8, symm = FALSE)
tr = geoCorrection(tr, scl=FALSE)
I then want to calculate the shortestPath distance between every coordinate in my dataset i.e. location 1 to location n, location 2 to location n etc.
Let's produce some hypothetical geographic coordinates and convert to spatial points
x <- rnorm(10, mean = -40, sd=5)
y <- rnorm(10, mean = 20, sd=5)
xy <- cbind(x,y); colnames(xy) <- c("lon","lat")
xy <- SpatialPoints(xy); projection(xy) <- projection(mapcrs)
Using the shortestPath function in gDistance, I can calculate the distance from the first coordinate (i.e. xy[1]) to all other xy coordinates, like so.
dist <- shortestPath(tr, origin = xy, goal = xy, output="SpatialLines")
I then tried to apply a for loop to sequentially calculate distance from location 1 to all other locations, and then calculating distance from location 2 to all other locations etc., which I wrote as follows:
for(i in seq_along(xy)){
AtoB <- shortestPath(tr, origin = xy[i,], goal=xy, output="SpatialLines")
i <- i+1
}
This, however, still only calculates the distances relative to the first xy spatial point and does not 'loop' for all subsequent rows. I don't know what I'm doing wrong. It's probably super-easy, but I'm struggling. Any help would be appreciated.
Thanks in advance,
Tony
---- UPDATE ----
We have come up with a bit of a work around (thanks Charley Clubley) but it still won't produce outputs for every spatial line. This will generate a matrix of distances.
The work around is as follows:
Using xy as a matrix, not spatial points
distances <- matrix(ncol=nrow(xy), nrow=nrow(xy))
xy_b <- xy ## Coords needs to be as a matrix (not spatial points)
## This generates an error indicating there are no more rows to delete once complete, but the computation works
for (i in 1:nrow(xy_b)) {
AtoB <-shortestPath(tr, xy_b, xy, output="SpatialLines")
length <- SpatialLinesLengths(AtoB)
distances[i, ] <- length
xy_b <- xy_b[-1,]
}

Calculating the distance between points in R

I looked through the questions that been asked but dealing with coordinates but couldn't find something can help me out with my problem.
I have dataset that contain ID, Speed, Time , List of Latitude & Longitude. ( dataset can be found in the link)
https://drive.google.com/file/d/1MJUvM5WEhua7Rt0lufCyugBdGSKaHMGZ/view?usp=sharing
I want to measure the distance between each point of Latitude & Longitude.
For example;
Latitude has: x1 ,x2 ,x3 ,...x1000
Longitude has: y1 ,y2 ,y3 ,..., y100
I want to measure the distance between (x1,y1) to all the points , and (x2,y2) to all the points, and so on.
The reason I'm doing this to know which point close to which and assign index to each location based on the distance.
if (x1, y1) is close to (x4,y4) so (x1, y1) will get the index A for example and (x4,y4) will get labeled as B. sort the points in order based on distance.
I tried gDistance function but showed error message: "package ‘gDistance’ is not available (for R version 3.4.3)"
and if I change the version to 3.3 library(rgeos) won't work !!
Any suggestions?
here's what I tried,
#requiring necessary packages:
library(sp) # vector data
library(rgeos) # geometry ops
#Read the data and transform them to spatial objects
d <- read.csv("ReadyData.csv")
sp.ReadData <- d
coordinates(sp.ReadyData) <- ~Longitude + Latitude
d <- gDistance(sp.ReadyData, byid= TRUE)
here's update my solution, I created spatial object and made spatial data frame as follow:
#Create spatial object:
lonlat <- cbind(spatial$Longitude, spatial$Latitude)
#Create a SpatialPoints object:
library(sp)
pts <- SpatialPoints(lonlat)
crdref <- CRS('+proj=longlat +datum=WGS84')
pts <- SpatialPoints(lonlat, proj4string=crdref)
# make spatial data frame
ptsdf <- SpatialPointsDataFrame(pts, data=spatial)
Now I'm trying to measure the Distance for longitude/latitude coordinates. I tried dist method but seems not working for me and tried pointDistance method:
gdis <- pointDistance(pts, lonlat=TRUE)
still not clear for me how this function can measure the distance, I need to figure out the distance so I can locate the point in the middle and assign numbers for each point based on its location from the middle point..
You can use raster::pointDistance or geosphere::distm among others functions.
Part of your example data (please avoid files in your questions):
d <- read.table(sep=",", text='
"OBU ID","Time Received","Speed","Latitude","Longitude"
"1",20,1479171686325,0,38.929596,-77.2478813
"2",20,1479171686341,0,38.929596,-77.2478813
"3",20,1479171698485,1.5,38.9295887,-77.2478945
"4",20,1479171704373,1,38.9295048,-77.247922
"5",20,1479171710373,0,38.9294865,-77.2479055
"6",20,1479171710373,0,38.9294865,-77.2479055
"7",20,1479171710373,0,38.9294865,-77.2479055
"8",20,1479171716373,2,38.9294773,-77.2478712
"9",20,1479171716374,2,38.9294773,-77.2478712
"10",20,1479171722373,1.32,38.9294773,-77.2477417')
Solution:
library(raster)
m <- pointDistance(d[, c("Longitude", "Latitude")], lonlat=TRUE)
To get the nearest point to each point, you can do
mm <- as.matrix(as.dist(m))
diag(mm) <- NA
i <- apply(mm, 1, which.min)
The point pairs
p <- cbind(1:nrow(mm), i)
To get the distances, you can do:
mm[p]
Or do this:
apply(mm, 1, min, na.rm=TRUE)
Note that rgeos::gDistance is for planar data, not for longitude/latitude data.
Here is a similar question/answer with some illustration.
our data set is too large to make a single distance matrix. You can process your data in chunks to with that. Here I am showing that with a rather small chunk size of 4 rows. Make this number much bigger to speed up processing time.
library(geosphere)
chunk <- 4 # rows
start <- seq(1, nrow(d), chunk)
end <- c(start[-1], nrow(d))
x <- d[, c("Longitude", "Latitude")]
r <- list()
for (i in 1:length(start)) {
y <- x[start[i]:end[i], , drop=FALSE]
m <- distm(y, x)
m[cbind(1:nrow(m), start[i]:end)] <- NA
r[[i]] <- apply(m, 1, which.min)
}
r <- unlist(r)
r
# [1] 2 1 1 5 6 6 5 5 9 8 8 8
So for your data:
d <- read.csv("ReadyData.csv")
chunk <- 100 # rows
# etc
This will take a long time.
An alternative approach:
library(spdep)
x <- as.matrix(d[, c("Longitude", "Latitude")])
k <- as.vector(knearneigh(x, k=1, longlat=TRUE)$nn)
Assuming you have p1 as spatialpoints of x and p2 as spatialpoints of y, to get the index of the nearest other point:
ReadyData$cloDist <- apply(gDistance(p1, p2, byid=TRUE), 1, which.min)
If you have the same coordinate in the list you will get an index of the point itself since the closest place to itself is itself. An easy trick to avoid that is to use the second farthest distance as reference with a quick function:
f_which.min <- function(vec, idx) sort(vec, index.return = TRUE)$ix[idx]
ReadyData$cloDist2 <- apply(gDistance(p1, p2, byid=TRUE), 1, f_which.min,
idx = 2)

Get all (multiple) values of a data.frame falling into every raster cell

I have spatial data with lat/long (x/y) and want to put a raster on it. I want to get all values inside every raster cell where the respective points fit into. The points are not equally distributed so one raster cell does not contain the same amount of points as the neighbouring raster cell. I know that there is the function rasterize that uses the mean to average all values inside that cell into one new value but I dont want to interpolate the mean inside the cell, I want to extract all values (here values of points inside that cell).
How can I do this in an effective way?
consider I have:
library(raster)
library(sp)
my data:
x <- runif(n) * 360 - 180
y <- runif(n) * 180 - 90
n <- 1000
values=runif(n)
xy <- cbind(x,y)
my raster
r <- raster(ncols=10, nrows=10)
Now I dont want to average all values like using rasterize, but extract all values (e.g into a list) that fall into that cell.
Many thanks for ideas and help! Is there any function for this?
Firstly, you have to have values in the raster to be sampled. In your example you are just trying to sample an empty raster. ( I mistook this for your sample size in the originals edit; issue is with your example, not the question)
To answer your question...
extract() is the function you are looking for:
library(raster)
library(sp)
r <- raster(ncols=10, nrows=10)
n <- 1000
x <- runif(n) * 360 - 180
y <- runif(n) * 180 - 90
values=runif(n)
r[]<-values
xy <- SpatialPointsDataFrame(data=data.frame(cbind(x,y)),coords=cbind(x,y))
r0 <- extract(r, xy)
plot(r0)

Detect if raster is within, without, or intersecting a SpatialPolygons object

I have many rasters for which I'd like to check if they are contain fully within a spatial polygon, fully without the spatial polygon, or intersecting with the spatial polygon (this could mean the polygon is fully within the raster, or the polygon and raster overlap). I'm doing this check to save me from time intensive masking when possible.
Here is an example:
# create 3 example rasters
r <- raster()
r[] <- rnorm(n = ncell(r))
e1 <- extent(c(45,55,45,50))
r1 <- crop(r,e1)
e2 <- extent(c(20,25,25,30))
r2 <- crop(r,e2)
e3 <- extent(c(38,55,57,65))
r3 <- crop(r,e3)
#create SpatialPolygons
x <- c(40,60)
y <- c(40,60)
m <- expand.grid(x,y)
m <- m[c(1,2,4,3),]
p1 <- Polygon(m)
p1 <- Polygons(list(p1),1)
x <- c(10,15)
y <- c(10,15)
m <- expand.grid(x,y)
m <- m[c(1,2,4,3),]
p2 <- Polygon(m)
p2 <- Polygons(list(p2),2)
x <- c(30,45)
y <- c(70,80)
m <- expand.grid(x,y)
m <- m[c(1,2,4,3),]
p3 <- Polygon(m)
p3 <- Polygons(list(p3),3)
poly <- SpatialPolygons(list(p1,p2,p3))
plotting these:
I will read in each raster separately and check if it is within, without, or intersecting the SpatialPolygons.
What do you think will be the most efficient way to do this in R? I have thousands of 4mb rasters that I'm planning to mask in parallel and would like this check to speed things up a bit.
Note, there's also this question: https://gis.stackexchange.com/questions/34535/detect-whether-there-is-a-spatial-polygon-in-a-spatial-extent
However, I don't think it gives the detail I'm looking for. For example, all of the rasters are within the extent of the spatial polygons, but not all are within the spatial polygons.
Functions like those in rgeos (gIntersects, gContains), would probably be handy. I'm not sure if these are most efficient, or how I should convert raster (or it's extent), to an sp object.
thanks!
You can also use gRelate for this. It returns a DE-9IM code that describes the relationships between interior, boundary and exterior components of the two geometries.
library(rgeos)
x <- sapply(rlist, function(x)
gsub('[^F]', 'T', gRelate(as(extent(x), 'SpatialPolygons'), poly)))
You could then compare the strings to relationships of interest. For example we might define within, disjoint, and overlaps as follows (but note that other some intersections are optional for given relationships - "within" is defined by GEOS as T*F**F***, "disjoint" as FF*FF****, and "overlaps" as T*T***T**):
pat <- c(TFFTFFTTT='within', FFTFFTTTT='disjoint', TTTTTTTTT='overlaps')
pat[x]
## TFFTFFTTT FFTFFTTTT TTTTTTTTT
## "within" "disjoint" "overlaps"
It seems marginally faster than the gContainsProperly/gIntersects approach, but #Tedward's post is more comprehensible, and more consistent with GEOS definitions (though the power to create specific relationship definitions might be desirable).
The elements of the DE-9IM strings represent, in order:
Does the interior of geometry A intersect the interior of geometry B?
Does the boundary of A intersect the interior of B?
Does the exterior of A intersect the interior of B?
Does the interior of geometry A intersect the boundary of geometry B?
Does the boundary of A intersect the boundary of B?
Does the exterior of A intersect the boundary of B?
Does the interior of geometry A intersect the exterior of geometry B?
Does the boundary of A intersect the exterior of B?
Does the exterior of A intersect the exterior of B?
This is what I did to solve the problem:
library(rgeos)
rlist <- list(r1,r2,r3)
lapply(rlist, function(raster) {
ei <- as(extent(raster), "SpatialPolygons")
if (gContainsProperly(poly, ei)) {
print ("fully within")
} else if (gIntersects(poly, ei)) {
print ("intersects")
} else {
print ("fully without")
}
})
Please let me know if you know of a more efficient solution.

Row ordering for polygons

My question is simple. Is there an automatic way to order you data so that it makes "clean" polygons? I have functions that are generating rings (specifically the ahull function), and I would like a way to cleanly produce polygons using such functions. Here is an example.
x <- c(1:3, 3:1, 1)
y <- c(1,1,1,3,3,2, 1)
xy <- cbind(x,y)
Sr1 <- Polygon(xy)
Srs1 = Polygons(list(Sr1), "s1")
SpP = SpatialPolygons(list(Srs1))
plot(SpP)
z <- runif(7)
xyz <- cbind(x,y,z)
xyz <- xyz[order(z),]
xy <- xyz[,-3]
xy <- rbind(xy, xy[1,])
Sr1 <- Polygon(xy)
Srs1 = Polygons(list(Sr1), "s1")
SpP = SpatialPolygons(list(Srs1))
SpP = SpatialPolygons(list(Srs1))
plot(SpP)
Here is my real data: https://drive.google.com/file/d/0B8QG4cbDqH0UOUlobnlWaDgwOWs/edit?usp=sharing
In a sense, you have answered your own question.
Assuming you have a set of points, and you use ahull(...) in the alphahull package to generate the convex hull, you can extract the points on the boundary, in the correct order, directly from the ahull object. Here is an example:
library(sp)
library(alphahull)
set.seed(1) # for reproducible example
X <- rnorm(100)
Y <- rnorm(100)
plot(X,Y)
XY <- cbind(X,Y)
hull <- ahull(XY,alpha=1)
plot(hull)
# extract the row numbers of the boundary points, in convex order.
indx=hull$arcs[,"end1"]
points <- XY[indx,] # extract the boundary points from XY
points <- rbind(points,points[1,]) # add the closing point
# create the SpatialPolygonsDataFrame
SpP = SpatialPolygons(list(Polygons(list(Polygon(points)),ID="s1")))
plot(SpP)
points(XY)
EDIT Response to OP's providing their dataset.
ahull(...) seems to fail, without warning, with your dataset - it does not produce any convex hulls. After a bit if experimentation, it looks like the problem has to do with the magnitude of the x,y values. If I divide everything by 1000, it works. No idea what's going one with that (perhaps someone else will provide an insight??). Anyway, here's the code and the result:
library(sp)
library(alphahull)
df <- read.csv("ahull problem.csv")
hull <- ahull(df[2:3]/1000,alpha=2)
plot(hull)
# extract the row numbers of the boundary points, in convex order.
indx=hull$arcs[,"end1"]
points <- df[indx,2:3] # extract the boundary points from df
points <- rbind(points,points[1,]) # add the closing point
# create the SpatialPolygonsDataFrame
SpP = SpatialPolygons(list(Polygons(list(Polygon(points)),ID="s1")))
plot(SpP)
points(df[2:3])
Note also that alpha=2. Setting alpha=1 with this dataset actually generates 2 hulls, one with 1 point and one with all the other points. Setting alpha=2 creates 1 hull.

Resources