QGIS Select polygons which intersect points with python - polygon

I'm very new to using QGIS, what I have is a points shapefile and a polygon shapefile. I would like to select all the polygons which have at least one point in them. The problem I'm running into is how long this takes. I have 1 million points and about 320,000 polygons, so using spatial query takes far too long. I've heard that I'd need to write a python script with spatial indexing to get a feasibly quick result, but I have no idea how to approach this. Any help would be greatly appreciated.
What I've tried to cobble together from other stack overflow questions is:
pointProvider = self.pointLayer.dataProvider()
all_point = pointProvider.getFeatures()
delta = 0.1
for point in all_point:
searchRectangle = QgsRectangle(point.x() - delta, point.y() - delta, point.x() + delta, point.y() + delta)
candidateIDs = line_index.intesects(searchRectangle)
for candidateID in candidateIDs:
candFeature == rotateProvider.getFeatures(QgsFeatureRequest(candidateID)).next()
if candFeature.geometry().contains(point):
break
This throws up a NameError: name 'self' is not defined

I found an answer over on GIS Stack Exchange, which you can find here
The code I used was:
from qgis.core import *
import processing
layer1 = processing.getObject('MyPointsLayer')
layer2 = processing.getObject('MyPolygonsLayer')
index = QgsSpatialIndex() # Spatial index
for ft in layer1.getFeatures():
index.insertFeature(ft)
selection = [] # This list stores the features which contains at least one point
for feat in layer2.getFeatures():
inGeom = feat.geometry()
idsList = index.intersects(inGeom.boundingBox())
if idsList:
selection.append(feat)
# Select all the polygon features which contains at least one point
layer2.setSelectedFeatures([k.id() for k in selection])

Related

Intersection of polygons in R using sf

I want to assess the degree of spatial proximity of each point to other equivalent points by looking at the number of others within 400m (5 minute walk).
I have some points on a map.
I can draw a simple 400 m buffer around them.
I want to determine which buffers overlap and then count the number of overlaps.
This number of overlaps should relate back to the original point so I can see which point has the highest number of overlaps and therefore if I were to walk 400 m from that point I could determine how many other points I could get to.
I've asked this question in GIS overflow, but I'm not sure it's going to get answered for ArcGIS and I think I'd prefer to do the work in R.
This is what I'm aiming for
https://www.newham.gov.uk/Documents/Environment%20and%20planning/EB01.%20Evidence%20Base%20-%20Cumulative%20Impact%20V2.pdf
To simplify here's some code
# load packages
library(easypackages)
needed<-c("sf","raster","dplyr","spData","rgdal",
"tmap","leaflet","mapview","tmaptools","wesanderson","DataExplorer","readxl",
"sp" ,"rgisws","viridis","ggthemes","scales","tidyverse","lubridate","phecharts","stringr")
easypackages::libraries(needed)
## read in csv data; first column is assumed to be Easting and second Northing
polls<-st_as_sf(read.csv(url("https://www.caerphilly.gov.uk/CaerphillyDocs/FOI/Datasets_polling_stations_csv.aspx")),
coords = c("Easting","Northing"),crs = 27700)
polls_buffer_400<-st_buffer(plls,400)
polls_intersection<-st_intersection(x=polls_buffer_400,y=polls_buffer_400)
plot(polls_intersection$geometry)
That should show the overlapping buffers around the polling stations.
What I'd like to do is count the number of overlaps which is done here:
polls_intersection_grouped<-polls_intersection%>%group_by(Ballot.Box.Polling.Station)%>%count()
And this is the bit I'm not sure about, to get to the output I want (which will show "Hotspots" of polling stations in this case) how do I colour things? How can I :
asess the degree of spatial proximity of each point to other equivalent points by looking at the number of others within 400m (5 minute walk).
It's probably terribly bad form but here's my original GIS question
https://gis.stackexchange.com/questions/328577/buffer-analysis-of-points-counting-intersects-of-resulting-polygons
Edit:
this gives the intersections different colours which is great.
plot(polls_intersection$geometry,col = sf.colors(categorical = TRUE, alpha = .5))
summary(lengths(st_intersects(polls_intersection)))
What am I colouring here? I mean it looks nice but I really don't know what I'm doing.
How can I : asess the degree of spatial proximity of each point to other equivalent points by looking at the number of others within 400m (5 minute walk).
Here is how to add a column to your initial sfc of pollings stations that tells you how many polling stations are within 400m of each feature in that sfc.
Note that the minimum value is 1 because a polling station is always within 400m of itself.
# n_neighbors shows how many polling stations are within 400m
polls %>%
mutate(n_neighbors = lengths(st_is_within_distance(polls, dist = 400)))
Similarly, for your sfc collection of intersecting polygons, you could add a column that counts the number of buffer polygons that contain each intersection polygon:
polls_intersection %>%
mutate(n_overlaps = lengths(st_within(geometry, polls_buffer_400)))
And this is the bit I'm not sure about, to get to the output I want (which will show "Hotspots" of polling stations in this case) how do I colour things?
If you want to plot these things I highly recommend using ggplot2. It makes it very clear how you associate an attribute like colour with a specific variable.
For example, here is an example mapping the alpha (transparency) of each polygon to a scaled version of the n_overlaps column:
library(ggplot2)
polls_intersection %>%
mutate(n_overlaps = lengths(st_covered_by(geometry, polls_buffer_400))) %>%
ggplot() +
geom_sf(aes(alpha = 0.2*n_overlaps), fill = "red")
Lastly, there should be a better way to generate your intersecting polygons that already counts overlaps. This is built in to the st_intersection function for finding intersections of sfc objects with themselves.
However, your data in particular generates an error when you try to do this:
st_intersection(polls_buffer_400)
# > Error in CPL_nary_intersection(x) :
#> Evaluation error: TopologyException: side location conflict at 315321.69159061194 199694.6971799387.
I don't know what a "side location conflict" is. Maybe #edzer could help with that. However, most subsets of your data do not contain that conflict. For example:
# this version adds an n.overlaps column automatically:
st_intersection(polls_buffer_400[1:10,]) %>%
ggplot() + geom_sf(aes(alpha = 0.2*n.overlaps), fill = "red")

Analyzing octopus catches with LinearK function in R [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 3 years ago.
Improve this question
I hope you can help me with this problem i can't find how to overcome. Sorry if I made some mistakes while writing this post, my english is a bit rusty right now.
Here is the question. I have .shp data that I want to analyze in R. The .shp can be either lines that represent lines of traps we set to catch octopuses or points located directly over those lines, representing where we had catured one.
The question i'm trying to answer is: Are octopuses statistically grouped or not?
After a bit of investigation it seems to me that i need to use R and its linearK function to answer that question, using the libraries Maptools, SpatStat and Sp.
Here is the code i'm using in RStudio:
Loading the libraries
library(spatstat)
library(maptools)
library(sp)
Creating a linnet object with the track
t1<- as.linnet(readShapeSpatial("./20170518/t1.shp"))
I get the following warning but it seems to work
Warning messages:
1: use rgdal::readOGR or sf::st_read
2: use rgdal::readOGR or sf::st_read
Plotting it to be sure everything is ok
plot(t1)
Creating a ppp object with the points
p1<- as.ppp(readShapeSpatial("./20170518/p1.shp"))
I get the same warning here, but the real problems start when I try to plot it:
> plot(p1)
Error in if (!is.vector(xrange) || length(xrange) != 2 || xrange[2L] < :
missing value where TRUE/FALSE needed
In addition: Warning messages:
1: Interpretation of arguments maxsize and markscale has changed (in spatstat version 1.37-0 and later). Size of a circle is now measured by its diameter.
2: In plot.ppp(x, ..., multiplot = FALSE, do.plot = FALSE) :
All mark values are NA; plotting locations only.
3: In plot.ppp(x, ..., multiplot = FALSE, do.plot = FALSE) :
All mark values are NA; plotting locations only.
4: In plot.ppp(x, ..., multiplot = FALSE, do.plot = FALSE) :
All mark values are NA; plotting locations only.
5: In plot.ppp(x, ..., multiplot = FALSE, do.plot = FALSE) :
All mark values are NA; plotting locations only.
6: In plot.ppp(x, ..., multiplot = FALSE, do.plot = FALSE) :
All mark values are NA; plotting locations only.
7: In plot.ppp(x, ..., multiplot = FALSE, do.plot = FALSE) :
All mark values are NA; plotting locations only.
Now what is left is to join the objects in a lpp object and to analyze it with the linearK function
> pt1 <- lpp(p1,t1)
> linearK(pt1)
Function value object (class ‘fv’)
for the function r -> K[L](r)
......................................
Math.label Description
r r distance argument r
est {hat(K)[L]}(r) estimated K[L](r)
......................................
Default plot formula: .~r
where “.” stands for ‘est’
Recommended range of argument r: [0, 815.64]
Available range of argument r: [0, 815.64]
This is my situation right now. What i dont know is why the plot function is not working with my ppp object and how to understant the return of the linearK function. Help(linearK) didn't provide any clue. Since i have a lot of tracks, each with its set of points, my desired outcome would be some kind of summary like x tracks analized, a grouped, b dispersed and c unkown.
Thank you for your time, i'll greatly appreciate if you can help me solve this problem.
Edit: Here is a link to a zip file containing al the shp files of one day, both tracks and points, and a txt file with my code. https://drive.google.com/open?id=0B0uvwT-2l4A5ODJpOTdCekIxWUU
First two pieces of general advice: (1) each time you create a complicated object, print it at the terminal, to see if it is what you expected. (2) When you get an error, immediately type traceback() and copy the output. This will reveal exactly where the error is detected.
A ppp object must include a specification of the study region (window). In your code, the object p1 is created by converting data of class SpatialPointsDataFrame, which do not include a specification of the study region, converted via the function as.ppp.SpatialPointsDataFrame, into an object of class ppp in which the window is guessed by taking the bounding box of the coordinates. Unfortunately, in your example, there is only one data point in p1, so the default bounding box is a rectangle of width 0 and height 0. [This would have been revealed by printing p1.] Such objects can usually be handled by spatstat, but this particular object triggers a bug in the function plot.solist which expects windows to have non-zero size. I will fix the bug, but...
In your case, I suggest you do
Window(p1) <- Window(t1)
immediately after creating p1. This will ensure that p1 has the window that you probably intended.
If all else fails, read the spatstat vignette on shapefiles...
I have managed to find a solution. As Adrian Baddeley noticed there was a problem with the owin object. That problem seems to be bypassed (not really solved) if I create the ppp object in a manual way instead of converting my set of points.
I have also changed the readShapeFile function for the rgdal::readOGR, since the first once was deprecated, and that was the reason of the warnings I was getting.
This is the R script i'm using right now, commented to clarify:
#first install spatstat, maptools y sp
#load them
library(spatstat)
library(maptools)
library(sp)
#create an array of folders, will add more when everything works fine
folders=c("20170518")
for(f in folders){
#read all shp from that folder, both points and tracks
pointfiles <- list.files(paste("./",f,"/points", sep=""), pattern="*.shp$")
trackfiles <- list.files(paste("./",f,"/tracks", sep=""), pattern="*.shp$")
#for each point and track couple
for(i in 1:length(pointfiles)){
#create a linnet object with the track
t<- as.linnet(rgdal::readOGR(paste("./",f,"/tracks/",trackfiles[i], sep="")))
#plot(t)
#create a ppp object for each set of points
pre_p<-rgdal::readOGR(paste("./",f,"/points/",pointfiles[i], sep=""))
#plot(p)
#obtain the coordinates the current set of points
c<-coordinates(pre_p)
#create vector of x coords
xc=c()
#create vector of y coords
yc=c()
#not a very good way to fill my vectors but it works for my study area
for(v in c){
if(v>4000000){yc<-c(yc,v)}
else {if(v<4000000 && v>700000){xc<-c(xc,v)}}
}
print(xc)
print(yc)
#create a ppp object using the vectors of x and y coords, and a window object
#extracted from my set of points
p=ppp(xc,yc,Window(as.ppp(pre_p)))
#join them into an lpp object
pt <- lpp(p,t)
#plot(pt)
#analize it with the linearK function, nsim=9 for testing purposes
#envelope.lpp is the method for analyzing linear point patterns
assign(paste("results",f,i,sep="_"),envelope.lpp(pt, nsim=9, fun=linearK))
}#end for each points & track set
}#end for each day of study
So as you can see this script is testing for CSR each couple of points and track for each day, working fine right now. Unfortunately I have not managed to create a report or reportlike with the results yet (or even to fully understand them), I'll keep working on that. Of course I can use any advice you have, since this is my first try with R and many newie mistakes will happen.
The script and the shp files with the updated folder structure can be found here(113 KB size)

Shortest grid connection wire length

I am constructing a microgrid, so I have to connect 12 or so houses to a central solar power source. Wiring is a major cost here, so I'm trying to come up with a configuration that minimizes wire length. It's similar but not exactly like a traveling salesman problem, because multiple wires can come out of the same source -- if it were a single wire/path, it would be exactly like the TSP.
So my question is:
Does anyone know of an algorithm to determine the shortest way to connect all points, where there is a central point that can connect an indeterminate number of the surrounding points? The final solution should resemble a graph in which n-1 nodes have maximum two edges connecting them and one may have up to n-1 edges? Specifically, is there a way to do this in R?
EDIT TO SHOW CODE/EFFORT
I've solved it in a relatively simple way assuming a single path. And I've solved it assuming every house is connected directly to the power source. Here is that code:
############ interested users Wire CalculationBommekalla Analysis
require(png)
require(grid)
require(arm)
require(DCluster)
require(ggplot2)
setwd("C:/Users/Lucas/Documents/India2014_2015/ADATS Docs/BoomekallaAnalysis")
data= read.csv("BommekallInterestedUsers.csv")
summary(data)
names(data)
data$ind = c(1:nrow(data))
##### Analysis
# Shortest Path
distFrame = data[,c("Lon.Deg", "Lat.Deg")]
dists= as.matrix(dist(distFrame, upper=TRUE))
diag(dists)=1000
current= which(data.WO$ind==11) # sushilemma
ord = rep(current,length=nrow(data.WO))
dists[,current]=1000
for (j in c(1:(nrow(data.WO)-1))){
current= which(dists[current,]==min(dists[current,]))
dists[,current]=1000
ord[j+1] = current
}
# line calculation
firstHouses= data.WO[ord,]
secondHouses= data.WO[c(ord[-1],NA),]
lines = data.frame(lonA = firstHouses$Lon.Deg,
latA= firstHouses$Lat.Deg,
lonB = secondHouses$Lon.Deg,
latB = secondHouses$Lat.Deg)
lines= na.omit(lines)
# Spider web -- completely connected to source
ccLines = data.frame(latA = data$Lat.Deg[
data$Name=="Sushilemma"], latB = data$Lat.Deg,
lonA = data$Lon.Deg[data$Name=="Sushilemma"],
lonB = data$Lon.Deg)
# Haversine Distance -- atanh_trans() is arctan
linesRads=lines*pi/180
a= with(linesRads, sin((latB-latA)/2)^2+
cos(latB)*cos(latA)*sin((lonB-lonA)/2)^2)
c= 2*asin(pmin(1,sqrt(a)))
lines$distance=6371*c*1000
totalDistance = sum(lines$distance)
totalCost = totalDistance*15

Get summary vectors of raster cell centers in R

I want to extract summary vectors that contain the coordinates for the centers of the different cells in a raster. The following code works but I believe involves an n-squared comparison operation. Is there a more efficient method? Not seeing anything obvious in {raster}'s guidance.
require(raster)
r = raster(volcano)
pts = rasterToPoints(r)
x_centroids = unique(pts[,1])
y_centroids = unique(pts[,2])
To get the centers of the raster cells, you should use the functions xFromCol, yFromRow and friends (see also the help pages)
In this case, you get exactly the same result as follows:
require(raster)
r <- raster(volcano)
x_centers <- xFromCol(r)
y_centers <- yFromRow(r)
Note that these functions actually don't do much else but check the minimum value of the coordinates and the resolution of the raster. From these two values, they calculate the sequence of centers as follows:
xmin(r) + (seq_len(ncol(r)) - 0.5) * xres(r)
ymin(r) + (seq_len(nrow(r)) - 0.5) * xres(r)
But you better use the functions mentioned above, as these do a bit more safety checks.

Use Rcartogram on a SpatialPolygonsDataFrame object

I'm trying to do the same thing asked in this question, Cartogram + choropleth map in R, but starting from a SpatialPolygonsDataFrame and hoping to end up with the same type of object.
I could save the object as a shapefile, use scapetoad, reopen it and convert back, but I'd rather have it all within R so that the procedure is fully reproducible, and so that I can code dozens of variations automatically.
I've forked the Rcartogram code on github and added my efforts so far here.
Essentially what this demo does is create a SpatialGrid over the map, look up the population density at each point of the grid and convert this to a density matrix in the format required for cartogram() to work on. So far so good.
But, how to interpolate the original map points based on the output of cartogram()?
There are two problems here. The first is to get the map and grid into the same units to allow interpolation. The second is to access every point of every polygon, interpolate it, and keep them all in right order.
The grid is in grid units and the map is in projected units (in the case of the example longlat). Either the grid must be projected into longlat, or the map into grid units. My thought is to make a fake CRS and use this along with the spTransform() function in package(rgdal), since this handles every point in the object with minimal fuss.
Accessing every point is difficult because they are several layers down into the SpPDF object: object>polygons>Polygons>lines>coords I think. Any ideas how to access these while keeping the structure of the overall map intact?
This problem can be solved with the getcartr package, available on Chris Brunsdon's GitHub, as beautifully explicated in this blog post.
The quick.carto function does exactly what you want -- takes a SpatialPolygonsDataFrame as input and has a SpatialPolygonsDataFrame as output.
Reproducing the essence of the example in the blog post here in case the link goes dead, with my own style mixed in & typos fixed:
(Shapefile; World Bank population data)
library(getcartr)
library(maptools)
library(data.table)
world <- readShapePoly("TM_WORLD_BORDERS-0.3.shp")
#I use data.table, see blog post if you want a base approach;
# data.table wonks may be struck by the following step as seeming odd;
# see here: http://stackoverflow.com/questions/32380338
# and here: https://github.com/Rdatatable/data.table/issues/1310
# for some background on what's going on.
world#data <- setDT(world#data)
world.pop <- fread("sp.pop.totl_Indicator_en_csv_v2.csv",
select = c("Country Code", "2013"),
col.names = c("ISO3", "pop"))
world#data[world.pop, Population := as.numeric(i.pop), on = "ISO3"]
#calling quick.carto has internal calls to the
# necessary functions from Rcartogram
world.carto <- quick.carto(world, world$Population, blur = 0)
#plotting with a color scale
x <- world#data[!is.na(Population), log10(Population)]
ramp <- colorRampPalette(c("navy", "deepskyblue"))(21L)
xseq <- seq(from = min(x), to = max(x), length.out = 21L)
#annoying to deal with NAs...
cols <- ramp[sapply(x, function(y)
if (length(z <- which.min(abs(xseq - y)))) z else NA)]
plot(world.carto, col = cols,
main = paste0("Cartogram of the World's",
" Population by Country (2013)"))

Resources