I want to assess the degree of spatial proximity of each point to other equivalent points by looking at the number of others within 400m (5 minute walk).
I have some points on a map.
I can draw a simple 400 m buffer around them.
I want to determine which buffers overlap and then count the number of overlaps.
This number of overlaps should relate back to the original point so I can see which point has the highest number of overlaps and therefore if I were to walk 400 m from that point I could determine how many other points I could get to.
I've asked this question in GIS overflow, but I'm not sure it's going to get answered for ArcGIS and I think I'd prefer to do the work in R.
This is what I'm aiming for
To simplify here's some code
# load packages
"sp" ,"rgisws","viridis","ggthemes","scales","tidyverse","lubridate","phecharts","stringr")
## read in csv data; first column is assumed to be Easting and second Northing
coords = c("Easting","Northing"),crs = 27700)
That should show the overlapping buffers around the polling stations.
What I'd like to do is count the number of overlaps which is done here:
And this is the bit I'm not sure about, to get to the output I want (which will show "Hotspots" of polling stations in this case) how do I colour things? How can I :
assess the degree of spatial proximity of each point to other equivalent points by looking at the number of others within 400m (5 minute walk).
It's probably terribly bad form but here's my original GIS question
this gives the intersections different colours which is great.
plot(polls_intersection$geometry,col = sf.colors(categorical = TRUE, alpha = .5))
What am I colouring here? I mean it looks nice but I really don't know what I'm doing.

How can I : asess the degree of spatial proximity of each point to other equivalent points by looking at the number of others within 400m (5 minute walk).
Here is how to add a column to your initial sfc of pollings stations that tells you how many polling stations are within 400m of each feature in that sfc.
Note that the minimum value is 1 because a polling station is always within 400m of itself.
# n_neighbors shows how many polling stations are within 400m
polls %>%
mutate(n_neighbors = lengths(st_is_within_distance(polls, dist = 400)))
Similarly, for your sfc collection of intersecting polygons, you could add a column that counts the number of buffer polygons that contain each intersection polygon:
polls_intersection %>%
mutate(n_overlaps = lengths(st_within(geometry, polls_buffer_400)))
And this is the bit I'm not sure about, to get to the output I want (which will show "Hotspots" of polling stations in this case) how do I colour things?
If you want to plot these things I highly recommend using ggplot2. It makes it very clear how you associate an attribute like colour with a specific variable.
For example, here is an example mapping the alpha (transparency) of each polygon to a scaled version of the n_overlaps column:
polls_intersection %>%
mutate(n_overlaps = lengths(st_covered_by(geometry, polls_buffer_400))) %>%
ggplot() +
geom_sf(aes(alpha = 0.2*n_overlaps), fill = "red")
Lastly, there should be a better way to generate your intersecting polygons that already counts overlaps. This is built in to the st_intersection function for finding intersections of sfc objects with themselves.
However, your data in particular generates an error when you try to do this:
# > Error in CPL_nary_intersection(x) :
#> Evaluation error: TopologyException: side location conflict at 315321.69159061194 199694.6971799387.
I don't know what a "side location conflict" is. Maybe #edzer could help with that. However, most subsets of your data do not contain that conflict. For example:
# this version adds an n.overlaps column automatically:
st_intersection(polls_buffer_400[1:10,]) %>%
ggplot() + geom_sf(aes(alpha = 0.2*n.overlaps), fill = "red")


Finding nearest neighbours in time and space in R using sf

I have a made up dataset of polling stations in Wales and I've attached a date column to it. We can imagine this date is the date this polling station was visited to check the facilities (for example).
What I'd like to do is work out :
I would like to work out whether geographic points are within a certain distance
This I've managed by self_joining and using st_buffer and st_within to calculate within 1000 m and then calculated the number of neighbours.
and also the interval between the sample dates
this I'm having a bit of a problem with
What I'd like to do, I think, is
for each polling station
calculate the number of neighbours (so far so easy)
for each neighbour determine the interval between the sampling dates
return a spatial object (for plotting in tmaps probably)
Here's some test code that I've got that generates the sf dataset, calculates the number of neighbours and returns that.
It's really the date interval that's stumping me. It's not so much the calculation of the date interval but it's the way to generate these clusters of polling stations with date intervals.
Is it better to generate the (in this case) 108 polling station clusters?
What I'm trying to do in my larger dataset is calculate clusters of points over time.
I have ~2000 records with a date. I'd like to say :
for each of these 2000 records calculate the number of neighbours within a distance and within a timeframe.
I think it's probably better to
calculate each cluster of neighbouring points and visualise
remove neighbours from the cluster that are outside of the time frame and visualise that
Although, on typing this, I wonder if excluding points that didn't fall within a timeframe first and then calculating neighbours would be more efficient?
coords = c("Easting","Northing"),crs = 27700)%>%
mutate(date = sample(seq(as.Date('2020/01/01'), as.Date('2020/05/31'), by="day"), 147))
rename(number_of_neighbours = n)%>%
mutate(interval_date = date.x-date.y)%>%
subset(select = -c(6:8,10,11,13:18))## removing this comment will summarise the data so that only number of neighbours is returned %>%
filter(number_of_neighbours >=2)
I think it might be as simple as
tm_shape(test_stack)+tm_dots(col = "number_of_neighbours", clustering =T, size = 0.5)
I'm not sure how clustering works in leaflet, but that works quite nicely on this test data.

how to plot specific segment from density.lpp

I use density.lpp for kernel density estimation. I want to pick specific segment in that and plot the estimation through chosen segment. As an example, I have a road which is a combination of two segments. each segments have different length so I don't know how many pieces each of them are divided by.
here is the locations of vertices and road segment ids.
here is the code I used to create spatial lines data frame and random points on the network and get density estimation.
Is there a way to know how many pieces each segment divided by? OR if I want to plot locations vs estimation for chosen segment how can I do? Using dimyx=100 created 199 estimation points but I don't know how many of them belongs to Swid=1 or Swid=2.
One approached I used was, using gDistance it works fine in this problem because these segments connected to one directions however, when there is 4 ways connection, some of the lambda values connects to another segments which is not belongs to that segment. I provided picture and circled 2 points, when I used gDistance, those points connected to other segments. Any ideas?
R2.1=dplyr::select(R, X01,Y01,Swid)
coordinates(R2.1) = c("X01", "Y01")
proj4string(R2.1)=CRS("+proj=utm +zone=17 +datum=NAD83 +units=m +no_defs +ellps=GRS80 +towgs84=0,0,0")
plot(R2.1,main="nodes on the road")
LineXX <- lapply(split(R2.1, R2.1$Swid), function(x) Lines(list(Line(coordinates(x))), x$Swid[1L]))
linesXY <- SpatialLines(LineXX)
data <- data.frame(Swid = unique(R2.1$Swid))
rownames(data) <- data$Swid
lxy <- SpatialLinesDataFrame(linesXY, data)
Rand1=runiflpp(250, W.1)
W2=owin(xrange=c(142751.98, 214311.26), yrange=c(3353111, 3399329))
Trpp=ppp(x=Rand1XY$x, y=Rand1XY$y, window=W2) ### planar point object
L.orig=lpp(Trpp,W.1) # discrete
plot(L.orig,main="Original with accidents")
S1=bw.scott(L.orig)[1] # in case to change bandwitdh
[1] 199
This is a question about the spatstat package.
The result of density.lpp is an object of class linim. For any such object, you can use to extract the data. This yields a data frame with one row for each sample point on the network. For each sample point, the data are xc, yc (coordinates of nearest pixel centre), x,y (exact coordinates of sample point on network), seg (identifier of segment), tp (relative position along segment) and values (the density value). If you split the data frame by the seg column, you will get the data for invididual segments of the network.
However, it seems that you may want information about the internal workings of density.lpp. In order to achieve adequate accuracy during the computation phase, density.lpp subdivides each network segment into many short segments (using a complex set of rules). This information is lost when the final results are discretised into a linim object and returned. The attribute "dx" reports the length of the short segments that were used in the computation phase, but that's all.
If you email me directly I can show you how to extract the internal information.

Highest concentration area

I try to find groups of points within a radius of 300 meters that gather the highest amount. I am looking for the coordinates of this point. Note that the center point of the area that gather the highest amount has no reason to be a point in data frame observations.
I have the following data:
observations <- spatialrisk::insurance %>%
dplyr::select(amount, lon, lat)
The function spatialrisk::concentration determines the concentration for all target points (i.e. sub):
spatialrisk::concentration(sub = observations,
full = observations,
value = amount, radius = 300)
The function is written in C++ (Rcpp), and is therefore fast. However, the approach is not 'smart'.
Any ideas for a faster solution with the raster (or velox) package? Or with a kernel density approach.

R: How do I loop through spatial points with a specific buffer?

So my problem is quite difficult to describe so I hope I can make my question as clear as possible.
I use the rLiDAR package to load a .las file into R and afterwards convert it into a SpatialPointsDataFrame using the sp package.
So my SpatialPointsDataFrame is quite dense.
Now I want to define a buffer of 0.5 meters and loop (iterate) with him (the buffer) through the points, choosing always the point with the highest Z value within the buffer, as the next point to jump to.This should be repeated until there isn't any point within the buffer with an higher Z value as the current. All values (or perhaps the X and Y values) of this "found" point should then be written into a list/dataframe and the process should be repeated until all such highest points are found.
Thats the code I got so far:
>PointCloud<- data.frame(rLAS)
>coordinates(PointCloud) <- c("X", "Y")
Well I googled extensively but I could not find any clues how to proceed further...
I dont even know which packages could be of help, I guess perhaps spatstat as my question would probably go into the spatial point pattern analysis.
Does anyone have some ideas how to archive something like that in R? Or is something like that not possible? (Do I perhaps have to skip to python to make something like this work?)
Help would gladly be appreciated.
If you want to get the set of points which are the local maxima within a 0.5m radius circle around each point, this should work. The gist of it is:
Convert the LAS points to a SpatialPointsDataFrame
Create a buffered polygon set with overlapping polygons
Loop through all buffered polygons and find the desired element within the buffer -- in your case, it's the one with the maximum height.
Code below:
rLAS <- readLAS("Test.las",short=FALSE)
PointCloud <- data.frame(rLAS)
coordinates(PointCloud) <- c("X", "Y")
Finish creating the SpatialPointsDataFrame from the LAS source. I'm assuming the field with the point height is PointCloud$value
pointCloudSpdf <- SpatialPointsDataFrame(data=PointCloud,xy)
Use rgeos library for intersection. It's important to have byid=TRUE or the polygons will get merged where they intersect
bufferedPoints <- gBuffer(pointCloudSpdf,width=0.5,byid=TRUE)
# Save our local maxima state (this will be updated)
localMaxes <- rep(FALSE,nrow(PointCloud))
for (buff in 1:nrow(bufferedPoint#data)){
i <- i+1
bufPolygons <- bufferedPoints#polygons[[i]]
bufSpPolygons <- SpatialPolygons(list(bufPolygons))
bufSpPolygonDf <-patialPolygonsDataFrame(bufSpPolygons,bufferedPoints#data[i,])
ptsInBuffer <- which(!,spPolygonDf)))
# I'm assuming `value` is the field name containing the point height
localMax <- order(pointCloudSpdf#data$value[ptsInBuffer],decreasing=TRUE)[1]
localMaxes[localMax] <- TRUE
localMaxPointCloudDf <- pointCloudSpdf#data[localMaxes,]
Now localMaxPointCloudDf should contain the data from the original points if they are a local maximum. Just a warning -- this isn't going to be super fast if you have a lot of points. If that ends up being a concern you may be smarter about pre-filtering your points using a smaller grid and extract from the raster package.
That would look something like this:
Make the cell size small enough so that each 0.5m buffer will intersect at least 4 raster cells -- err on smaller since we are comparing circles to squares.
numRows <- extent(pointCloudSpdf)#ymax-extent(pointCloudSpdf)#ymin/0.2
numCols <- extent(pointCloudSpdf)#xmax-extent(pointCloudSpdf)#xmin/0.2
emptyRaster <- raster(nrow=numRows,ncol=numCols)
rasterize will create a grid with the maximum value of the given field within a cell. Because of the square/circle mismatch this is only a starting point to filter out obvious non-maxima. After this we will have a raster in which all the local maxima are represented by cells. However, we won't know which cells are maxima in the 0.5m radius and we don't know which point in the original feature layer they came from.
r <- rasterize(pointCloudSpdf,emptyRaster,"value",fun="max")
extract will give us raster values (i.e., the highest value for each cell) that each point intersects. Recall from above that all the local maxima will be in this set, although some values will not be 0.5m radius local maxima.
rasterMaxes <- extract(r,pointCloudSpdf)
To match up the original points with the raster maxes, just subtract the raster value at each point from that point's value. If the value is 0, then the values are the same and we have a point with a potential maximum. Note that at this point we are only merging the points back to the raster -- we will have to throw some of these out because they are "under" a 0.5m radius with a higher local max even though they are the max in their 0.2m x 0.2m cell.
potentialMaxima <- which(pointCloudSpdf#data$value-rasterMaxes==0)
Next, just subset the original SpatialPointsDataFrame and we'll do the more exhaustive and accurate iteration over this subset of points since we should have thrown out a bunch of points which could not have been maxima.
potentialMaximaCoords <- coordinates(pointCloudSpdf#coords[potentialMaxima,])
# using the data.frame() constructor because my example has only one column
potentialMaximaDf <- data.frame(pointCloudSpdf#data[potentialMaxima,])
potentialMaximaSpdf <-SpatialPointsDataFrame(potentialMaximaCoords,potentialMaximaDf)
The rest of the algorithm is the same but we are buffering the smaller dataset and iterating over it:
bufferedPoints <- gBuffer(potentialMaximaSpdf, width=0.5, byid=TRUE)
# Save our local maxima state (this will be updated)
localMaxes <- rep(FALSE, nrow(PointCloud))
for (buff in 1:nrow(bufferedPoint#data)){
i <- i+1
bufPolygons <- bufferedPoints#polygons[[i]]
bufSpPolygons <- SpatialPolygons(list(bufPolygons))
bufSpPolygonDf <-patialPolygonsDataFrame(bufSpPolygons,bufferedPoints#data[i,])
ptsInBuffer <- which(!, spPolygonDf)))
localMax <- order(pointCloudSpdf#data$value[ptsInBuffer], decreasing=TRUE)[1]
localMaxes[localMax] <- TRUE
localMaxPointCloudDf <- pointCloudSpdf#data[localMaxes,]

options to allow heavily-weighted points on a map to overwhelm other points with low weights

what are some good kriging/interpolation idea/options that will allow heavily-weighted points to bleed over lightly-weighted points on a plotted R map?
the state of connecticut has eight counties. i found the centroid and want to plot poverty rates of each of these eight counties. three of the counties are very populated (about 1 million people) and the other five counties are sparsely populated (about 100,000 people). since the three densely-populated counties have more than 90% of the total state population, i would like those the three densely-populated counties to completely "overwhelm" the map and impact other points across the county borders.
the Krig function in the R fields package has a lot of parameters and also covariance functions that can be called, but i'm not sure where to start?
here is reproducible code to quickly produce a hard-bordered map and then three differently-weighted maps. hopefully i can just make changes to this code, but perhaps it requires something more complex like the geoRglm package? two of the three weighted maps look almost identical, despite one being 10x as weighted as the other..
edit: here's a picture example of the behavior i want-
disclaimer - I am not an expert on Krigging. Krigging is complex and takes a good understanding of the underlying data, the method and the purpose to achieve the correct result. You may wish to try to get input from #whuber [on the GIS Stack Exchange or contact him through his website (] or another expert you know.
That said, if you just want to achieve the visual effect you requested and are not using this for some sort of statistical analysis, I think there are some relatively simple solutions.
As you commented, though the suggestions below to use theta and smoothness arguments do even out the prediction surface, they apply equally to all measurements and thus do not extend the "sphere of influence" of more densely populated counties relative to less-densely populated. After further consideration, I think there are two ways to achieve this: by altering the covariance function to depend on population density or by using weights, as you have. Your weighting approach, as I wrote below, alters the error term of the krigging function. That is, it inversely scales the nugget variance.
As you can see in the semivariogram image, the nugget is essentially the y-intercept, or the error between measurements at the same location. Weights affect the nugget variance (sigma2) as sigma2/weight. Thus, greater weights mean less error at small-scale distances. This does not, however, change the shape of the semivariance function or have much effect on the range or sill.
I think that the best solution would be to have your covariance function depend on population. however, I'm not sure how to accomplish that and I don't see any arguments to Krig to do so. I tried playing with defining my own covariance function as in the Krig example, but only got errors.
Sorry I couldn't help more!
Another great resource to help understand Krigging is:
As I said in my comment, the sill and nugget values as well as the range of the semivariogram are things you can alter to affect the smoothing. By specifying weights in the call to Krig, you are altering the variance of the measurement errors. That is, in a normal use, weights are expected to be proportional to the accuracy of the measurement value so that higher weights represent more accurate measurements, essentially. This isn't actually true with your data, but it may be giving you the effect you desire.
To alter the way your data is interpolated, you can adjust two (and many more) parameters in the simple Krig call you are using: theta and smoothness. theta adjusts the semivariance range, meaning that measured points farther away contribute more to the estimates as you increase theta. Your data range is
range <- data.frame(lon=range($lon),lat=range($lat))
lon lat
2 1.383717 0.6300484
so, your measurement points vary by ~1.4 degrees lon and ~0.6 degrees lat. Thus, you can play with specifying your theta value in that range to see how that affects your result. In general, a larger theta leads to more smoothing since you are drawing from more values for each prediction.
Krig.output.wt <- Krig( cbind($lon,$lat) ,$county.poverty.rate ,
weights=c( size , 1 , 1 , 1 , 1 , size , size , 1 ),Covariance="Matern", theta=.8)
r <- interpolate(ras, Krig.output.wt)
r <- mask(r,
plot(r, col=colRamp(100) ,axes=FALSE,legend=FALSE)
title(main="Theta = 0.8", outer = FALSE)
text($lon,$lat-0.05,$NAME, cex=0.5)
Krig.output.wt <- Krig( cbind($lon,$lat) ,$county.poverty.rate ,
weights=c( size , 1 , 1 , 1 , 1 , size , size , 1 ),Covariance="Matern", theta=1.6)
r <- interpolate(ras, Krig.output.wt)
r <- mask(r,
plot(r, col=colRamp(100) ,axes=FALSE,legend=FALSE)
title(main="Theta = 1.6", outer = FALSE)
text($lon,$lat-0.05,$NAME, cex=0.5)
Adding the smoothness argument, will change the order of the function used to smooth your predictions. The default is 0.5 leading to a second-order polynomial.
Krig.output.wt <- Krig( cbind($lon,$lat) ,$county.poverty.rate ,
weights=c( size , 1 , 1 , 1 , 1 , size , size , 1 ),
Covariance="Matern", smoothness = 0.6)
r <- interpolate(ras, Krig.output.wt)
r <- mask(r,
plot(r, col=colRamp(100) ,axes=FALSE,legend=FALSE)
title(main="Theta unspecified; Smoothness = 0.6", outer = FALSE)
text($lon,$lat-0.05,$NAME, cex=0.5)
This should give you a start and some options, but you should look at the manual for fields. It is pretty well-written and explains the arguments well.
Also, if this is in any way quantitative, I would highly recommend talking to someone with significant spatial statistics know how!
Kriging is not what you want. (It is a statistical method for accurate--not distorted!--interpolation of data. It requires preliminary analysis of the data--of which you do not have anywhere near enough for this purpose--and cannot accomplish the desired map distortion.)
The example and the references to "bleed over" suggest considering an anamorph or area cartogram. This is a map which will expand and shrink the areas of the county polygons so that they reflect their relative population while retaining their shapes. The link (to the SE GIS site) explains and illustrates this idea. Although its answers are less than satisfying, a search of that site will reveal some effective solutions.
lot's of interesting comments and leads above.
I took a look at the Harvard dialect survey to get a sense for what you are trying to do first. I must say really cool maps. And before I start in on what I came up with...I've looked at your work on survey analysis before and have learned quite a few tricks. Thanks.
So my first take pretty quickly was that if you wanted to do spatial smoothing by way of kernel density estimation then you need to be thinking in terms of point process models. I'm sure there are other ways, but that's where I went.
So what I do below is grab a very generic US map and convert it into something I can use as a sampling window. Then I create random samples of points within that region, just pretend those are your centroids. After I attach random values to those points and plot it up.
I just wanted to test this conceptually, which is why I didn't go through the extra steps to grab cbsa's and also sorry for not projecting, but I think these are the fundamentals. Oh and the smoothing in the dialect study is being done over the whole country. I think. That is the author is not stratifying his smoothing procedure within I just added states at the end.
# grab us map from R maps package
usMap <- map("usa")
usIds <- usMap$names
# convert to spatial polygons so this can be used as a windo below
usMapPoly <- map2SpatialPolygons(usMap,IDs=usIds)
# just select us with no islands
usMapPoly <- usMapPoly[names(usMapPoly)=="main",]
# create a random sample of points on which to smooth over within the map
pts <- spsample(usMapPoly, n=250, type='random')
# just for a quick check of the map and sampling locations
# create values associated with points, be sure to play aroud with
# these after you get the map it's fun
vals <-rnorm(250,100,25)
valWeights <- vals/sum(vals)
ptsCords <- data.frame(pts#coords)
# create window for the point pattern object (ppp) created below
usWindow <- as.owin(usMapPoly)
# create spatial point pattern object
usPPP <- ppp(ptsCords$x,ptsCords$y,marks=vals,window=usWindow)
# create colour ramp
col <- colorRampPalette(brewer.pal(9,"Reds"))(20)
# the plots, here is where the gausian kernal density estimation magic happens
# if you want a continuous legend on one of the sides get rid of ribbon=FALSE
# and be sure to play around with sigma
example no weights:
example with my trivial weights
There is obviously a lot of work in between this and your goal of making this type of map reproducible at various levels of spatial aggregation and sample data, but good luck it seems like a cool project.
p.s. initially I did not use any weighting, but I suppose you could provide weights directly to the Smooth function. Two example maps above.
