Create neighborhood list of large dataset / fasten up - r

I want to create a weight matrix based on distance. My code for the moment looks as follows and functions for a smaller sample of the data. However, with the large dataset (569424 individuals in 24077 locations) it doesn't go through. The problem arise at the nb2blocknb fuction. So my question would be: How can I optimize my code for large datasets?
# load all survey data
DHS <- read.csv("Daten/final.csv")
attach(DHS)
# define coordinates matrix
coormat <- cbind(DHS$location, DHS$lon_s, DHS$lat_s)
coorm <- cbind(DHS$lon_s, DHS$lat_s)
colnames(coormat) <- c("location", "lon_s", "lat_s")
coo <- cbind(unique(coormat))
c <- as.data.frame(coo)
coor <- cbind(c$lon_s, c$lat_s)
# get a list with beneighbored locations thath are inbetween 50 km distance
neighbor <- dnearneigh(coor, d1 = 0, d2 = 50, row.names=c$location, longlat=TRUE, bound=c("GE", "LE"))
# get neighborhood list on individual level
nb <- nb2blocknb(neighbor, as.character(DHS$location)))
# weight matrix in list format
nbweights.lw <- nb2listw(nb, style="B", zero.policy=TRUE)
Thanks a lot for your help!

you're trying to make 1.3 e10 distance calculations. The results would be in the GB.
I think you'd want to limit either the maximum distance or the number of nearest neighbors you're looking for. Try nn2 from the RANN package:
library('RANN')
nearest_neighbours_w_distance<-nn2(coordinatesA, coordinatesB,10)
note that this operation is not symmetric (Switching coordinatesA and coordinatesB gives different results).
Also you would first have to convert your gps coordinates to a coordinate reference system in which you can calculate euclidean distances, for example UTM (code not tested):
library("sp")
gps2utm<-function(gps_coordinates_matrix,utmzone){
spdf<-SpatialPointsDataFrame(gps_coordinates_matrix[,1],gps_coordinates_matrix[,2])
proj4string(spdf) <- CRS("+proj=longlat +datum=WGS84")
return(spTransform(spdf, CRS(paste0("+proj=utm +zone=",utmzone," ellps=WGS84"))))
}

Related

How to construct a spatial adjacency matrix in R

I am new to using shapefiles in R and I was wondering if you can help me get a better understanding.
I need to create a spatial adjacency matrix W so that I can build a spatial model. W is an n x n matrix where n is the number of area polygons. The diagonal entries are wii = 0 and the off-diagonal entries wij = 1 if areas i and j share a common boundary and wij = 0 otherwise.
I know I would probably need to construct a contiguity matrix (I chose to use a queen neighborhood). But I am not sure how to further derive my spatial adjacency matrix from this.
#load relevant packages
library(sf)
library(tmap)
library(tmaptools)
library(dplyr)
#import data
mydata <- read.csv("tobago_communities.csv")
#import shapefile
mymap <-st_read("C:/Users/ndook/OneDrive/Desktop/Tobago/2011_parish_data.shp", stringsAsFactors = FALSE)
#join data and shapefile into one dataframe
map_and_data <- inner_join(mymap, mydata, by = "TGOLOC_ID")
#generate map
tm_shape(map_and_data) + tm_polygons("Unemployment")
#specify queen neighborhood
queen_tobago.nb <- poly2nb(mymap)
So I'm assuming the queen neighborhood would somehow be relevant to getting the spatial adjacency matrix but I am stuck at this point. Any further suggestions would be greatly appreciated.
The poly2nb function does generate a neighborhood list. Note that you need to call the option queen=T if you want queen neighborhood.
Some R packages expect a list representation of the spatial matrix, others might want a matrix form. The nb2listw function turns the neighborhood list into a list of spatial weights.
With the nb2mat function, you get a matrix representation that you are probably looking for (https://rdrr.io/rforge/spdep/man/nb2mat.html).

Calculating the distance between points in R

I looked through the questions that been asked but dealing with coordinates but couldn't find something can help me out with my problem.
I have dataset that contain ID, Speed, Time , List of Latitude & Longitude. ( dataset can be found in the link)
https://drive.google.com/file/d/1MJUvM5WEhua7Rt0lufCyugBdGSKaHMGZ/view?usp=sharing
I want to measure the distance between each point of Latitude & Longitude.
For example;
Latitude has: x1 ,x2 ,x3 ,...x1000
Longitude has: y1 ,y2 ,y3 ,..., y100
I want to measure the distance between (x1,y1) to all the points , and (x2,y2) to all the points, and so on.
The reason I'm doing this to know which point close to which and assign index to each location based on the distance.
if (x1, y1) is close to (x4,y4) so (x1, y1) will get the index A for example and (x4,y4) will get labeled as B. sort the points in order based on distance.
I tried gDistance function but showed error message: "package ‘gDistance’ is not available (for R version 3.4.3)"
and if I change the version to 3.3 library(rgeos) won't work !!
Any suggestions?
here's what I tried,
#requiring necessary packages:
library(sp) # vector data
library(rgeos) # geometry ops
#Read the data and transform them to spatial objects
d <- read.csv("ReadyData.csv")
sp.ReadData <- d
coordinates(sp.ReadyData) <- ~Longitude + Latitude
d <- gDistance(sp.ReadyData, byid= TRUE)
here's update my solution, I created spatial object and made spatial data frame as follow:
#Create spatial object:
lonlat <- cbind(spatial$Longitude, spatial$Latitude)
#Create a SpatialPoints object:
library(sp)
pts <- SpatialPoints(lonlat)
crdref <- CRS('+proj=longlat +datum=WGS84')
pts <- SpatialPoints(lonlat, proj4string=crdref)
# make spatial data frame
ptsdf <- SpatialPointsDataFrame(pts, data=spatial)
Now I'm trying to measure the Distance for longitude/latitude coordinates. I tried dist method but seems not working for me and tried pointDistance method:
gdis <- pointDistance(pts, lonlat=TRUE)
still not clear for me how this function can measure the distance, I need to figure out the distance so I can locate the point in the middle and assign numbers for each point based on its location from the middle point..
You can use raster::pointDistance or geosphere::distm among others functions.
Part of your example data (please avoid files in your questions):
d <- read.table(sep=",", text='
"OBU ID","Time Received","Speed","Latitude","Longitude"
"1",20,1479171686325,0,38.929596,-77.2478813
"2",20,1479171686341,0,38.929596,-77.2478813
"3",20,1479171698485,1.5,38.9295887,-77.2478945
"4",20,1479171704373,1,38.9295048,-77.247922
"5",20,1479171710373,0,38.9294865,-77.2479055
"6",20,1479171710373,0,38.9294865,-77.2479055
"7",20,1479171710373,0,38.9294865,-77.2479055
"8",20,1479171716373,2,38.9294773,-77.2478712
"9",20,1479171716374,2,38.9294773,-77.2478712
"10",20,1479171722373,1.32,38.9294773,-77.2477417')
Solution:
library(raster)
m <- pointDistance(d[, c("Longitude", "Latitude")], lonlat=TRUE)
To get the nearest point to each point, you can do
mm <- as.matrix(as.dist(m))
diag(mm) <- NA
i <- apply(mm, 1, which.min)
The point pairs
p <- cbind(1:nrow(mm), i)
To get the distances, you can do:
mm[p]
Or do this:
apply(mm, 1, min, na.rm=TRUE)
Note that rgeos::gDistance is for planar data, not for longitude/latitude data.
Here is a similar question/answer with some illustration.
our data set is too large to make a single distance matrix. You can process your data in chunks to with that. Here I am showing that with a rather small chunk size of 4 rows. Make this number much bigger to speed up processing time.
library(geosphere)
chunk <- 4 # rows
start <- seq(1, nrow(d), chunk)
end <- c(start[-1], nrow(d))
x <- d[, c("Longitude", "Latitude")]
r <- list()
for (i in 1:length(start)) {
y <- x[start[i]:end[i], , drop=FALSE]
m <- distm(y, x)
m[cbind(1:nrow(m), start[i]:end)] <- NA
r[[i]] <- apply(m, 1, which.min)
}
r <- unlist(r)
r
# [1] 2 1 1 5 6 6 5 5 9 8 8 8
So for your data:
d <- read.csv("ReadyData.csv")
chunk <- 100 # rows
# etc
This will take a long time.
An alternative approach:
library(spdep)
x <- as.matrix(d[, c("Longitude", "Latitude")])
k <- as.vector(knearneigh(x, k=1, longlat=TRUE)$nn)
Assuming you have p1 as spatialpoints of x and p2 as spatialpoints of y, to get the index of the nearest other point:
ReadyData$cloDist <- apply(gDistance(p1, p2, byid=TRUE), 1, which.min)
If you have the same coordinate in the list you will get an index of the point itself since the closest place to itself is itself. An easy trick to avoid that is to use the second farthest distance as reference with a quick function:
f_which.min <- function(vec, idx) sort(vec, index.return = TRUE)$ix[idx]
ReadyData$cloDist2 <- apply(gDistance(p1, p2, byid=TRUE), 1, f_which.min,
idx = 2)

Create buffer around spatial point data in R and count how many points are in the buffer

I am trying to analyze spatial density of gas station points using R. I need to create a buffer (circle) around the gas stations and count the number of gas stations within the buffer. I'll then need to play around with buffer distances to see what's a reasonable buffer to see something interesting. These are the files I am working with: https://dl.dropboxusercontent.com/u/45095175/sbc_gas.shp; https://dl.dropboxusercontent.com/u/45095175/sbc_gas.shx; https://dl.dropboxusercontent.com/u/45095175/sbc_gas.dbf
# Install packages
x <- c("ggmap", "rgdal", "rgeos", "maptools", "ks")
lapply(x, library, character.only = TRUE)
all <- readShapePoints("sbc_gas.shp")
all.df <- as(all, "data.frame")
locs <- subset(all.df, select = c("OBJECTID", "Latitude", "Longitude"))
head(locs) # a simple data frame with coordinates
coordinates(locs) <- c("Longitude", "Latitude") # set spatial coordinates
plot(locs)
Any help greatly appreciated!!
We cannot use your data as provided because the .shp file alone is not enough. At minimum, you must also provide the .shx and the .dbf file to be able to load this data.
However, something that should work is to get the package geosphere. It contains a function called distGeo. You can use it to get the distance from each gas station to all other stations. From the distance matrix, you should be able to select all stations within a specified distance.
I found an answer to my question: fivekm <- cbind(coordinates(locs), X=rowSums(distm (coordinates(locs)[,1:2], fun = distHaversine) / 1000 <= 5)) # number of points within 5 km

In R, how to average spatial points data over spatial grid squares

Managed to solve problem now
I have a set of around 50 thousand points that have coordinates and one value associated with them. I would like to be able to place points into a grid averaging the associated value of all points that fall into a grid square. So I want to end up with an object that identifies each grid square and gives the average inside the grid square.
I have the data in a spatial points data frame and a spatial grid object if that helps.
Improving answer: I have definitely done some searching, sorry about the initial state of the question I had only managed to frame the question inside my own head; hadn't had to communicate it to anyone else before...
Here is example data that hopefully illustrates the problem more clearly
##make some data
longi <- runif(100,0,10)
lati <- runif(100,0,10)
value <- runif(500,20,30)
##put in data frame then change to spatial data frame
df <- data.frame("lon"=longi,"lat"=lati,"val"=value)
coordinates(df) <- c("lon","lat")
proj4string(df) <- CRS("+proj=longlat")
##create a grid that bounds the data
grd <- GridTopology(cellcentre.offset=bbox(df)[,1],
cellsize=c(1,1),cells.dim=c(11,11))
sg <- SpatialGrid(grd)
Then I hope to get an object albeit a vector/data frame/list that gives me the average of value in each grid cell/square and some way of identifying which cell it is.
Solution
##convert the grid into a polygon##
polys <- as.SpatialPolygons.GridTopology(grd)
proj4string(polys) <- CRS("+proj=longlat")
##can now use the function over to select the correct points and average them
results <- rep(0, length(polys))
for(i in 1:length(polys)) {
results[i] = mean(df$val[which(!is.na(over(x=df,y=polys[i])))])
}
My question now is if this is the best way to do it or is there a more efficient way?
Your description is vague at best. Please try to ask more specific answers preferably, with code illustrating what you have already tried. Averaging a single value in your point data or a single raster cell makes absolutely no sense.
The best guess at an answer I can provide is to use raster extract() to assign the raster values to a sp point object and then use tapply() to aggregate the values to your grouping values in the points. You can use the coordinates of the points to identify cell location or alternately, the cellnumbers returned from extract (per below example).
require(raster)
require(sp)
# Create example data
r <- raster(ncol=500, nrow=500)
r[] <- runif(ncell(r))
pts <- sampleRandom(r, 100, sp=TRUE)
# Add a grouping value to points
pts#data <- data.frame(ID=rownames(pts#data), group=c( rep(1,25),rep(2,25),
rep(3,25),rep(4,25)) )
# Extract raster values and add to #data slot dataframe. Note, the "cells"
# attribute indicates the cell index in the raster.
pts#data <- data.frame(pts#data, extract(r, pts, cellnumbers=TRUE))
head(pts#data)
# Use tapply to cal group means
tapply(pts#data$layer, pts#data$group, FUN=mean)

Export R plot to shapefile

I am fairly new to R, but not to ArcView. I am plotting some two-mode data, and want to convert the plot to a shapefile. Specifically, I would like to convert the vertices and the edges, if possible, so that I can get the same plot to display in ArcView, along with the attributes.
I've installed the package "shapefiles", and I see the convert.to.shapefile command, but the help doesn't talk about how to assign XY coords to the vertices.
Thank you,
Tim
Ok, I'm making a couple of assumptions here, but I read the question as you're looking to assign spatial coordinates to a bipartite graph and export both the vertices and edges as point shapefiles and polylines for use in ArcGIS.
This solution is a little kludgey, but will make shapefiles with coordinate limits xmin, ymin and xmax, ymax of -0.5 and +0.5. It will be up to you to decide on the graph layout algorithm (e.g. Kamada-Kawai), and project the shapefiles in the desired coordinate system once the shapefiles are in ArcGIS as per #gsk3's suggestion. Additional attributes for the vertices and edges can be added where the points.data and edge.data data frames are created.
library(igraph)
library(shapefiles)
# Create dummy incidence matrix
inc <- matrix(sample(0:1, 15, repl=TRUE), 3, 5)
colnames(inc) <- c(1:5) # Person ID
rownames(inc) <- letters[1:3] # Event
# Create bipartite graph
g.bipartite <- graph.incidence(inc, mode="in", add.names=TRUE)
# Plot figure to get xy coordinates for vertices
tk <- tkplot(g.bipartite, canvas.width=500, canvas.height=500)
tkcoords <- tkplot.getcoords(1, norm=TRUE) # Get coordinates of nodes centered on 0 with +/-0.5 for max and min values
# Create point shapefile for nodes
n.points <- nrow(tkcoords)
points.attr <- data.frame(Id=1:n.points, X=tkcoords[,1], Y=tkcoords[,2])
points.data <- data.frame(Id=points.attr$Id, Name=paste("Vertex", 1:n.points, sep=""))
points.shp <- convert.to.shapefile(points.attr, points.data, "Id", 1)
write.shapefile(points.shp, "~/Desktop/points", arcgis=TRUE)
# Create polylines for edges in this example from incidence matrix
n.edges <- sum(inc) # number of edges based on incidence matrix
Id <- rep(1:n.edges,each=2) # Generate Id number for edges.
From.nodes <- g.bipartite[[4]]+1 # Get position of "From" vertices in incidence matrix
To.nodes <- g.bipartite[[3]]-max(From.nodes)+1 # Get position of "To" vertices in incidence matrix
# Generate index where position alternates between "From.node" to "To.node"
node.index <- matrix(t(matrix(c(From.nodes, To.nodes), ncol=2)))
edge.attr <- data.frame(Id, X=tkcoords[node.index, 1], Y=tkcoords[node.index, 2])
edge.data <- data.frame(Id=1:n.edges, Name=paste("Edge", 1:n.edges, sep=""))
edge.shp <- convert.to.shapefile(edge.attr, edge.data, "Id", 3)
write.shapefile(edge.shp, "~/Desktop/edges", arcgis=TRUE)
Hope this helps.
I'm going to take a stab at this based on a wild guess as to what your data looks like.
Basically you'll want to coerce the data into a data.frame with two columns containing the x and y coordinates (or lat/long, or whatever).
library(sp)
data(meuse.grid)
class(meuse.grid)
coordinates(meuse.grid) <- ~x+y
class(meuse.grid)
Once you have it as a SpatialPointsDataFrame, sp provides some decent functionality, including exporting shapefiles:
writePointsShape(meuse.grid,"/home/myfiles/wherever/myshape.shp")
Relevant help files examples are drawn from:
coordinates
SpatialPointsDataFrame
readShapePoints
At least a few years ago when I last used sp, it was great about projection and very bad about writing projection information to the shapefile. So it's best to leave the coordinates untransformed and manually tell Arc what projection it is. Or use writeOGR rather than writePointsShape.

Resources