R - Best Way to Perform Geospatial Calculations - r

I am working on a project where I pull crime data from an API, and essentially calculate the density of crime per predefined grid unit. I do this now by putting lat and lon into a data.frame and then calculating the count of points within a radius of a point center. This is computationally expensive as there are thousands of points in the predefined grid and thousands of crime points.
I'm wondering if there is a better way to calculate crime density; I've heard that raster may be valuable?
Some sample data:
# Create a predefined grid of coordinates
predef.grid <- data.frame(lat = seq(from = 2.0, to = 4.0, by = 0.1),lon = seq(from = 19.0, to = 21.0, by = 0.1))
predef.grid <- expand.grid(predef.grid)
# Create random sample of crime incidents
crime.incidents <- data.frame(lat = rnorm(10, 4),lon = rnorm(10,20))
crime.incidents <- expand.grid(mydata)
# Need to count number of crimes within radius of every point in predef.grid
Thanks!

# Need to count number of crimes within radius of every point in
library(raster)
library(sp)
# predfined raster
predef.grid <- raster(xmn=2, # xmin
ymn=4, # ymin
xmx=19, # xmax
ymx=21, # ymax
res=1, # spatial resolution
vals = 1) # cell value
plot(predef.grid)
# Create random sample of crime incidents
# points should be a Spatial object of some form, point, etc.
crime.incidents <- spsample(x = as(extent(predef.grid), 'SpatialPolygons'),
n = 100,
type = 'random')
# plot points over grid
points(crime.incidents, pch = 20)
# count points per cell
density <- rasterize(crime.incidents, predef.grid, fun='count')
# plot the density
plot(density)

Related

Generating GPS Coordinates / creating a raster of evenly distributed points 1km apart

i would like to cover an area defined by a bbox lat long coordinates with a raster of gps points 1 km apart. Currently i generate 2000 points for a bboxbbox=8.9771580802,47.2703623267,13.8350427083,50.5644529365 the following way:
as.data.frame(cbind(runif(2000,8.9771580802 ,13.8350427083),runif(2000,47.2703623267,50.5644529365)))
Since runif is a normal distribution, i think i just have to increase the amount of points to cover the whole area the way i need it.
Is there a more clever way to do it?
How many points would i need?
UPDATE
I thought i maybe can use the package sp to do the job but im still not realy familiar the with settings:
longitudes <- c(8.9771580802, 13.8350427083)
latitudes <- c(47.2703623267, 50.5644529365)
bounding_box <- matrix(c(longitudes, latitudes), nrow = 2, byrow = TRUE, dimnames = list(NULL, c("min", "max")))
projection <- "+proj=longlat"
sp_box<-Spatial(bbox = bounding_box, proj4string = CRS(projection))
p_sample<-spsample(sp_box, 10000, type="regular")
If i understand correctly this will give me a number of points evenly distributed within my coordinates. spsample has an option for cell size but i dont grasp it yet.
BR
Andreas
I'm not too much into spatial data and analysis but maybe it helps as a first step (I took different coordinates to get a reproducible example, which fits the dimensions of Germany where I have some feelings for the dimensions). I'm sure that there is a more elegant way but it should give you what you need. geosphere::destPoint() is ued to compute the points for a given distance and direction, geosphere::distGeo() computes the north-south/west-east distance of the given box to compute how many points we need for each direction. expand_grid() is then used to compute every combination for the computed border points.
Be also aware that I changed the distance between the points to 10,000 meters or 10 km to get fewer points and a nicer plot. You would have to change the numbers accordingly
nw <- c(5.8 55)
se <- c(15.1, 47)
lon1 <- nw[1]
lat1 <- nw[2]
lon2 <- se[1]
lat2 <- se[2]
#(1) compute the border points in y direction, going south from the nw-point
# while keeping lon constant
lat <- geosphere::destPoint(nw, 180, 1:floor(geosphere::distGeo(c(lon1,lat1),
c(lon1,lat2))/10000)*10000)
lat <- as_tibble(lat)
#(2) compute the border point in x direction (analog to above)
lon <- geosphere::destPoint(nw, 90, 1:floor(geosphere::distGeo(c(lon1,lat1),
c(lon2,lat1))/10000)*10000)
lon <- as_tibble(lon)
# use expand_grid() to compute all combinations
grid <- tidyr::expand_grid(lat$lat, lon$lon)
names(grid) <- c("lat", "lon") #nicer names
### for visualizing what we've done, map germany with a grid overlay
germany <- rnaturalearth::ne_countries(type = "countries",
country = "germany", returnclass = "sf")
ggplot2::ggplot(data = germany)+
ggplot2::geom_sf()+
ggplot2::geom_point(data = grid, mapping = aes(x = lon, y = lat), size = 0.01)

How to efficiently count the number of spatial points within a certain distance around raster cells in R?

I would like to count the number of spatial points (of a SpatialPointsDataFrame object) within a certain distance to every cell of a RasterLayer in R. The resulting value should replace the original value of that particular raster cell.
Here is a reproducible example:
# load library
library(raster)
# generate raster
ras <- raster(nrow=18, ncol=36)
values(ras) <- NA
# create SpatialPointsDataFrame
x <- c(-160,-155,-153,-150, 30, -45, -44, -42, -40, 100, 110, 130)
y <- c(-75,-73,-71,-60, 0, 30, 35, 40, 41, 10, -10, 60)
z <- c(seq(1, 12, 1))
df <- data.frame(x,y,z)
spdf <- SpatialPointsDataFrame(coords=df[,c(1,2)],
data=as.data.frame(df[,3]),
proj4string=CRS("+proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0"))
# visualize
plot(ras)
plot(spdf, add=T)
# loop over all raster cells
for(r in 1:nrow(ras)){
for(c in 1:ncol(ras)){
# duplicate raster for subsequent modification
ras_x <- ras
# define cell for which to count the number of surrounding points
ras_x[r,c] <- nrow(spdf) # some value that is impossible to be true, this is only a temporary placeholder
ras_x[ras_x != nrow(spdf)] <- NA
# convert raster cell to spatial point
spatial_point <- rasterToPoints(ras_x, spatial=T)
# calculate distance around raster cell
ras_dist <- distanceFromPoints(ras_x, spatial_point)
ras_dist <- ras_dist / 1000000 # scale values
# define circular zone by setting distance threshold (raster only with values 1 or NA)
ras_dist[ras_dist > 2] <- NA
ras_dist[ras_dist <= 2] <- 1
# create empty vector to count number of spatial points located within zone around the particular raster cell
empty_vec <- c()
# loop to check which value every point of SpatialPointsDataFrame corresponds to
for (i in 1:nrow(spdf)){
point <- extract(ras_dist, spdf[i,])
empty_vec[i] <- point
}
# sum of resulting vector is the number of points within surrounding zone around predefined raster cell
val <- sum(na.omit(empty_vec))
val
ras[r,c] <- val
# print for progress monitoring
print(paste0("sum of points within radius around cell row ", r, " and column ", c, " is ", val))
print(paste0("finished ", r, " out of ", nrow(ras)))
print(paste0("finished ", c, " out of ", ncol(ras)))
# both plots are just for visualization and progress monitoring
plot(ras)
plot(spdf, add=T)
}
}
plot(ras)
plot(spdf, add=T)
The resulting raster is exactly what I want but my way of checking the underlying raster values for each point of the SpatialPointsDataFrame seems inefficient. My real data consists of a RasterLayer with 2160, 4320, 9331200 (nrow, ncol, ncell) and a SpatialPointsDataFrame with 2664 features.
Is there a way to generate the raster of simply counting how many points are located within a certain distance around every raster cell more efficiently?
If you can work with projected coordinates this can be done fairly easily with the spatstat package.
This requires you to project your points (and grid) with e.g. sf::st_transform() and will not work
on a global scale.
Load spatstat and make 2000 random points to test against:
library(spatstat)
W <- square(1)
set.seed(42)
Y <- runifpoint(2000) # Random points in the unit square
plot(Y, main = "Random points in unit square")
Make 3000x3000 grid of points (9 million points):
xy <- gridcenters(W, 3000, 3000) # Grid of points in the unit square
X <- ppp(xy$x, xy$y, window = W, check = FALSE, checkdup = FALSE)
For each of the 9 million grid points count the number of other points within
radius 0.01 (timed on my resonably fast laptop with 16GB RAM):
system.time(counts <- crosspaircounts(X, Y, r = .01))
#> user system elapsed
#> 1.700 0.228 1.928
Convert to spatstat’s im-format (raster type format – can be converted with maptools) and plot:
rslt <- as.im(data.frame(x = xy$x, y = xy$y, counts))
plot(rslt, main = "Point counts in raster cells")
The points overlayed on the counts shows that we have done the right thing:
plot(rslt, main = "Point counts in raster cells")
plot(Y, add = TRUE, col = rgb(1,1,1,.7), pch = 3)
I’m sure you can also do something elegant and fast with raster, but I’m not the right one to ask there.

Minimum distance between two sets of co-ordinates

I have two data frames of different sizes containing geocodes. The first (df) has 12,000 observations and the second (schools) 3,000.
The first contains geocodes for properties in a country and the second for schools in the country.
I want to find the distance of the nearest school for each property. Using the geosphere package I'm currently working with the following:
library(geosphere)
for(i in 1:length(df$longitude)){
df$dist2[i] <- distm(c(schools[1, 3], schools[1, 2]), c(df$longitude[i], df$latitude[i]), fun = distHaversine) *0.001
}
where schools[, 3] and schools[, 2] are the longitude and latitude columns of that data frame respectively.
The above calculates the distance (in km) between all observations in df and the first school in schools.
I want to calculate the distance between each observation and all schools, saving only the smallest distance as that value for df$dist2[i].
In the following example, I make up the longitude/latitude data on the points and the school.
library(tidyverse)
library(geosphere)
df_points <- data.frame(lon = rnorm(10, mean =4, sd = 0.5), lat = rnorm(10, mean = 50, sd= 0.1))
df_schools <- data.frame(lon = rnorm( 3, mean =4, sd = 0.5), lat = rnorm( 3, mean = 50, sd= 0.1))
distm(df_points, df_schools, fun = distHaversine ) %>%
as.data.frame() %>%
rownames_to_column(var = "point_id") %>%
mutate(point_id = as.numeric(point_id)) %>%
gather(key = school, value = distance, -point_id) %>%
group_by(point_id) %>%
summarise(smalles_distance = min(distance))
Here is an approach using sp class objects. You can coerce data.frame objects to SpatialPointsDataFrame objects using something along the lines of: coordinates(x) <- ~lon+lat The idea here is to derive a distance matrix between the two point feature classes and then pull the distance and ID based on the column name (assigned from the school data). This not only returns the distance but also a unique identifier for each school making it easy to query the actual closest school to any given property feature.
First, add the required libraries and create some example data.
library(sp)
library(raster)
e <- as(raster::extent(-180, 180, -90, 90), "SpatialPolygons")
properties <- spsample(e, 1000, type="random")
proj4string(properties) <- "+proj=longlat +ellps=WGS84"
schools <- spsample(e, 100, type="random")
proj4string(schools) <- "+proj=longlat +ellps=WGS84"
schools$ids <- paste0("school", 1:length(schools))
Now, we can create the distance matrix, assign the diagonal to NA and add a unique identifier from schools to the column names of the matrix.
d <- spDists(x = properties, y = schools, longlat = TRUE)
diag(d) <- NA
colnames(d) <- schools$ids
There are certainly more elegant ways to do this but, for simplicity sake we will use a for loop to populate two vectors representing distance and ID's. We use which.min to pull the index for the minimum distance at row i. The iterator is based on the matrix rows because they represent the property features.
sdist <- rep(NA, nrow(d))
sid <- rep(NA, nrow(d))
for(i in 1:nrow(d)) {
srow <- d[i,]
sdist[i] <- srow[which.min(srow)]
sid[i] <- names(srow)[which.min(srow)]
}
We can then assign the resulting vectors to the properties SpatialPointsDataFrame. Now we have columns in the #data slot data.frame that represent distance to nearest school as well as the school ID's.
properties$school <- sid
properties$dist <- sdist
Here we can plot the results.
par(mfrow=c(2,1))
plot(properties, pch=19, cex=0.5)
plot(schools, pch=19, col="red", add=TRUE)
plot(e, add=TRUE)
title("random properties (black) and schools (red)", cex=0.5)
plot(properties, col="white")
plot(properties[1,], pch=19, cex=2, add=TRUE)
plot(schools[which(schools$ids %in% properties[1,]$school),],
pch=19, cex=2, col="red", add=TRUE)
plot(e, add=TRUE)
title("Property 1 (black) and closest school (red)", cex=0.5)
sidx <- which(schools$ids %in% properties[1,]$school)
text(coordinates(schools[sidx,]),
label = schools[sidx,]$ids, col="blue", cex=1)

Proximity Analysis in R

I have a set of GPS coordinates in R that I want to treat as an "exposure" for another set of GPS coordinates corresponding to patients of interest. How do I do proximity analysis in R to separate patients into two groups: those within x meters of an exposure coordinate and those further away?
I think something like this should work:
library(sp)
exp <- data.frame(lat= 40.741895,long = -73.989308)
patients <- data.frame(lat = rnorm(10,exp$lat,0.1),long = rnorm(10,exp$long,0.1))
coordinates(patients) <- ~ long + lat
coordinates(exp) <- ~ long + lat
d <- spDistsN1(coordinates(patients),coordinates(exp),longlat = TRUE)
Strongly inspired from: Calculate distance from GPS data

Latitude and Longitude, x and y, and differences when plotting them: geosphere and ggmap

I'm using latitude and longitude data and trying to graphically display metrics at each point of a map of Stockholm (based on proximity to that point). I am more interested in the points being equally spaced on the image, rather than equally spaced in actual distance: in this sense, I understand that distances between latitude points at the equator are longer than they are along the polar circles, and this might be a crucial to the question.
My goal was to divide up the map into a grid of approximately 1 km increments in both the x and y direction. As such, I took the minimum and maximum latitude and longitude, calculated their x and y distances from the centre of Stockholm, and then divided the span of the latitudes and longitudes by the span of the x and y coordinates (using geosphere). I did this because I wanted the points to be equally spaced when plotting them (otherwise, there would be less distance between x points at the top compared to the bottom of the map due to proximity to the equator).
I then plotted these points on the map (using ggmap), and observed that there is more distance between points in the y direction than the x direction. I suppose the map could be simply drawn in a distorted manner, but it seems a little bit too distorted to believe. I suspect I might be doing something wrong, but can't find what it might be.
Code example below:
library("ggmap")
library("RgoogleMaps")
library("geosphere")
stockholm <- get_map("stockholm", zoom=11)
ggmap(stockholm)
places <- c('Tensta', 'Hanviken')
pos <- data.frame(Places = places, lat = NA, lon = NA, x = NA, y = NA)
reflatlon = getGeoCode('Stockholm, Sweden')
for(i in 1:length(places)) {
latlon <- getGeoCode(paste0(places[i], ', Stockholm'))
pos$lat[i] <- as.numeric(latlon[1])
pos$lon[i] <- as.numeric(latlon[2])
dist_y <- distGeo(c(latlon[1], reflatlon[2]), reflatlon) * sign(latlon[1] - reflatlon[1]) # same longitude
dist_x <- distGeo(c(reflatlon[1], latlon[2]), reflatlon) * sign(latlon[2] - reflatlon[2]) # same latitude
pos$x[i] <- dist_x
pos$y[i] <- dist_y
}
deglatperm <- ( max(pos$lat) - min(pos$lat) ) / ( max(pos$y) - min(pos$y) ) # degrees latitude per metre
deglonperm <- ( max(pos$lon) - min(pos$lon) ) / ( max(pos$x) - min(pos$x) ) # degrees longitude per metre
seqlat <- seq(min(pos$lat), max(pos$lat), by = deglatperm*1000) # sequence with a point every ~1km
seqlon <- seq(min(pos$lon), max(pos$lon), by = deglonperm*1000) # sequence with a point every ~1km
seqlatlon <- expand.grid(seqlat, seqlon)
names(seqlatlon) <- c('lat', 'lon')
ggmap(stockholm) + geom_point(aes(x = lon, y = lat), data=seqlatlon)
output plot
As you can see from the output plot, there is at least twice as much distance between points in the y direction as compared to the x direction.
To summarise: the x and y coordinates are obtained using geosphere. The map is plotted using ggmap.
Am I doing something wrong with geosphere? Or are maps of latitude and longitude SO distorted? When I open Google Maps, and use the "measure distance" tool approximately between top and bottom, and left and right points, I get estimates of 16.3 and 16.9 km, whereas the values I get with geosphere are 17 and 32km (x and y) respectively.
If someone could tell me what's going on here, I'd be extremely thankful!
I try to avoid working in any coordinate system the uses degrees instead of distances. In the US I use our State Plane system constantly. It appears that Sweden uses the RT system. Once you get your coordinates out of degrees and into distances from a datum, then you can build your grid using more conventional distances. From there you can put your coordinates back into degrees if you like.
I use the spTranform function for coordinate conversions and use I the Spatial Reference guide to get the reference codes for the coordinate systems.
library("ggmap")
library("RgoogleMaps")
library("geosphere")
library("sp")
stockholm <- get_map("stockholm", zoom=11)
ggmap(stockholm)
places <- c('Tensta', 'Hanviken')
pos <- data.frame(Places = places, lat = NA, lon = NA)
reflatlon <- getGeoCode('Stockholm, Sweden')
for(i in 1:length(places)) {
latlon <- getGeoCode(paste0(places[i], ', Stockholm'))
pos$lat[i] <- as.numeric(latlon[1])
pos$lon[i] <- as.numeric(latlon[2])
}
p <- SpatialPoints(data.frame(pos$lon, pos$lat), proj4string = CRS("+init=epsg:4326"))
p <- spTransform(p, CRS("+init=epsg:3022"))
seqx <- seq(min(p#coords[,1]), max(p#coords[,1]), by = 1000)
seqy <- seq(min(p#coords[,2]), max(p#coords[,2]), by = 1000)
pgrid <- expand.grid(seqx, seqy)
pgrid <- SpatialPoints(pgrid, proj4string = CRS("+init=epsg:3022"))
pgrid <- spTransform(pgrid, CRS("+init=epsg:4326"))
pgrid <- data.frame(pgrid#coords)
names(pgrid) <- c('lon', 'lat')
ggmap(stockholm) + geom_point(aes(x = lon, y = lat), data=pgrid)

Resources