Join SpatialPointsDataFrame and SpatialLinesDataFrame using over(), R - r

I am stuck with something that it seems should be quite straightforward. Apologies, I am new to using spatial data in R.
I am trying to map city data, onto a map of the world's coastlines. I have taken the coastlines from the natural earth data set (https://www.naturalearthdata.com/downloads/) 1:110m data and generated the spatial lines dataframe:
coast_rough_sldf
class : SpatialLinesDataFrame
features : 134
extent : -180, 180, -85.60904, 83.64513 (xmin, xmax, ymin, ymax)
coord. ref. : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0
variables : 3
names : scalerank, featurecla, min_zoom
min values : 0, Coastline, 0.0
max values : 1, Country, 1.5
I further have a dataset of cities, a sample of which looks as follows:
city_coast <- data.frame(Latitude = c(-34.60842, -34.47083, -34.55848, -34.76200, -34.79658, -34.66850),
Longitude = c(-58.37316, -58.52861, -58.73540, -58.21130, -58.27601, -58.72825),
Name1 = c("Buenos Aires", "San Isidro", "San Miguel", "Berazategui", "Florencio Varela", "Merlo"),
distance = c(7970.091, 5313.518, 26156.700, 11670.274, 18409.738, 33880.259))
city_coast
Latitude Longitude Name1 distance
1 -34.60842 -58.37316 Buenos Aires 7970.091
2 -34.47083 -58.52861 San Isidro 5313.518
3 -34.55848 -58.73540 San Miguel 26156.700
4 -34.76200 -58.21130 Berazategui 11670.274
5 -34.79658 -58.27601 Florencio Varela 18409.738
6 -34.66850 -58.72825 Merlo 33880.259
I then successfully create the spatial points dataframe:
city_spdf <- SpatialPointsDataFrame(coords = select(city_coast, c("Longitude", "Latitude")),
proj4string = CRS("+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84"),
data = select(city_coast, c("Name1", "distance")))
city_spdf
class : SpatialPointsDataFrame
features : 6
extent : -58.7354, -58.2113, -34.79658, -34.47083 (xmin, xmax, ymin, ymax)
coord. ref. : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0
variables : 2
names : Name1, distance
min values : Berazategui, 5313.518
max values : San Miguel, 33880.259
Now i want to join the city_spdf with the coast_sldf, so that i can plot them using tmap. Looking at tutorials it seems that i should use over():
city_coast_shp <- over(coast_rough_sldf, city_spdf)
city_coast_shp
Name1 distance
1 <NA> NA
Which is clearly wrong. Switching the order of the objects changes things but still doesn't give me what i need.
Can anyone tell me what i am not getting right with this over function? Every example i have seen simply has people joining the two spatial objects. Apologies if i am missing something extremely simple.

Like #elmuertefurioso point out in the comments, I think a one reason this isn't working how you expect is because of confusion of types of geometries.
Since the coastline data is lines, and not polygons like data(World) from tmap, you are restricted a bit in the calculations and comparisons you can make with cities, which is points.
Reading in the data the sf way:
library(sf)
# downloaded from https://www.naturalearthdata.com/http//www.naturalearthdata.com/download/110m/physical/ne_110m_coastline.zip
coastline <- read_sf("~/Downloads/ne_110m_coastline/ne_110m_coastline.shp")
cities <- data.frame(
Latitude = c(-34.60842, -34.47083, -34.55848, -34.76200, -34.79658, -34.66850),
Longitude = c(-58.37316, -58.52861, -58.73540, -58.21130, -58.27601, -58.72825),
Name1 = c("Buenos Aires", "San Isidro", "San Miguel", "Berazategui", "Florencio Varela", "Merlo"),
distance = c(7970.091, 5313.518, 26156.700, 11670.274, 18409.738, 33880.259)
)
In order to do any comparisons between sf objects they must have the same Coordinate Reference System. So as we read in cities we will set the CRS to be that of coastline.
cities <- st_as_sf(
cities,
coords = c("Longitude", "Latitude"), # must be x, y order
crs = st_crs(coastline) # must be equivilant between objects
)
Now you can make comparisons using the st_{comparison}() family of functions.
The function over() and its sf counterpart st_intersects() would work on a set of points and polygons, but we don't have that here. We can use distance functions like st_nearest_feature() with points and lines, to get the closest geometry from coastline for each city.
st_nearest_feature(cities, coastline)
It returns the row index for the nearest geometry in coastlines which happens to be the same for all the cities here because they are all in Argentina. The order matters in the function because it defines the question being asked If we flipped it to st_nearest_feature(coastline, cities)it would return the closest city for each geometry in coastline, so the return would have 134 elements.
All that to say you don't actually have to do any joining or comparisons to plot your points together on the same tmap.
library(tmap)
tmap_mode("view")
tm_shape(coastline) +
tm_lines() +
tm_shape(cities) +
tm_bubbles("distance")
I'm not a tmap user but I just zoomed in snapped this screen shot to show its working.

Related

How to calculate area of polygon that does not overlap with other polygons in R?

I have a SpatialPoints object with the coordinates for several points of interest.
I also have several shapefiles (polygons) with information about the presence of slums. The polygons with info about slums in each of these shapefiles can overlap (they provide somewhat the same information about the presence of slums, but come from different sources).
For each of the points in my SpatialPoints object, I have used the function spCircle to create a circular polygon around each point. What I need to do next is to check what percentage of the area of the circular polygon contains slums. If any of the shapefiles indicates that the slum is present, I will consider that there is a slum in the area.
I have created the following image to help explain my issue. The circles represent the polygon around a single point. For this single point, each of the four shapefiles indicates that the slum is present in somewhat different areas (sometimes they overlap, and sometimes they do not). I want to be able to find the red area (where none of the shapefiles indicate the presence of the slum, and then calculate the percentage of the circle that has slums.
The following code is an attempt to do that:
# Create data with coordinates
lat = c(-22.868879628748203,-22.88511,-22.82166,-22.89692,-22.67945)
long = c(-43.237195000177564,-43.34278,-43.04717,-43.35168,-43.59667)
data_points = cbind.data.frame(lat,long)
coordinates(data_points) = c("lat","long")
proj4string(data_points) = CRS("+init=epsg:4326")
# Transform projection of points to UTM
utmStr <- "+proj=utm +zone=%d +datum=NAD83 +units=m +no_defs +ellps=GRS80"
crs <- CRS(sprintf(utmStr, 23))
data_points = spTransform(data_points, crs)
# Create a list with circular polygons around each point (radius = 2000 meters)
circular_grid = list()
for (i in 1:length(data_points)){
spc = spCircle(radius = 2000, centerPoint = c(x=as.numeric(data_points#coords[i,1]), y=as.numeric(data_points#coords[i,2])), spID=i,
spUnits = CRS("+proj=utm +zone=23 +datum=NAD83 +units=m +no_defs"))
circular_grid[[i]] = spc
}
# For each circle, check the percentage that overlaps with several different shapefiles:
# I first use gUnion to merge all the shapefiles with info about slums together
allShapes = gUnion(shape1,shape2)
allShapes = gUnion(allShapes, shape3)
allShapes = gUnion(allShapes, shape4)
allShapes = gUnion(allShapes, shape5)
allShapes = gUnion(allShapes, shape6)
allShapes = as(allShapes, "SpatialPolygonsDataFrame")
allShapes = spTransform(allShapes, CRS("+proj=utm +zone=23 +datum=NAD83 +units=m +no_defs"))
# I am unable to reproduce the object "allShapes" (I do not know how),
# but this is its information
# class : SpatialPolygonsDataFrame
# features : 1
# extent : 633347.1, 724692.1, -2547787, -2513212 (xmin, xmax, ymin, ymax)
# crs : +proj=utm +zone=23 +datum=NAD83 +units=m +no_defs
# variables : 1
# names : dummy
# value : 0
# Next, to get the intersection, I tried the following:
intersection_circle_shape = list()
for (i in 1:length(circular_grid)){
circle = circular_grid[[i]][["spCircle"]]
inter = intersect(circle, allShapes)
intersection_circle_shape[[i]] = inter
}
# The list "intersection_circle_shape" is empty because the command
# "intersect" says that there is no intersection, but I know there is.
Any ideas?

point pattern analysis in Spatstat

I am having some trouble setting up my data for some point pattern analysis.
What I want to do: conduct a point pattern analysis on NYC arrest data and see if there exists a spatial dependence between arrests and Covid-19 cases.
What I've done so far: downloaded data in the form of shapefiles
https://data.cityofnewyork.us/City-Government/Borough-Boundaries/tqmj-j8zm (the ZIP code boundaries)
https://www1.nyc.gov/site/nypd/stats/crime-statistics/citywide-crime-stats.page (year to date data for arrests in NYC by zip code)
Code:
library(readxl)
library(rgdal) #Brings Spatial Data in R
library(spatstat) # Spatial Statistics
library(lattice) #Graphing
library(maptools)
library(raster)
library(ggplot2)
library(RColorBrewer)
library(broom)
# Load nyc zip code boundary polygon shapefile
s <- readOGR("/Users/my_name/Documents/fproject/zip","zip")
nyc <- as(s,"owin")
### OGR data source with driver: ESRI Shapefile
Source: "/Users/my_name/Documents/project/zip", layer: "zip"
with 263 features
# Load nyc arrests point feature shapefile
> s <- readOGR("/Users/my_name/Documents/project/nycarrests/","geo1")
### OGR data source with driver: ESRI Shapefile
Source: "/Users/my_name/Documents/project/nycarrests", layer: "geo1"
with 103376 features
It has 19 fields
#Converting the dataset into a point pattern
arrests <- as(s,"ppp”)
### Error in as.ppp.SpatialPointsDataFrame(from) :
Only projected coordinates may be converted to spatstat class objects
This gave me the error above.
I know the error has to do with the coordinates not being in the cartesian coordinates. So my question is:
How can I convert my sp object to have (projected) cartesian coordinates in order to convert it to a point pattern (poisson point process)?
You are looking for spTransform.
Here is some example data
library(raster)
filename <- system.file("external/lux.shp", package="raster")
p <- shapefile(filename)
Solution
utm <- "+proj=utm +zone=32 +datum=WGS84"
x <- spTransform(p, utm)
x
#class : SpatialPolygonsDataFrame
#features : 12
#extent : 266045.9, 322163.8, 5481445, 5563062 (xmin, xmax, ymin, ymax)
#crs : +proj=utm +zone=32 +datum=WGS84 +units=m +no_defs
#variables : 5
#names : ID_1, NAME_1, ID_2, NAME_2, AREA
#min values : 1, Diekirch, 1, Capellen, 76
#max values : 3, Luxembourg, 12, Wiltz, 312

Plotting lat/long coordinates into a Formal Class Raster Layer (factor) map with aea projection in R?

I am quite new to working with spatial dataframes, and have what I thought was a relatively simple task: take a dataframe of 6 points, with x and y columns representing the lat/long positions of those points, and project them so that they can be used in a spatial data frame that I have made.
Here is the way I coded in the 6 points:
d1 <- structure(list(latitude = c(37.427733, 37.565759, 37.580956, 37.429285, 37.424270, 37.502496), longitude = c(-108.011061, -107.814039, -107.676662, -107.677166, -108.898826, -108.586042)))
d2 <- as.data.frame(d1)
d3 <- SpatialPointsDataFrame(c(d2[,c('longitude','latitude')]), data = d2)
And I tried changing/assigning a projection for these (these lat/long data were taken from Google Maps), but I can't seem to make it work. The projection for the data I want to overlay these points on is the following:
+proj=aea +lat_0=23 +lon_0=-96 +lat_1=29.5 +lat_2=45.5 +x_0=0 +y_0=0 +datum=WGS84 +units=m +no_defs
So basically my question is, how can I convert these lat/long into the format for x/y that this projection uses? Here is the extent of the dataset I want to overlay it on for reference, showing that it is clearly not in simple lat/long:
class : Extent
xmin : -1145835
xmax : -1011345
ymin : 1613205
ymax : 1704855
Thank you all so much in advance!
You need to (re)project your spatial points into the same projection as your other data source. I'm more familiar with the sf package for working with spatial vector information in R, but it looks like you are using the sp packcage.
The first step is to assign your data frame to the correct projection. Latitude and longitude are generally in WGS84, or epsg:4326 so:
library(sp)
d1 <- structure(list(latitude = c(37.427733, 37.565759, 37.580956, 37.429285, 37.424270, 37.502496), longitude = c(-108.011061, -107.814039, -107.676662, -107.677166, -108.898826, -108.586042)))
d2 <- as.data.frame(d1)
d3 <- SpatialPointsDataFrame(c(d2[,c('longitude','latitude')]), data = d2)
proj4string(d3) <- CRS("+init=epsg:4326")
sf::st_bbox(d3)
# xmin ymin xmax ymax
# -108.89883 37.42427 -107.67666 37.58096
Looking at the summary, or this extent, you can see that the lat and long are as the original. Now we can reproject using the proj4string you supplied
target_crs = CRS("+proj=aea +lat_0=23 +lon_0=-96 +lat_1=29.5 +lat_2=45.5 +x_0=0 +y_0=0 +datum=WGS84 +units=m +no_defs") # This is your string assigned to an object
d4 = spTransform(d3, target_crs) ## object used to transform your data frame
sf::st_bbox(d4)
# xmin ymin xmax ymax
# -1127246 1661667 -1018854 1679942
Your coordinates are now represented in the target space, and should be able to be plotted on top of your background data.
If you are willing to use an alternative package, sf::st_transform() is a little easier to use, and the sf package is generally more user friendly.

I downloaded shapefile but merging with internal dataset based on geometry fails

Apologies as i'm a little new to GIS features within R, so any help and explanation would be very helpful!
I have downloaded this shape file (Simple Feature Polygon) from source like so:
fire <-tempfile()
download.file("http://frap.fire.ca.gov/webdata/data/statewide/fhszs.sn.zip",destfile = fire)
unzip(fire,exdir = ".")
fire_map<-read_shape("fhszs06_3.shp")
Map has small polygons based on Hazard code (i.e.: 1,2,3)
I also have an internal dataframe that is about 15 variables with 3584 rows, I also have lat/lon for all points (commercial properties in california) that I'm trying to convert to either spatial points DF or simple feature in order to figure out which properties lie within a hazard code.
Example of property file:
ln_bal<- c(500000,200000,6000000,12000,130000)
ln_city <-c('Ventura','Torrance','Buena Park','Concord','Lake View Terrace')
lon <- c(-119.213504,-118.311072,-117.985452,-122.057139,-116.893845)
lat <-c(34.278122,33.844817,33.846594,37.979995,32.844287)
cmbs3 <- data.frame(ln_bal,ln_city,lon,lat)
I think my problem is getting the correct CRS and then matching with the shape file.
The CA Fire map has the following:
epsg (SRID): NA
proj4string: +proj=aea +lat_1=34 +lat_2=40.5 +lat_0=0 +lon_0=-120 +x_0=0 +y_0=-4000000 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs
I've Tried sf_intersect by creating a SF points DF:
fire_map <-st_read("fhszs06_3.shp")%>%
st_transform(4326) #need to set CRS the same as your dataframe below
#transforms coordinates to ellips code and creates matching values with fire_map:
proj4string(cmbs3)<-CRS("+proj=longlat +datum=WGS84")
cmbs3 <- spTransform(cmbs3, CRS("+proj=utm +zone=51 ellps=WGS84"))
#fire_map is a simple feature data frame need to convert our data to this, and then match
cmbs3<-st_as_sf(cmbs3,precision=0)
cmbs3<-st_set_crs(cmbs3,4326)
inters <- st_intersection(cmbs3,fire_map)
Expected (potential) Results:
ln_bal ln_city lon lat HAZ_CODE HAZ_CLASS
12000 Concord -122.057139 37.97 1 Moderate

Map raw data and mean data based on the shapefile

sI have the dataset (pts) like this:
x <- seq(-124.25,length=115,by=0.5)
y <- seq(26.25,length=46,by=0.5)
z = 1:5290
longlat <- expand.grid(x = x, y = y) # Create an X,Y grid
pts=data.frame(longlat,z)
names(pts) <- c( "x","y","data")
I knew that I can map the dataframe (pts) into a map by doing:
library(sp)
library(rgdal)
library(raster)
library(maps)
coordinates(pts)=~x+y
proj4string(pts)=CRS("+init=epsg:4326") # set it to long, lat
pts = spTransform(pts,CRS(" +init=epsg:4326 +proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs +towgs84=0,0,0"))
pts <- as(pts, "SpatialPixelsDataFrame")
r = raster(pts)
projection(r) = CRS(" +init=epsg:4326 +proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs +towgs84=0,0,0")
plot(r)
map("usa",add=T)
Now I would like to create a separate map which shows the means of pts across different regions. The shapefile I want to use is from ftp://ftp.epa.gov/wed/ecoregions/cec_na/NA_CEC_Eco_Level2.zip , however, this is a north america map. How can I create the map showing only US based on this north america map? Or is there another better way to do this? thanks so much.
I think that cutting out the non-US data based on the data in the shapefile alone would be hard, since the regions do not correspond to political boundaries - that could be done with rgeos though.
Assuming that "eco" is a SpatialPolygonsDataFrame read in by rgdal::readOGR or maptools::readShapeSpatial, see the available key data for indexing:
sapply(as.data.frame(eco), function(x) if(!is.numeric(x)) unique(x) else NULL)
If you just want to plot it, set up a map with only the US region to start with and then overplot.
library(maps)
map("usa", col = "transparent")
We see that the data is in Lambert Azimuthal Equal Area:
proj4string(eco)
[1] " +proj=laea +lat_0=45 +lon_0=-100 +x_0=0 +y_0=0 +a=6370997 +b=6370997 +units=m +no_defs"
So
require(rgdal)
eco.laea <- spTransform(eco, CRS("+proj=longlat +ellpse=WGS84"))
plot(eco.laea, add = TRUE)
If you want to plot in the original Lambert Azimuthal Equal Area you'll need to get the bounding box in that projection and start the plot based on that, I just used existing data to make an easy example. I'm pretty sure the data could also be cropped with rgeos against another boundary too, but depends what you actually want.

Resources