Maps doesn't register weird shapes - r

I'm working with one of my professors on some research aimed toward bettering the current methods of carbon accounting. We noticed that many of the locations for point sources were defaulted to the centroid of the county it was in (this is specific to the US at the moment, though it will be applied globally) if there was no data on the location.
So I'm using R to to address the uncertainty associated with these locations. My code takes the range of longitude and latitude for a county and plots 10,000 points. It then weeds out the points that are not in the county and take the average of the leftover points to locate the centroid. My goal is to ultimately take the difference between these points and the centroid to find the spacial uncertainty for point sources that were placed in the centroid.
However, I'm running into problems with coastal regions. My first problem is that the maps package ignores islands (the barrier islands for example) as well as other disjointed county shapes, so the centroid is not accurately reproduced when the points are averaged. My second problem is found specifically with Currituck county (North Carolina). Maps seems to recognize parts of the barrier islands contained in this county, but since it is not continuous, the entire function goes all wonky and produces a bunch of "NAs" and "Falses" that don't correspond with the actual border of the county at all.
(The data for the centroid is going to be used in other areas of the research which is why it's important we can accurately access all counties.)
Is there any way around the errors I'm running into? A different data set that could be read in, or anything of the sort? Your help would be greatly appreciated. Let me know if there are any questions about what I'm asking, and I'll be happy to clarify.
CODE:
ggplot2 helps
SOME TROUBLE COUNTIES: north carolina, currituck & massachusetts,dukes
library(ggplot2)
library(maps) # package has maps
library(mapproj) # projections
library(sp)
WC <- map_data('county','north carolina,currituck') #calling on county
p <- ggplot(data = WC, aes(x = long, y = lat)) #calling on latitude and longitude
p1 <- p + geom_polygon(fill = "lightgreen") + theme_bw() +
coord_map("polyconic") + coord_fixed() #+ labs(title = "Watauga County")
p1
### range for the long and lat
RLong <- range(WC$long)
RLong
RLat <- range(WC$lat)
RLat
### Add some random points
n <- 10000
RpointsLong <- sample(seq(RLong[1], RLong[2], length = 100), n, replace = TRUE)
RpointsLat <- sample(seq(RLat[1], RLat[2], length = 100), n, replace = TRUE)
DF <- data.frame(RpointsLong, RpointsLat)
head(DF)
p2<-p1 + geom_point(data = DF, aes(x = RpointsLong, y = RpointsLat))
p2
# Source:
# http://www.nceas.ucsb.edu/scicomp/usecases/GenerateConvexHullAndROIForPoints
inside <- map.where('county',RpointsLong,RpointsLat)=="north carolina,currituck"
inside[which(nchar(inside)==2)] <- FALSE
inside
g<-inside*DF
g1<-subset(g,g$RpointsLong!=0)
g1
CentLong<-mean(g1$RpointsLong)
CentLat<-mean(g1$RpointsLat)
Centroid<-data.frame(CentLong,CentLat)
Centroid
p1+geom_point(data=g1, aes(x=RpointsLong,y=RpointsLat)) #this maps all the points inside county
p1+geom_point(data=Centroid, aes(x=CentLong,y=CentLat))

First, given your description of the problem, I would probably invest a lot of effort to avoid this business of locations defaulting to the county centroids - that's the right way to solve this problem.
Second, if this is a research project, I would not use the built in maps in R. The USGS National Atlas website has excellent county maps of the US. Below is an example using Currituck County in NC.
library(ggplot2)
library(rgdal) # for readOGR(...)
library(rgeos) # for gIntersection(...)
setwd("< directory contining shapefiles >")
map <- readOGR(dsn=".",layer="countyp010")
NC <- map[map$COUNTY=="Currituck County" & !is.na(map$COUNTY),]
NC.df <- fortify(NC)
bbox <- bbox(NC)
x <- seq(bbox[1,1],bbox[1,2],length=50) # longitude
y <- seq(bbox[2,1],bbox[2,2],length=50) # latitude
all <- SpatialPoints(expand.grid(x,y),proj4string=CRS(proj4string(NC)))
pts <- gIntersection(NC,all) # points inside the polygons
pts <- data.frame(pts#coords) # ggplot wants a data.frame
centroid <- data.frame(x=mean(pts$x),y=mean(pts$y))
ggplot(NC.df)+
geom_path(aes(x=long,y=lat, group=group), colour="grey50")+
geom_polygon(aes(x=long,y=lat, group=group), fill="lightgreen")+
geom_point(data=pts, aes(x,y), colour="blue")+
geom_point(data=centroid, aes(x,y), colour="red", size=5)+
coord_fixed()
Finally, another way to do this (which I'd recommend, actually), is to just calculate the area weighted centroid. This is equivalent to what you are approximating, is more accurate, and much faster.
polys <- do.call(rbind,lapply(NC#polygons[[1]]#Polygons,
function(x)c(x#labpt,x#area)))
polys <- data.frame(polys)
colnames(polys) <- c("long","lat","area")
polys$area <- with(polys,area/sum(area))
centr <- with(polys,c(x=sum(long*area),y=sum(lat*area)))
centr # area weighted centroid
# x y
# -76.01378 36.40105
centroid # point weighted centroid (start= 50 X 50 points)
# x y
# 1 -76.01056 36.39671
You'll find that as you increase the number of points in the points-weighted centroid the result gets closer to the area-weighted centroid.

Related

Generating GPS Coordinates / creating a raster of evenly distributed points 1km apart

i would like to cover an area defined by a bbox lat long coordinates with a raster of gps points 1 km apart. Currently i generate 2000 points for a bboxbbox=8.9771580802,47.2703623267,13.8350427083,50.5644529365 the following way:
as.data.frame(cbind(runif(2000,8.9771580802 ,13.8350427083),runif(2000,47.2703623267,50.5644529365)))
Since runif is a normal distribution, i think i just have to increase the amount of points to cover the whole area the way i need it.
Is there a more clever way to do it?
How many points would i need?
UPDATE
I thought i maybe can use the package sp to do the job but im still not realy familiar the with settings:
longitudes <- c(8.9771580802, 13.8350427083)
latitudes <- c(47.2703623267, 50.5644529365)
bounding_box <- matrix(c(longitudes, latitudes), nrow = 2, byrow = TRUE, dimnames = list(NULL, c("min", "max")))
projection <- "+proj=longlat"
sp_box<-Spatial(bbox = bounding_box, proj4string = CRS(projection))
p_sample<-spsample(sp_box, 10000, type="regular")
If i understand correctly this will give me a number of points evenly distributed within my coordinates. spsample has an option for cell size but i dont grasp it yet.
BR
Andreas
I'm not too much into spatial data and analysis but maybe it helps as a first step (I took different coordinates to get a reproducible example, which fits the dimensions of Germany where I have some feelings for the dimensions). I'm sure that there is a more elegant way but it should give you what you need. geosphere::destPoint() is ued to compute the points for a given distance and direction, geosphere::distGeo() computes the north-south/west-east distance of the given box to compute how many points we need for each direction. expand_grid() is then used to compute every combination for the computed border points.
Be also aware that I changed the distance between the points to 10,000 meters or 10 km to get fewer points and a nicer plot. You would have to change the numbers accordingly
nw <- c(5.8 55)
se <- c(15.1, 47)
lon1 <- nw[1]
lat1 <- nw[2]
lon2 <- se[1]
lat2 <- se[2]
#(1) compute the border points in y direction, going south from the nw-point
# while keeping lon constant
lat <- geosphere::destPoint(nw, 180, 1:floor(geosphere::distGeo(c(lon1,lat1),
c(lon1,lat2))/10000)*10000)
lat <- as_tibble(lat)
#(2) compute the border point in x direction (analog to above)
lon <- geosphere::destPoint(nw, 90, 1:floor(geosphere::distGeo(c(lon1,lat1),
c(lon2,lat1))/10000)*10000)
lon <- as_tibble(lon)
# use expand_grid() to compute all combinations
grid <- tidyr::expand_grid(lat$lat, lon$lon)
names(grid) <- c("lat", "lon") #nicer names
### for visualizing what we've done, map germany with a grid overlay
germany <- rnaturalearth::ne_countries(type = "countries",
country = "germany", returnclass = "sf")
ggplot2::ggplot(data = germany)+
ggplot2::geom_sf()+
ggplot2::geom_point(data = grid, mapping = aes(x = lon, y = lat), size = 0.01)

Filter shapefile polygons by area

I have the following boundary dataset for the United Kingdom, which shows all the counties:
library(raster)
library(sp)
library(ggplot)
# Download the data
GB <- getData('GADM', country="gbr", level=2)
Using the subset function it is really easy to filter the shapefile polygons by an attribute in the data. For example, if I want to exclude Northern Ireland:
GB_sub <- subset(UK, NAME_1 != "Northern Ireland")
However, there are lots of small islands which distort the scale data range, as shown in the maps below:
Any thoughts on how to elegantly subset the dataset on a minimum size? It would be ideal to have something in the format consistent with the subset argument. For example:
GB_sub <- subset(UK, Area > 20) # specify minimum area in km^2
Here is another potential solution. Because your data is in lat-long projection, directly calculating the area based on latitude and longitude would cause bias, it is better to calculate the area based on functions from the geosphere package.
install.packages("geosphere")
library(geosphere)
# Calculate the area
GB$poly_area <- areaPolygon(GB) / 10^6
# Filter GB based on area > 20 km2
GB_filter <- subset(GB, poly_area > 20)
poly_area contains the area in km2 for all polygons. We can filter the polygon by a threshold, such as 20 in your example. GB_filter is the final output.
This is one potential solution:
GB_sub = GB[sapply(GB#polygons, function(x) x#area>0.04),] # select min size
map.df <- fortify(GB_sub)
ggplot(map.df, aes(x=long, y=lat, group=group)) + geom_polygon()
Check this link for specifics on the actual interpretation of km2 size: Getting a slot's value of S4 objects?
I compared both as well but they don't seem to differ:
out1 = sapply(GB#polygons, function(x) x#area)
out2 = rgeos::gArea(GB, byid=TRUE)

Latitude and Longitude, x and y, and differences when plotting them: geosphere and ggmap

I'm using latitude and longitude data and trying to graphically display metrics at each point of a map of Stockholm (based on proximity to that point). I am more interested in the points being equally spaced on the image, rather than equally spaced in actual distance: in this sense, I understand that distances between latitude points at the equator are longer than they are along the polar circles, and this might be a crucial to the question.
My goal was to divide up the map into a grid of approximately 1 km increments in both the x and y direction. As such, I took the minimum and maximum latitude and longitude, calculated their x and y distances from the centre of Stockholm, and then divided the span of the latitudes and longitudes by the span of the x and y coordinates (using geosphere). I did this because I wanted the points to be equally spaced when plotting them (otherwise, there would be less distance between x points at the top compared to the bottom of the map due to proximity to the equator).
I then plotted these points on the map (using ggmap), and observed that there is more distance between points in the y direction than the x direction. I suppose the map could be simply drawn in a distorted manner, but it seems a little bit too distorted to believe. I suspect I might be doing something wrong, but can't find what it might be.
Code example below:
library("ggmap")
library("RgoogleMaps")
library("geosphere")
stockholm <- get_map("stockholm", zoom=11)
ggmap(stockholm)
places <- c('Tensta', 'Hanviken')
pos <- data.frame(Places = places, lat = NA, lon = NA, x = NA, y = NA)
reflatlon = getGeoCode('Stockholm, Sweden')
for(i in 1:length(places)) {
latlon <- getGeoCode(paste0(places[i], ', Stockholm'))
pos$lat[i] <- as.numeric(latlon[1])
pos$lon[i] <- as.numeric(latlon[2])
dist_y <- distGeo(c(latlon[1], reflatlon[2]), reflatlon) * sign(latlon[1] - reflatlon[1]) # same longitude
dist_x <- distGeo(c(reflatlon[1], latlon[2]), reflatlon) * sign(latlon[2] - reflatlon[2]) # same latitude
pos$x[i] <- dist_x
pos$y[i] <- dist_y
}
deglatperm <- ( max(pos$lat) - min(pos$lat) ) / ( max(pos$y) - min(pos$y) ) # degrees latitude per metre
deglonperm <- ( max(pos$lon) - min(pos$lon) ) / ( max(pos$x) - min(pos$x) ) # degrees longitude per metre
seqlat <- seq(min(pos$lat), max(pos$lat), by = deglatperm*1000) # sequence with a point every ~1km
seqlon <- seq(min(pos$lon), max(pos$lon), by = deglonperm*1000) # sequence with a point every ~1km
seqlatlon <- expand.grid(seqlat, seqlon)
names(seqlatlon) <- c('lat', 'lon')
ggmap(stockholm) + geom_point(aes(x = lon, y = lat), data=seqlatlon)
output plot
As you can see from the output plot, there is at least twice as much distance between points in the y direction as compared to the x direction.
To summarise: the x and y coordinates are obtained using geosphere. The map is plotted using ggmap.
Am I doing something wrong with geosphere? Or are maps of latitude and longitude SO distorted? When I open Google Maps, and use the "measure distance" tool approximately between top and bottom, and left and right points, I get estimates of 16.3 and 16.9 km, whereas the values I get with geosphere are 17 and 32km (x and y) respectively.
If someone could tell me what's going on here, I'd be extremely thankful!
I try to avoid working in any coordinate system the uses degrees instead of distances. In the US I use our State Plane system constantly. It appears that Sweden uses the RT system. Once you get your coordinates out of degrees and into distances from a datum, then you can build your grid using more conventional distances. From there you can put your coordinates back into degrees if you like.
I use the spTranform function for coordinate conversions and use I the Spatial Reference guide to get the reference codes for the coordinate systems.
library("ggmap")
library("RgoogleMaps")
library("geosphere")
library("sp")
stockholm <- get_map("stockholm", zoom=11)
ggmap(stockholm)
places <- c('Tensta', 'Hanviken')
pos <- data.frame(Places = places, lat = NA, lon = NA)
reflatlon <- getGeoCode('Stockholm, Sweden')
for(i in 1:length(places)) {
latlon <- getGeoCode(paste0(places[i], ', Stockholm'))
pos$lat[i] <- as.numeric(latlon[1])
pos$lon[i] <- as.numeric(latlon[2])
}
p <- SpatialPoints(data.frame(pos$lon, pos$lat), proj4string = CRS("+init=epsg:4326"))
p <- spTransform(p, CRS("+init=epsg:3022"))
seqx <- seq(min(p#coords[,1]), max(p#coords[,1]), by = 1000)
seqy <- seq(min(p#coords[,2]), max(p#coords[,2]), by = 1000)
pgrid <- expand.grid(seqx, seqy)
pgrid <- SpatialPoints(pgrid, proj4string = CRS("+init=epsg:3022"))
pgrid <- spTransform(pgrid, CRS("+init=epsg:4326"))
pgrid <- data.frame(pgrid#coords)
names(pgrid) <- c('lon', 'lat')
ggmap(stockholm) + geom_point(aes(x = lon, y = lat), data=pgrid)

Extract routes between stations from a rail network shapefile

Problem: I want to extract a route along the rail network between two stations from a shapefile and only plot this particular route, rather than the entire network.
This is what I have so far:
I have a shapefile with the entire rail network of the UK, plotted it looks like this:
library(maptools)
rail <- readShapeSpatial("railnetworkLine.shp")
I also have a list of stations with Eastings and Northings, for example:
1) ABDARE 300400 202800
2) DEIGHTN 416490 419140
I can add them to the map and it looks like this:
plot(rail)
plot(spdf.station, add=TRUE, col="red", pch=20)
So what I don't know, is how I can extract the route between them and just plot that route - the information is obviously in the shapefile and I have the coordinates of the station, but I don't understand how to extract it.
I managed to calculate the distance between them with this code:
SpacingInMetres <- 10000
require(secrlinear)
network <- read.linearmask(data=rail, spacing=SpacingInMetres)
distance <- (networkdistance (stations[1,], stations[2,], network))/1000
# Confirm distance:
distance
>311.7893
And I found that you can get the routes along roads with Google Maps with ggmaps (see here). But how can you do it when you have a shapefile as the network input rather than Google Maps?
I think maybe the packages 'shp2graph' + 'igraph' are useful, but I just can't figure it out. Any thoughts?
Shortest paths on route networks can be calculated using the stplanr package. I used a shapefile with the entire rail network for the Netherlands. This shapefile is available from:
https://mapcruzin.com/free-netherlands-arcgis-maps-shapefiles.htm
library(sf)
library(ggplot2)
library(stplanr)
# Read shapefile
nl_rails_sf <- sf::st_read("~/netherlands-railways-shape/railways.shp")
# Data frame with station locations
stations_df <- data.frame(station = c("Den Haag", "Den Bosch"),
lat = c(52.080276, 51.690556),
lon = c(4.325, 5.293611))
# Create sf object
stations_sf <- sf::st_as_sf(stations_df, coords = c("lon", "lat"), crs = 4326)
# Find shortest route
slnetwork <- SpatialLinesNetwork(nl_rails_sf)
find_nodes <- find_network_nodes(sln = slnetwork,
x = stations_df$lon,
y = stations_df$lat,
maxdist = 2e5)
route_dhdb_df <- data.frame(start = find_nodes[1], end = find_nodes[2])
route_dhdb_sf <- sum_network_links(sln = slnetwork, routedata = route_dhdb_df)
# Distance route in meters
distance_m <- sum(route_dhdb_sf$length) # 112189.5 [m]
# Plot results
ggplot(nl_rails_sf) +
geom_sf() +
theme_void() +
geom_sf(data = stations_sf, color = "red") +
geom_sf(data = route_dhdb_sf, color = "red")

Calculating distance to nearest shore from multiple GPS coordinates

I have tried using the response to this question to solve this problem but I cannot apply it in my case since I have many coordinates distributed at a global scale.
Does anyone have a way to calculate the minimum distance in km from a series of points to the nearest shore using a loop? This is a subset of the points I am using (DATA HERE)
#setwd and load directories----
setwd("your_wd")
require (ggplot2)
require (ggmap)
#build a map to plot distribution of sample sites ----
sites<-read.csv("sites.csv", header=T)
#Using GGPLOT, plot the Base World Map
mp <- NULL
mapWorld <- borders("world", colour="gray50", fill="gray50") # create a layer of borders
mp <- ggplot() + mapWorld
#Now Layer the sites on top
Lon<-sites$x
Lat<-sites$y
mp <- mp+ geom_point(aes(x=Lon, y=Lat),color="blue", size=3)
mp
Have a look at the rgeos package
library(rgeos)
gDistance(spPoints, spPolygon, byid = TRUE)
spPoints will be a SpatialPoints object holding the coordinates. spPolygon will be a SpatialPolygons objects with landmasses. See the sp package. Make sure that both object have the same projection and have a sensible projection.

Resources