How to get coordinates of cities from openstreetmap in R? - r

I have a vector of german city names, like
cities <- c("Munich", "Hamburg", "Gelsenkirchen", "Berlin", "Berlin", "Berlin", "Hamburg")
I would like to draw a map with red dots on any city in that list.
Specificly, I would like to
create an openstreetmap section, which is just wide enough to include all cities in the list
draw red points on the map for every city in the list, making the points bigger by the count of city names
(optimally) not care about the actual coordinates of the cities, as openstreetmap "should know" them
So, how would I do this in R with ggplot?
This is how I got so far:
library(OpenStreetMap)
# List of cities
staedte <- tribble(
~Stadt, ~lon, ~lat,
"HSNR", 6.56964 , 51.31655,
"Mönchengladbach", 6.4519, 51.2007
)
# get the map (this might take some time)
# and add some extra margin of 0.015
mymap <-openmap(c(min(staedte$lat)-0.015, min(staedte$lon)-0.015),
c(max(staedte$lat+0.015), max(staedte$lon)+0.015),
type = "stamen-watercolor", mergeTiles = TRUE)
# project openstreetmap to alternate coordinate system (might also take some time)
mymap_coord <- openproj(mymap)
# create a ggplot2-Object of the map
mymap_plt <- OpenStreetMap::autoplot.OpenStreetMap(mymap_coord)
# add the city points
mymap_plt +
geom_point(data=staedte, aes(x=lon, y=lat, size=10), color="blue") +
geom_text(data=staedte, aes(x=lon, y=lat+0.01, size=10, label=Stadt), color="blue")
This works quiet well. My problem is:
right now I have to enter the lon and lat coordinates of the city by hand. Is there a way to get them just by the city name?
how could I set the size of the points in relation to the count of the city?

Related

Adding specific locations to shp map

I am creating a map on R with the following commands:
map_and_data <- inner_join(mapshape, mapdata) // where mapshape is the shp file and mapdata contains the data
ggplot(map_and_data) +
geom_sf(aes(fill=covidcases))+
scale_fill_gradient(low = "white", high = "red4", name = "COVID-19 cases")+
theme_classic())
This produces a normal map by region showing the amount of COVID-19 cases. Then, I have an excel file with a list of the largest 20 cities and their coordenates, and I want to add to the map a point for each one of these cities. How can I overlay a point for the locations of these cities to the original map?
My dataset (previously shp) called map_data looks as follows:
region_code
geometry
080018
(((410315.5 4598889, 410888.8 4598870~
080039
(((384322.7 4625199, 384343.6 4625174~
Thank you in advance!
Your city coordinates are in a different coordinate system than the polygons in you shape file. You can sf::st_transform() them into the target CRS after reading them.

st_transform appears to change the area of the geometry of a shapefile

Shouldn't the area of polygons remain the same when you transform the shapefile projection? I am currently working with the shapefile here (https://data.humdata.org/dataset/f814a950-4d4e-4f46-a880-4da5522f14c4/resource/731e11cb-be02-46cf-8347-db0a07abff4e/download/gin_admbnda_adm4_2021_ocha.zip) for the country GIN which I call gin_shp after I read it in with the st_read function in R as follows:
gin_shp <- st_read(dsn = "INPUT FOLDER", layer = "sous_prefectures")
gin_shp ##this shows the crs is WGS84 obviously in degrees
##compute the area of the multipolygon geometries
gin_shp$area <- sf::st_area(gin_shp) ##computes areas in m^2
gin_shp$area <- units::set_units(gin_shp$area, "km^2") ##converting the area to square kms
sum(gin_shp$area) ##compute total area of the entire country (google seems to agree!)
245084.2 [km^2]
I need to create 1sqkm grids (using sf::st_make_grid) so I tried to transform crs projection from degrees to UTM by calling the st_transform function as follows:
crs_dt <- rgdal::make_EPSG() ##first load the dataframe of CRS projections available in R
gin_shp <- sf::st_transform(gin_shp, crs = crs_dt$prj4[crs_dt$code == 4328]) ##select one and assign it to gin_shp
#now I try to compute total area under new projection regime by running the same code as before
gin_shp$area <- sf::st_area(gin_shp) ##computes areas in m^2
gin_shp$area <- units::set_units(gin_shp$area, "km^2") #converting the area to square kms
#compute total area of the entire country
sum(gin_shp$area)
44363.83 [km^2] (way way off)
Do you know why this might be happening? Any ideas how to fix it?
No, the area of polygons is not guaranteed to remain constant when transforming between different CRSes. The Earth is round, maps & computer screens are flat - something has to give; either area or shape has to be distorted somewhat.
There are some area preserving projections - such as the Mollweide - but these are more of exception than rule.
For an exaggerated example consider the world dataset, taken from {giscoR} package. Greenland (close to the pole) and Congo (on equator) have roughly the same area on sphere (calculated on WGS84 using spherical geometry tools) but wildly different one when projected. Especially when projected to Mercator and its derivatives (e.g. Web Mercator as used by Google maps).
library(sf)
library(dplyr)
library(giscoR)
library(ggplot2)
# the world in, 1: 20M
svet <- gisco_get_countries(resolution = "20")
# Greenland & Congo - to be drawn in red
glmd <- svet %>%
filter(CNTR_ID %in% c('GL', 'CD'))
# web mercator = default for google maps and other web based tools
ggplot() +
geom_sf(data = glmd, fill = "red", color = NA) +
geom_sf(data = svet, fill = NA, color = "gray45") +
coord_sf(crs = st_crs("EPSG:3857"),
ylim = c(-20e6, 20e6))
# Mollweide - equal area projection
ggplot() +
geom_sf(data = glmd, fill = "red", color = NA) +
geom_sf(data = svet, fill = NA, color = "gray45") +
coord_sf(crs = st_crs("ESRI:53009"))

map correct id to unique identifier in shp file coordinates

I have a shapefile, http://census.cso.ie/censusasp/saps/boundaries/Census2011_Small_Areas_generalised20m.zip
and want to extract the long/lat, but I am not sure how to map the correct coordinate to the correct small area.
mycode is:
require(ggplot2)
require(proj4)
require(rgdal)
a=readOGR(....shp)
dublin = a[a$NUS3NAME=='Dublin',]
dublin=spTransform(dublin,CRS('=proj=longlat +ellps=WGS84 +datum=WGS84'))
b=data.frame(dublin)
sa=fortify(dublin,SA='SMALL_AREA')
pj=project(sa[,1:2],proj4string(dublin),inverse=TRUE)
latlon=data.frame(latdeg=pj$y,londeg=pj$x)
sa=data.frame(cbind(latlon,sa)
The number of unique sa$id (4500) is the same as the number of unique b$SMALL_AREA (4500 rows). How is (for example) and id of 22 mapped from sa to the correct small area in b?
there are 56k rows in sa and 4500 rows in b
Any suggestions are appreciated
I am working in R
Shapefiles are much easier to work with and understand using the sf package in R. It keeps things tidy and rectangular, with the added $geometry list-column.
For your example, getting the lat & lon for the Dublin area:
library(sf)
library(tidyverse)
a <- read_sf('Census2011_Small_Areas_generalised20m/Census2011_Small_Areas_generalised20m.shp')
# dplyr filter() works for sf objects
dublin <- a %>% filter(NUTS3NAME == 'Dublin')
# Tranform to WGS84 coordinates
dublin <- dublin %>% st_transform(st_crs(4326))
# Proof CRS has changed
st_crs(dublin)
# lat/lon coords
st_coordinates(dublin) %>% head()
In this case, the sf geometry is of MULTIPOLYGON type. Each observation has between 4 and 168 connected lat/lon points associated with it. If you are interested in a single point for each observation, the centroid might be a good approximation.
Using dublin %>% st_centroid() will return all the data, but with the $geometry column consisting of a single point. Getting just the centroid points (as a matrix) can be achieved using dublin %>% st_centroid %>% st_coordinates().
Finally, a plot of the Dublin subset of the shapefile & the respective centroid points. There are quite a few shapes in a small area, making things hard to see. In the outskirts with larger polygons the centroids should be more visible.
dublin %>%
st_centroid() %>%
ggplot() +
geom_sf( size = .4, color = '#FF7900') +
geom_sf(data = dublin,
color = '#009A49',
fill = NA,
size = .2) +
theme(panel.background = element_rect(fill = "black")) +
coord_sf(datum = NA)

How do I select the subset of my lat/lon data that is highway travel using R?

I have some data which consists of time-stamped lat/lon pairs, a subset of which I've mapped out below using ggmap. If I wanted to select only the data consisting of travel along the highways, which you can kinda see on the map - the 280 running NW-SE between the green mountains and the grey flat area & the 101 cutting through the middle of the grey flat area (where the red is dense) - how would I select only that data?
What I'd ultimately like to achieve is a dataframe which contains only the highway/interstate travel. I've seen this question, which is a brief sketch of a possible solution in javascript, & suggests to use the Directions API to return the nearest road for any given point. I could then filter those results, but I'm wondering if anyone has found a cleaner solution.
Here's some sample data (CSV)
Here's the code to plot the above:
require(ggmap)
map<-get_googlemap(center="Palo Alto", zoom = 10)
ggmap(map) + geom_point(data = sample, aes(x = lon, y = lat),size = 3, color = "red")
You don't need an API key to run the above.
I just found this post and thought this is an interesting question. I wanted to download your sample data file. Unfortunately, the link was not working any more. Therefore, I could not try the whole process I had in my mind. However, I believe the following will let you move forward if you still try to do this task.
I recently noticed that Natural Earth offers road data. That is, you can get long and lat for the roads in the US, for example. If you can compare lon/lat in your data set and lon/lat of the road data, and identify matches in data points, you can get the data you want. My concern is to what extent your data points are accurate. If lon/lat perfectly stays on the road you are interested in, you will be OK. But if there are some margins, you may have to think how you can filter your data.
What I would like to leave here is evidence that the road data and googlemap match pretty well. As long as I see the output, the road data is reliable. You can subset your data using the road data. Here is my code.
### Step 1: shapefile becomes SpatialLinesDataFrame.
foo <- readShapeLines("ne_10m_roads_north_america.shp")
### Step 2: If necessary, subset data before I use fortify().
### dplyr does not work with SpatialLinesDataFrame at this point.
CA <- foo %>%
subset(.,country == "United States" & state == "California")
### Step 3: I need to convert foo to data frame so that I can use the data
### with ggplot2.
ana <- fortify(CA)
### Step 4: Get a map using ggmap package
longitude <- c(-122.50, -121.85)
latitude <- c(37.15, 37.70)
map <- get_map(location = c(lon = mean(longitude), lat = mean(latitude)),
zoom = 12, source = "google",
maptype = "satellite")
ggmap(map) +
geom_path(aes(x = long, y = lat, group = group), data = ana)

Maps doesn't register weird shapes

I'm working with one of my professors on some research aimed toward bettering the current methods of carbon accounting. We noticed that many of the locations for point sources were defaulted to the centroid of the county it was in (this is specific to the US at the moment, though it will be applied globally) if there was no data on the location.
So I'm using R to to address the uncertainty associated with these locations. My code takes the range of longitude and latitude for a county and plots 10,000 points. It then weeds out the points that are not in the county and take the average of the leftover points to locate the centroid. My goal is to ultimately take the difference between these points and the centroid to find the spacial uncertainty for point sources that were placed in the centroid.
However, I'm running into problems with coastal regions. My first problem is that the maps package ignores islands (the barrier islands for example) as well as other disjointed county shapes, so the centroid is not accurately reproduced when the points are averaged. My second problem is found specifically with Currituck county (North Carolina). Maps seems to recognize parts of the barrier islands contained in this county, but since it is not continuous, the entire function goes all wonky and produces a bunch of "NAs" and "Falses" that don't correspond with the actual border of the county at all.
(The data for the centroid is going to be used in other areas of the research which is why it's important we can accurately access all counties.)
Is there any way around the errors I'm running into? A different data set that could be read in, or anything of the sort? Your help would be greatly appreciated. Let me know if there are any questions about what I'm asking, and I'll be happy to clarify.
CODE:
ggplot2 helps
SOME TROUBLE COUNTIES: north carolina, currituck & massachusetts,dukes
library(ggplot2)
library(maps) # package has maps
library(mapproj) # projections
library(sp)
WC <- map_data('county','north carolina,currituck') #calling on county
p <- ggplot(data = WC, aes(x = long, y = lat)) #calling on latitude and longitude
p1 <- p + geom_polygon(fill = "lightgreen") + theme_bw() +
coord_map("polyconic") + coord_fixed() #+ labs(title = "Watauga County")
p1
### range for the long and lat
RLong <- range(WC$long)
RLong
RLat <- range(WC$lat)
RLat
### Add some random points
n <- 10000
RpointsLong <- sample(seq(RLong[1], RLong[2], length = 100), n, replace = TRUE)
RpointsLat <- sample(seq(RLat[1], RLat[2], length = 100), n, replace = TRUE)
DF <- data.frame(RpointsLong, RpointsLat)
head(DF)
p2<-p1 + geom_point(data = DF, aes(x = RpointsLong, y = RpointsLat))
p2
# Source:
# http://www.nceas.ucsb.edu/scicomp/usecases/GenerateConvexHullAndROIForPoints
inside <- map.where('county',RpointsLong,RpointsLat)=="north carolina,currituck"
inside[which(nchar(inside)==2)] <- FALSE
inside
g<-inside*DF
g1<-subset(g,g$RpointsLong!=0)
g1
CentLong<-mean(g1$RpointsLong)
CentLat<-mean(g1$RpointsLat)
Centroid<-data.frame(CentLong,CentLat)
Centroid
p1+geom_point(data=g1, aes(x=RpointsLong,y=RpointsLat)) #this maps all the points inside county
p1+geom_point(data=Centroid, aes(x=CentLong,y=CentLat))
First, given your description of the problem, I would probably invest a lot of effort to avoid this business of locations defaulting to the county centroids - that's the right way to solve this problem.
Second, if this is a research project, I would not use the built in maps in R. The USGS National Atlas website has excellent county maps of the US. Below is an example using Currituck County in NC.
library(ggplot2)
library(rgdal) # for readOGR(...)
library(rgeos) # for gIntersection(...)
setwd("< directory contining shapefiles >")
map <- readOGR(dsn=".",layer="countyp010")
NC <- map[map$COUNTY=="Currituck County" & !is.na(map$COUNTY),]
NC.df <- fortify(NC)
bbox <- bbox(NC)
x <- seq(bbox[1,1],bbox[1,2],length=50) # longitude
y <- seq(bbox[2,1],bbox[2,2],length=50) # latitude
all <- SpatialPoints(expand.grid(x,y),proj4string=CRS(proj4string(NC)))
pts <- gIntersection(NC,all) # points inside the polygons
pts <- data.frame(pts#coords) # ggplot wants a data.frame
centroid <- data.frame(x=mean(pts$x),y=mean(pts$y))
ggplot(NC.df)+
geom_path(aes(x=long,y=lat, group=group), colour="grey50")+
geom_polygon(aes(x=long,y=lat, group=group), fill="lightgreen")+
geom_point(data=pts, aes(x,y), colour="blue")+
geom_point(data=centroid, aes(x,y), colour="red", size=5)+
coord_fixed()
Finally, another way to do this (which I'd recommend, actually), is to just calculate the area weighted centroid. This is equivalent to what you are approximating, is more accurate, and much faster.
polys <- do.call(rbind,lapply(NC#polygons[[1]]#Polygons,
function(x)c(x#labpt,x#area)))
polys <- data.frame(polys)
colnames(polys) <- c("long","lat","area")
polys$area <- with(polys,area/sum(area))
centr <- with(polys,c(x=sum(long*area),y=sum(lat*area)))
centr # area weighted centroid
# x y
# -76.01378 36.40105
centroid # point weighted centroid (start= 50 X 50 points)
# x y
# 1 -76.01056 36.39671
You'll find that as you increase the number of points in the points-weighted centroid the result gets closer to the area-weighted centroid.

Resources