Extract from OSM Planet File is incomplete - r

I'm fairly new to the OSM world and I need to extract all water related polygons from the OSM planet file, except for ocean polygons. I know there is some product from a university in Tokyo, but it's from 2016 and I need it as up to date as possible.
I already extracted a good bit of it with the following code. However, comparing the resulting layers with the OSM basemap in QGIS I noticed that some parts are missing. Even though they have the same flags and relations like other parts that were extracted. I know that some parts of rivers are digitalized as lines and not polygons, so it's okay that those are missing. The missing parts are definitely polygons, since I could extract one of them with the same flags through the QuickOSM plug-in in QGIS. Also the OSM Basemap shows clearly that those areas must be polygons (see screenshot).
Is there a mistake in my code or did I make an mistake with the flags? My code however throws no errors and everything seems to be working except for the missing parts.
Thanks in advance!
Here is the code so far:
library(gdalUtils)
library(rgdal)
library(sf)
# extracting all layers with flag "natural = water"
path_pbf <- "path/to/planet_file.osm.pbf"
ogr2ogr(src_datasource_name = path_pbf,
"OSM_Waterbodies.gpkg",
f = "GPKG",
sql = "SELECT * FROM multipolygons WHERE natural = 'water'",
progress = T)
# extracting all layers with flag "other_tags LIKE waterway"
ogr2ogr(src_datasource_name = path_pbf,
"OSM_Waterways.gpkg",
f = "GPKG",
sql = "SELECT * FROM multipolygons WHERE other_tags LIKE '%waterway%'",
progress = T)
waterways <- st_read("OSM_Waterways.gpkg")
waterways$rm <- NA
# select only certain polygons since waterways also includes dams etc
check <- "*riverbank*|*river*|*stream*|*tidal channel*|*canal*|*drain*|*ditch*|*lake*"
# mark polygons which are not part of the desired selection with "remove" flag
for(i in 1:nrow(waterways)){
if (!grepl(check, waterways$other_tags[i])){
waterways$rm[i] <- "Remove"
}
}
# drop rows with "remove" flag
index <- which(waterways$rm == "Remove")
waterways <- waterways[-index,]
st_write(waterways, "OSM_Waterways_clean.gpkg", driver = "GPKG")
P.S.: The code is probably not the most efficient one, but it's not about efficiency, since I will probably run it once or twice.

It looks like you're only extracting multiploygons, which are used in OSM when a shape isn't a simple polygon. This means that river sections with islands in them will be extracted, but many simple river sections will not, as they are just mapped as closed ways (An example from your screenshot). I don't have the OSM Global file on hand to check, but I would imagine that it's as simple as running the ogr2ogr functions again with ways instead of multipolygons in the SQL, and then checking that the ways are closed (likely, check that the first and last nodes are identical, as a quick search suggests that ogr2ogr doesn't provide a way to check for closed ways explicitly).

Related

Unable to create a multilinestring using st_intersection that is good enough to use border_segment from the SpatialRDD package

I’ve been following the following tutorial https://axlehner.github.io/SpatialRDD/articles/spatialrdd_vignette.html#assign-treatment for SpatialRDD using my own dataset and maps. I’ve run into a problem when attempting to use border_segment. In the tutorial the author has a cut off line he uses which describes a spatial discontinuity. In my own project I am trying to create such a cut off line using historical boundaries and by taking the intersection. I’ve been using two neighboring polygons and taking the intersection of their boundaries using st_intersection to create this cut off. However, once I reach the point where I need to use border_segment it does not work as intended.
If for instance I want 2 border segments to be created using my region, points in that region, and my cut off line, it appears border_segment creates 2 segments for each line segment of my cut off line (the cut off line I created using the intersection of boundaries). Furthermore, comparing the cut off line from the tutorial (can be loaded into R using SpatialRDD::data("cut_off.sf")) to the one I create, the one in the tutorial has one big matrix describing the geometry, while the one I create is described by many 2x2 matrices. I think this is where the problem might be stemming from, but I have no idea how to fix it. I think by finding a solution it would help to understand how to create better geographical objects using just R.
I’ve created a reproducible example to illustrate the issue:
library(SpatialRDD)
library(sf)
library(tmap)
library(spData) #loading various maps
#getting worldmap
data("world")
#changing to a localised projection system
world <- st_transform(world, "EPSG:3035")
#getting just Germany
germany <- world[world$name_long == "Germany",]
#getting just Poland
poland <- world[world$name_long == "Poland",]
#taking the union of Germany and Poland to later distribute random points in them
union <- st_union(germany, poland)
#taking the intersection of their borders, which will act as a cut off line
cut_off <- st_intersection(germany, poland)
#creating random points for exercise to work (similar code to what is done in tutorial)
set.seed(1088) # set a seed
points_samp <- sf::st_sample(union, 1000)
points_samp <- sf::st_sf(points_samp) # make it an sf object
#using border_segment to sort points into 2 border regions
points_samp$segment2 <- border_segment(points_samp, cut_off, 2)
#mapping the points in their respective border segments
tm_shape(points_samp) + tm_dots("segment2", size = 0.1) + tm_shape(cut_off) + tm_lines()
As you will see when plotting the points after creating the border segments, that there will be many border segments even though I indicated for border_segment to create only two.
I’m fairly new to R and geospatial manipulation, so maybe there is a simple solution, but I’ve searched everywhere and haven’t been able to find one. This is the first time after using stack overflow for years that I’ve had to actually post. Please let me know if you need any other information or if I was not clear enough at some point!

Why is R only extracting a shapefile field values for certain points?

Apologies if this belongs in GIS stackexchange, I'm happy to be redirected there, but I have a feeling it's something in my code that is the problem.
I have a shapefile of the Terrestrial Ecoregions of the World that contains fields such as Ecoregion Name, Biome, and, Realm, that I want to extract for a range of ~50,000 lat/long points. An example of the data would be
ID
ddlat
ddlong
221784
6.133
37.700
221814
26.450
74.700
221826
-17.716
-63.633
221827
47.933
8.083
221830
-24.283
131.600
My code (sample from within a much larger script) currently looks like the following:
teow <- shapefile("~/wwf_terr_ecos.shp")
teowlltest <- read.csv(file="~/teow_csv.csv",stringsAsFactors=FALSE)
teowlltest <- na.omit(teowlltest)
teowlltestSPT <- SpatialPoints(teowlltest, proj4string = CRS(proj4string(teow)))
overteow <- over(teowlltestSPT, teow)
bindtest <- cbind(teowlltest,overteow)
My problem is that the code works ... but only for 15 of the latlong points. I'm not expecting a 100% success rate, as I know the shapefile isn't perfect in coastline areas etc, but when mapped, far more than 15 points lie within the TEOW shapefile. The 15 that are being extracted are also incorrect when compared to maps of the biomes/realms etc. How do I get it to extract the field data for all the points and do it correctly? I've tried renaming/reorganising the csv (for example Y,X or Latitude, Longitude instead of ddlat, ddlong, and placing longitude as the first field), and subsetting the data, but only the same 15 keep extracting. I can't separate the Lat/Long information from the ID either, so I'm hoping that's not the problem here, but I can't see why it would be, if some of the points extract fine. Any help will be much appreciated, and I'm happy to answer any questions/supply more information if needed. Thanks!
Example output currently:
Showing that some have extracted (but not correctly) and all the others have just refused. Duplicate lat/longs removed for brevity.

How to calculate area of polygons from a large shapefile

Summary:
I'm trying to calculate the area of a large number of polygons in R. I've read a few posts about how I might do this (Example #1 & Example #2) but the problem I'm having is that my shapefile is too large (1.7gb) to import. Given I can't import the file, I can't calculate the area of the polygons.
Extended Explanation:
I'm actually trying to calculate the area of properties in Victoria, Australia. The polygons represent these properties. I downloaded the simplified models 1 and 2 of VicMaps from Spatial Datamart for all of Victoria.
However, given the size of the shapefiles, I had to narrow my search to just one local government area (LGA) and calculated the polygon areas (just for testing). The shapefile was 15.5MB.
library(raster)
x <- shapefile("D:/Downloads/SDM616230/ll_gda94/shape/lga_polygon/ballarat/VMPROP/PROPERTY_PRIMARY_APPROVED.shp")
crs(x)
x$area_sqkm <- area(x) / 1000000
This worked but its not a practical solution to my problem given there's many LGAs in Victoria and I plan to eventually follow the same process for Queensland and NSW.
However, trying to load a larger shapefile doesn't work and results in the below error code "Error: memory exhausted (limit reached?)".
I've tried using readShapePoly, readogr, st_read and read_sf to get the large shapefile into R but they don't work. I think the file is just too large. I tried using a select query within read_sf in an effort to reduce the size of the file I was reading but that didn't work either. I've read online that I should seek to split the shapefile into just the data I need to reduce the size - but I have no idea how to do that.
Hope you can help.
Obviously the file is too big for a single box. I think the options then are either
1) split the files into smaller ones, process one by one. See
https://gis.stackexchange.com/questions/195508/split-a-shapefile-into-smaller-files-on-linux-command-line
2) use some dbms or data warehouse to do it, they do such batching automatically.

Convert Spatial Lines to Spatial Polygons

Is there an easy way to convert a Spatial Lines into a Spatial Polygon object within R?
Reproducible Example
I have put together a reusable dataset here, which is downloaded from OpenStreetMaps through the overpass package. This extracts the locations of a few airports in South England:
devtools::install_github("hrbrmstr/overpass")
library(overpass)
library(raster)
library(sp)
# Write Query
query_airport <- '
(node["aeroway"="aerodrome"](50.8, -1.6,51.1, -1.1);
way["aeroway"="aerodrome"](50.8, -1.6,51.1, -1.1);
relation["aeroway"="aerodrome"](50.8, -1.6,51.1, -1.1);
);
out body;
>;
out skel qt;
'
# Run query
shp_airports <- overpass::overpass_query(query_airport, quiet = TRUE)
crs(shp_airports) <- CRS("+init=epsg:4326") # Add coordinates
shp_airports <- shp_airports[,1]
# Plot Results
plot(shp_airports, axes = T)
However, the data is of the class "SpatialLinesDataFrame". This really messes things up if you want to do any form of spatial joins or intersections, as it only acknowledges the edge of the region.
Potential Leads
I was exploring the use of SpatialLines2PolySet within the maptools package, but in my time exploring I produced nothing but error codes, so I didn't think there would be any worth including these within the question. There is some guidance about these functions here: https://rdrr.io/rforge/maptools/man/SpatialLines2PolySet.html
Notes
I have searched the web and SO to see find similar questions and struggled to find any questions directly referring to this. A lot seem to reference converting SpatialPoints -> SpatialLineDataFrames , but not SpatialLineDataFrames -> SpatialPolygonDataFrames. This question is similar but lacks any answers (or a reproducible dataset): Close a spatial line into a polygon using a shapefile
In addition, it seems strange that this would be difficult as it is something which can be done so easily in ArcGIS using the "Feature to Polygon" tool. This function requires no additional arguments specified and it works perfectly.
A way to solve the problem would be to use the library sf. After your query
library(sp)
library(raster)
library(sf)
sf_airports <- st_as_sf(shp_airports)
sf_airports_polygons <- st_polygonize(sf_airports)
shp_airports <- as(sf_airports_polygons, "Spatial") # If you want sp
class(shp_airports)

maps.where identifying the wrong county

When I run:
maps::map.where('county', y = 40.49069996, x = -94.40780005)
# prints "missouri,webster"
Even through these GPS co-ordinates are for a point in Worth County, Missouri, and not in Webster County:
I can't say precisely why, but if you subset the counties to Missouri first and add the fill and plot paramters, Worth is returned as result as it should be. I just followed last example in help for map.where in maps package
p <- map('county','Missouri',fill = TRUE, plot = FALSE)
map.where(p, -94.40780005, 40.49069996)
[1] "missouri, worth"
This is a bug in the maps package. It will be fixed in the forthcoming version 3.2. There is a quick way to avoid the bug, as mweber explains. You can even shorten it to
map.where(map("county",fill=TRUE,plot=FALSE), -94.40780005, 40.49069996)
The bug has probably been around for ages, and only shows for the "counties" database, for missouri countries that follow alphabetically after St Louis.
For the technically minded:
In fact, you can manually fix the installed package (or rather: stop the bug from affecting counties data) by editing maps/mapdata/counties.N
In that file, you will see that around lines 1545-1564, the line numbers do not correspond to the polygon numbers because someone moved a line. If you move that single line back, everything works again. The actual bug in the R code is that for map databases map.where() assumes the polygon number is equal to the line number in the .N file, which is almost always OK, but fails in this case.

Resources