How do I divide a very large OpenStreetMap file into smaller files in R without running out of memory? - r

I am currently looking to have map files that are no larger than the sizes of municipalities in Mexico (at largest, about 3 degrees longitude/latitude across). However, I have been running into memory issues (at the very least) when trying to do so. The file size of the OSM XML object is 1.9 GB, for reference.
library(osmar)
get.map.for.municipality<-function(province,municipality){
base.map.filename = 'OpenStreetMap/mexico-latest.osm'
#bounds.list is a list that contains the boundaries
bounds = bounds.list[[paste0(province,'*',municipality)]]
my.bbox = corner_bbox(bounds[1],bounds[2],bounds[3],bounds[4])
my.map.source = osmsource_file(base.map.filename)
my.map = get_osm(my.bbox,my.map.source)
return(my.map)
}
I am running this inside of a loop, but it can't even get past the first one. When I tried running it, my computer froze and I was only able to take a screenshot with my phone. The memory steadily inclined over the course of a few minutes, and then it shot up really quickly, and I was unable to react before my computer froze.
What is a better way of doing this? I expect to have to run this loop about 100-150 times, so any way that is more efficient in terms of memory would help. I would prefer not to download smaller files from an API service.
If necessary, I would be willing to use another programming language (preferably Python or C++), but I prefer to keep this in R.

I'd suggest not use R for that.
There are better tools for that job. Many ways to split, filter stuff from the command line or using a DBMS.
Here are some alternatives extracted from the OSM Wiki http://wiki.openstreetmap.org:
Filter your osm files using osmfilter: "osmfilter is used to filter OpenStreetMap data files for specific tags. You can define different kinds of filters to get OSM objects (i.e. nodes, ways, relations), including their dependent objects, e.g. nodes of ways, ways of relations, relations of other relations."
Clipping based on Polygons or borders using osmconvert: http://wiki.openstreetmap.org/wiki/Osmconvert#Applying_Geographical_Borders
You can write bash scripts for both osmfilter and osmconvert, but I'd recommend using a DBMS. Just import into PostGIS using osm2pgsql, and connect your R code with any Postgresql driver. This will optimize your read/write ops.

Related

How to properly load these adf files?

I am trying to use the LandScan population dataset, which comes as a series of .adf files. I thought that the largest file would be the one with the data, and this does seem to be the case, but when loaded as a raster, it doesn't seem quite right, and if I try to plot it, it is empty.
I see that the data is contained in #data#attributes, although I seem unable to access individual columns. Do I need to load multiple files together? How can I actually use this?
I tried to include images, but apparently my reputation is too low. The files are dblbnd.adf, hdr.adf, metadata.xml, prj.adf, sta.adf, vat.adf, w001001.adf, and w001001x.adf. w001001.adf is 158000 KB, while the second-largest file, w001001x.adf, is only 7000 KB.
The adf files are part of a single ESRI GRID raster. It should not matter which one you take, and you can probably also use the folder name (that is really the intention).
If the display does not work perhaps first multiply the object with 1. You can also try the terra package instead, using rast("filename")

Intersection and difference of PostGIS data using R

I am an absolute beginner in PostgreSQL and PostGIS (databases in general) but have a fairly good working experience in R. I have two multi-polygon data sets of vulnerable areas of India from two different sources - one is around 12gb and it's in .gdb format (let's call it mygdb) and the other is a shapefile around 2gb (let's call it myshp). I want to compare the two sets of vulnerability maps and generate some state-wise measures of fit using intersection (I), difference (D), and union (U) between the maps.
I would like to make use of PostGIS functionalities (via R) as neither R (crashes!) nor qgis (too slow) is efficient for this. To start with, I have uploaded both data sets in my PostGIS database. I used ogr2ogr in R to upload mygdb. But I am kind of stuck at this point. My idea is to split both polygon files by states and then apply other functions to get I, U and D. From my search, I think I can use sf functions like st_split, st_intersect, st_difference, and st_union. However, even after splitting, I would imagine that the file sizes will be still too large for r to process, so my questions are
Is my approach the best way forward?
How can I use sf::st_ functions (e.g. st_split, st_intersection) without importing the data from database into R
There are some useful answers to previous relevant questions, like this one for example. But I find it hard to put the steps together from different links and any help with a dummy example would be great. Many thanks in advance.
Maybe you could try loading it as a stars proxy. It doesn't load the file to the memory, it applies it directly to the hard drive.
https://r-spatial.github.io/stars/articles/stars2.html
Not answer for question sensu stricte, however in response to request in comment, an example of postgresql/postgis query for ST_Intersection. Based on OSM data in postgresql database imported with osm2pgsql:
WITH
highway AS (
select osm_id, way from planet_osm_line where osm_id = 332054927),
dln AS (
select osm_id, way from planet_osm_polygon where "boundary" = 'administrative'
and "admin_level" = '4' and "ref" = 'DS')
SELECT ST_Intersection(dln.way, highway.way) FROM highway, dln

Using OpenStreetMapX to create a powergrid graph network

I want to find out a few things about OpenStreetMapX which from what I understand works well with transportation-based networks. I am wondering if it's also possible to use this package along with lightgraphs.jl to create a power grid network. In my case, I have filtered some power grid data using osmosis (a piece of software that allows filtering OpenStreetMap data based on a tag)
I want to know whether it is relevant to use OpenStreetMapX for this kind of data (power grid)?
using OpenStreetMapX
# Load power data for Germany
deData = get_map_data("D:/PowerGridNetwork/data/germany/de_power_160718.osm")
# Get roadways (which I believe has the meta data for edges)
deData.roadways
I ended up with metadata for power as well as roads, which I am wondering, how it came in the first place. Since I filtered only the power data.
The next question I have is, does deData.e returns an adjacency list?. Since what I am really after is creating a MetaGraph with nodes and edges with their respective properties.
Any ideas?
Thanks in advance

arcmap network analyst iteration over multiple files using model builder

I have 10+ files that I want to add to ArcMap then do some spatial analysis in an automated fashion. The files are in csv format which are located in one folder and named in order as "TTS11_path_points_1" to "TTS11_path_points_13". The steps are as follows:
Make XY event layer
Export the XY table to a point shapefile using the feature class to feature class tool
Project the shapefiles
Snap the points to another line shapfile
Make a Route layer - network analyst
Add locations to stops using the output of step 4
Solve to get routes between points based on a RouteName field
I tried to attach a snapshot of the model builder to show the steps visually but I don't have enough points to do so.
I have two problems:
How do I iterate this procedure over the number of files that I have?
How to make sure that every time the output has a different name so it doesn't overwrite the one form the previous iteration?
Your help is much appreciated.
Once you're satisfied with the way the model works on a single input CSV, you can batch the operation 10+ times, manually adjusting the input/output files. This easily addresses your second problem, since you're controlling the output name.
You can use an iterator in your ModelBuilder model -- specifically, Iterate Files. The iterator would be the first input to the model, and has two outputs: File (which you link to other tools), and Name. The latter is a variable which you can use in other tools to control their output -- for example, you can set the final output to C:\temp\out%Name% instead of just C:\temp\output. This can be a little trickier, but once it's in place it tends to work well.
For future reference, gis.stackexchange.com is likely to get you a faster response.

R using waaay more memory than expected

I have an Rscript being called from a java program. The purpose of the script is to automatically generate a bunch of graphs in ggplot and them splat them on a pdf. It has grown somewhat large with maybe 30 graphs each of which are called from their own scripts.
The input is a tab delimited file from 5-20mb but the R session goes up to 12gb of ram usage sometimes (on a mac 10.68 btw but this will be run on all platforms).
I have read about how to look at the memory size of objects and nothing is ever over 25mb and even if it deep copies everything for every function and every filter step it shouldn't get close to this level.
I have also tried gc() to no avail. If I do gcinfo(TRUE) then gc() it tells me that it is using something like 38mb of ram. But the activity monitor goes up to 12gb and things slow down presumably due to paging on the hd.
I tried calling it via a bash script in which I did ulimit -v 800000 but no good.
What else can I do?
In the process of making assignments R will always make temporary copies, sometimes more than one or even two. Each temporary assignment will require contiguous memory for the full size of the allocated object. So the usual advice is to plan to have _at_least_ three time the amount of contiguous _memory available. This means you also need to be concerned about how many other non-R programs are competing for system resources as well as being aware of how you memory is being use by R. You should try to restart your computer, run only R, and see if you get success.
An input file of 20mb might expand quite a bit (8 bytes per double, and perhaps more per character element in your vectors) depending on what the structure of the file is. The pdf file object will also take quite a bit of space if you are plotting each point within a large file.
My experience is not the same as others who have commented. I do issue gc() before doing memory intensive operations. You should offer code and describe what you mean by "no good". Are you getting errors or observing the use of virtual memory ... or what?
I apologize for not posting a more comprehensive description with code. It was fairly long as was the input. But the responses I got here were still quite helpful. Here is how I mostly fixed my problem.
I had a variable number of columns which, with some outliers got very numerous. But I didn't need the extreme outliers, so I just excluded them and cut off those extra columns. This alone decreased the memory usage greatly. I hadn't looked at the virtual memory usage before but sometimes it was as high as 200gb lol. This brought it down to up to 2gb.
Each graph was created in its own function. So I rearranged the code such that every graph was first generated, then printed to pdf, then rm(graphname).
Futher, I had many loops in which I was creating new columns in data frames. Instead of doing this, I just created vectors not attached to data frames in these calculations. This actually had the benefit of greatly simplifying some of the code.
Then after not adding columns to the existing dataframes and instead making column vectors it reduced it to 400mb. While this is still more than I would expect it to use, it is well within my restrictions. My users are all in my company so I have some control over what computers it gets run on.

Resources