Nearest line to points using R - r

I'm trying to do some GIS work using R. Specifically, I have a spatialpointsdataframe (called 'points') and a spatiallinesdataframe (called 'lines). I want to know the closest line to each point. I do this:
# make a new field to hold the line ID
points#data$nearest_line <- as.character('')
# Loop through data. For each point, get ID of nearest line and store it
for (i in 1:nrow(points)){
points#data[i,"nearest_line"] <-
lines[which.min(gDistance(points[i,], lines, byid= TRUE)),]#data$line_id
}
This works fine. My issue is the size of my data. I've 4.5m points, and about 100,000 lines. It's been running for about a day so far, and has only done 200,000 of the 4.5m points (despite a fairly powerful computer).
Is there something I can do to speed this up? For example if I was doing this in PostGIS I would add a spatial index, but this doesn't seem to be an option in R.
Or maybe I'm approaching this totally wrong?

Related

Unable to create a multilinestring using st_intersection that is good enough to use border_segment from the SpatialRDD package

I’ve been following the following tutorial https://axlehner.github.io/SpatialRDD/articles/spatialrdd_vignette.html#assign-treatment for SpatialRDD using my own dataset and maps. I’ve run into a problem when attempting to use border_segment. In the tutorial the author has a cut off line he uses which describes a spatial discontinuity. In my own project I am trying to create such a cut off line using historical boundaries and by taking the intersection. I’ve been using two neighboring polygons and taking the intersection of their boundaries using st_intersection to create this cut off. However, once I reach the point where I need to use border_segment it does not work as intended.
If for instance I want 2 border segments to be created using my region, points in that region, and my cut off line, it appears border_segment creates 2 segments for each line segment of my cut off line (the cut off line I created using the intersection of boundaries). Furthermore, comparing the cut off line from the tutorial (can be loaded into R using SpatialRDD::data("cut_off.sf")) to the one I create, the one in the tutorial has one big matrix describing the geometry, while the one I create is described by many 2x2 matrices. I think this is where the problem might be stemming from, but I have no idea how to fix it. I think by finding a solution it would help to understand how to create better geographical objects using just R.
I’ve created a reproducible example to illustrate the issue:
library(SpatialRDD)
library(sf)
library(tmap)
library(spData) #loading various maps
#getting worldmap
data("world")
#changing to a localised projection system
world <- st_transform(world, "EPSG:3035")
#getting just Germany
germany <- world[world$name_long == "Germany",]
#getting just Poland
poland <- world[world$name_long == "Poland",]
#taking the union of Germany and Poland to later distribute random points in them
union <- st_union(germany, poland)
#taking the intersection of their borders, which will act as a cut off line
cut_off <- st_intersection(germany, poland)
#creating random points for exercise to work (similar code to what is done in tutorial)
set.seed(1088) # set a seed
points_samp <- sf::st_sample(union, 1000)
points_samp <- sf::st_sf(points_samp) # make it an sf object
#using border_segment to sort points into 2 border regions
points_samp$segment2 <- border_segment(points_samp, cut_off, 2)
#mapping the points in their respective border segments
tm_shape(points_samp) + tm_dots("segment2", size = 0.1) + tm_shape(cut_off) + tm_lines()
As you will see when plotting the points after creating the border segments, that there will be many border segments even though I indicated for border_segment to create only two.
I’m fairly new to R and geospatial manipulation, so maybe there is a simple solution, but I’ve searched everywhere and haven’t been able to find one. This is the first time after using stack overflow for years that I’ve had to actually post. Please let me know if you need any other information or if I was not clear enough at some point!

Why is R only extracting a shapefile field values for certain points?

Apologies if this belongs in GIS stackexchange, I'm happy to be redirected there, but I have a feeling it's something in my code that is the problem.
I have a shapefile of the Terrestrial Ecoregions of the World that contains fields such as Ecoregion Name, Biome, and, Realm, that I want to extract for a range of ~50,000 lat/long points. An example of the data would be
ID
ddlat
ddlong
221784
6.133
37.700
221814
26.450
74.700
221826
-17.716
-63.633
221827
47.933
8.083
221830
-24.283
131.600
My code (sample from within a much larger script) currently looks like the following:
teow <- shapefile("~/wwf_terr_ecos.shp")
teowlltest <- read.csv(file="~/teow_csv.csv",stringsAsFactors=FALSE)
teowlltest <- na.omit(teowlltest)
teowlltestSPT <- SpatialPoints(teowlltest, proj4string = CRS(proj4string(teow)))
overteow <- over(teowlltestSPT, teow)
bindtest <- cbind(teowlltest,overteow)
My problem is that the code works ... but only for 15 of the latlong points. I'm not expecting a 100% success rate, as I know the shapefile isn't perfect in coastline areas etc, but when mapped, far more than 15 points lie within the TEOW shapefile. The 15 that are being extracted are also incorrect when compared to maps of the biomes/realms etc. How do I get it to extract the field data for all the points and do it correctly? I've tried renaming/reorganising the csv (for example Y,X or Latitude, Longitude instead of ddlat, ddlong, and placing longitude as the first field), and subsetting the data, but only the same 15 keep extracting. I can't separate the Lat/Long information from the ID either, so I'm hoping that's not the problem here, but I can't see why it would be, if some of the points extract fine. Any help will be much appreciated, and I'm happy to answer any questions/supply more information if needed. Thanks!
Example output currently:
Showing that some have extracted (but not correctly) and all the others have just refused. Duplicate lat/longs removed for brevity.

What is the R script for the "isectpolyrst" from GME?

I'm trying to summarize raster cell values in overlapping polygons in ArcMap. This can be done in Geospatial Modelling Environment (GME), an extension for ArcMap. They have a command called isectpolyrst that calculates for values in overlapping polygons. My problem is, my version of ArcGIS (10.6.2), doesn't support the use of GME, so I can't use this function. I've heard that isectpolyrst can still accomplished in R Studio using R script, but I haven't found it anywhere.
I have a number of GPS points with 10 km buffers around them (these buffers overlap a lot). I'm trying to calculate proportions of different vegetation types within these buffer zones. I'm using ArcMap 10.6.2., and zonal statistics can't calculate for overlapping polygons.
You can use raster::extract for that. As you seem entirely new to R, you will need to study it a bit first. You can start here: https://rspatial.org/

How could I keep the data frame when working with shape files in R?

I'm working with two shapes and I want to get the first clean overlap with the second. To do this, I make the difference with rgeos :: gDifference () and it works correctly. The first shape is a SpatialPolygonsDataFrame and the second is a SpatialPolygons. As a result I need a SpatialPolygonsDataFrame, but I get a SpatialPolygon.
I have not yet found a way to correct this. Does anyone know a solution?
My script is:
ID <- as.character(shape_1#data$ID)
shape.gDiff <- gDifference(shape_1, shape_2, byid=T, id=ID)

Sort Extracted Data Based On Image Region

I have analysed tree core images through the raster package in an attempt to perform image analysis. In the image:
http://dx.doi.org/10.6084/m9.figshare.1555854
You can see the measured "vessels" (black and numbered) and also annual lines (red) which have been drawn using the locator function and represent each year of growth of the tree core.
By generating a list of the maximum y coordinates of each annual line I have been able to sort the vessels into years for this image. Which is what I am looking for. However, it has occurred to me that in reality things can get a little more difficult as seen in the next image:
http://figshare.com/articles/Complicated/1555855
The approach above will not work on this image as vessels from each year overrun so using the maximum y coordinates will not return the correct result.
So can anyone suggest another approach which may overcome this limitation? I have thought about using spatialpolygons but not sure this will achieve what I am looking for.
If you are creating the lines by clicking on the plot, you can use raster function drawLine or, for polygons, drawPoly. You could rasterize the polygons and mask that with the original image to get the vessels grouped by polygon (year).

Resources