Dividing Individual Spatial Polygons Equally in R - r

I have a shapefile of polygons that are the townships in the state of Iowa.I'd like to divide each element (ie each township) into 9 equal parts (i.e. a 3 x 3 grid for each township). I've figured out how to do this, but am having trouble forming a new dataframe out of the new polygons. My code is below. The data can be downloaded here: https://ufile.io/wi6tt
library(sf)
library(tidyverse)
setwd("~/Desktop")
iowa<-st_read( dsn="Townships/iowa", layer="PLSS_Township_Boundaries", stringsAsFactors = F) # import data
## Make division
r<-NULL
for (row in 1:nrow(iowa)) {
r[[row]]<-st_make_grid(iowa[row,],n=c(3,3))
}
# Combine together
region<-NULL
for (row in 1:nrow(iowa)) {
region<-rbind(region,r[[row]])
}
region<-st_sfc(region,crs=4326) #convert to sfc
reg_id<-data.frame(reg_id=1:length(region)) #make ID for dataframe
# Make SF
region_df<-st_sf(reg_id,region)
The last line gives the following error:
Error in `[[<-.data.frame`(`*tmp*`, all_sfc_names[i], value = list(list( : replacement has 1644 rows, data has 14796
1664 is the number of rows in the initial Iowa dataframe.
Clearly the number of rows does not match the number of elements.
This might be a general r thing, rather than a spatial one, but I figured I'd post the whole thing in case someone had an idea on how to do the entirety of this a little cleaner

Related

aggregate statistics on each cell of grid and plot heatmap in R

Below is the cycle hire data of london. Each point represent one cycle hire point.
I have created a grid using st_make_grid(). Now I wish to -
plot heatmap of number of cycle hire point in each cell of grid
plot heatmap of total nbikes in each cell of grid
(nbikes - The number of bikes currently parked)
library(spData)
library(sf)
# cycle hire data of london
# Each observaion represent a cycle hire point in London.
hire_sf <- spData::cycle_hire
head(hire_sf)
# create grid
grid_area <- st_make_grid(hire_sf)
# 1. plot heatmap of number of cycle hire point in each cell
# 2. plot heatmap of total nbikes in each cell
# (nbikes - The number of bikes currently parked)
This is indeed a duplicate; but we may as well offer a possible solution. So consider the following code:
It is built on sf::st_join() which spatially joins two sf objects (in this case grid and points) while preserving the data attributes.
Note that the join is by default left (in SQL speak) so all rows (grid cells) are maintained in the first object. There will NAs for cells with no hires, and duplicate rows for multiple points (so be sure to assign each cell a unique ID in advance, to make aggregation easier).
The type of the first object in join drives the resulting geometry type, so be sure to start with grid if you want to end up with polygon type result / starting with points you would get point result.
Once the points are assigned to grid cells it is an exercise in aggregation - I suggest via {dplyr} techniques, but base R would do as well.
For the final heatmap you will likely want ggplot for polished results, but base plot will do for a proof of concept.
library(spData)
library(sf)
library(dplyr)
# cycle hire data of london
# Each observaion represent a cycle hire point in London.
hire_sf <- spData::cycle_hire
head(hire_sf)
# create grid
grid_area <- st_make_grid(hire_sf) %>%
st_as_sf() %>%
mutate(grid_id = 1:n())
# join data to grid; make sure to join points to grid
# the first object drives the output geometry
result <- grid_area %>%
st_join(hire_sf) %>%
group_by(grid_id) %>%
summarise(point_count = n(),
total_bikes = sum(nbikes))
# draw heatmap
plot(result["point_count"])

How do I remove a subset of polygons from a Large SpatialPolygonsDataFrame using a string search, in R?

I have a spatial file in R, that contains all the area units for New Zealand. I have downloaded it in NZGD2000 format. In this file I have irrelevant geographic details, such as the Oceanic regions. I have managed to remove those from my data by simply removing those polygons with higher than a certain value.
library("dplyr")
library("rgdal")
library("rgeos")
NZAreas <- readOGR("[FILEPATH]/area-unit-2013.shp")
#remove the areas that are offshore
NZAreas#data$AU2013_V1_ <- as.numeric(as.character(NZAreas#data$AU2013_V1_))
NZAreas <- NZAreas[NZAreas#data$AU2013_V1_ < 614000,]
I have the problem that the area units include inlets and inland water. I can't remove those in the same way as I removed the coastal units, as the area unit values are not continguous. The #data$$AU2013_V_1 contains the labels for the area units. All the area units I wish to remove have the label starting with "Inlet" or "Inland Water".
I can't work out how to remove these polygons from the data.
First I tried without the dataframe name in front of the #data:
NZAreas <- NZAreas[!grepl("Inlet", #data$AU2013_V_1),]
Error: unexpected '#' in "NZAreas <- NZAreas[!grepl("Inlet", #"
and then I tried:
NZAreas <- NZAreas[!grepl("Inlet", NZAreas#data$AU2013_V_1),]
That second code runs but does not remove the polygons; it does not seem to do anything to the Large SpatialPolygonDataFrame. I checked the dataframe I constructed off NZAreas and there are Inlet and Inland Water rows. How do I remove these polygons?
This should work. It removed 49 areas containing "Inlet" in label and 15 areas having "Inland Water" in label.
> dim(NZAreas)
[1] 2004 5
> NZAreas=NZAreas[!grepl("Inlet", NZAreas$AU2013_V_1),]
> dim(NZAreas)
[1] 1955 5
> NZAreas=NZAreas[!grepl("Inland Water", NZAreas$AU2013_V_1),]
> dim(NZAreas)
[1] 1940 5
>

Distance between points in GPX file becomes too large

I want to analyze distance traveled based on GPS tracks But when i calculate the distance it always comes out as too large.
I use python to make a csv file with the latitude and longitude for all points in a track which i then analyze with R. The data frame looks like this:
| lat| lon| lat.p1| lon.p1| dist_to_prev|
|--------:|--------:|--------:|--------:|------------:|
| 60.62061| 15.66640| 60.62045| 15.66660| 28.103099|
| 60.62045| 15.66660| 60.62037| 15.66662| 8.859034|
| 60.62037| 15.66662| 60.62026| 15.66636| 31.252373|
| 60.62026| 15.66636| 60.62018| 15.66636| 8.574722|
| 60.62018| 15.66636| 60.62010| 15.66650| 17.787905|
| 60.62001| 15.66672| 60.61996| 15.66684| 14.393267|
| 60.61996| 15.66684| 60.61989| 15.66685| 7.584996|
...
I could post the whole data frame here for reproducability, it's only 59 rows, but i'm not sure of the etiquette for posting big chunks of data here? Let me know how i can best share it.
lat.next and lon.next is just the lat and lon from the row below. dist_to_prev is calculated with distm() from geosphere:
library(geosphere)
library(dplyr)
df$dist_to_prev <- apply(df, 1 , FUN = function (row) {
distm(c(as.numeric(row["lat"]), as.numeric(row["lon"])),
c(as.numeric(row["lat.p1"]), as.numeric(row["lon.p1"])),
fun = distHaversine)})
df %>% filter(dist_to_prev != "NA") %>% summarise(sum(dist_to_prev))
# A tibble: 1 x 1
`sum(dist_to_prev)`
<dbl>
1 1266.
I took this track as an example from Trailforks and if you look at their track description it should be 787m, not 1266m as i got. This is not unique to this track but to all tracks i've looked at. When i do it they all come out 30-50% too long.
One thing that might be the cause is that there is only 5 decimal-places for the lats/lons. There is 6 decimal-places in the csv but i can only see 5 when i open it in Rstudio. I was thinking it was just formatting to make it easier to read and that the "whole" number was there but maybe not? The lat/lons are of type: double.
Why are my distances much larger than the ones displayed on the website i got the gpx-file from?
There are couple of problems in the code above. The function distHaversine is a vectorized function thus you can avoid the loop / apply statement. This will significantly improve the performance.
Most important is with the geosphere package the first coordinate is longitude and not latitude.
df<- read.table(header =TRUE, text=" lat lon lat.p1 lon.p1
60.62061 15.66640 60.62045 15.66660
60.62045 15.66660 60.62037 15.66662
60.62037 15.66662 60.62026 15.66636
60.62026 15.66636 60.62018 15.66636
60.62018 15.66636 60.62010 15.66650
60.62001 15.66672 60.61996 15.66684
60.61996 15.66684 60.61989 15.66685")
library(geosphere)
#Lat is first column (incorrect)
distHaversine(df[,c("lat", "lon")], df[,c("lat.p1", "lon.p1")])
#incorrect
#[1] 28.103099 8.859034 31.252373 8.574722 17.787905 14.393267 7.584996
#Longitude is first (correct)
distHaversine(df[,c("lon", "lat")], df[,c("lon.p1", "lat.p1")])
#correct result.
#[1] 20.893456 8.972291 18.750046 8.905559 11.737448 8.598240 7.811479

R - Efficiently create dataframe from large raster excluding NA values

apologies for cross-posting something similar in the GIS stack.
I am looking for a more efficient way to create a frequency table based on a large raster in R.
Currently, I have a few dozen rasters, ~ 150 million cells in each, and I need to create frequencies tables for each. These rasters are derived from masking a base raster with a few hundred small sampling locations*. Therefore the rasters I am creating the tables from contain ~99% NA values.
My current working approach is this:
sampling_site_raster <- raster("FILE")
base_raster <- raster("FILE")
sample_raster <- mask(base_raster, sampling_site_raster)
DF <- as.data.frame(freq(sample_raster, useNA='no', progress='text'))
### run time for the freq() process ###
user system elapsed
162.60 4.85 168.40
this uses the freq() function from the raster package of R. The usaNA=no flag will dump the NA values.
My questions are:
1) is there a more efficient way to create a frequency table from a large raster that is 99% NA values?
or
2) is the a more efficient way to derive the values from the base raster than by using mask()? (using the Mask GP function in ArcGIS is very fast, but still has the NA values and is an extra step
*additional info: The sample areas represented by sampling_site_raster are irregular shapes of various sizes spread randomly across the study area. In the sampling_site_raster the sampling sites are encoded as 1 and non-sampling areas as NA.
Thank you!
If you mask the raster by raster, you will always get another huge raster. I don't think this is a way to make things faster.
What I would do is to try to mask by polygon layer using extract:
res <- extract(raster, polygons)
Then you will have all the cell values for each polygon and can run freq on them.

Counting polygons in shapefile

So I have multiple species ranges which look like the following (colored blue for example) This one runs east to west across Africa:
I can get the total area by using gArea in the rgeos package. What I want to know is how many individual polygons make up this file - i.e. how many distinct regions are there to the total range (this could be islands, or just separated populations) and what the ranges of those polygons are. I have been using the following code:
#Load example shapefile
shp <- readShapeSpatial("species1.shp")
#How many polygon slots are there?
length(shp#polygons)
>2
#How many polygons are in each slot
length(shp#polygons[[1]]#Polygons
length(shp#polygons[[2]]#Polygons
and to get the area of a particular one:
shp#polygons[[1]]#Polygons[[1]]#area
Is this correct? I'm worried that a lake in the middle of the range might constitute a polygon on its own? I want to end up with a list that is roughly like:
Species A Species B
Polygon 1 12 11
Polygon 2 13 10
Polygon 2 14 NA
If I wanted to compile a list for every species of how many polygons and their individual ranges would be pretty straightforward to pass to a loop if the above code is correct.
Thanks
This is a very un-glamorous solution, but it gets the job done at the moment.
for(i in 1:length(shpfiles)){
shp <- shpfiles[[i]]
#1) Create master list of all polygon files within a shapefile
#How many lists of polygons are there within the shpfile
num.polygon.lists <- length(shp#polygons)
#Get all polygon files
master <- vector(mode="list")
for(i in 1:num.polygon.lists){
m <- shp#polygons[[i]]#Polygons
master[[i]] <- m
}
#Combine polygon files into a single list
len <- length(master)
if(len > 1) {
root <- master[[1]]
for(i in 2:length(master)){
root <- c(master[[i]], root)}
} else {root <- master[[1]]}
#Rename
polygon.files <- root
#2) Count number of polygon files that are not holes
#Create a matrix for the total number of polygon slots that are going to be counted
res <- matrix(NA,ncol=1 , nrow=length(polygon.files))
#Loop through polygons returning whether slot "hole" is TRUE/FALSE
for(i in 1:length(polygon.files)){
r <- isTRUE(polygon.files[[i]]#hole)
res[[i,1]] <- r
}
#Count number times "FALSE" appears - means polygon is not a hole
p.count <- table(res)["FALSE"]
p.count <- as.numeric(p.count)
print(p.count)
}
It's a start
I used the following code to find out how many multipart polygons there were in each "row" of a shapefile...
sapply(shapefile#polygons, function(p) length(p#Polygons))

Resources