Plotting large sf dataframe - r

I need to work with a 10minutes (1/12th degree) global grid of all land areas. The grid is generated in R using package (sf). The grid is to be limited to land areas of the world. Grid ID is needed for further down-stream analysis. Code generating the grid is below:
library(tidyverse)
library(sf)
sf_use_s2(F)
birds <- st_read('BOTW_breeding_valid_union.gpkg') ## This is just an examplary shapefile I use to set a bbox.
bbox <- st_bbox(birds)
bbox[1] <- -180
bbox[2] <- -90
bbox[3] <- 180
bbox[4] <- 90
bbox <- bbox %>% st_as_sfc
grid <- st_make_grid(bbox, cellsize = 1/12) %>% st_as_sf() %>% mutate(grid_ID = row_number())
land <- st_read('ne_10m_land.shp')
land_grids <- st_intersects(grid, land) %>% as.data.frame() %>% rename(grid_ID = row.id)
grid <- grid %>% left_join(land_grids, by = "grid_ID") %>% filter(col.id == "1") %>% select(grid_ID) %>%
st_write('global_10m_grid.gpkg')
Now I need to plot it to inspect it and for further data mapping (the grids will have values). I use package tmap:
grid <- st_read('global_10m_grid.gpkg')
bitmap('test_grid.png')
tm_shape(grid) + tm_fill(col = 'red')
dev.off()
However, I am struggling due to the size either on a personal machine (takes incredibly long time to load [or so I hope as it hasn't loaded in principle just yet]) or on a cluster with interactive shell (dev.off produced an empty file).
Is there a way to plot this more efficiently?

Yes, rasterising was indeed the solution. Maintaining the resolution of the vector in raster results in files just under 11 mb, perfectly openable in a normal RStudio setting on my Desktop.
For future references, the code looks like this:
g <- st_read('yourfile.gpkg')
library(stars)
g %>% left_join(df, by = 'grp') %>% select(value) %>% st_rasterize(n = 2773927) %>% write_stars('filename.tif')

Related

Two questions of R circlepack plot :

Two questions of R circlepack plot :
the result always not real circular (it always show in ellipse, I have to adjust the plot window size in Rstudio ....) . Is the any way to hadle it ?
The position of circular and sub circular will be changed when rerun the code ? How to fix it ?
library(ggraph)
library(igraph)
library(dplyr)
library(tidyverse)
md <- data.frame(category = c('FDM','FDM','FDM','LCD','LCD','LCD'),
item =c('A1','B1','C1','A','B','C'),
amount = c(1,2,3,4,3,1))
md_sum <- md %>% group_by(category) %>% summarise(amount =sum(amount)) %>% rename('item'='category')
md_v <- rbind(md[,c(2:3)],md_sum)
pt <- igraph::graph_from_data_frame(md,vertices = md_v)
ggraph(pt,layout = 'circlepack', weight =amount)+
geom_node_circle(aes(fill=depth))+
geom_node_label(aes(label = paste0(name,'\n',amount )))+theme_void()

no applicable method for 'summarise' applied to an object of class "c('double', 'numeric')"

Currently, I have a .csv file with a bunch of lat/lon points that have fire radiance (FRP) values associated with each geographic point. What I would like to do is rasterize this CSV, then overlay that with a multipolygon vector layer and extract the average for each individual polygon in that layer. Here is the code I am using the accomplish this task:
library(choroplethr)
library(choroplethrMaps)
library(ggmap)
library(exactextractr) #For `exact_extract`
library(matrixStats) #For `colWeightedMeans`
library(raster) #For `brick`
library(sf) #For `st_read`
library(stringr) #For `str_sub`
library(tidyr)
library(dplyr)
library(RColorBrewer)
library(ncdf4) # package for netcdf manipulation
library(rasterVis)
library(raster) # package for raster manipulation
library(rgdal) # package for geospatial analysis
library(ggplot2) # package for plotting
library(maptools)
library(rgeos)
library(maps)
library(data.table)
library(reshape)
library(doBy)
# TOY .csv file, in real code I load in a much larger file
viirs_2020 <- read.csv(text = "frp, lat, long,
1.43,46.73021,-92.0754,
1.27,46.72907,-92.0775,
1.8,40.72152,-84.1262,
1.23,41.63972,-87.1309,
0.82,41.61905,-87.3285,
1.61,41.64,-87.4095,
1.56,41.64461,-87.141, header=TRUE")
# setting extent + projection for raster data
viirs2020_r <- raster(xmn = -125, xmx = -65, ymn = 15, ymx = 55, res = 0.75, crs = "+proj=longlat +datum=WGS84")
# converting .csv file into a raster
viirs2020_raster <- rasterize(viirs_2020[, c('long', 'lat')], viirs2020_r, viirs_2020[, 'frp'], fun=mean)
plot(viirs2020_raster)
# Load in shapefile which is a grid over a wildfire
cp_grid <- st_read('Drop Effectiveness/data/map data/cameronpeak_grid.shp')
cp_grid <- st_transform(cp_grid, 4269)
cp_grid$id <- factor(cp_grid$id)
#Function for taking the mean of values within shapes in coverage
get_frp = function(values, coverage){
values %>% summarize(across(everything(), ~ weighted.mean(.x, coverage, na.rm=TRUE)))
}
###extract the averages
out = exact_extract(viirs2020_raster, cp_grid, fun=get_frp, stack_apply=FALSE)
I've managed to do this successfully using the exact_extract function with NetCDF files in the past on different projects. But when I attempt to perform this task with the rasterized .csv file, I get the following error on the final line of the code I pasted above:
"in UseMethod("summarise") :
no applicable method for 'summarise' applied to an object of class "c('double', 'numeric')"
Anyone have an idea what could be causing this error?
I was unable to figure out how to solve this problem by changing the custom function that I wrote. However, I was able to replicate the process by doing a simple 'exact_extract' first, then calculate the weighted mean afterward.
out = exact_extract(viirs2020_raster, cp_grid)
# Create data frame from extraction
frp_combined <- bind_rows(out, .id = "id") %>%
as_tibble()
# Create a weighted mean based on values and coverage of raster
frp_by_id <- frp_combined %>%
#--- convert from character to numeric ---#
mutate(id = as.numeric(id)) %>%
#--- group summary ---#
group_by(id) %>%
summarise(frp_aw = sum(value * coverage_fraction) / sum(coverage_fraction))
# Merge extracted values to each polygon
cameronpeak.frp <- cp_grid %>%
mutate(id := seq_len(nrow(.))) %>%
left_join(., frp_by_id, by = "id") %>%
dplyr::select(id, frp_aw)
I think the he problem comes from the lack of {{ }} to wrap the variable name coverage.
Try the following
#Function for taking the mean of values within shapes in coverage
get_frp = function(values, coverage){
values %>% summarize(across(everything()), ~ weighted.mean(.x, {{coverage}}, na.rm=TRUE))
}
More info on this : https://rlang.r-lib.org/reference/topic-metaprogramming.html

How to construct/plot convex hulls of polygons from points by factor using sf?

I've got a dataset of species occurrences which I'm trying to convert into areas of occurrence by making convex hulls. I'm able to do this manually (ie. one species at a time) but I'd really love to be able to just have it handled automatically by the species name.
A pared-down example dataset can be found here: https://pastebin.com/dWxEvyUB
Here's how I'm currently doing it manually:
library(tidyverse)
library(sf)
library(rgeos)
library(maps)
library(mapview)
library(mapdata)
library(ggplot2)
fd <- read_csv("occurrence.csv")
spA.dist <- fd %>%
filter(species == "sp.A") %>%
dplyr::select(lon,lat) %>%
as.matrix() %>%
coords2Polygons(ID="distribution") %>%
gConvexHull() %>%
gBuffer()
spB.dist <- fd %>%
filter(species == "sp.B") %>%
dplyr::select(lon,lat) %>%
as.matrix() %>%
coords2Polygons(ID="distribution") %>%
gConvexHull() %>%
gBuffer()
wrld2 = st_as_sf(map('world2', plot=F, fill=T))
ggplot() +
geom_sf(data=wrld2, fill='gray20',color="lightgrey",size=0.07) +
geom_polygon(aes(x=long,y=lat,group=group),color="red",data=spA.dist,fill=NA) +
geom_polygon(aes(x=long,y=lat,group=group),color="blue",data=spB.dist,fill=NA) +
coord_sf(xlim=c(100,300), ylim=c(-60,60))
That displays a map with the two species occurrence areas based on the convex hull of their observations. I realize I'm mixing different spatial libraries here so it would be nice to do it all in sf if possible. In my real data I have more than two species and I can copy and paste the code I've got for each one but it seems like it should be possible to simplify this so the polygons (and subsequent convex hulls) are constructed by factor level automatically. Something more like this:
polys <- st_as_sf(fd) %>%
group_by(species) %>%
magically_make_polygons(lon,lat) %>%
st_convex_hull() %>%
st_buffer()
I've been searching for days as well as digging through reams of documentation. A lot of this spatial stuff is non-intuitive to me so I expect there's a lot of basic understanding I'm missing. Can this be done?
Here is a possible solution using the tidyverse (in fact only dplyr) and the sf-package (and the mapview package for some quick viewing).
You were very close with your own solution (kudo's). The trick is to summarise the grouped data, and then create the hulls..
library( tidyverse )
library( sf )
#create simple feature
df.sf <- df %>%
st_as_sf( coords = c( "lon", "lat" ), crs = 4326 )
#what are we working with?
# perform fast visual check using mapview-package
mapview::mapview( df.sf )
#group and summarise by species, and draw hulls
hulls <- df.sf %>%
group_by( species ) %>%
summarise( geometry = st_combine( geometry ) ) %>%
st_convex_hull()
#result
mapview::mapview( list( df.sf, hulls ) )

Hot to get clean polygons in R after st_combine/st_union with sf package?

I have a tidy dataset for census sectors in sf format (setores_sp_ok.rda), which has polygons for two different territorial models, indicated by variable modelo. I want to aggregate census sectors by modelo and cnes, to create another dataset with new boundaries.
I can do this using group_by() + summarise() technique, which automatically uses st_union() to aggregate polygons. But the result is poor, with many internal boundaries.
# load packages
library(dplyr)
library(ggplot2)
library(sf)
library(lwgeom)
# import data
load(url("https://github.com/bruno-pinheiro/app_acesso_saude/raw/master/data/setores_sp_ok.rda"))
# combine polygons
ubs_malhas <- setores_sp %>%
st_make_valid() %>%
group_by(cnes, modelo) %>%
summarise(area = sum(area)) %>%
ungroup()
# plot
ggplot(ubs_malhas[ubs_malhas$modelo == "vigente", ]) +
geom_sf(lwd = .2)
I know that is possible to realize this kind of operation combining st_combine, st_union and st_intersect, but I'm not realizing how to make it.
How to combine polygons by modelo and cnes and get clean aggregated polygons, without internal boundaries?
Someone has any tip?
Many thanks!
Your data may be tidy in a tidyverse sense, but the geometries certainly aren't. The borders between the "vigente" modelo units don't quite line up in many cases, hence you get these little "leftovers" caused by gaps between units. I would snap those to a grid of, say 1cm, and then call st_union.
# load packages
library(dplyr)
library(ggplot2)
library(sf)
library(lwgeom)
# import data
load(url("https://github.com/bruno-pinheiro/app_acesso_saude/raw/master/data/setores_sp_ok.rda"))
# combine polygons
ubs_malhas <- setores_sp %>%
st_snap_to_grid(size = 0.01) %>%
st_make_valid() %>%
group_by(cnes, modelo) %>%
summarise(area = sum(area)) %>%
ungroup()
# plot
ggplot(ubs_malhas[ubs_malhas$modelo == "vigente", ]) +
geom_sf(lwd = .2)
In case you still have unwanted polygons left, you may have to increase the grid size or delete those manually, e.g. in QGIS or thelike.
The function
nngeo::st_remove_holes(your_sf_object)
Solve your problem.

Leaflet package in R not plotting all coordinates

I'm trying to plot using leaflet with a somewhat sizable set of coordinates (~34K latitude/longitude pairs) however, using the code below it seems that leaflet is only plotting a small portion of these:
data <- read.csv("Food_inspections.csv", header = TRUE)
names(data) <- tolower(names(data))
data1 <- filter(data, risk == c("Risk 1 (High)","Risk 2 (Medium)","Risk 3 (Low)"))
data1$risk <- droplevels(data1$risk)
leaflet(data1) %>%
addTiles() %>%
addMarkers(lat = ~latitude, lng = ~longitude)
What I get back is a map like this:
This clearly does not contain all ~34K coordinates. Even if I use "addCircles" I get the same thing. Other mapping packages (RgoogleMaps for instance) seem to plot everything correctly. Does leaflet round the coordinates it takes as input before plotting because I could see that making several of the coordinates appear to overlap in the plot.
The points are there, you have to zoom in to see them. But... at least in my browser... anything more than 80 points or so, and it takes super long to zoom.
url <- "http://data.cityofchicago.org/api/views/4ijn-s7e5/rows.csv?accessType=DOWNLOAD"
data <- read.csv(url, header = TRUE) # takes a minute...
names(data) <- tolower(names(data))
data1 <- subset(data, risk %in% c("Risk 1 (High)","Risk 2 (Medium)","Risk 3 (Low)"))
data1$risk <- droplevels(data1$risk)
data1 <- data1[1:50,]
library(leaflet)
leaflet(data1) %>%
addTiles() %>%
addMarkers(lat = ~latitude, lng = ~longitude)

Resources