Converting SpatVector objects to data frames for use in ggplot2 - r

I would like to convert SpatVector objects to data frames for use in ggplot2.
pkgs <- c("geodata", "raster", "ggplot2", "tidy")
lapply(pkgs, require, character.only = TRUE)
boundary_GB <- geodata::gadm(country = "GB", path = tempdir(), resolution = 2, level = 1)
My current approach takes a long time:
boundary_GB_df <- broom::tidy(methods::as(boundary_GB, "Spatial"))
The plot:
ggplot(data = boundary_GB_df, mapping = aes(x = long, y = lat, group = group)) +
geom_polygon(fill = NA, colour = "black")
I am not experienced with SpatVector objects, is there a faster approach?
I am aware of tidyterra package (i.e., tidyterra::geom_spatvector()).
Thanks

sf objects are also data.frame and you can use a specific geom provided by ggplot2 (geom_sf()). Conversion between spatial vectors classes in R is as simple as:
# From SpatVector to sf
sf::st_as_sf(x_spatvector)
# From sf to SpatVector
terra::vect(x_sf)
# To sp, although for most uses is recommended to stick to sf
as(x_sf, "Spatial")
So if ypu only need to plot the spatial object, why not use ggplot2::geom_sf()/tidyterra::geom_spatvector()? Convert the object to data frame for plotting seems to be just going back and forth, unless you have a good reason for doing that.
See reprex:
library(geodata)
#> Loading required package: terra
#> terra 1.6.17
library(ggplot2)
boundary_GB <- geodata::gadm(country = "GB", path = tempdir(), resolution = 2, level = 1)
class(boundary_GB)
#> [1] "SpatVector"
#> attr(,"package")
#> [1] "terra"
library(sf)
#> Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 6.3.1; sf_use_s2() is TRUE
boundary_GB_sf <- st_as_sf(boundary_GB)
class(boundary_GB_sf)
#> [1] "sf" "data.frame"
# Is already a data.frame
# sf with geom_sf
ggplot(boundary_GB_sf) +
geom_sf(fill = NA, colour = "black")
# Spatvector with tidyterra
library(tidyterra)
#> Loading required package: dplyr
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:terra':
#>
#> intersect, union
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
#> Loading required package: tibble
#> Loading required package: tidyr
#>
#> Attaching package: 'tidyr'
#> The following object is masked from 'package:terra':
#>
#> extract
ggplot() +
geom_spatvector(data = boundary_GB, fill = NA, colour = "black")
Created on 2022-10-05 with reprex v2.0.2

Related

How I can Speed up code built using purrr- Why is Furrr slower than purrr

I'd like to know if there is a way to speed up a code built with purrr package. I tried to convert it into furr and use the multisession option, but it is even slower than the sequential version.
# rm(list = ls())
setwd(dirname(rstudioapi::getActiveDocumentContext()$path))
#> Error: RStudio not running
getwd()
#> [1] "C:/Users/Angela/AppData/Local/Temp/RtmpOqCRC2/reprex-44604912759f-full-husky"
#load required packages
library(mc2d)
#> Loading required package: mvtnorm
#>
#> Attaching package: 'mc2d'
#> The following objects are masked from 'package:base':
#>
#> pmax, pmin
library(gplots)
#>
#> Attaching package: 'gplots'
#> The following object is masked from 'package:stats':
#>
#> lowess
library(RColorBrewer)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(tidyverse)
library(furrr)
#> Loading required package: future
library(future) #for parallel computation
#remotes::install_github("HenrikBengtsson/parallelly", ref="develop") #to use multisession
library(parallelly)
library(tictoc)
set.seed(99)
iters<-1000
df<-data.frame(id=c(1:30),cat=c(rep("a",12),rep("b",18)),month=c(1:6,1,6,4,1,5,2,3,2,5,4,6,3:6,4:6,1:5,5),n=rpois(30,5))
df$n[df$n == "0"] <- 3
se<-rbeta(iters,96,6)
epi.a<-rpert(iters,min=1.5, mode=2, max=3)
p=0.2
p2=epi.a*p
df<-as_tibble(df)
# this defined function ensures any `n` from `df` will be itered with 10000 s and a and generated 10000 results
plan(multisession)
tic()
iter_n <- function(n) future_map2_dbl(.x = se, .y = p2, ~ 1 - (1 - .x * .y) ^ n)
list_1 <- df %>% mutate(Result = future_map(n, ~iter_n(.x))) %>% unnest(Result)%>% group_split(month)
toc()
#> 2.22 sec elapsed
plan(sequential)
#the same without parallelization
tic()
iter_n <- function(n) map2_dbl(.x = se, .y = p2, ~ 1 - (1 - .x * .y) ^ n)
list_1 <- df %>% mutate(Result = map(n, ~iter_n(.x))) %>% unnest(Result)%>% group_split(month)
toc()
#> 0.08 sec elapsed
Created on 2022-05-08 by the reprex package (v2.0.1)
I have read about an issue of using furrr with R studio (https://github.com/DavisVaughan/furrr/issues/195), so I followed the advice and downloaded the parallely package, but it doesn't change anything. Furrr is still slower than purrr (which is actually odd)
If someone is wondering what system I am using I am working with a Windows system.
Do you have any suggestion on how to speed up a code using purr or fixing the problem of furrr? This is just an example, but I have a huge database to deal with. Any suggestion is appreciated.

Defining a column as y-axis with tidygraph and ggraph? foodweb and network plots

community!
I have a question regarding tidygraph:: and ggraph:: packages.
I am trying to create a food web network plot (code and graph below), however, I wanted to use the kk layout with a defined y-axis, in my case the TL column from nodes_df.
I have created mock data to show as an example. If anybody has an idea on how to improve this plot, that would be of great help.
library(tidyverse)
#> Warning: package 'ggplot2' was built under R version 4.1.2
#> Warning: package 'tibble' was built under R version 4.1.2
#> Warning: package 'readr' was built under R version 4.1.2
#> Warning: package 'purrr' was built under R version 4.1.2
#> Warning: package 'dplyr' was built under R version 4.1.2
#> Warning: package 'stringr' was built under R version 4.1.2
library(tidygraph)
#> Warning: package 'tidygraph' was built under R version 4.1.2
#>
#> Attaching package: 'tidygraph'
#> The following object is masked from 'package:stats':
#>
#> filter
library(ggraph)
#> Warning: package 'ggraph' was built under R version 4.1.2
library(igraph)
#> Warning: package 'igraph' was built under R version 4.1.2
#>
#> Attaching package: 'igraph'
#> The following object is masked from 'package:tidygraph':
#>
#> groups
#> The following objects are masked from 'package:dplyr':
#>
#> as_data_frame, groups, union
#> The following objects are masked from 'package:purrr':
#>
#> compose, simplify
#> The following object is masked from 'package:tidyr':
#>
#> crossing
#> The following object is masked from 'package:tibble':
#>
#> as_data_frame
#> The following objects are masked from 'package:stats':
#>
#> decompose, spectrum
#> The following object is masked from 'package:base':
#>
#> union
library(ggpubr)
#> Warning: package 'ggpubr' was built under R version 4.1.2
set.seed(16)
label <- c("Transient_orca",
"Salmon_sharks",
"Resident_orca",
"Sleeper_sharks",
"Halibut",
"Pinnipeds",
"Porpoise",
"Lingcod",
"Arrowtooth_L",
"Salmon_L",
"Pacific_cod",
"Sablefish",
"Arrowtooth_S",
"Spiny_dogfish",
"Avian_raptors",
"Octopods",
"Seabirds",
"Deep_demersals",
"Pollock_L",
"Rockfish",
"Baleen_whales",
"Salmon_fry_S",
"Nshore_demersal",
"Squids",
"Eulachon",
"Sea_otters",
"Deep_epibenthos",
"Capelin",
"Herring_L",
"Pollock_S",
"Invert_eat_seaduck",
"Oystercatchers",
"Sandlance",
"Sunflower_stars",
"Pisaster_Evasterias",
"Leather_stars",
"Sea_cucumbers",
"Urchins",
"Helmet_crab",
"Herring_S",
"Jellies",
"Deep_infauna_S",
"Zoopl_near_onmiv",
"Zoop_omniv",
"Shallow_infauna_S",
"Meiofauna",
"Deep_infauna_L",
"Snail_crust_S",
"Mussels",
"Barnacles",
"Shallow_infauna_clams",
"Zoopl_near_herb",
"Zoopl_herb",
"Phyto_near",
"Phyto_off",
"Fucus",
"Subtidal_kelps",
"Macroalgae_other",
"Eelgrass",
"Nekton_falls",
"Inshore_detritus",
"Offshore_detritus",
"Salmon_seine","Salmon_gillnets",
"Herring_seine","Shrimp_pot_trawl","Pollock_trawl" ,"Demersal_longline" ,
"Other_commercial","Rec_anglers","Rec_charters", "Subsistence_take")
TL <- rnorm(72, 3.155, 1)
B <- abs(rnorm(72, 0.7789, 10))
nodes_df <-data.frame(label,TL, B)
from <- rep(c("Lingcod",
"Arrowtooth_L",
"Salmon_L",
"Pacific_cod",
"Sablefish",
"Arrowtooth_S",
"Spiny_dogfish",
"Avian_raptors",
"Octopods",
"Seabirds",
"Deep_demersals",
"Pollock_L",
"Rockfish",
"Baleen_whales",
"Salmon_fry_S",
"Nshore_demersal",
"Squids",
"Eulachon",
"Sea_otters",
"Deep_epibenthos",
"Capelin",
"Herring_L",
"Pollock_S",
"Invert_eat_seaduck",
"Oystercatchers",
"Sandlance",
"Sunflower_stars",
"Pisaster_Evasterias",
"Leather_stars",
"Sea_cucumbers",
"Urchins",
"Helmet_crab",
"Herring_S",
"Jellies",
"Deep_infauna_S",
"Zoopl_near_onmiv",
"Zoop_omniv",
"Shallow_infauna_S",
"Meiofauna",
"Deep_infauna_L",
"Snail_crust_S",
"Mussels",
"Barnacles",
"Shallow_infauna_clams",
"Zoopl_near_herb",
"Zoopl_herb",
"Phyto_near",
"Phyto_off",
"Fucus",
"Subtidal_kelps",
"Macroalgae_other",
"Eelgrass",
"Nekton_falls",
"Inshore_detritus",
"Offshore_detritus"), times= 3)
to <- sample(label, size = 165, replace = TRUE)
width <- abs(rnorm(165, 0.258370, 10))
edge_df <- as_tibble(data.frame(from, to,width))
netgraph_df <- graph_from_data_frame(d=edge_df, vertices = nodes_df, directed=T)
graph <- as_tbl_graph(netgraph_df)
# plot using ggraph
ggraph(graph, layout = 'kk') +
geom_edge_link(aes(edge_width = width, color=stat(index)), lineend="round")+
scale_edge_colour_gradient(high = "#e96d50", low = "#4b84a6") +
scale_edge_width(range = c(0.5, 5)) +
geom_edge_loop(aes(edge_width = width, color=stat(index)), lineend="round")+
geom_node_point(aes(col= nodes_df$TL, size = nodes_df$B)) +
geom_node_text(aes(label = label), size=3,color="black", repel = TRUE) +
theme_transparent()+
theme(legend.position = "none")
Created on 2022-02-03 by the reprex package (v2.0.1)
You can precalculate the layout and set the y column to some metric.
# Same prep as in question
graph <- as_tbl_graph(netgraph_df)
lay <- create_layout(graph, "kk")
lay$y <- lay$TL
# plot using ggraph
ggraph(graph = lay) +
geom_edge_link(aes(edge_width = width, color=stat(index)), lineend="round")+
scale_edge_colour_gradient(high = "#e96d50", low = "#4b84a6") +
scale_edge_width(range = c(0.5, 5)) +
geom_edge_loop(aes(edge_width = width, color=stat(index)), lineend="round")+
geom_node_point(aes(col= nodes_df$TL, size = nodes_df$B)) +
geom_node_text(aes(label = label), size=3,color="black", repel = TRUE) +
theme_transparent()+
theme(legend.position = "none")

download.file with wildcard matching in R

I'm trying to download all the files that match a pattern from a url directory in R using download.file, but I can't get it working for even a single file. The url is:
https://coast.noaa.gov/htdata/CMSP/AISDataHandler/2019/
and the pattern match is all files like: AIS_2019_*_18.zip
Here is what I've tried for a single file case:
download.file('https://coast.noaa.gov/htdata/CMSP/AISDataHandler/2019/AIS_2019_04_18.zip',
destfile = "AIS_2019_04_18.zip",
method = "wget", extra = c("-r", "-np", "-L", "--max-redirect=0"))
but I always get 'wget' call had nonzero exit status
I've also tried setting method = internal and mode = w, but get ```scheme not supported in url'
Here's a way to generate all the links that you can then loop through them with a for loop.
library(glue)
library(stringr)
library(lubridate)
#>
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#>
#> date, intersect, setdiff, union
# Setup
month_dates <- glue("2019-{str_pad(1:12, width = 2, pad = '0')}-01")
days_in_months <- days_in_month(as.Date(month_dates))
# Get appropriate number of days and months combinations
months <- rep(1:12, days_in_months)
days <- unlist(mapply(function(x) str_pad(1:x, width = 2, pad = "0"),
days_in_months))
base_url <- "https://coast.noaa.gov/htdata/CMSP/AISDataHandler/2019"
# Put everything together
all_files <- glue("{base_url}/AIS_2019_{months}_{days}.zip")
# See results
head(all_files)
#> https://coast.noaa.gov/htdata/CMSP/AISDataHandler/2019/AIS_2019_1_01.zip
#> https://coast.noaa.gov/htdata/CMSP/AISDataHandler/2019/AIS_2019_1_02.zip
#> https://coast.noaa.gov/htdata/CMSP/AISDataHandler/2019/AIS_2019_1_03.zip
#> https://coast.noaa.gov/htdata/CMSP/AISDataHandler/2019/AIS_2019_1_04.zip
#> https://coast.noaa.gov/htdata/CMSP/AISDataHandler/2019/AIS_2019_1_05.zip
#> https://coast.noaa.gov/htdata/CMSP/AISDataHandler/2019/AIS_2019_1_06.zip
# Check number of the days in a year is correct
length(all_files)
#> [1] 365
Created on 2021-08-04 by the reprex package (v2.0.0)
Once you have those created, you can do something like:
# Untested
for (file in all_files) {
download.file(file,
destfile = basename(file),
extra = c("-r", "-np", "-L", "--max-redirect=0"))
}

R: Unusual error plotting multipolygons with ggplot, geom_sf, and openstreetmap

I am trying to plot Lake Geneva/Lac Leman using {ggplot2}, {osmdata} and {sf}. I am getting an error whenever I try to plot the lake multipolygons (of which there are 5). I can't find any mention of this error elsewhere.
library(osmdata)
#> Data (c) OpenStreetMap contributors, ODbL 1.0. http://www.openstreetmap.org/copyright
library(tidyverse)
lake_gva <-
getbb("Geneva") %>%
opq()%>%
add_osm_feature(key = "natural", value = "water") %>%
osmdata_sf()
lake_gva
#> Object of class 'osmdata' with:
#> $bbox : 46.1777724,6.1102411,46.231885,6.1758527
#> $overpass_call : The call submitted to the overpass API
#> $meta : metadata including timestamp and version numbers
#> $osm_points : 'sf' Simple Features Collection with 22394 points
#> $osm_lines : 'sf' Simple Features Collection with 320 linestrings
#> $osm_polygons : 'sf' Simple Features Collection with 120 polygons
#> $osm_multilines : NULL
#> $osm_multipolygons : 'sf' Simple Features Collection with 5 multipolygons
ggplot() +
geom_sf(
data = lake_gva$osm_multipolygons
)
#> Error in do.call(rbind, x): variable names are limited to 10000 bytes
Here is my session info:
sessionInfo()
#> R version 3.6.0 (2019-04-26)
#> Platform: x86_64-apple-darwin15.6.0 (64-bit)
#> Running under: macOS 10.15.1
#>
#> Matrix products: default
#> BLAS: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
#>
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> loaded via a namespace (and not attached):
#> [1] compiler_3.6.0 magrittr_1.5 tools_3.6.0 htmltools_0.4.0
#> [5] yaml_2.2.0 Rcpp_1.0.3 stringi_1.4.3 rmarkdown_1.18
#> [9] highr_0.8 knitr_1.26 stringr_1.4.0 xfun_0.11
#> [13] digest_0.6.23 rlang_0.4.2 evaluate_0.14
Created on 2019-12-02 by the reprex package (v0.3.0)
There is an issue in osmdata that will prevent this in future if fixed: https://github.com/ropensci/osmdata/issues/188 Essentially it's because the polygons that are part of the multipolygon object have names, and those names can be unacceptably long for ggplot2.
library(osmdata)
lake_gva <-
getbb("Geneva") %>%
opq()%>%
add_osm_feature(key = "natural", value = "water") %>%
osmdata_sf()
names(lake_gva$osm_multipolygons)
name_polys = lapply(lake_gva$osm_multipolygons$geometry[[1]], names)
sapply(name_polys, nchar)
lake_poly = sf::st_cast(lake_gva$osm_multipolygons$geometry, "POLYGON")
lake_gva$osm_multipolygons$geometry = lake_poly
names(lake_gva$osm_multipolygons$geometry[[2]][[1]]) = NULL]
lake_gva$osm_multipolygons$geometry = unname(lake_gva$osm_multipolygons$geometry)
library(ggplot2)
ggplot() +
geom_sf(
data = lake_gva$osm_multipolygons
)
names(lake_gva$osm_multipolygons$geometry[[1]][[1]][[1]]) = NULL
names(lake_gva$osm_multipolygons$geometry[[1]][[1]][[2]]) = "lake"
That helps show what's going on but doesn't quite solve your problem. See here for a solution on a related issue: https://github.com/rstudio/leaflet/issues/631
This has been remedied in most recent update to the {osmdata} package. You can install it with:
devtools::install_github("ropensci/osmdata")
As #RobinLovelace mentioned, the problem comes from openstreetmap geometries including named features within their geometry specifications, something that {sf} and {tidyverse} cannot handle. The latest version includes a helper function to convert an osmdata-sf into a true {sf} object. This is osmdata::unname_osmdata_sf(). Here is your code, with that function added to allow for use with ggplot:
library(osmdata)
#> Data (c) OpenStreetMap contributors, ODbL 1.0. https://www.openstreetmap.org/copyright
library(tidyverse)
#> Warning: package 'tibble' was built under R version 3.6.2
#> Warning: package 'purrr' was built under R version 3.6.2
lake_gva <-
getbb("Geneva") %>%
opq()%>%
add_osm_feature(key = "natural", value = "water") %>%
osmdata_sf()
lake_gva_sf <- lake_gva %>%
unname_osmdata_sf()
#> Loading required namespace: sf
lake_gva_sf$osm_multipolygons %>%
ggplot() + geom_sf()
Created on 2020-05-26 by the reprex package (v0.3.0)
In the end I did a workaround by plotting using the sp method rather than the sf method, but hope that a solution is eventually implemented in {osmdata}/{sf} to allow the geom_sf approach:
library(osmdata)
#> Data (c) OpenStreetMap contributors, ODbL 1.0. http://www.openstreetmap.org/copyright
library(tidyverse)
lake_gva <-
getbb("Geneva") %>%
opq()%>%
add_osm_feature(key = "natural", value = "water") %>%
osmdata_sp()
lake_gva$osm_multipolygons#data$id <- rownames(lake_gva$osm_multipolygons#data)
df_lake_gva <-
fortify(lake_gva$osm_multipolygons, region = "id") %>%
merge(lake_gva$osm_multipolygons#data, by = "id")
#> Warning in RGEOSUnaryPredFunc(spgeom, byid, "rgeos_isvalid"): Self-intersection
#> at or near point 6.2434241000000004 46.174487800000001
#> SpP is invalid
#> Warning in rgeos::gUnaryUnion(spgeom = SpP, id = IDs): Invalid objects found;
#> consider using set_RGEOS_CheckValidity(2L)
ggplot() +
geom_polygon(
data = df_lake_gva,
aes(x = long, y = lat, group = group)
)
Created on 2019-12-03 by the reprex package (v0.3.0)

How to make a dashed line and legend using ggplot

I want to know how to add a legend to my graph and also if its possible to make a line half solid half dashed.
I need the red line to become dashed at 28 and the green one at 20 I've been told to use geo_segment but I can't find a way to see the commands I need to input.
If anyone can help and suggest what codes should I use it would be great.
man<-dataset
ggplot(man,aes(Plot))+
geom_line(aes(y=N),color="forestgreen",lwd=0.5)+
geom_ribbon(aes(ymin=NLB,ymax=NUB),alpha=0.2,fill="green")+
geom_line(aes(y=M),color="navy",lwd=0.5)+
geom_ribbon(aes(ymin=MLB,ymax=MUB),alpha=0.2,fill="blue")+
geom_line(aes(y=S),color="brown1",lwd=0.5)+
geom_ribbon(aes(ymin=SLB,ymax=SUB),alpha=0.2,fill="red")+
xlab("Number of Samples")+
ylab("Number of Diametric-Species")
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(ggplot2)
df <- data.frame(
x = seq(1:200),
y = log(seq(1:200))
)
df <- df %>% mutate(should_dash = x >= 50)
ggplot(df, aes(x,y)) + geom_line(aes(linetype = should_dash))
Created on 2018-06-15 by the reprex package (v0.2.0).

Resources