Error when using curly-curly (`{{ }}`) operator with `if` clause - r

I'm struggling to understand how to use the {{ }} operator to pass bare variable names in custom functions. I get an error when I use the operator in conjunction with an if clause.
This function works:
f <- function(.data, .vars=NULL){
require(dplyr)
df = select(.data, {{ .vars }})
print(head(df))
}
f(iris, c(Species, Sepal.Length))
#> Loading required package: dplyr
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
#> Species Sepal.Length
#> 1 setosa 5.1
#> 2 setosa 4.9
#> 3 setosa 4.7
#> 4 setosa 4.6
#> 5 setosa 5.0
#> 6 setosa 5.4
Created on 2021-12-20 by the reprex package (v2.0.1)
If I try to add an if clause, it throws an error:
f <- function(.data, .vars=NULL){
require(dplyr)
if(!is.null(.vars)) df = select(.data, {{ .vars }})
else df = .data
print(head(df))
}
f(iris, c(Species, Sepal.Length))
#> Loading required package: dplyr
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
#> Error in f(iris, c(Species, Sepal.Length)): object 'Species' not found
Created on 2021-12-20 by the reprex package (v2.0.1)
What am I missing?

I think the easiest explanation is that when .vars is not NULL, R will interpret the value (in your example: c(Species, Sepal.Length)) as a vector of variables, and look for these variables in your environment. Since you don't have any variable called Species, it throws an error.
You can fix it like this:
library(dplyr)
f <- function(.data, .vars = NULL){
vars <- enquo(.vars)
if(!rlang::quo_is_null(vars)) df = select(.data, !!vars)
else df = .data
print(head(df))
}
f(iris)
f(iris, c(Species, Sepal.Length))
Note that {{x}} is actually a shorthand for !!enquo(x).
Elaboration (update)
When you don't use if, the only place .vars is being used is inside dplyr::select(.data, {{.vars}}). In this context, the variable names in .vars are interpreted as being variables in the dataframe .data.
When you add the if statement, the .vars is evaluated as being variables in your environment. Since they don't exist in your environment you get an error.
This is called data-masking.
Here is a
nice article about it.

#jpiversen's answer and explanation are correct, but here's a simpler fix for your function. Instead of looking for the default value of NULL, just check if .vars is missing:
library(dplyr)
f <- function(.data, .vars){
if(!missing(.vars)) df = select(.data, {{ .vars }})
else df = .data
print(head(df))
}
f(iris, c(Species, Sepal.Length))
By the way, I also removed require(dplyr) from your function. It's generally a bad idea to use it in a function, because of the side effect of changing the search list. Use requireNamespace("dplyr") and prefix functions using dplyr:: if you're not sure it will be available.

Related

Converting SpatVector objects to data frames for use in ggplot2

I would like to convert SpatVector objects to data frames for use in ggplot2.
pkgs <- c("geodata", "raster", "ggplot2", "tidy")
lapply(pkgs, require, character.only = TRUE)
boundary_GB <- geodata::gadm(country = "GB", path = tempdir(), resolution = 2, level = 1)
My current approach takes a long time:
boundary_GB_df <- broom::tidy(methods::as(boundary_GB, "Spatial"))
The plot:
ggplot(data = boundary_GB_df, mapping = aes(x = long, y = lat, group = group)) +
geom_polygon(fill = NA, colour = "black")
I am not experienced with SpatVector objects, is there a faster approach?
I am aware of tidyterra package (i.e., tidyterra::geom_spatvector()).
Thanks
sf objects are also data.frame and you can use a specific geom provided by ggplot2 (geom_sf()). Conversion between spatial vectors classes in R is as simple as:
# From SpatVector to sf
sf::st_as_sf(x_spatvector)
# From sf to SpatVector
terra::vect(x_sf)
# To sp, although for most uses is recommended to stick to sf
as(x_sf, "Spatial")
So if ypu only need to plot the spatial object, why not use ggplot2::geom_sf()/tidyterra::geom_spatvector()? Convert the object to data frame for plotting seems to be just going back and forth, unless you have a good reason for doing that.
See reprex:
library(geodata)
#> Loading required package: terra
#> terra 1.6.17
library(ggplot2)
boundary_GB <- geodata::gadm(country = "GB", path = tempdir(), resolution = 2, level = 1)
class(boundary_GB)
#> [1] "SpatVector"
#> attr(,"package")
#> [1] "terra"
library(sf)
#> Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 6.3.1; sf_use_s2() is TRUE
boundary_GB_sf <- st_as_sf(boundary_GB)
class(boundary_GB_sf)
#> [1] "sf" "data.frame"
# Is already a data.frame
# sf with geom_sf
ggplot(boundary_GB_sf) +
geom_sf(fill = NA, colour = "black")
# Spatvector with tidyterra
library(tidyterra)
#> Loading required package: dplyr
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:terra':
#>
#> intersect, union
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
#> Loading required package: tibble
#> Loading required package: tidyr
#>
#> Attaching package: 'tidyr'
#> The following object is masked from 'package:terra':
#>
#> extract
ggplot() +
geom_spatvector(data = boundary_GB, fill = NA, colour = "black")
Created on 2022-10-05 with reprex v2.0.2

How I can Speed up code built using purrr- Why is Furrr slower than purrr

I'd like to know if there is a way to speed up a code built with purrr package. I tried to convert it into furr and use the multisession option, but it is even slower than the sequential version.
# rm(list = ls())
setwd(dirname(rstudioapi::getActiveDocumentContext()$path))
#> Error: RStudio not running
getwd()
#> [1] "C:/Users/Angela/AppData/Local/Temp/RtmpOqCRC2/reprex-44604912759f-full-husky"
#load required packages
library(mc2d)
#> Loading required package: mvtnorm
#>
#> Attaching package: 'mc2d'
#> The following objects are masked from 'package:base':
#>
#> pmax, pmin
library(gplots)
#>
#> Attaching package: 'gplots'
#> The following object is masked from 'package:stats':
#>
#> lowess
library(RColorBrewer)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(tidyverse)
library(furrr)
#> Loading required package: future
library(future) #for parallel computation
#remotes::install_github("HenrikBengtsson/parallelly", ref="develop") #to use multisession
library(parallelly)
library(tictoc)
set.seed(99)
iters<-1000
df<-data.frame(id=c(1:30),cat=c(rep("a",12),rep("b",18)),month=c(1:6,1,6,4,1,5,2,3,2,5,4,6,3:6,4:6,1:5,5),n=rpois(30,5))
df$n[df$n == "0"] <- 3
se<-rbeta(iters,96,6)
epi.a<-rpert(iters,min=1.5, mode=2, max=3)
p=0.2
p2=epi.a*p
df<-as_tibble(df)
# this defined function ensures any `n` from `df` will be itered with 10000 s and a and generated 10000 results
plan(multisession)
tic()
iter_n <- function(n) future_map2_dbl(.x = se, .y = p2, ~ 1 - (1 - .x * .y) ^ n)
list_1 <- df %>% mutate(Result = future_map(n, ~iter_n(.x))) %>% unnest(Result)%>% group_split(month)
toc()
#> 2.22 sec elapsed
plan(sequential)
#the same without parallelization
tic()
iter_n <- function(n) map2_dbl(.x = se, .y = p2, ~ 1 - (1 - .x * .y) ^ n)
list_1 <- df %>% mutate(Result = map(n, ~iter_n(.x))) %>% unnest(Result)%>% group_split(month)
toc()
#> 0.08 sec elapsed
Created on 2022-05-08 by the reprex package (v2.0.1)
I have read about an issue of using furrr with R studio (https://github.com/DavisVaughan/furrr/issues/195), so I followed the advice and downloaded the parallely package, but it doesn't change anything. Furrr is still slower than purrr (which is actually odd)
If someone is wondering what system I am using I am working with a Windows system.
Do you have any suggestion on how to speed up a code using purr or fixing the problem of furrr? This is just an example, but I have a huge database to deal with. Any suggestion is appreciated.

When performing multisession work, future_lapply says that a package doesn't exist, but it works fine when running plan(sequential)

When I try to use future_apply with plan(multisession), it says that the package I'm trying to use doesn't exist. When I use plan(sequential) it works fine. I also get the same error when using plan(callr).
Here's the error:
Error in loadNamespace(name): there is no package called 'fuzzyjoin'
Can anyone help me figure out a solution or what's going wrong here?
I'm not sure if this is related to the future.apply package or future or globals packages as I know that they are also involved here.
Here's my code showing the issue:
library(fuzzyjoin)
library(future.apply)
#> Loading required package: future
library(dplyr)
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(tidyr)
iris_mod<- iris %>%
mutate(examplefield= Sepal.Width + Petal.Length,
Species = as.character(Species))
iristype <- iris_mod$Species %>% unique()
plan(sequential)
test_sequential <- future_lapply(iristype,
FUN = function(x) {
fuzzyjoin::fuzzy_left_join(
iris_mod %>% filter(Species %in% x),
iris_mod,
by = c("Species"="Species",
"examplefield"="Sepal.Length"),
match_fun = list(`==`, `<`)
)},
future.chunk.size= 2
)
plan(multisession)
test_multisession <- future_lapply(iristype,
FUN = function(x) {
fuzzyjoin::fuzzy_left_join(
iris_mod %>% filter(Species %in% x),
iris_mod,
by = c("Species"="Species",
"examplefield"="Sepal.Length"),
match_fun = list(`==`, `<`)
)},
future.chunk.size=2
)
#> Error in loadNamespace(name): there is no package called 'fuzzyjoin'
Created on 2022-01-28 by the reprex package (v2.0.1)
I'm running R v4.0.3 if that's relevant.
I ran the following code and found that the library paths weren't being passed correctly for some reason. My dirty fix was just to make sure the packages were installed on the libPath where future was looking.
install.packages("fuzzyjoin", lib= "C:/Program Files/R/R-4.0.3/library" )
Here's the code that I ran to discover my normal session and future_lapply/future session were using different library paths:
.libPaths()
# [1] "\\\\networkfileservername/Userdata/myusername/Home/R/win-library/4.0" "C:/Program Files/R/R-4.0.3/library"
f_libs%<-% .libPaths()
print(f_libs)
# [1] "C:/Program Files/R/R-4.0.3/library"

Error despite purrr's 'otherwise' - Why is purrr/possibly's 'otherwise' not triggered?

I am scraping content from websites. For this I iterate over links. If an error occurs, purrr's possibly adverb should keep the process going, and place a "missing" (or "NA_character") as a result.
The code below works as intended when the site linked to is not existing, i.e. the output is "missing";
However, if the site linked to exists, but the element which I am trying to extract from the site does not exist, the function throws an error despite having defined a value for 'otherwise'.
To me this is surprising, since the documentation states that
' possibly : wrapped function uses a default value ( otherwise ) whenever an error occurs.'
Any idea why this is happening? I understand that i could modify the function accordingly (e.g. check for the length of the returned object). But I do not understand why the 'otherwise' value was not used.
library(tidyverse)
#> Warning: package 'tibble' was built under R version 4.0.4
#> Warning: package 'tidyr' was built under R version 4.0.4
#> Warning: package 'dplyr' was built under R version 4.0.4
library(rvest)
#> Warning: package 'rvest' was built under R version 4.0.4
#>
#> Attaching package: 'rvest'
#> The following object is masked from 'package:readr':
#>
#> guess_encoding
# possibly with wrong links when scraping site ----------------------------
#see https://github.com/tidyverse/purrr/issues/409
sample_data <- tibble::tibble(
link = c(
#link ok, selected item exists
"https://www.parlament.gv.at/PAKT/VHG/XXVII/NRSITZ/NRSITZ_00068/index.shtml#tab-Sten.Protokoll",
#link not ok
"https://www.wrong-url.foobar",
#link ok, selected item does not exist on site
"https://www.parlament.gv.at/PAKT/VHG/XXVII/NRSITZ/NRSITZ_00094/index.shtml#tab-Sten.Protokoll"
)
)
fn_get_link_to_records <- function(link_to_overview_sessions) {
print(link_to_overview_sessions)
link_to_overview_sessions %>%
rvest::read_html() %>%
rvest::html_elements("a") %>%
rvest::html_attr("href") %>%
enframe(name = NULL,
value = "link_to_text") %>%
filter(str_detect(link_to_text, regex("\\/NRSITZ_\\d+\\/fnameorig_\\d+\\.html$"))) %>%
mutate(link_to_text=glue::glue("https://www.parlament.gv.at/{link_to_text}")) %>%
pull()
}
sample_data %>%
mutate(link_to_text=map_chr(link,
possibly(fn_get_link_to_records,
otherwise=NA_character_)))
#> [1] "https://www.parlament.gv.at/PAKT/VHG/XXVII/NRSITZ/NRSITZ_00068/index.shtml#tab-Sten.Protokoll"
#> [1] "https://www.wrong-url.foobar"
#> [1] "https://www.parlament.gv.at/PAKT/VHG/XXVII/NRSITZ/NRSITZ_00094/index.shtml#tab-Sten.Protokoll"
#> Error: Problem with `mutate()` input `link_to_text`.
#> x Result 3 must be a single string, not a vector of class `glue/character` and of length 0
#> i Input `link_to_text` is `map_chr(link, possibly(fn_get_link_to_records, otherwise = NA_character_))`.
sample_data %>%
mutate(link_to_text=map_chr(link,
possibly(fn_get_link_to_records,
otherwise="missing")))
#> [1] "https://www.parlament.gv.at/PAKT/VHG/XXVII/NRSITZ/NRSITZ_00068/index.shtml#tab-Sten.Protokoll"
#> [1] "https://www.wrong-url.foobar"
#> [1] "https://www.parlament.gv.at/PAKT/VHG/XXVII/NRSITZ/NRSITZ_00094/index.shtml#tab-Sten.Protokoll"
#> Error: Problem with `mutate()` input `link_to_text`.
#> x Result 3 must be a single string, not a vector of class `glue/character` and of length 0
#> i Input `link_to_text` is `map_chr(link, possibly(fn_get_link_to_records, otherwise = "missing"))`.
Created on 2021-03-28 by the reprex package (v1.0.0)
UPDATE: I added the output below to make the unexpected result (last chunk) clearer.
sample_data[1:2,] %>%
mutate(link_to_text=map_chr(link,
possibly(fn_get_link_to_records,
otherwise="missing")))
#> [1] "https://www.parlament.gv.at/PAKT/VHG/XXVII/NRSITZ/NRSITZ_00068/index.shtml#tab-Sten.Protokoll"
#> [1] "https://www.wrong-url.foobar"
#> # A tibble: 2 x 2
#> link link_to_text
#> <chr> <chr>
#> 1 https://www.parlament.gv.at/PAKT/VHG~ https://www.parlament.gv.at//PAKT/VHG/X~
#> 2 https://www.wrong-url.foobar missing
sample_data[3, ] %>%
mutate(link_to_text=map_chr(link,
possibly(fn_get_link_to_records,
otherwise="missing")))
#> [1] "https://www.parlament.gv.at/PAKT/VHG/XXVII/NRSITZ/NRSITZ_00094/index.shtml#tab-Sten.Protokoll"
#> Error: Problem with `mutate()` input `link_to_text`.
#> x Result 1 must be a single string, not a vector of class `glue/character` and of length 0
#> i Input `link_to_text` is `map_chr(link, possibly(fn_get_link_to_records, otherwise = "missing"))`.
Created on 2021-03-29 by the reprex package (v1.0.0)
The error is coming from map_chr but you have possibly wrapped around fn_get_link_to_records function. If you run fn_get_link_to_records(sample_data$link[3]) you'll see the URL get's printed and nothing is returned and no error is generated. However, map_chr cannot change this empty output to character value hence you get the error. Instead of map_chr if you use map you'll see it works.
sample_data[3,] %>%
mutate(link_to_text= map(link, fn_get_link_to_records))
#[1] #"https://www.parlament.gv.at/PAKT/VHG/XXVII/NRSITZ/NRSITZ_00094/index.shtml#tab-Sten.Protokoll"
# A tibble: 1 x 2
# link link_to_text
# <chr> <list>
#1 https://www.parlament.gv.at/PAKT/VHG/XXVII/NRSITZ/NRSITZ_00094/index.shtml#tab-Sten.Pro… <glue [0]>
but link_to_text is empty. The solution as you already know is check length of output value and return NA or generate an error inside fn_get_link_to_records functions for such cases which will be handled using possibly.

How to make a dashed line and legend using ggplot

I want to know how to add a legend to my graph and also if its possible to make a line half solid half dashed.
I need the red line to become dashed at 28 and the green one at 20 I've been told to use geo_segment but I can't find a way to see the commands I need to input.
If anyone can help and suggest what codes should I use it would be great.
man<-dataset
ggplot(man,aes(Plot))+
geom_line(aes(y=N),color="forestgreen",lwd=0.5)+
geom_ribbon(aes(ymin=NLB,ymax=NUB),alpha=0.2,fill="green")+
geom_line(aes(y=M),color="navy",lwd=0.5)+
geom_ribbon(aes(ymin=MLB,ymax=MUB),alpha=0.2,fill="blue")+
geom_line(aes(y=S),color="brown1",lwd=0.5)+
geom_ribbon(aes(ymin=SLB,ymax=SUB),alpha=0.2,fill="red")+
xlab("Number of Samples")+
ylab("Number of Diametric-Species")
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(ggplot2)
df <- data.frame(
x = seq(1:200),
y = log(seq(1:200))
)
df <- df %>% mutate(should_dash = x >= 50)
ggplot(df, aes(x,y)) + geom_line(aes(linetype = should_dash))
Created on 2018-06-15 by the reprex package (v0.2.0).

Resources