riskRegression() function: confusing examples in R package riskRegression vignette - r

I am having trouble interpreting the riskRegression() function in the R package riskRegression:
https://cran.r-project.org/web/packages/riskRegression/riskRegression.pdf
I looked over the examples under this function in the package vignette, but none of them actually use this function name. They use different function names such as LRR() or ARR(), so I am rather confused as to what the function actually does.

You can see from the source code below that ARR() and LRR() are just wrappers for riskRegression(link="relative") and riskRegression(link="logistic"), respectively.
library(riskRegression);ARR;LRR
#> riskRegression version 2022.11.21
#> function (formula, data, times, cause, cens.model, cens.formula,
#> ...)
#> {
#> fit <- riskRegression(formula = formula, data = data, times = times,
#> link = "relative", cause = cause, cens.model = cens.model,
#> cens.formula = cens.formula, ...)
#> fit$call <- match.call()
#> fit
#> }
#> <bytecode: 0x7f99ba1e1738>
#> <environment: namespace:riskRegression>
#> function (formula, data, times, cause, cens.model, cens.formula,
#> ...)
#> {
#> fit <- riskRegression(formula = formula, data = data, times = times,
#> link = "logistic", cause = cause, cens.model = cens.model,
#> cens.formula = cens.formula, ...)
#> fit$call <- match.call()
#> fit
#> }
#> <bytecode: 0x7f99ba2232e8>
#> <environment: namespace:riskRegression>
Created on 2022-11-26 by the reprex package (v2.0.1)

Related

download.file with wildcard matching in R

I'm trying to download all the files that match a pattern from a url directory in R using download.file, but I can't get it working for even a single file. The url is:
https://coast.noaa.gov/htdata/CMSP/AISDataHandler/2019/
and the pattern match is all files like: AIS_2019_*_18.zip
Here is what I've tried for a single file case:
download.file('https://coast.noaa.gov/htdata/CMSP/AISDataHandler/2019/AIS_2019_04_18.zip',
destfile = "AIS_2019_04_18.zip",
method = "wget", extra = c("-r", "-np", "-L", "--max-redirect=0"))
but I always get 'wget' call had nonzero exit status
I've also tried setting method = internal and mode = w, but get ```scheme not supported in url'
Here's a way to generate all the links that you can then loop through them with a for loop.
library(glue)
library(stringr)
library(lubridate)
#>
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#>
#> date, intersect, setdiff, union
# Setup
month_dates <- glue("2019-{str_pad(1:12, width = 2, pad = '0')}-01")
days_in_months <- days_in_month(as.Date(month_dates))
# Get appropriate number of days and months combinations
months <- rep(1:12, days_in_months)
days <- unlist(mapply(function(x) str_pad(1:x, width = 2, pad = "0"),
days_in_months))
base_url <- "https://coast.noaa.gov/htdata/CMSP/AISDataHandler/2019"
# Put everything together
all_files <- glue("{base_url}/AIS_2019_{months}_{days}.zip")
# See results
head(all_files)
#> https://coast.noaa.gov/htdata/CMSP/AISDataHandler/2019/AIS_2019_1_01.zip
#> https://coast.noaa.gov/htdata/CMSP/AISDataHandler/2019/AIS_2019_1_02.zip
#> https://coast.noaa.gov/htdata/CMSP/AISDataHandler/2019/AIS_2019_1_03.zip
#> https://coast.noaa.gov/htdata/CMSP/AISDataHandler/2019/AIS_2019_1_04.zip
#> https://coast.noaa.gov/htdata/CMSP/AISDataHandler/2019/AIS_2019_1_05.zip
#> https://coast.noaa.gov/htdata/CMSP/AISDataHandler/2019/AIS_2019_1_06.zip
# Check number of the days in a year is correct
length(all_files)
#> [1] 365
Created on 2021-08-04 by the reprex package (v2.0.0)
Once you have those created, you can do something like:
# Untested
for (file in all_files) {
download.file(file,
destfile = basename(file),
extra = c("-r", "-np", "-L", "--max-redirect=0"))
}

Error despite purrr's 'otherwise' - Why is purrr/possibly's 'otherwise' not triggered?

I am scraping content from websites. For this I iterate over links. If an error occurs, purrr's possibly adverb should keep the process going, and place a "missing" (or "NA_character") as a result.
The code below works as intended when the site linked to is not existing, i.e. the output is "missing";
However, if the site linked to exists, but the element which I am trying to extract from the site does not exist, the function throws an error despite having defined a value for 'otherwise'.
To me this is surprising, since the documentation states that
' possibly : wrapped function uses a default value ( otherwise ) whenever an error occurs.'
Any idea why this is happening? I understand that i could modify the function accordingly (e.g. check for the length of the returned object). But I do not understand why the 'otherwise' value was not used.
library(tidyverse)
#> Warning: package 'tibble' was built under R version 4.0.4
#> Warning: package 'tidyr' was built under R version 4.0.4
#> Warning: package 'dplyr' was built under R version 4.0.4
library(rvest)
#> Warning: package 'rvest' was built under R version 4.0.4
#>
#> Attaching package: 'rvest'
#> The following object is masked from 'package:readr':
#>
#> guess_encoding
# possibly with wrong links when scraping site ----------------------------
#see https://github.com/tidyverse/purrr/issues/409
sample_data <- tibble::tibble(
link = c(
#link ok, selected item exists
"https://www.parlament.gv.at/PAKT/VHG/XXVII/NRSITZ/NRSITZ_00068/index.shtml#tab-Sten.Protokoll",
#link not ok
"https://www.wrong-url.foobar",
#link ok, selected item does not exist on site
"https://www.parlament.gv.at/PAKT/VHG/XXVII/NRSITZ/NRSITZ_00094/index.shtml#tab-Sten.Protokoll"
)
)
fn_get_link_to_records <- function(link_to_overview_sessions) {
print(link_to_overview_sessions)
link_to_overview_sessions %>%
rvest::read_html() %>%
rvest::html_elements("a") %>%
rvest::html_attr("href") %>%
enframe(name = NULL,
value = "link_to_text") %>%
filter(str_detect(link_to_text, regex("\\/NRSITZ_\\d+\\/fnameorig_\\d+\\.html$"))) %>%
mutate(link_to_text=glue::glue("https://www.parlament.gv.at/{link_to_text}")) %>%
pull()
}
sample_data %>%
mutate(link_to_text=map_chr(link,
possibly(fn_get_link_to_records,
otherwise=NA_character_)))
#> [1] "https://www.parlament.gv.at/PAKT/VHG/XXVII/NRSITZ/NRSITZ_00068/index.shtml#tab-Sten.Protokoll"
#> [1] "https://www.wrong-url.foobar"
#> [1] "https://www.parlament.gv.at/PAKT/VHG/XXVII/NRSITZ/NRSITZ_00094/index.shtml#tab-Sten.Protokoll"
#> Error: Problem with `mutate()` input `link_to_text`.
#> x Result 3 must be a single string, not a vector of class `glue/character` and of length 0
#> i Input `link_to_text` is `map_chr(link, possibly(fn_get_link_to_records, otherwise = NA_character_))`.
sample_data %>%
mutate(link_to_text=map_chr(link,
possibly(fn_get_link_to_records,
otherwise="missing")))
#> [1] "https://www.parlament.gv.at/PAKT/VHG/XXVII/NRSITZ/NRSITZ_00068/index.shtml#tab-Sten.Protokoll"
#> [1] "https://www.wrong-url.foobar"
#> [1] "https://www.parlament.gv.at/PAKT/VHG/XXVII/NRSITZ/NRSITZ_00094/index.shtml#tab-Sten.Protokoll"
#> Error: Problem with `mutate()` input `link_to_text`.
#> x Result 3 must be a single string, not a vector of class `glue/character` and of length 0
#> i Input `link_to_text` is `map_chr(link, possibly(fn_get_link_to_records, otherwise = "missing"))`.
Created on 2021-03-28 by the reprex package (v1.0.0)
UPDATE: I added the output below to make the unexpected result (last chunk) clearer.
sample_data[1:2,] %>%
mutate(link_to_text=map_chr(link,
possibly(fn_get_link_to_records,
otherwise="missing")))
#> [1] "https://www.parlament.gv.at/PAKT/VHG/XXVII/NRSITZ/NRSITZ_00068/index.shtml#tab-Sten.Protokoll"
#> [1] "https://www.wrong-url.foobar"
#> # A tibble: 2 x 2
#> link link_to_text
#> <chr> <chr>
#> 1 https://www.parlament.gv.at/PAKT/VHG~ https://www.parlament.gv.at//PAKT/VHG/X~
#> 2 https://www.wrong-url.foobar missing
sample_data[3, ] %>%
mutate(link_to_text=map_chr(link,
possibly(fn_get_link_to_records,
otherwise="missing")))
#> [1] "https://www.parlament.gv.at/PAKT/VHG/XXVII/NRSITZ/NRSITZ_00094/index.shtml#tab-Sten.Protokoll"
#> Error: Problem with `mutate()` input `link_to_text`.
#> x Result 1 must be a single string, not a vector of class `glue/character` and of length 0
#> i Input `link_to_text` is `map_chr(link, possibly(fn_get_link_to_records, otherwise = "missing"))`.
Created on 2021-03-29 by the reprex package (v1.0.0)
The error is coming from map_chr but you have possibly wrapped around fn_get_link_to_records function. If you run fn_get_link_to_records(sample_data$link[3]) you'll see the URL get's printed and nothing is returned and no error is generated. However, map_chr cannot change this empty output to character value hence you get the error. Instead of map_chr if you use map you'll see it works.
sample_data[3,] %>%
mutate(link_to_text= map(link, fn_get_link_to_records))
#[1] #"https://www.parlament.gv.at/PAKT/VHG/XXVII/NRSITZ/NRSITZ_00094/index.shtml#tab-Sten.Protokoll"
# A tibble: 1 x 2
# link link_to_text
# <chr> <list>
#1 https://www.parlament.gv.at/PAKT/VHG/XXVII/NRSITZ/NRSITZ_00094/index.shtml#tab-Sten.Pro… <glue [0]>
but link_to_text is empty. The solution as you already know is check length of output value and return NA or generate an error inside fn_get_link_to_records functions for such cases which will be handled using possibly.

Different print method dispatched when running in console vs R markdown cell?

When I run the following code in the console vs in the R markdown document, I see that the different methods are being dispatched. Is this a bug, or does the way that the code is run affect method dispatch?
library(sloop)
library(tidyverse)
df <- tibble(x=rnorm(5),
y=rnorm(5))
sloop::s3_dispatch(print(df))
Running the code in the R Markdown code cell (in RStudio)
Running the code in the console
RStudio is overriding print.tbl_df in the notebook. Code here.
In an R Notebook:
getAnywhere(print.tbl_df)
#> 2 differing objects matching ‘print.tbl_df’ were found
#> in the following places
#> registered S3 method for print
#> namespace:tibble
#> Use [] to view one of them
getAnywhere(print.tbl_df)[1]
#> function (x, ...)
#> {
#> o <- overrideMap(x, options)
#> if (!is.null(o)) {
#> overridePrint(o$x, o$options, o$className, o$nRow, o$nCol)
#> }
#> }
#> <bytecode: 0x7fb2bdf8fd48>
#> <environment: 0x7fb2bd9567e8>
#> attr(,".rs.S3Override")
#> [1] TRUE
In a normal R console:
getAnywhere(print.tbl_df)
#> A single object matching ‘print.tbl_df’ was found
#> It was found in the following places
#> namespace:tibble
#> with value
#>
#> function (x, ..., n = NULL, width = NULL, n_extra = NULL)
#> {
#> NextMethod()
#> }
#> <bytecode: 0x7fb2b77d7040>
#> <environment: namespace:tibble>

Function passes test inside the package and fails from vignette (due to cpp_object_initializer)

I am writing an R package that uses rstan for Bayesian
sampling. (Here
is the specific commit if you want to reproduce the issue.) I only succeed in
running a function that calls rstan from the vignette if I use
library(rstan) in the vignette, and not with a few workarounds.
The setup
A function in the package calls rstan (edited for clarity):
#' #importFrom Rcpp cpp_object_initializer
#' #export
run_variational_bayes <- function(x, y, output_samples, beta_sd, stan_file) {
n_input <- length(y)
p <- ncol(x)
train_dat <- list(n = n_input, p = p, x = x, y = y, beta_sd = beta_sd)
stan_model <- rstan::stan_model(file = stan_file)
stan_vb <- rstan::vb(object = stan_model, data = train_dat,
output_samples = output_samples)
return(rstan::extract(stan_vb)$beta)
}
I test this function in the package:
context("RStan variational Bayes model")
test_that("Rstan variational Bayes model runs", {
german <- PosteriorBootstrap::get_german_credit_dataset()
n_bootstrap <- 10
prior_variance <- 100
stan_vb_sample <- PosteriorBootstrap::run_variational_bayes(x = german$x,
y = german$y,
output_samples = n_bootstrap,
beta_sd = sqrt(prior_variance),
iter = 10)
expect_true(nrow(stan_vb_sample) == n_bootstrap)
expect_true(ncol(stan_vb_sample) == ncol(german$x))
})
The tests pass locally and on Travis, so the function works from inside the
package.
The problem
The vignette code works if I include library(rstan):
library(rstan)
prior_sd <- 10
n_bootstrap <- 1000
german <- PosteriorBootstrap::get_german_credit_dataset()
stan_vb_sample <- PosteriorBootstrap::run_variational_bayes(x = german$x,
y = german$y,
output_samples = n_bootstrap,
beta_sd = prior_sd)
dim(stan_vb_sample)
#> [1] 1000 25
but I see it as bad practice that the user needs to attach another package to
use my package. If I use requireNamespace(), building the vignette works but
the Stan model does not run:
requireNamespace("PosteriorBootstrap", quietly = TRUE)
# ...
stan_vb_sample <- PosteriorBootstrap::run_variational_bayes(x = german$x,
y = german$y,
output_samples = n_bootstrap,
beta_sd = prior_sd)
#> Error in cpp_object_initializer(.self, .refClassDef, ...) :
#> could not find function "cpp_object_initializer"
#> failed to create the model; variational Bayes not done
#> Stan model 'bayes_logit' does not contain samples.
dim(stan_vb_sample)
#> NULL
Note that I used #' #importFrom Rcpp cpp_object_initializer in the Roxygen
comment, which should import the function that rstan says is missing.
Comparison with another package that uses rstan
This package
has similar values in DESCRIPTION, yet I tested that it does not require library(rstan) to
run rstan. It uses #import Rcpp in one function, which I tested with my
package by replacing #importFrom Rcpp cpp_object_initializer in front of the
function and got the same error.
Failed workarounds
The difference between requireNamespace() and library() is that the latter
imports the namespace of the package into the current environment. But
rstan does import(Rcpp) so that object should be available.
(1) I tried library("PosteriorBootstrap") in the vignette, since the
package imports that object into its namespace: I got the same error (with
#import Rcpp or with #importFrom Rcpp cpp_object_initializer).
(2) I copied that object to the environment of the function:
requireNamespace("Rcpp", quietly = TRUE)
#' #import Rcpp
#' #export
run_variational_bayes <- function(x, y, output_samples, beta_sd,
stan_file = get_stan_file(),
iter = 10000, seed = 123, verbose = FALSE) {
cpp_object_initializer <- Rcpp:cpp_object_initializer
# ...
}
and I was surprised to get a vignette error:
E creating vignettes (1.8s)
Quitting from lines 151-157 (anpl.Rmd)
Error: processing vignette 'anpl.Rmd' failed with diagnostics:
object 'Rcpp' not found
Execution halted
Temporary solution
As a temporary solution, I moved the code in the function to the vignette
entirely. The vignette fails with requireNamespace():
requireNamespace("rstan")
#> Loading required namespace: rstan
prior_sd <- 10
n_bootstrap <- 1000
german <- PosteriorBootstrap::get_german_credit_dataset()
train_dat <- list(n = length(german$y), p = ncol(german$x), x = german$x, y = german$y, beta_sd = prior_sd)
stan_file <- PosteriorBootstrap::get_stan_file()
stan_model <- rstan::stan_model(file = stan_file)
stan_vb <- rstan::vb(object = stan_model, data = train_dat, seed = seed,
output_samples = n_bootstrap)
#> Error in cpp_object_initializer(.self, .refClassDef, ...) :
#> could not find function "cpp_object_initializer"
#> failed to create the model; variational Bayes not done
stan_vb_sample <- rstan::extract(stan_vb)$beta
#> Stan model 'bayes_logit' does not contain samples.
dim(stan_vb_sample)
#> NULL
and succeeds with library(rstan):
library("rstan")
#> Loading required package: ggplot2
# ...
stan_model <- rstan::stan_model(file = stan_file)
stan_vb <- rstan::vb(object = stan_model, data = train_dat, seed = seed,
output_samples = n_bootstrap)
#> Chain 1: ------------------------------------------------------------
# ...
#> Chain 1: COMPLETED.
stan_vb_sample <- rstan::extract(stan_vb)$beta
dim(stan_vb_sample)
#> [1] 1000 25
In moving the code out of the package, I realised that a test that uses
library("rstan") and calls the rstan package directly, e.g.
context("Adaptive non-parametric learning function")
library("rstan")
# ...
test_that("Adaptive non-parametric learning with posterior samples works", {
german <- get_german_credit_dataset()
n_bootstrap <- 100
# Get posterior samples
seed <- 123
prior_sd <- 10
train_dat <- list(n = length(german$y), p = ncol(german$x), x = german$x,
y = german$y, beta_sd = prior_sd)
stan_model <- rstan::stan_model(file = get_stan_file())
stan_vb <- rstan::vb(object = stan_model, data = train_dat, seed = seed,
output_samples = n_bootstrap)
stan_vb_sample <- rstan::extract(stan_vb)$beta
# ...
}
passes the tests inside the package:
✔ | 24 | Adaptive non-parametric learning function [53.1 s]
══ Results ═════════════════════════════════════════════════════════════════════
Duration: 53.2 s
OK: 24
Failed: 0
Warnings: 0
Skipped: 0
but the same test with requireNamespace("rstan") fails them:
⠋ | 21 | Adaptive non-parametric learning functionError in cpp_object_initializer(.self, .refClassDef, ...) :
could not find function "cpp_object_initializer"
Stan model 'bayes_logit' does not contain samples.
...
══ Results ═════════════════════════════════════════════════════════════════════
Duration: 51.7 s
OK: 22
Failed: 1
Warnings: 0
Skipped: 0
Conclusion
I wonder if rstan code is calling cpp_object_initializer without a
qualifier, and if it's doing that in a new environment that does not inherit the
objects from the calling environment.
I acknowledge that I did not use rstantools to start the package
(my employer decided to stick with the MIT license and chose not to restart the
package structure from scratch) and that I am compiling
the model at call time. I suppose that users providing their own model would
face the same errors when using requireNamespace() instead of library().
How can I allow users to run package functions that call rstan without
library(rstan), short of restarting the package from scratch with rstantools?

Lock environment but not .Random.seed

Is it possible to lock the global environment and still allow .Random.seed to be set or removed? The default behavior of lockEnvironment() is too aggressive
for my use case.
lockEnvironment(globalenv())
rnorm(10)
#> Error in rnorm(10) : cannot add bindings to a locked environment
rm(.Random.seed)
#> Error in rm(.Random.seed) :
#> cannot remove bindings from a locked environment
Background
drake version 7.0.0 will have a new safeguard to protect reproducibility.
plan <- drake_plan(
x = {
data(mtcars)
mtcars$mpg
},
y = mean(x)
)
plan
#> # A tibble: 2 x 2
#> target command
#> <chr> <expr>
#> 1 x { data(mtcars) mtcars$mpg }
#> 2 y mean(x)
make(plan)
#> target x
#> fail x
#> Error: Target `x` failed. Call `diagnose(x)` for details. Error message:
#> cannot add bindings to a locked environment.
#> One of your targets tried to modify your environment,
#> which could invalidate other targets
#> and undermine reproducibility (example:
#> https://github.com/ropensci/drake/issues/664#issuecomment-453163562).
#> Beware <<-, ->>, attach(), data(), and side effects in general.
#> Use make(lock_envir = FALSE) to avoid this error (not recommended).
The error comes from the call to data(mtcars). The very act of building x would have changed x's dependencies. Without guardrails, the workflow invalidates itself.
make(plan, lock_envir = FALSE)
#> target x
#> target y
make(plan, lock_envir = FALSE)
#> target x
But with guardrails, we run into edge cases like https://github.com/ropensci/drake/issues/749 and https://github.com/ropensci/drake/issues/675#issuecomment-458222414.

Resources