When I run the following code in the console vs in the R markdown document, I see that the different methods are being dispatched. Is this a bug, or does the way that the code is run affect method dispatch?
library(sloop)
library(tidyverse)
df <- tibble(x=rnorm(5),
y=rnorm(5))
sloop::s3_dispatch(print(df))
Running the code in the R Markdown code cell (in RStudio)
Running the code in the console
RStudio is overriding print.tbl_df in the notebook. Code here.
In an R Notebook:
getAnywhere(print.tbl_df)
#> 2 differing objects matching ‘print.tbl_df’ were found
#> in the following places
#> registered S3 method for print
#> namespace:tibble
#> Use [] to view one of them
getAnywhere(print.tbl_df)[1]
#> function (x, ...)
#> {
#> o <- overrideMap(x, options)
#> if (!is.null(o)) {
#> overridePrint(o$x, o$options, o$className, o$nRow, o$nCol)
#> }
#> }
#> <bytecode: 0x7fb2bdf8fd48>
#> <environment: 0x7fb2bd9567e8>
#> attr(,".rs.S3Override")
#> [1] TRUE
In a normal R console:
getAnywhere(print.tbl_df)
#> A single object matching ‘print.tbl_df’ was found
#> It was found in the following places
#> namespace:tibble
#> with value
#>
#> function (x, ..., n = NULL, width = NULL, n_extra = NULL)
#> {
#> NextMethod()
#> }
#> <bytecode: 0x7fb2b77d7040>
#> <environment: namespace:tibble>
Related
I am having trouble interpreting the riskRegression() function in the R package riskRegression:
https://cran.r-project.org/web/packages/riskRegression/riskRegression.pdf
I looked over the examples under this function in the package vignette, but none of them actually use this function name. They use different function names such as LRR() or ARR(), so I am rather confused as to what the function actually does.
You can see from the source code below that ARR() and LRR() are just wrappers for riskRegression(link="relative") and riskRegression(link="logistic"), respectively.
library(riskRegression);ARR;LRR
#> riskRegression version 2022.11.21
#> function (formula, data, times, cause, cens.model, cens.formula,
#> ...)
#> {
#> fit <- riskRegression(formula = formula, data = data, times = times,
#> link = "relative", cause = cause, cens.model = cens.model,
#> cens.formula = cens.formula, ...)
#> fit$call <- match.call()
#> fit
#> }
#> <bytecode: 0x7f99ba1e1738>
#> <environment: namespace:riskRegression>
#> function (formula, data, times, cause, cens.model, cens.formula,
#> ...)
#> {
#> fit <- riskRegression(formula = formula, data = data, times = times,
#> link = "logistic", cause = cause, cens.model = cens.model,
#> cens.formula = cens.formula, ...)
#> fit$call <- match.call()
#> fit
#> }
#> <bytecode: 0x7f99ba2232e8>
#> <environment: namespace:riskRegression>
Created on 2022-11-26 by the reprex package (v2.0.1)
Is it possible to read/convert PDF portfolios in R?
I usually use pdftools, however, I get an error:
library(pdftools)
#> Using poppler version 0.73.0
link <- c("http://www.accessdata.fda.gov/cdrh_docs/pdf19/K190072.pdf")
pdftools::pdf_convert(link, dpi = 600)
#> Converting page 1 to K190072_1.png...
#> PDF error: Non conformant codestream TPsot==TNsot.<0a>
#> PDF error: Non conformant codestream TPsot==TNsot.<0a>
#> PDF error: Non conformant codestream TPsot==TNsot.<0a>
#> PDF error: Non conformant codestream TPsot==TNsot.<0a>
#> done!
#> [1] "K190072_1.png"
Created on 2021-05-06 by the reprex package (v1.0.0)
The K190072_1.png I finally get is only the image of the portfolio front page.
I am interessted in the document K190072.510kSummary.Final_Sent001.pdf of this PDF portfolio
I found a way for Python (Reading a PDF Portfolio in Python?) but I would really like to do that in R.
Thank you for your help.
There seems to be an issue with pdf_convert handling one-page raw pdf data (it wants to use basename(pdf) under these conditions), so I have edited that function so that it also works with the second attached pdf file.
If you only need the first file then you could run this with the original pdf_convert function, but it will give an error with the second file.
If you are interested in rendering raster graphics from the attached files this worked for me:
library(pdftools)
#> Using poppler version 21.02.0
link <- c("http://www.accessdata.fda.gov/cdrh_docs/pdf19/K190072.pdf")
pdf_convert <- function (pdf, format = "png", pages = NULL, filenames = NULL,
dpi = 72, antialias = TRUE, opw = "", upw = "", verbose = TRUE) {
config <- poppler_config()
if (!config$can_render || !length(config$supported_image_formats))
stop("You version of libppoppler does not support rendering")
format <- match.arg(format, poppler_config()$supported_image_formats)
if (is.null(pages))
pages <- seq_len(pdf_info(pdf, opw = opw, upw = upw)$pages)
if (!is.numeric(pages) || !length(pages))
stop("Argument 'pages' must be a one-indexed vector of page numbers")
if (length(filenames) < 2 & !is.raw(pdf)) { # added !is.raw(pdf)
input <- sub(".pdf", "", basename(pdf), fixed = TRUE)
filenames <- if (length(filenames)) {
sprintf(filenames, pages, format)
}
else {
sprintf("%s_%d.%s", input, pages, format)
}
}
if (length(filenames) != length(pages))
stop("Length of 'filenames' must be one or equal to 'pages'")
antialiasing <- isTRUE(antialias) || isTRUE(antialias ==
"draw")
text_antialiasing <- isTRUE(antialias) || isTRUE(antialias ==
"text")
pdftools:::poppler_convert(pdftools:::loadfile(pdf), format, pages, filenames,
dpi, opw, upw, antialiasing, text_antialiasing, verbose)
}
lapply(pdf_attachments(link), function(x) pdf_convert(x$data,
filenames=paste0(tools::file_path_sans_ext(x$name), "-",
seq_along(pdf_data(x$data)), ".png")))
#> Converting page 1 to K190072.510kSummary.Final_Sent001-1.png... done!
#> Converting page 2 to K190072.510kSummary.Final_Sent001-2.png... done!
#> Converting page 3 to K190072.510kSummary.Final_Sent001-3.png... done!
#> Converting page 4 to K190072.510kSummary.Final_Sent001-4.png... done!
#> Converting page 5 to K190072.510kSummary.Final_Sent001-5.png... done!
#> Converting page 1 to K190072.IFU.FINAL_Sent001-1.png... done!
#> Converting page 1 to K190072.Letter.SE.FINAL_Sent001-1.png... done!
#> Converting page 2 to K190072.Letter.SE.FINAL_Sent001-2.png... done!
#> [[1]]
#> [1] "K190072.510kSummary.Final_Sent001-1.png"
#> [2] "K190072.510kSummary.Final_Sent001-2.png"
#> [3] "K190072.510kSummary.Final_Sent001-3.png"
#> [4] "K190072.510kSummary.Final_Sent001-4.png"
#> [5] "K190072.510kSummary.Final_Sent001-5.png"
#>
#> [[2]]
#> [1] "K190072.IFU.FINAL_Sent001-1.png"
#>
#> [[3]]
#> [1] "K190072.Letter.SE.FINAL_Sent001-1.png"
#> [2] "K190072.Letter.SE.FINAL_Sent001-2.png"
Created on 2021-05-05 by the reprex package (v2.0.0)
I am scraping content from websites. For this I iterate over links. If an error occurs, purrr's possibly adverb should keep the process going, and place a "missing" (or "NA_character") as a result.
The code below works as intended when the site linked to is not existing, i.e. the output is "missing";
However, if the site linked to exists, but the element which I am trying to extract from the site does not exist, the function throws an error despite having defined a value for 'otherwise'.
To me this is surprising, since the documentation states that
' possibly : wrapped function uses a default value ( otherwise ) whenever an error occurs.'
Any idea why this is happening? I understand that i could modify the function accordingly (e.g. check for the length of the returned object). But I do not understand why the 'otherwise' value was not used.
library(tidyverse)
#> Warning: package 'tibble' was built under R version 4.0.4
#> Warning: package 'tidyr' was built under R version 4.0.4
#> Warning: package 'dplyr' was built under R version 4.0.4
library(rvest)
#> Warning: package 'rvest' was built under R version 4.0.4
#>
#> Attaching package: 'rvest'
#> The following object is masked from 'package:readr':
#>
#> guess_encoding
# possibly with wrong links when scraping site ----------------------------
#see https://github.com/tidyverse/purrr/issues/409
sample_data <- tibble::tibble(
link = c(
#link ok, selected item exists
"https://www.parlament.gv.at/PAKT/VHG/XXVII/NRSITZ/NRSITZ_00068/index.shtml#tab-Sten.Protokoll",
#link not ok
"https://www.wrong-url.foobar",
#link ok, selected item does not exist on site
"https://www.parlament.gv.at/PAKT/VHG/XXVII/NRSITZ/NRSITZ_00094/index.shtml#tab-Sten.Protokoll"
)
)
fn_get_link_to_records <- function(link_to_overview_sessions) {
print(link_to_overview_sessions)
link_to_overview_sessions %>%
rvest::read_html() %>%
rvest::html_elements("a") %>%
rvest::html_attr("href") %>%
enframe(name = NULL,
value = "link_to_text") %>%
filter(str_detect(link_to_text, regex("\\/NRSITZ_\\d+\\/fnameorig_\\d+\\.html$"))) %>%
mutate(link_to_text=glue::glue("https://www.parlament.gv.at/{link_to_text}")) %>%
pull()
}
sample_data %>%
mutate(link_to_text=map_chr(link,
possibly(fn_get_link_to_records,
otherwise=NA_character_)))
#> [1] "https://www.parlament.gv.at/PAKT/VHG/XXVII/NRSITZ/NRSITZ_00068/index.shtml#tab-Sten.Protokoll"
#> [1] "https://www.wrong-url.foobar"
#> [1] "https://www.parlament.gv.at/PAKT/VHG/XXVII/NRSITZ/NRSITZ_00094/index.shtml#tab-Sten.Protokoll"
#> Error: Problem with `mutate()` input `link_to_text`.
#> x Result 3 must be a single string, not a vector of class `glue/character` and of length 0
#> i Input `link_to_text` is `map_chr(link, possibly(fn_get_link_to_records, otherwise = NA_character_))`.
sample_data %>%
mutate(link_to_text=map_chr(link,
possibly(fn_get_link_to_records,
otherwise="missing")))
#> [1] "https://www.parlament.gv.at/PAKT/VHG/XXVII/NRSITZ/NRSITZ_00068/index.shtml#tab-Sten.Protokoll"
#> [1] "https://www.wrong-url.foobar"
#> [1] "https://www.parlament.gv.at/PAKT/VHG/XXVII/NRSITZ/NRSITZ_00094/index.shtml#tab-Sten.Protokoll"
#> Error: Problem with `mutate()` input `link_to_text`.
#> x Result 3 must be a single string, not a vector of class `glue/character` and of length 0
#> i Input `link_to_text` is `map_chr(link, possibly(fn_get_link_to_records, otherwise = "missing"))`.
Created on 2021-03-28 by the reprex package (v1.0.0)
UPDATE: I added the output below to make the unexpected result (last chunk) clearer.
sample_data[1:2,] %>%
mutate(link_to_text=map_chr(link,
possibly(fn_get_link_to_records,
otherwise="missing")))
#> [1] "https://www.parlament.gv.at/PAKT/VHG/XXVII/NRSITZ/NRSITZ_00068/index.shtml#tab-Sten.Protokoll"
#> [1] "https://www.wrong-url.foobar"
#> # A tibble: 2 x 2
#> link link_to_text
#> <chr> <chr>
#> 1 https://www.parlament.gv.at/PAKT/VHG~ https://www.parlament.gv.at//PAKT/VHG/X~
#> 2 https://www.wrong-url.foobar missing
sample_data[3, ] %>%
mutate(link_to_text=map_chr(link,
possibly(fn_get_link_to_records,
otherwise="missing")))
#> [1] "https://www.parlament.gv.at/PAKT/VHG/XXVII/NRSITZ/NRSITZ_00094/index.shtml#tab-Sten.Protokoll"
#> Error: Problem with `mutate()` input `link_to_text`.
#> x Result 1 must be a single string, not a vector of class `glue/character` and of length 0
#> i Input `link_to_text` is `map_chr(link, possibly(fn_get_link_to_records, otherwise = "missing"))`.
Created on 2021-03-29 by the reprex package (v1.0.0)
The error is coming from map_chr but you have possibly wrapped around fn_get_link_to_records function. If you run fn_get_link_to_records(sample_data$link[3]) you'll see the URL get's printed and nothing is returned and no error is generated. However, map_chr cannot change this empty output to character value hence you get the error. Instead of map_chr if you use map you'll see it works.
sample_data[3,] %>%
mutate(link_to_text= map(link, fn_get_link_to_records))
#[1] #"https://www.parlament.gv.at/PAKT/VHG/XXVII/NRSITZ/NRSITZ_00094/index.shtml#tab-Sten.Protokoll"
# A tibble: 1 x 2
# link link_to_text
# <chr> <list>
#1 https://www.parlament.gv.at/PAKT/VHG/XXVII/NRSITZ/NRSITZ_00094/index.shtml#tab-Sten.Pro… <glue [0]>
but link_to_text is empty. The solution as you already know is check length of output value and return NA or generate an error inside fn_get_link_to_records functions for such cases which will be handled using possibly.
I am currently writing swirl lessons where im trying to test if a ggplot2 object created by the user is somewhat equal (all.equal()) to an object i create in a custom AnswerTest. however the plot object which i receive from swirl api by accessing e$val often inherits an flipped_aes = FALSE attribute which i cannot create in my own plots and hence all.equal(e$val, someplot) fails allthough they look equal.
I would really appreciate some ideas how to work around it or control its occurence!
if it occurs all.equal() fails with the following message:
"Component “layers”: Component 1: Component 4: Length mismatch: comparison on first 2 components"
my current test looks like this:
calculates_same_graph <- function(expression){ #If ggplot expression must be written in curly brackets in Yaml file
e <- get("e", parent.frame())
eSnap <- cleanEnv(e$snapshot)
val <- expression
passed <- isTRUE(all.equal(val[-8], e$val[-8]))
assign("e", e$val, envir = globalenv()) #only for diagnostics, changes
#when new answer is put in
return(passed)
}
Ok, I agree that this is a bit weird, but I found out that the flipped_aes parameter only comes into existance after printing a plot. The weird bit is that is appears to be an object-modifying side-effect of printing the plot. This only makes sense if the paramter is being cached somehow.
Suppose we have two plots that have opposite aesthetic flipping:
library(ggplot2)
# Should have flipped_aes = FALSE
plot1 <- ggplot(iris, aes(Species, Sepal.Width)) +
geom_col()
# Should have flipped_aes = TRUE
plot2 <- ggplot(iris, aes(Sepal.Width, Species)) +
geom_col()
We can see that these unprinted objects do not have flipped.aes in their geom parameters.
# Before printing plot
plot1$layers[[1]]$geom_params
#> $width
#> NULL
#>
#> $na.rm
#> [1] FALSE
plot2$layers[[1]]$geom_params
#> $width
#> NULL
#>
#> $na.rm
#> [1] FALSE
Now we can print these plots to a temporary file. Just printing it in the console should work too, I just can't replicate that in a reprex.
# Printing as tempfile
tmp <- tempfile(fileext = ".png")
png(tmp)
plot1
plot2
dev.off()
#> png
#> 2
unlink(tmp)
Now after we've printed the plot, suddenly the plot objects have the flipped_aes parameter.
# After printing plot
plot1$layers[[1]]$geom_params
#> $width
#> NULL
#>
#> $na.rm
#> [1] FALSE
#>
#> $flipped_aes
#> [1] FALSE
plot2$layers[[1]]$geom_params
#> $width
#> NULL
#>
#> $na.rm
#> [1] FALSE
#>
#> $flipped_aes
#> [1] TRUE
Created on 2021-03-10 by the reprex package (v1.0.0)
I don't know what the best way is to deal with this weirdness in your swirl test, but it appears that the printing of the plot influences that parameter.
Is it possible to lock the global environment and still allow .Random.seed to be set or removed? The default behavior of lockEnvironment() is too aggressive
for my use case.
lockEnvironment(globalenv())
rnorm(10)
#> Error in rnorm(10) : cannot add bindings to a locked environment
rm(.Random.seed)
#> Error in rm(.Random.seed) :
#> cannot remove bindings from a locked environment
Background
drake version 7.0.0 will have a new safeguard to protect reproducibility.
plan <- drake_plan(
x = {
data(mtcars)
mtcars$mpg
},
y = mean(x)
)
plan
#> # A tibble: 2 x 2
#> target command
#> <chr> <expr>
#> 1 x { data(mtcars) mtcars$mpg }
#> 2 y mean(x)
make(plan)
#> target x
#> fail x
#> Error: Target `x` failed. Call `diagnose(x)` for details. Error message:
#> cannot add bindings to a locked environment.
#> One of your targets tried to modify your environment,
#> which could invalidate other targets
#> and undermine reproducibility (example:
#> https://github.com/ropensci/drake/issues/664#issuecomment-453163562).
#> Beware <<-, ->>, attach(), data(), and side effects in general.
#> Use make(lock_envir = FALSE) to avoid this error (not recommended).
The error comes from the call to data(mtcars). The very act of building x would have changed x's dependencies. Without guardrails, the workflow invalidates itself.
make(plan, lock_envir = FALSE)
#> target x
#> target y
make(plan, lock_envir = FALSE)
#> target x
But with guardrails, we run into edge cases like https://github.com/ropensci/drake/issues/749 and https://github.com/ropensci/drake/issues/675#issuecomment-458222414.