What determines the 'flipped_aes' occurence in ggplot2 objects? - r

I am currently writing swirl lessons where im trying to test if a ggplot2 object created by the user is somewhat equal (all.equal()) to an object i create in a custom AnswerTest. however the plot object which i receive from swirl api by accessing e$val often inherits an flipped_aes = FALSE attribute which i cannot create in my own plots and hence all.equal(e$val, someplot) fails allthough they look equal.
I would really appreciate some ideas how to work around it or control its occurence!
if it occurs all.equal() fails with the following message:
"Component “layers”: Component 1: Component 4: Length mismatch: comparison on first 2 components"
my current test looks like this:
calculates_same_graph <- function(expression){ #If ggplot expression must be written in curly brackets in Yaml file
e <- get("e", parent.frame())
eSnap <- cleanEnv(e$snapshot)
val <- expression
passed <- isTRUE(all.equal(val[-8], e$val[-8]))
assign("e", e$val, envir = globalenv()) #only for diagnostics, changes
#when new answer is put in
return(passed)
}

Ok, I agree that this is a bit weird, but I found out that the flipped_aes parameter only comes into existance after printing a plot. The weird bit is that is appears to be an object-modifying side-effect of printing the plot. This only makes sense if the paramter is being cached somehow.
Suppose we have two plots that have opposite aesthetic flipping:
library(ggplot2)
# Should have flipped_aes = FALSE
plot1 <- ggplot(iris, aes(Species, Sepal.Width)) +
geom_col()
# Should have flipped_aes = TRUE
plot2 <- ggplot(iris, aes(Sepal.Width, Species)) +
geom_col()
We can see that these unprinted objects do not have flipped.aes in their geom parameters.
# Before printing plot
plot1$layers[[1]]$geom_params
#> $width
#> NULL
#>
#> $na.rm
#> [1] FALSE
plot2$layers[[1]]$geom_params
#> $width
#> NULL
#>
#> $na.rm
#> [1] FALSE
Now we can print these plots to a temporary file. Just printing it in the console should work too, I just can't replicate that in a reprex.
# Printing as tempfile
tmp <- tempfile(fileext = ".png")
png(tmp)
plot1
plot2
dev.off()
#> png
#> 2
unlink(tmp)
Now after we've printed the plot, suddenly the plot objects have the flipped_aes parameter.
# After printing plot
plot1$layers[[1]]$geom_params
#> $width
#> NULL
#>
#> $na.rm
#> [1] FALSE
#>
#> $flipped_aes
#> [1] FALSE
plot2$layers[[1]]$geom_params
#> $width
#> NULL
#>
#> $na.rm
#> [1] FALSE
#>
#> $flipped_aes
#> [1] TRUE
Created on 2021-03-10 by the reprex package (v1.0.0)
I don't know what the best way is to deal with this weirdness in your swirl test, but it appears that the printing of the plot influences that parameter.

Related

Error despite purrr's 'otherwise' - Why is purrr/possibly's 'otherwise' not triggered?

I am scraping content from websites. For this I iterate over links. If an error occurs, purrr's possibly adverb should keep the process going, and place a "missing" (or "NA_character") as a result.
The code below works as intended when the site linked to is not existing, i.e. the output is "missing";
However, if the site linked to exists, but the element which I am trying to extract from the site does not exist, the function throws an error despite having defined a value for 'otherwise'.
To me this is surprising, since the documentation states that
' possibly : wrapped function uses a default value ( otherwise ) whenever an error occurs.'
Any idea why this is happening? I understand that i could modify the function accordingly (e.g. check for the length of the returned object). But I do not understand why the 'otherwise' value was not used.
library(tidyverse)
#> Warning: package 'tibble' was built under R version 4.0.4
#> Warning: package 'tidyr' was built under R version 4.0.4
#> Warning: package 'dplyr' was built under R version 4.0.4
library(rvest)
#> Warning: package 'rvest' was built under R version 4.0.4
#>
#> Attaching package: 'rvest'
#> The following object is masked from 'package:readr':
#>
#> guess_encoding
# possibly with wrong links when scraping site ----------------------------
#see https://github.com/tidyverse/purrr/issues/409
sample_data <- tibble::tibble(
link = c(
#link ok, selected item exists
"https://www.parlament.gv.at/PAKT/VHG/XXVII/NRSITZ/NRSITZ_00068/index.shtml#tab-Sten.Protokoll",
#link not ok
"https://www.wrong-url.foobar",
#link ok, selected item does not exist on site
"https://www.parlament.gv.at/PAKT/VHG/XXVII/NRSITZ/NRSITZ_00094/index.shtml#tab-Sten.Protokoll"
)
)
fn_get_link_to_records <- function(link_to_overview_sessions) {
print(link_to_overview_sessions)
link_to_overview_sessions %>%
rvest::read_html() %>%
rvest::html_elements("a") %>%
rvest::html_attr("href") %>%
enframe(name = NULL,
value = "link_to_text") %>%
filter(str_detect(link_to_text, regex("\\/NRSITZ_\\d+\\/fnameorig_\\d+\\.html$"))) %>%
mutate(link_to_text=glue::glue("https://www.parlament.gv.at/{link_to_text}")) %>%
pull()
}
sample_data %>%
mutate(link_to_text=map_chr(link,
possibly(fn_get_link_to_records,
otherwise=NA_character_)))
#> [1] "https://www.parlament.gv.at/PAKT/VHG/XXVII/NRSITZ/NRSITZ_00068/index.shtml#tab-Sten.Protokoll"
#> [1] "https://www.wrong-url.foobar"
#> [1] "https://www.parlament.gv.at/PAKT/VHG/XXVII/NRSITZ/NRSITZ_00094/index.shtml#tab-Sten.Protokoll"
#> Error: Problem with `mutate()` input `link_to_text`.
#> x Result 3 must be a single string, not a vector of class `glue/character` and of length 0
#> i Input `link_to_text` is `map_chr(link, possibly(fn_get_link_to_records, otherwise = NA_character_))`.
sample_data %>%
mutate(link_to_text=map_chr(link,
possibly(fn_get_link_to_records,
otherwise="missing")))
#> [1] "https://www.parlament.gv.at/PAKT/VHG/XXVII/NRSITZ/NRSITZ_00068/index.shtml#tab-Sten.Protokoll"
#> [1] "https://www.wrong-url.foobar"
#> [1] "https://www.parlament.gv.at/PAKT/VHG/XXVII/NRSITZ/NRSITZ_00094/index.shtml#tab-Sten.Protokoll"
#> Error: Problem with `mutate()` input `link_to_text`.
#> x Result 3 must be a single string, not a vector of class `glue/character` and of length 0
#> i Input `link_to_text` is `map_chr(link, possibly(fn_get_link_to_records, otherwise = "missing"))`.
Created on 2021-03-28 by the reprex package (v1.0.0)
UPDATE: I added the output below to make the unexpected result (last chunk) clearer.
sample_data[1:2,] %>%
mutate(link_to_text=map_chr(link,
possibly(fn_get_link_to_records,
otherwise="missing")))
#> [1] "https://www.parlament.gv.at/PAKT/VHG/XXVII/NRSITZ/NRSITZ_00068/index.shtml#tab-Sten.Protokoll"
#> [1] "https://www.wrong-url.foobar"
#> # A tibble: 2 x 2
#> link link_to_text
#> <chr> <chr>
#> 1 https://www.parlament.gv.at/PAKT/VHG~ https://www.parlament.gv.at//PAKT/VHG/X~
#> 2 https://www.wrong-url.foobar missing
sample_data[3, ] %>%
mutate(link_to_text=map_chr(link,
possibly(fn_get_link_to_records,
otherwise="missing")))
#> [1] "https://www.parlament.gv.at/PAKT/VHG/XXVII/NRSITZ/NRSITZ_00094/index.shtml#tab-Sten.Protokoll"
#> Error: Problem with `mutate()` input `link_to_text`.
#> x Result 1 must be a single string, not a vector of class `glue/character` and of length 0
#> i Input `link_to_text` is `map_chr(link, possibly(fn_get_link_to_records, otherwise = "missing"))`.
Created on 2021-03-29 by the reprex package (v1.0.0)
The error is coming from map_chr but you have possibly wrapped around fn_get_link_to_records function. If you run fn_get_link_to_records(sample_data$link[3]) you'll see the URL get's printed and nothing is returned and no error is generated. However, map_chr cannot change this empty output to character value hence you get the error. Instead of map_chr if you use map you'll see it works.
sample_data[3,] %>%
mutate(link_to_text= map(link, fn_get_link_to_records))
#[1] #"https://www.parlament.gv.at/PAKT/VHG/XXVII/NRSITZ/NRSITZ_00094/index.shtml#tab-Sten.Protokoll"
# A tibble: 1 x 2
# link link_to_text
# <chr> <list>
#1 https://www.parlament.gv.at/PAKT/VHG/XXVII/NRSITZ/NRSITZ_00094/index.shtml#tab-Sten.Pro… <glue [0]>
but link_to_text is empty. The solution as you already know is check length of output value and return NA or generate an error inside fn_get_link_to_records functions for such cases which will be handled using possibly.

How to find heavy objects that are not stored in .GlobalEnv?

I am trying to find which objects are taking a lot of memory in my R session, but the problem is that the object might have been invisibly created with an unknown name in an unknown environment.
If the object is stored in .GlobalEnv or a known environment, I can easily use a strategy like ls(enviro)+get()+object.size() (see lsos on this post for example) to list all objects and their size, allowing me to identify the heavy objects.
However, the object in question might not be stored in .GlobalEnv, but might be in some obscure environment implicitly created by an external package. How can in that case identify which object is using a lot of RAM?
The best case study is ggplot2 creating .last_plot in a dedicated environment. Looking under the hood one can find that it is stored in environment(ggplot2:::.store$get), so one can find it and eventually remove it. But if I didn't know that location or name a priori, would there be a way to find that there is a heavy object called .last_plot somewhere in memory?
pryr::mem_used()
#> 34.7 MB
## example: implicit creation of heavy and hidden object by ggplot
path <- tempfile()
if(!file.exists(path)){
saveRDS(as.data.frame(matrix(rep(1,1e07), ncol=5)), path)
}
pryr::mem_used()
#> 34.9 MB
p1 <- ggplot2::ggplot(readr::read_rds(path), ggplot2::aes(V1))
rm(p1)
pryr::mem_used()
#> 127 MB
## Hidden object is not in .GlobalEnv
ls(.GlobalEnv, all.names = TRUE)
#> [1] "path"
## Here I know where to find it: environment(ggplot2:::.store$get)
ls(all.names = TRUE, envir = environment(ggplot2:::.store$get))
#> [1] ".last_plot"
pryr::object_size(get(".last_plot", environment(ggplot2:::.store$get))$data)
#> 80 MB
## But how could I have found this otherwise?
Created on 2020-11-03 by the reprex package (v0.3.0)
I don't think there's any existing way to do this. If you combine #AllanCameron's answer with my comment, where you'd also run ls(y) for y environments calculated as
ns <- loadedNamespaces()
for (x in ns) {
y <- loadNamespace(x)
# look at the size of everything in y
}
you still won't find all the environments. I think you could do it if you also examined every object that might contain a reference to an environment (e.g. every function, formula, list, and various exotic objects) but it would be tricky not to miss something or count things more than once.
Edited to add: Actually, pryr::object_size is pretty smart at reporting on the environments attached to objects, so we'd get close by searching namespaces. For example, to find the top 20 objects:
pryr::mem_used()
#> Registered S3 method overwritten by 'pryr':
#> method from
#> print.bytes Rcpp
#> 35 MB
path <- tempfile()
if(!file.exists(path)){
saveRDS(as.data.frame(matrix(rep(1,1e07), ncol=5)), path)
}
pryr::mem_used()
#> 35.2 MB
p1 <- ggplot2::ggplot(readr::read_rds(path), ggplot2::aes(V1))
rm(p1)
pryr::mem_used()
#> 127 MB
envs <- c(globalenv = globalenv(),
sapply(loadedNamespaces(), function(ns) loadNamespace(ns)))
sizes <- lapply(envs, function(e) {
objs <- ls(e, all = TRUE)
sapply(objs, function(obj) pryr::object_size(get(obj, envir = e)))
})
head(sort(unlist(sizes), decreasing = TRUE), 20)
#> base..__S3MethodsTable__. utils..__S3MethodsTable__.
#> 96216872 83443704
#> grid..__S3MethodsTable__. ggplot2..__S3MethodsTable__.
#> 80945520 80636768
#> ggplot2..store methods..classTable
#> 80418936 10101152
#> graphics..__S3MethodsTable__. tools..check_packages
#> 9325608 5185880
#> compiler.inlineHandlers methods..genericTable
#> 3444600 2808440
#> Rcpp..__T__show:methods colorspace..__T__show:methods
#> 2474672 2447880
#> Rcpp..RcppClass Rcpp..__C__C++OverloadedMethods
#> 2127584 1990504
#> Rcpp..__C__RcppClass Rcpp..__C__C++Field
#> 1982576 1980176
#> Rcpp..__C__C++Constructor Rcpp..__T__$:base
#> 1979992 1939616
#> tools..install_packages Rcpp..__C__Module
#> 1904032 1899872
Created on 2020-11-03 by the reprex package (v0.3.0)
I don't know why those methods tables come out so large (I suspect it's because ggplot2 adds methods to those tables, so its environment gets captured); but somehow they are finding your object, because they aren't so big if I don't create it.
A hint about the issue is in the 5th object, listed as ggplot2..store (i.e. the object named .store in the ggplot2 namespace). Doesn't tell you to look in the environments of the functions in .store, but at least it gets you started.
Second edit:
Here are some tweaks to make the output a bit more readable.
# Unlist first, so we can clean up the names
sizes <- unlist(sizes)
# Replace the first dot with :::
names(sizes) <- sub(".", ":::", names(sizes), fixed = TRUE)
# Remove internal R objects
keep <- !grepl(".__", names(sizes), fixed = TRUE)
sizes <- sizes[keep]
With these changes, the output from sort(sizes[keep], decreasing = TRUE) starts out as
ggplot2:::.store
80418936
base:::.userHooksEnv
47855920
base:::.Options
45016888
utils:::Rprof
44958416
If you do
unlist(lapply(search(), function(y) sapply(ls(y), function(x) object.size(get(x)))))
You will get a complete list of all the objects in all the environments on your search path, including their sizes. You can then sort these and find the offending objects.

Different print method dispatched when running in console vs R markdown cell?

When I run the following code in the console vs in the R markdown document, I see that the different methods are being dispatched. Is this a bug, or does the way that the code is run affect method dispatch?
library(sloop)
library(tidyverse)
df <- tibble(x=rnorm(5),
y=rnorm(5))
sloop::s3_dispatch(print(df))
Running the code in the R Markdown code cell (in RStudio)
Running the code in the console
RStudio is overriding print.tbl_df in the notebook. Code here.
In an R Notebook:
getAnywhere(print.tbl_df)
#> 2 differing objects matching ‘print.tbl_df’ were found
#> in the following places
#> registered S3 method for print
#> namespace:tibble
#> Use [] to view one of them
getAnywhere(print.tbl_df)[1]
#> function (x, ...)
#> {
#> o <- overrideMap(x, options)
#> if (!is.null(o)) {
#> overridePrint(o$x, o$options, o$className, o$nRow, o$nCol)
#> }
#> }
#> <bytecode: 0x7fb2bdf8fd48>
#> <environment: 0x7fb2bd9567e8>
#> attr(,".rs.S3Override")
#> [1] TRUE
In a normal R console:
getAnywhere(print.tbl_df)
#> A single object matching ‘print.tbl_df’ was found
#> It was found in the following places
#> namespace:tibble
#> with value
#>
#> function (x, ..., n = NULL, width = NULL, n_extra = NULL)
#> {
#> NextMethod()
#> }
#> <bytecode: 0x7fb2b77d7040>
#> <environment: namespace:tibble>

Lock environment but not .Random.seed

Is it possible to lock the global environment and still allow .Random.seed to be set or removed? The default behavior of lockEnvironment() is too aggressive
for my use case.
lockEnvironment(globalenv())
rnorm(10)
#> Error in rnorm(10) : cannot add bindings to a locked environment
rm(.Random.seed)
#> Error in rm(.Random.seed) :
#> cannot remove bindings from a locked environment
Background
drake version 7.0.0 will have a new safeguard to protect reproducibility.
plan <- drake_plan(
x = {
data(mtcars)
mtcars$mpg
},
y = mean(x)
)
plan
#> # A tibble: 2 x 2
#> target command
#> <chr> <expr>
#> 1 x { data(mtcars) mtcars$mpg }
#> 2 y mean(x)
make(plan)
#> target x
#> fail x
#> Error: Target `x` failed. Call `diagnose(x)` for details. Error message:
#> cannot add bindings to a locked environment.
#> One of your targets tried to modify your environment,
#> which could invalidate other targets
#> and undermine reproducibility (example:
#> https://github.com/ropensci/drake/issues/664#issuecomment-453163562).
#> Beware <<-, ->>, attach(), data(), and side effects in general.
#> Use make(lock_envir = FALSE) to avoid this error (not recommended).
The error comes from the call to data(mtcars). The very act of building x would have changed x's dependencies. Without guardrails, the workflow invalidates itself.
make(plan, lock_envir = FALSE)
#> target x
#> target y
make(plan, lock_envir = FALSE)
#> target x
But with guardrails, we run into edge cases like https://github.com/ropensci/drake/issues/749 and https://github.com/ropensci/drake/issues/675#issuecomment-458222414.

`testthat::expect_silent()` does not seem to notice ggplot2 errors

I'm having trouble understanding the following behaviour of the expect_silent() function from testthat.
expect_silent() is supposed to fail when the test code returns any output, for example an error or warning:
library(testthat)
test_that("expect_silent works as expected", {
expect_silent( {
stop()
} )
} )
#> Error: Test failed: 'expect_silent works as expected'
#> *
#> 1: expect_silent({
#> stop()
#> }) at <text>:5
#> 2: quasi_capture(enquo(object), evaluate_promise)
#> 3: capture(act$val <- eval_bare(get_expr(quo), get_env(quo)))
#> 4: withr::with_output_sink(temp, withCallingHandlers(withVisible(code), warning = handle_warning,
#> message = handle_message))
#> 5: force(code)
#> 6: withCallingHandlers(withVisible(code), warning = handle_warning, message = handle_message)
#> 7: withVisible(code)
#> 8: eval_bare(get_expr(quo), get_env(quo))
#> 9: stop() at <text>:6
(The above is the expected behaviour: expect_silent() detects the error produced by stop(), and the test fails.)
However, for some reason it doesn't seem to detect errors that occur in ggplot2 expressions. For example, the following ggplot2 code produces an error due to a misspelling:
library(ggplot2)
ggplot(diamonds, aes(x = carrot, y = price)) +
geom_point()
#> Error in FUN(X[[i]], ...): object 'carrot' not found
But expect_silent() doesn't seem to detect the error:
test_that("expect_silent fails when ggplot2 throws an error", {
expect_silent( {
ggplot(diamonds, aes(x = carrot, y = price)) +
geom_point()
} )
} )
(No output is produced.)
Am I misunderstanding the purpose of expect_silent()? This is causing me a real headache as I'm trying to use it to test a ggplot2 extension.
Try capturing the output from ggplot and then testing if it can be printed:
library(ggplot2)
library(testthat)
# First test should succeed (no output)
test_that("silent when ggplot2 succeeds", {
working.plot <- ggplot(diamonds, aes(x = carat, y = price)) + geom_point()
expect_silent(print(working.plot))
} )
# Second test should fail
test_that("fails when ggplot2 throws an error", {
broken.plot <- ggplot(diamonds, aes(x = carrot, y = price)) + geom_point()
expect_silent(print(broken.plot))
} )
The second test fails with copious output which I've curtailed below:
Error: Test failed: 'expect_silent fails when ggplot2 throws an error'
* object 'carrot' not found
Update - 15th Dec 2018
Regarding your comment about why print() is necessary:
The ggplot() function returns an object of class ggplot. The ggplot2 package overloads the print() function, so instead of printing the object to STDOUT in the R session terminal, it prints the chart. The interactive mode in the R session terminal assumes that most of the commands are run through the print() function.
The testthat tests are evaluated in their own environments. The testthat environments are non-interactive, so the running through the print() function assumption no longer holds. You can test this with the interactive() function that comes with base R. It should report TRUE in the R session terminal and FALSE within a test_that() call.

Resources