I have an R object lf which is an element of the class tbl_lazy:
library(dbplyr)
lf <- lazy_frame(a = TRUE, b = 1, c = 2, d = "z", con = simulate_hana())
>class(lf)
[1] "tbl_HDB" "tbl_lazy" "tbl"
With the help of the sloop package, I can see that the generic function print.tbl_lazy is set to visible = FALSE. This seems to be the reason why printing print.tbl_lazy returns Error: object 'print.tbl_lazy' not found.
generic class visible source
<chr> <chr> <lgl> <chr>
11 print tbl_lazy FALSE registered S3method
When I debug print I see the call to print.lazy and can now see the content of print.tbl_lazy.
debugging in: function (x, ...)
UseMethod("print")(x)
debug: UseMethod("print")
Browse[2]> n
debugging in: print.tbl_lazy(x)
debug: {
show_query(x)
}
My question is why are all the methods of the class tbl_lazy set to visible = FALSE and what are the consequences of this? It would appear to me, while it may have some advantages, whatever they might be, it makes the code of the method more difficult to access, which in a language like R, used by so many non technical users, seems to be a big disadvantage.
I wasn't able to find any documentation on this.
Related
I am trying to find which objects are taking a lot of memory in my R session, but the problem is that the object might have been invisibly created with an unknown name in an unknown environment.
If the object is stored in .GlobalEnv or a known environment, I can easily use a strategy like ls(enviro)+get()+object.size() (see lsos on this post for example) to list all objects and their size, allowing me to identify the heavy objects.
However, the object in question might not be stored in .GlobalEnv, but might be in some obscure environment implicitly created by an external package. How can in that case identify which object is using a lot of RAM?
The best case study is ggplot2 creating .last_plot in a dedicated environment. Looking under the hood one can find that it is stored in environment(ggplot2:::.store$get), so one can find it and eventually remove it. But if I didn't know that location or name a priori, would there be a way to find that there is a heavy object called .last_plot somewhere in memory?
pryr::mem_used()
#> 34.7 MB
## example: implicit creation of heavy and hidden object by ggplot
path <- tempfile()
if(!file.exists(path)){
saveRDS(as.data.frame(matrix(rep(1,1e07), ncol=5)), path)
}
pryr::mem_used()
#> 34.9 MB
p1 <- ggplot2::ggplot(readr::read_rds(path), ggplot2::aes(V1))
rm(p1)
pryr::mem_used()
#> 127 MB
## Hidden object is not in .GlobalEnv
ls(.GlobalEnv, all.names = TRUE)
#> [1] "path"
## Here I know where to find it: environment(ggplot2:::.store$get)
ls(all.names = TRUE, envir = environment(ggplot2:::.store$get))
#> [1] ".last_plot"
pryr::object_size(get(".last_plot", environment(ggplot2:::.store$get))$data)
#> 80 MB
## But how could I have found this otherwise?
Created on 2020-11-03 by the reprex package (v0.3.0)
I don't think there's any existing way to do this. If you combine #AllanCameron's answer with my comment, where you'd also run ls(y) for y environments calculated as
ns <- loadedNamespaces()
for (x in ns) {
y <- loadNamespace(x)
# look at the size of everything in y
}
you still won't find all the environments. I think you could do it if you also examined every object that might contain a reference to an environment (e.g. every function, formula, list, and various exotic objects) but it would be tricky not to miss something or count things more than once.
Edited to add: Actually, pryr::object_size is pretty smart at reporting on the environments attached to objects, so we'd get close by searching namespaces. For example, to find the top 20 objects:
pryr::mem_used()
#> Registered S3 method overwritten by 'pryr':
#> method from
#> print.bytes Rcpp
#> 35 MB
path <- tempfile()
if(!file.exists(path)){
saveRDS(as.data.frame(matrix(rep(1,1e07), ncol=5)), path)
}
pryr::mem_used()
#> 35.2 MB
p1 <- ggplot2::ggplot(readr::read_rds(path), ggplot2::aes(V1))
rm(p1)
pryr::mem_used()
#> 127 MB
envs <- c(globalenv = globalenv(),
sapply(loadedNamespaces(), function(ns) loadNamespace(ns)))
sizes <- lapply(envs, function(e) {
objs <- ls(e, all = TRUE)
sapply(objs, function(obj) pryr::object_size(get(obj, envir = e)))
})
head(sort(unlist(sizes), decreasing = TRUE), 20)
#> base..__S3MethodsTable__. utils..__S3MethodsTable__.
#> 96216872 83443704
#> grid..__S3MethodsTable__. ggplot2..__S3MethodsTable__.
#> 80945520 80636768
#> ggplot2..store methods..classTable
#> 80418936 10101152
#> graphics..__S3MethodsTable__. tools..check_packages
#> 9325608 5185880
#> compiler.inlineHandlers methods..genericTable
#> 3444600 2808440
#> Rcpp..__T__show:methods colorspace..__T__show:methods
#> 2474672 2447880
#> Rcpp..RcppClass Rcpp..__C__C++OverloadedMethods
#> 2127584 1990504
#> Rcpp..__C__RcppClass Rcpp..__C__C++Field
#> 1982576 1980176
#> Rcpp..__C__C++Constructor Rcpp..__T__$:base
#> 1979992 1939616
#> tools..install_packages Rcpp..__C__Module
#> 1904032 1899872
Created on 2020-11-03 by the reprex package (v0.3.0)
I don't know why those methods tables come out so large (I suspect it's because ggplot2 adds methods to those tables, so its environment gets captured); but somehow they are finding your object, because they aren't so big if I don't create it.
A hint about the issue is in the 5th object, listed as ggplot2..store (i.e. the object named .store in the ggplot2 namespace). Doesn't tell you to look in the environments of the functions in .store, but at least it gets you started.
Second edit:
Here are some tweaks to make the output a bit more readable.
# Unlist first, so we can clean up the names
sizes <- unlist(sizes)
# Replace the first dot with :::
names(sizes) <- sub(".", ":::", names(sizes), fixed = TRUE)
# Remove internal R objects
keep <- !grepl(".__", names(sizes), fixed = TRUE)
sizes <- sizes[keep]
With these changes, the output from sort(sizes[keep], decreasing = TRUE) starts out as
ggplot2:::.store
80418936
base:::.userHooksEnv
47855920
base:::.Options
45016888
utils:::Rprof
44958416
If you do
unlist(lapply(search(), function(y) sapply(ls(y), function(x) object.size(get(x)))))
You will get a complete list of all the objects in all the environments on your search path, including their sizes. You can then sort these and find the offending objects.
Say I have the following tag:
library(htmltools)
t = div(name = 'oldname')
I can overwrite the 'name' attribute of this tag using t$attribs$name = 'newname' but prefer using htmltools getters/setters, does the package have a function that facilitates this?
Looking through the package manual, the only function that allows for the manipulation of tag attributes is tagAppendAttributes, which only appends the new atrribute value to the original:
t = tagAppendAttributes(t, name = 'newname')
t
#<div name="oldname newname"></div>
Does the absence of a helper function that overwrites the value of an attribute mean that tag attributes are not meant to be overwritten?
You're probably overthinking this. Look at the code for tagAppendAttributes:
tagAppendAttributes
#> function (tag, ...)
#> {
#> tag$attribs <- c(tag$attribs, list(...))
#> tag
#> }
All it does is take whatever you pass and write directly to tag$attribs. If you unclass your object you'll see it's just a list really:
unclass(t)
#> $name
#> [1] "div"
#>
#> $attribs
#> $attribs$name
#> [1] "oldname"
#>
#>
#> $children
#> list()
I can see why writing directly to an object's data member rather than using a setter might not feel right if you come from an object-oriented programming background, but this is clearly a "public" data member in an informal S3 class. Setting it directly is no more likely to break it that any other implementation.
If you really want to I suppose you could define a setter:
tagSetAttributes <- function(tag, ...) {tag$attribs <- list(...); tag}
tagSetAttributes(t, name = "new name")
#> <div name="new name"></div>
Is it possible to change default argument(s) of S3 Methods in R?
It's easy enough to change arguments using formals ...
# return default arguments of table
> args(table)
function (..., exclude = if (useNA == "no") c(NA, NaN), useNA = c("no",
"ifany", "always"), dnn = list.names(...), deparse.level = 1)
# Update an argument
> formals(table)$useNA <- "always"
# Check change
> args(table)
function (..., exclude = if (useNA == "no") c(NA, NaN), useNA = "always",
dnn = list.names(...), deparse.level = 1)
But not S3 methods ...
# View default argument of S3 method
> formals(utils:::str.default)$list.len
[1] 99
# Attempt to change
> formals(utils:::str.default)$list.len <- 99
Error in formals(utils:::str.default)$list.len <- 99 :
object 'utils' not found
At #nicola's generous prompting here is an answer-version of the comments:
You can edit S3 methods and other non-exported functions using assignInNamespace(). This lets you replace a function in a given namespace with a new user-defined function (fixInNamespace() will open the target function in an editor to let you make a change).
# Take a look at what we are going to change
formals(utils:::str.default)$list.len
#> [1] 99
# extract the whole function from utils namespace
f_to_edit <- utils:::str.default
# make the necessary alterations
formals(f_to_edit)$list.len<-900
# Now we substitute our new improved version of str.default inside
# the utils namespace
assignInNamespace("str.default", f_to_edit, ns = "utils")
# and check the result
formals(utils:::str.default)$list.len
#> [1] 900
If you restart your R session you'll recover the defaults (or you can put them back manually in the current session).
I'm attempting to exclude some words when running hunspell_check on a text block in Rstudio.
ignore_me <- c("Daniel")
hunspell_check(unlist(some_text), ignore = ignore_me, dict = dictionary("en_GB"))
However, whenever I run I get the following error:
Error in hunspell_check(unlist(some_text, dict = dictionary("en_GB"), :
unused argument (ignore = ignore_me))
I've had a look around SO and trawled the documenation but am struggling to figure what's gone wrong.
It looks like you’ve missed a closing bracket after some_text, so it’s passinng ignore as an argument to unlist() rather than hunspell_check().
UPDATE: Ok, I think you were looking at an old version of the documentation. At least that's what I did at first (https://www.rdocumentation.org/packages/hunspell/versions/1.1/topics/hunspell_check). In the current version, 2.9, ignore is no longer an argument for hunspell_check(). Instead, use add_words in the call to dictionary():
library(hunspell)
some_text <- list("hello", "there", "Daniell")
hunspell_check(unlist(some_text), dict = dictionary("en_GB"))
# [1] TRUE TRUE FALSE
ignore_me <- "Daniell"
hunspell_check(unlist(some_text), dict = dictionary("en_GB", add_words = ignore_me))
# [1] TRUE TRUE TRUE
I understood that .Fortran from following code invokes Fortran subroutine, but why we are using C_ for subroutine name here? Few other subroutine calling examples I looked over internet are simply "stl", can someone please help me with why C_stl instead of stl?
z <- .Fortran(C_stl, x, n,
as.integer(period),
as.integer(s.window),
as.integer(t.window),
as.integer(l.window),
s.degree, t.degree, l.degree,
nsjump = as.integer(s.jump),
ntjump = as.integer(t.jump),
nljump = as.integer(l.jump),
ni = as.integer(inner),
no = as.integer(outer),
weights = double(n),
seasonal = double(n),
trend = double(n),
double((n+2*period)*5))
C_stl is an object in the stats package containing auxiliary information about the Fortran subroutine. It's not exported, so to see it you'll have to type stats:::C_stl.
> stats:::C_stl
$name
[1] "stl"
$address
<pointer: 0x000000000f87b950>
attr(,"class")
[1] "RegisteredNativeSymbol"
$dll
DLL name: stats
Filename: E:/apps/R/R-3.1.1/library/stats/libs/x64/stats.dll
Dynamic lookup: FALSE
$numParameters
[1] 18
attr(,"class")
[1] "FortranRoutine" "NativeSymbolInfo"
After a lot of searching I believe I found the answer. Look in the the NAMESPACE file in the directory <path to R sources>/src/library/stats.
You'll see that all C/Fortran routines are referred to with names prefixed with C_, This appears to be done by useDynLib.