I've been exploring engine.R to improve my understanding of how the sql engine for Knitr works. I noticed a call to setNames in the definition for interpolate_from_env that looks like it may be redundant (at least to my relatively inexperienced eyes!)
I am unsure if it is appropriate to reproduce the entirety of the function definition below, so I'm just including the lines in question:
args = if (length(names) > 0) setNames(
mget(names, envir = env), names)
setNames appears to be called on the result of mget, which already returns a named list of objects. Similarly, the call to identical below returns TRUE:
identical(
mget(names, envir = env),
setNames(mget(names, envir = env, names)
)
Have I overlooked something here or is the call to setNames in fact redundant?
TIA!
Related
I'm posting this in hopes someone could explain the behavior here. And perhaps this may save others some time in tracking down how to fix a similar error.
The answer is likely somewhere here in this vignette by Hadley Wickham and Lionel Henry. Yet it will take someone like me weeks of study to connect the dots.
I am running a number of queries from a remote database and then combining them into a single data.table. I add the "part_" prefix to the name of each individual query result and use ls() and mget() with data.table's rbindlist() to combined them.
This works:
results_all <- rbindlist(mget(ls(pattern = "part_", )))
I learned that approach, probably from list data.tables in memory and combine by row (rbind), and it is a helpful thing to know how to do for sure.
For readability, I often prefer using the magrittr pipe (or chaining with data.table) and especially so with projects like this because I use dplyr to query the database. Yet this code results in an error:
results_all <- ls(pattern = "part_", ) %>%
mget() %>%
rbindlist()
The error reads Error: value for ‘part_a’ not found where part_a is the first object name in the character vector returned by ls().
Searching that error message, I came across the discussion in this data.table Github issue. Reading through that, I tried setting "inherits = TRUE" within mget() like so:
results_all <- ls(pattern = "part_", ) %>%
mget(inherits = TRUE) %>%
rbindlist()
And that works. So the error is happening when piping the result of ls() to mget(). And given that nesting ls() within mget() works, my guess is that it is something to do with the pipe and "the enclosing frames of the environment".
In writing this up, I came across Unexpected error message while joining data.table with rbindlist() using mget(). From the discussion there I found out that this also works.
results_all <- ls(pattern = "part_", ) %>%
mget(envir = .GlobalEnv) %>%
rbindlist()
Again, I am hoping someone can explain what is going on for folks looking to learn more about how environments work in R.
Edit: Adding reproducible example
Per the request for a reproducible answer, running the code above using these three data.tables (data.frames or tibbles will behave the same) should do it.
part_a <- data.table(col1 = 1:10, col2 = sample(letters, 10))
part_b <- data.table(col1 = 11:20, col2 = sample(letters, 10))
part_c <- data.table(col1 = 21:30, col2 = sample(letters, 10))
The rhs argument to a pipe operator (in your example, the expression mget()) is never evaluated as a function call by the interpreter. The pipe operator is an infix function that performs non-standard evaluation of its second argument (rhs). The pipe function composes and performs a new function call using the RHS expression as a sort of "template".
The calling environment of this new function call is the function environment of %>%, not the calling environment of the lhs function or the global environment. .GlobalEnv and the calling environment of the lhs function happen to be the same environment in your example, and that environment is a parent to the function environment of %>%, which is why inherits = TRUE or setting the environment to .GlobalEnv works for you.
I am a beginner so I'd appreciate any thoughts, and I understand that this question might be too basic for some of you.
Also, this question is not about the difference between <- and =, but about the way they get evaluated when they are part of the function argument. I read this thread, Assignment operators in R: '=' and '<-' and several others, but I couldn't understand the difference.
Here's the first line of code:
My objective is to get rid of variables in the environment. From reading the above thread, I would believe that <- would exist in the user workspace, so there shouldn't be any issue with deleting all variables.
Here is my code and two questions:
Question 1
First off, this code doesn't work.
rm(ls()) #throws an error
I believe this happens because ls() returns a character vector, and rm() expects an object name. Am I correct? If so, I would appreciate if someone could guide me how to get object names from character array.
Question 2
I googled this topic and found that this code below deletes all variables.
rm(list = ls())
While this does help me, I am unsure why = is used instead of <-. If I run the following code, I get an error Error in rm(list <- ls()) : ... must contain names or character strings
rm(list <- ls())
Why is this? Can someone please guide me? I'd appreciate any help/guidance.
I read this thread, Assignment operators in R: '=' and '<-' and several others, but I couldn't understand the difference.
No wonder, since the answers there are actually quite confusing, and some are outright wrong. Since that’s the case, let’s first establish the difference between them before diving into your actual question (which, it turns out, is mostly unrelated):
<- is an assignment operator
In R, <- is an operator that performs assignment from right to left, in the current scope. That’s it.
= is either an assignment operator or a distinct syntactic token
=, by contrast, has several meanings: its semantics change depending on the syntactic context it is used in:
If = is used inside a parameter list, immediately to the right of a parameter name, then its meaning is: “associate the value on the right with the parameter name on the left”.
Otherwise (i.e. in all other situations), = is also an operator, and by default has the same meaning as <-: i.e. it performs assignment in the current scope.
As a consequence of this, the operators <- and = can be used interchangeably1. However, = has an additional syntactic role in an argument list of a function definition or a function call. In this context it’s not an operator and cannot be replaced by <-.
So all these statements are equivalent:
x <- 1
x = 1
x[5] <- 1
x[5] = 1
(x <- 1)
(x = 1)
f((x <- 5))
f((x = 5))
Note the extra parentheses in the last example: if we omitted these, then f(x = 5) would be interpreted as a parameter association rather than an assignment.
With that out of the way, let’s turn to your first question:
When calling rm(ls()), you are passing ls() to rm as the ... parameter. Ronak’s answer explains this in more detail.
Your second question should be answered by my explanation above: <- and = behave differently in this context because the syntactic usage dictates that rm(list = ls()) associates ls() with the named parameter list, whereas <- is (as always) an assignment operator. The result of that assignment is then once again passed as the ... parameter.
1 Unless somebody changed their meaning: operators, like all other functions in R, can be overwritten with new definitions.
To expand on my comment slightly, consider this example:
> foo <- function(a,b) b+1
> foo(1,b <- 2) # Works
[1] 3
> ls()
[1] "b" "foo"
> foo(b <- 3) # Doesn't work
Error in foo(b <- 3) : argument "b" is missing, with no default
The ... argument has some special stuff going on that restricts things a little further in the OP's case, but this illustrates the issue with how R is parsing the function arguments.
Specifically, when R looks for named arguments, it looks specifically for arg = val, with an equals sign. Otherwise, it is parsing the arguments positionally. So when you omit the first argument, a, and just do b <- 1, it thinks the expression b <- 1 is what you are passing for the argument a.
If you check ?rm
rm(..., list = character(),pos = -1,envir = as.environment(pos), inherits = FALSE)
where ,
... - the objects to be removed, as names (unquoted) or character strings (quoted).
and
list - a character vector naming objects to be removed.
So, if you do
a <- 5
and then
rm(a)
it will remove the a from the global environment.
Further , if there are multiple objects you want to remove,
a <- 5
b <- 10
rm(a, b)
This can also be written as
rm(... = a, b)
where we are specifying that the ... part in syntax takes the arguments a and b
Similarly, when we want to specify the list part of the syntax, it has to be given by
rm(list = ls())
doing list <- ls() will store all the variables from ls() in the variable named list
list <- ls()
list
#[1] "a" "b" "list"
I hope this is helpful.
Normally I wonder where mysterious errors come from but now my question is where a mysterious lack of error comes from.
Let
numbers <- c(1, 2, 3)
frame <- as.data.frame(numbers)
If I type
subset(numbers, )
(so I want to take some subset but forget to specify the subset-argument of the subset function) then R reminds me (as it should):
Error in subset.default(numbers, ) :
argument "subset" is missing, with no default
However when I type
subset(frame,)
(so the same thing with a data.frame instead of a vector), it doesn't give an error but instead just returns the (full) dataframe.
What is going on here? Why don't I get my well deserved error message?
tl;dr: The subset function calls different functions (has different methods) depending on the type of object it is fed. In the example above, subset(numbers, ) uses subset.default while subset(frame, ) uses subset.data.frame.
R has a couple of object-oriented systems built-in. The simplest and most common is called S3. This OO programming style implements what Wickham calls a "generic-function OO." Under this style of OO, an object called a generic function looks at the class of an object and then applies the proper method to the object. If no direct method exists, then there is always a default method available.
To get a better idea of how S3 works and the other OO systems work, you might check out the relevant portion of the Advanced R site. The procedure of finding the proper method for an object is referred to as method dispatch. You can read more about this in the help file ?UseMethod.
As noted in the Details section of ?subset, the subset function "is a generic function." This means that subset examines the class of the object in the first argument and then uses method dispatch to apply the appropriate method to the object.
The methods of a generic function are encoded as
< generic function name >.< class name >
and can be found using methods(<generic function name>). For subset, we get
methods(subset)
[1] subset.data.frame subset.default subset.matrix
see '?methods' for accessing help and source code
which indicates that if the object has a data.frame class, then subset calls the subset.data.frame the method (function). It is defined as below:
subset.data.frame
function (x, subset, select, drop = FALSE, ...)
{
r <- if (missing(subset))
rep_len(TRUE, nrow(x))
else {
e <- substitute(subset)
r <- eval(e, x, parent.frame())
if (!is.logical(r))
stop("'subset' must be logical")
r & !is.na(r)
}
vars <- if (missing(select))
TRUE
else {
nl <- as.list(seq_along(x))
names(nl) <- names(x)
eval(substitute(select), nl, parent.frame())
}
x[r, vars, drop = drop]
}
Note that if the subset argument is missing, the first lines
r <- if (missing(subset))
rep_len(TRUE, nrow(x))
produce a vector of TRUES of the same length as the data.frame, and the last line
x[r, vars, drop = drop]
feeds this vector into the row argument which means that if you did not include a subset argument, then the subset function will return all of the rows of the data.frame.
As we can see from the output of the methods call, subset does not have methods for atomic vectors. This means, as your error
Error in subset.default(numbers, )
that when you apply subset to a vector, R calls the subset.default method which is defined as
subset.default
function (x, subset, ...)
{
if (!is.logical(subset))
stop("'subset' must be logical")
x[subset & !is.na(subset)]
}
The subset.default function throws an error with stop when the subset argument is missing.
My question is whether this is possible. Given a list
lst <- list(a = 1, 2)
One can put with(lst, a) to return the first element. But can you return the second element using with without first naming it?
Doesn't work:
with(lst, [[2]])
with(lst, `2`)
with(lst, ..2)
I suspect that this is not possible because with(lst, ls(all.names = TRUE)) gives just "a". But does anyone know different?
I realise why with(lst, 2) could never work. And of course [[ is a function, so it is clear that my first attempt would confuse R. However, it would be feasible that with would give special names to unnamed arguments so that they were accessible without having to re-access the list separately. For example, the second element could be called ..2 in the environment set up by with. This is not the case, though.
In this example one would simply use lst[[2]]. But I am thinking in terms of a complex expression for a large multi-levelled list, for which some elements are named and others not. The code would be much more readable using a with statement to start at a convenient level of subsetting. But having some needed elements unnamed is a barrier for this.
No, this is not possible. You can't reference an unnamed object from within a list.
lst <- list(a = 1, 2)
This is what with does:
eval(substitute(a), lst, enclos = parent.frame())
#[1] 1
The only object accessible from within lst is a:
eval(substitute(ls()), lst, enclos = parent.frame())
#[1] "a"
I'd suggest naming all components of the list. (And to be honest, I don't really see a common use case.)
I'm creating a R package and I have a function that returns an object which its name is constructed with the argument passed.
I use the function assign() to do this as in the code below and it works fine.
df <- data.frame(A = 1:10, B = 2:11, C = 5:14)
ot_test <- function(df, min){
tmp <- colSums(df)
tmp2 <- df[, tmp >= min]
assign(paste0(deparse(substitute(df)), "_min_", min), tmp2, envir= .GlobalEnv)
}
ot_test(df,60)
ls()
[1] "df" "df_min_60" "ot_test"
But when I check the package with devtools::check I have the message.
Found the following assignments to the global environment:
File 'test/R/ottest.R':
assign(paste0(deparse(substitute(df)), "_min_", min), tmp2, envir = .GlobalEnv)
Is there a way to do the same without having .GlobalEnv in argument or without using the function assign().
Its just an ugly, bad thing to do in a functional programming environment.
What's wrong with:
df_min_60 = ot_test(df,60)
Your argument will be that your method saves a bit of typing, but it opens you up to all sorts of bugs and obscurities.
Suppose I want to call ot_test in a function, in a loop maybe. Now its stomped on, with no warning or obvious clue its going to do it, the df_min_60 in my global workspace. Gee thanks for that. So what do I have to do?
ot_test(df, 60)
# now rename so I don't stomp on it
df_min_60.1 = df_min_60
results = domyloop(d1,d2,d3)
Which has meant more typing.
Now another idea. Suppose I want to call ot_test on a list of data frames and make a list of the results. Normally I'd do something like:
for(i in 1:10){res[[i]] = ot_test(data[[i]], 60)}
but with your code I can't. I have to do:
for(i in 1:10){d=data[[i]]; ot_test(d,60); res[[i]] = d_min_60)}
which is WAY more typing.
Be thankful that devtools::check only gives a message and doesn't set your computer on fire for doing this. Seriously, don't create things in the global environment, return them as return values.