How does R reference unassigned values? - r

I'm familiar with tracemem() showing the hex memory address of an assigned variable, e.g.
x <- 2
tracemem(x)
#> [1] "<0x876df68>"
but what does this involve (under the hood) when the value is literally just an unassigned value? e.g.
tracemem(4)
#> [1] "<0x9bd93b8>"
The same question applies to just evaluating an expression without assignment
4
#> [1] 4
It seems that if I evaluate this several times in the console, I get ever-increasing hex addresses
tracemem(4)
#> [1] "<0x8779968>"
tracemem(4)
#> [1] "<0x87799c8>"
tracemem(4)
#> [1] "<0x8779a28>"
but if I either explicitly loop this operation
for ( i in 1:3 ) { print(tracemem(4)) }
#> [1] "<0x28bda48>"
#> [1] "<0x28bda48>"
#> [1] "<0x28bda48>"
or with sapply via replicate
replicate(3, tracemem(4))
#> [1] "<0xba88208>" "<0xba88208>" "<0xba88208>"
I get repeats of the same address, even if I explicitly delay the printing between iterations
for ( i in 1:3 ) { print(tracemem(4)); Sys.sleep(1) }
#> [1] "<0xa3c4058>"
#> [1] "<0xa3c4058>"
#> [1] "<0xa3c4058>"
My best guess is that the call refers to an already temporarily assigned value in the parent.frame given eval.parent(substitute( in replicate, but I don't know enough about the underlying .Primitive code of for to know if it's doing the same there.
I have some confidence that R is creating temporary variables given that I can do
list(x = 1)
#> $x
#> [1] 1
so R must be processing the data even though it never assigns to anything. I'm aware of the strict formality summarised by #hadleywickham's tweet:
but I'm not sure how it works here. Is it just that the temporary name isn't preserved? Does the for loop always use that name/object? Does evaluating lots of code regardless of whether or not it's assigned still use up memory? (up until gc() is called, whenever that is??)
tl;dr -- how does R "store" unassigned values for printing?

Ok, so I will do what I can here.
First off, tracemem is a primitive. This means it is not a closure like the vast majority of R-level functions you can call from R code. More specifically, it is a BUILTINSXP primitive:
> .Internal(inspect(tracemem))
#62f548 08 BUILTINSXP g0c0 [MARK,NAM(1)]
This means that when it is called, a closure is NOT applied (because it is a primitive) and it's argument IS evaluated, because it is a BUILTINSXP (see this section of the R internals manual).
Closure application is when R objects passed as arguments in function calls are assigned to the appropriate variables within the call frame. Thus this doesn't happen for tracemem. Instead, its arguments are evaluated at the C level into a SEXP that is never bound to any symbol in any environment, but instead is passed directly to the C-level do_tracemem function. See this line within the C-level eval function
This means that when a numeric constant is passed to tracemem (a valid call though something one would generally not have any reason to do), you get the actual SEXP for the constant, not one representing an R level variable with the value 4, being passed down to do_tracemem.
As far as I can tell, within any evaluation frame (I may not be using this term precisely, but call frames and steps within a for loop qualify, as do individual top-level expressions) , every evaluation of , e.g., "4L" gets an entirely new SEXP (INTSXP, speciifically) with NAMED set immediately to 4. Between these frames, it appears that they can be shared, though I pretty strongly suspect that may be an artifact of memory-reuse rather than actually shared SEXPs.
The output below appears to corroborate the memory-reuse theory but I don't have the cycles free to confirm it beyond that at the moment.
> for(i in 1:3) {print(tracemem(4L)); print(tracemem(4L))}
[1] "<0x1c3f3b8>"
[1] "<0x1c3f328>"
[1] "<0x1c3f3b8>"
[1] "<0x1c3f328>"
[1] "<0x1c3f3b8>"
[1] "<0x1c3f328>"
Hope that helps.

Related

Any logical test to distinguish between make-up of numerical objects

I was wondering if there a way for R to detect the existence or absence of the sign * as used in the following objects?
In other words, can R understand that a has a * sign but b doesn't?
a = 3*4
b = 12
If you keep the expressions unevaluated, R can understand their internal complexity. Under normal circumstances, though, R evaluates expressions immediately, so there is no way to tell the difference between a <- 3*4 and b <- 12 once the assignments have been made. That means that the answer to your specific question is No.
Dealing with unevaluated expressions can get a bit complex, but quote() is one simple way to keep e.g. 3*4 from being evaluated:
> length(quote(3*4))
[1] 3
> length(quote(12))
[1] 1
If you're working inside a function, you can use substitute to retrieve the unevaluated form of the function arguments:
> f <- function(a) {
+ length(substitute(a))
+ }
> f(12)
[1] 1
> f(3*4)
[1] 3
In case you're pursuing this farther, you should be aware that counting complexity might not be as easy as you think:
> f(sqrt(2*3+(7*19)^2))
[1] 2
What's going on is that R stores expressions as a tree; the top level here is made up of sqrt and <the rest of the expression>, which has length 2. If you want to measure complexity you'll need to do some kind of collapsing or counting down the branches of the tree ...
Furthermore, if you first assign a <- 3*4 and then call f(a) you get 1, not 3, because substitute() gives you back just the symbol a, which has length 1 ... the information about the difference between "12" and "3*4" gets lost as soon as the expression is evaluated, which happens when the value is assigned to the symbol a. The bottom line is that you have to be very careful in controlling when expressions get evaluated, and it's not easy.
Hadley Wickham's chapter on expressions might be a good place to read more.

What's the real meaning about 'Everything that exists is an object' in R?

I saw:
“To understand computations in R, two slogans are helpful:
• Everything that exists is an object.
• Everything that happens is a function call."
— John Chambers
But I just found:
a <- 2
is.object(a)
# FALSE
Actually, if a variable is a pure base type, it's result is.object() would be FALSE. So it should not be an object.
So what's the real meaning about 'Everything that exists is an object' in R?
The function is.object seems only to look if the object has a "class" attribute. So it has not the same meaning as in the slogan.
For instance:
x <- 1
attributes(x) # it does not have a class attribute
NULL
is.object(x)
[1] FALSE
class(x) <- "my_class"
attributes(x) # now it has a class attribute
$class
[1] "my_class"
is.object(x)
[1] TRUE
Now, trying to answer your real question, about the slogan, this is how I would put it. Everything that exists in R is an object in the sense that it is a kind of data structure that can be manipulated. I think this is better understood with functions and expressions, which are not usually thought as data.
Taking a quote from Chambers (2008):
The central computation in R is a function call, defined by the
function object itself and the objects that are supplied as the
arguments. In the functional programming model, the result is defined
by another object, the value of the call. Hence the traditional motto
of the S language: everything is an object—the arguments, the value,
and in fact the function and the call itself: All of these are defined
as objects. Think of objects as collections of data of all kinds. The data contained and the way the data is organized depend on the class from which the object was generated.
Take this expression for example mean(rnorm(100), trim = 0.9). Until it is is evaluated, it is an object very much like any other. So you can change its elements just like you would do it with a list. For instance:
call <- substitute(mean(rnorm(100), trim = 0.9))
call[[2]] <- substitute(rt(100,2 ))
call
mean(rt(100, 2), trim = 0.9)
Or take a function, like rnorm:
rnorm
function (n, mean = 0, sd = 1)
.Call(C_rnorm, n, mean, sd)
<environment: namespace:stats>
You can change its default arguments just like a simple object, like a list, too:
formals(rnorm)[2] <- 100
rnorm
function (n, mean = 100, sd = 1)
.Call(C_rnorm, n, mean, sd)
<environment: namespace:stats>
Taking one more time from Chambers (2008):
The key concept is that expressions for evaluation are themselves
objects; in the traditional motto of the S language, everything is an
object. Evaluation consists of taking the object representing an
expression and returning the object that is the value of that
expression.
So going back to our call example, the call is an object which represents another object. When evaluated, it becomes that other object, which in this case is the numeric vector with one number: -0.008138572.
set.seed(1)
eval(call)
[1] -0.008138572
And that would take us to the second slogan, which you did not mention, but usually comes together with the first one: "Everything that happens is a function call".
Taking again from Chambers (2008), he actually qualifies this statement a little bit:
Nearly everything that happens in R results from a function call.
Therefore, basic programming centers on creating and refining
functions.
So what that means is that almost every transformation of data that happens in R is a function call. Even a simple thing, like a parenthesis, is a function in R.
So taking the parenthesis like an example, you can actually redefine it to do things like this:
`(` <- function(x) x + 1
(1)
[1] 2
Which is not a good idea but illustrates the point. So I guess this is how I would sum it up: Everything that exists in R is an object because they are data which can be manipulated. And (almost) everything that happens is a function call, which is an evaluation of this object which gives you another object.
I love that quote.
In another (as of now unpublished) write-up, the author continues with
R has a uniform internal structure for representing all objects. The evaluation process keys off that structure, in a simple form that is essentially
composed of function calls, with objects as arguments and an object as the
value. Understanding the central role of objects and functions in R makes
use of the software more effective for any challenging application, even those where extending R is not the goal.
but then spends several hundred pages expanding on it. It will be a great read once finished.
Objects For x to be an object means that it has a class thus class(x) returns a class for every object. Even functions have a class as do environments and other objects one might not expect:
class(sin)
## [1] "function"
class(.GlobalEnv)
## [1] "environment"
I would not pay too much attention to is.object. is.object(x) has a slightly different meaning than what we are using here -- it returns TRUE if x has a class name internally stored along with its value. If the class is stored then class(x) returns the stored value and if not then class(x) will compute it from the type. From a conceptual perspective it matters not how the class is stored internally (stored or computed) -- what matters is that in both cases x is still an object and still has a class.
Functions That all computation occurs through functions refers to the fact that even things that you might not expect to be functions are actually functions. For example when we write:
{ 1; 2 }
## [1] 2
if (pi > 0) 2 else 3
## [1] 2
1+2
## [1] 3
we are actually making invocations of the {, if and + functions:
`{`(1, 2)
## [1] 2
`if`(pi > 0, 2, 3)
## [1] 2
`+`(1, 2)
## [1] 3

expression vs call

What is the difference between an expression and a call?
For instance:
func <- expression(2*x*y + x^2)
funcDx <- D(func, 'x')
Then:
> class(func)
[1] "expression"
> class(funcDx)
[1] "call"
Calling eval with envir list works on both of them. But Im curious what is the difference between the two class, and under what circumstances should I use expression or call.
You should use expression when you want its capacity to hold more than one expression or call. It really returns an "expression list". The usual situation for the casual user of R is in forming arguments to ploting functions where the task is forming symbolic expressions for labels. R expression-lists are lists with potentially many items, while calls never are such. It's interesting that #hadley's Advanced R Programming suggests "you'll never need to use [the expression function]": http://adv-r.had.co.nz/Expressions.html. Parenthetically, the bquote function is highly useful, but has the limitation that it does not act on more than one expression at a time. I recently hacked a response to such a problem about parsing expressions and got the check, but I thought #mnel's answer was better: R selectively style plot axis labels
The strategy of passing an expression to the evaluator with eval( expr, envir= < a named environment or list>) is essentially another route to what function is doing. A big difference between expression and call (the functions) is that the latter expects a character object and will evaluate it by looking for a named function in the symbol table.
When you say that processing both with the eval "works", you are not saying it produces the same results, right? The D function (call) has additional arguments that get substituted and restrict and modify the result. On the other hand evaluation of the expression-object substitutes the values into the symbols.
There seem to be "levels of evaluation":
expression(mean(1:10))
# expression(mean(1:10))
call("mean" , (1:10))
# mean(1:10)
eval(expression(mean(1:10)))
# [1] 5.5
eval(call("mean" , (1:10)))
# [1] 5.5
One might have expected eval(expression(mean(1:10))) to return just the next level of returning a call object but it continues to parse the expression tree and evaluate the results. In order to get just the unevaluated function call to mean, I needed to insert a quote:
eval(expression(quote(mean(1:10))))
# mean(1:10)
From the documentation (?expression):
...an R expression vector is a list of calls, symbols etc, for example as returned by parse.
Notice:
R> class(func[[1]])
[1] "call"
When given an expression, D acts on the first call. If func were simply a call, D would work the same.
R> func2 <- substitute(2 * x * y + x^2)
R> class(func2)
[1] "call"
R> D(func2, 'x')
2 * y + 2 * x
Sometimes for the sake of consistency, you might need to treat both as expressions.
in this case as.expression comes in handy:
func <- expression(2*x*y + x^2)
funcDx <- as.expression(D(func, 'x'))
> class(func)
[1] "expression"
> class(funcDx)
[1] "expression"

What does the function ls() do for environment in R

I run the following code
sapply( 0:3, function(x){ls(envir = sys.frame(x))} )
And get the following result
[[1]]
[1] "mat" "mat_inverse"
[[2]]
[1] "FUN" "simplify" "USE.NAMES" "X"
[[3]]
[1] "FUN" "X"
[[4]]
[1] "x"
It seems like it lists all the objects in the current enclosing environment; I do have mat and mat_inverse as two variables. But I am not sure what it returns for [[2]], [[3]], [[4]]. Is there a way to debug this code to track what this code does? Especially the following part:
envir = sys.frame(x)
is very confusing to me.
sys.frame allows you to go back through the calling stack. sys.frame(0) is the beginning of the stack (your initial workspace, so to speak). sys.frame(1) is nested one level deep, sys.frame(2) is nested two levels deep etc.
This code is a good demonstration of what happens when you call sapply. It goes through four environments (numbered 0-3) and prints the objects in each. sapply is in fact a wrapper around lapply. What environments do you get when you actually call this code?
Environment 0 is the beginning, i.e., your entire workspace.
Environment 1 is sapply. Type sapply to see its code. You'll see that the function header has simplify, one of the variables you see in [[2]].
Environment 2 is lapply. Once again, type lapply to see its code; the function header contains FUN and X.
Environment 3 is the function you defined for sapply to run. It only has one variable, x.
As an experiment, run
sapply(0:3, function(x) { howdy = 5; ls(envir = sys.frame(x)) } )
The last line will change to [1] "howdy" "x", because you defined a new variable within that final environment (the function inside lapply inside sapply).

R object identity

is there a way to test whether two objects are identical in the R language?
For clarity: I do not mean identical in the sense of the identical function,
which compares objects based on certain properties like numerical values or logical values etc.
I am really interested in object identity, which for example could be tested using the is operator in the Python language.
UPDATE: A more robust and faster implementation of address(x) (not using .Internal(inspect(x))) was added to data.table v1.8.9. From NEWS :
New function address() returns the address in RAM of its argument. Sometimes useful in determining whether a value has been copied or not by R, programatically.
There's probably a neater way but this seems to work.
address = function(x) substring(capture.output(.Internal(inspect(x)))[1],2,17)
x = 1
y = 1
z = x
identical(x,y)
# [1] TRUE
identical(x,z)
# [1] TRUE
address(x)==address(y)
# [1] FALSE
address(x)==address(z)
# [1] TRUE
You could modify it to work on 32bit by changing 17 to 9.
You can use the pryr package.
For example, return the memory location of the mtcars object:
pryr::address(mtcars)
Then, for variables a and b, you can check:
address(a) == address(b)

Resources