I can't step into function calls or into the for loop in Rstudio.
for (i in seq_len(max(last))) {
r = normdata$.return[smpls[[i]]]
m = normmat(i)
for (j in which(i <= last)) {
x = lm.fit(cbind(intercept = 1, m[, avail[i, ] & include[last[j], ], drop = F]), r)
temp[[j]][[i]] = if (!calc.tstat)
cbind(Estimate = coef(x))
else {
terms = terms(as.formula(paste(fieldmap["return"], paste(names(coef(x)), collapse
=" + "), sep = " ~ ")))
coef(summary.lm(structure(modifyList(x, list(terms = terms)), class = "lm")))[,
1:2, drop = F]
}
}
}
f = function(l, i, flip) t(sapply(l[!sapply(l, is.null)],
function(x) setNames(x[, i][fields], fields) * (if (flip) sign else 1)))
When I put my cursur on the first for loop line and hit run, it ignores my breakpoints inside both for loops and exits to the f= function line. On the Console, I can see:
Browse[9]> max(last) [1] 131
Browse[9]> j
[1] 1
Browse[9]> i
[1] 1
Browse[9]> temp[[j]][[i]]
NULL
Browse[9]> coef(x)
intercept BY_FY0 CY_FY0 CY_NTM DY_NTM EBITDA_EV_FY0 EBITDA_EV_NTM EY_FY0 EY_NTM FUND_PB
-7.236841e-10 1.121348e-01 -6.650536e-02 1.198634e-02 -5.849855e-02 8.291955e-02 -3.586112e-02 -6.218132e-02 1.936980e-01 -1.064521e-01
IRR SAL_YIELD_NTM
-3.522072e-02 6.294885e-02
Browse[9]> calc.tstat
[1] FALSE
I'm expecting temp[[1]][[1]] is set to coef(x), as calc.tstat is FALSE. But it remains NULL.
Could someone shed me some light on how to wake up my Rstudio please? Or I need a wakeup call?
Without access to your normdata object we can't investigate your exact problem. Two general suggestions:
Before running your code, run this at the console:
compiler::enableJIT(0)
This occasionally makes RStudio's IL-to-source matching more accurate.
Instead of setting rstudio breakpoints, add them in code: add browser() calls where you wish to break.
Related
I'm having trouble understanding how/why parentheses work where they otherwise should not work®.
f = function(...) substitute(...()); f(a, b)
[[1]]
a
[[2]]
b
# but, substitute returns ..1
f2 = function(...) substitute(...); f2(a, b)
a
Normally an error is thrown, could not find function "..." or '...' used in an incorrect context, for example when calling (\(...) ...())(5).
What I've tried
I have looked at the source code of substitute to find out why this doesn't happen here. R Internals 1.1.1 and 1.5.2 says ... is of SEXPTYPE DOTSXP, a pairlist of promises. These promises are what is extracted by substitute.
# \-substitute #R
# \-do_substitute #C
# \-substituteList #C recursive
# \-substitute #C
Going line-by-line, I am stuck at substituteList, in which h is the current element of ... being processed. This happens recursively at line 2832 if (TYPEOF(h) == DOTSXP) h = substituteList(h, R_NilValue);. I haven't found exception handling of a ...() case in the source code, so I suspect something before this has happened.
In ?substitute we find substitute works on a purely lexical basis. Does it mean ...() is a parser trick?
parse(text = "(\\(...) substitute(...()))(a, b)") |> getParseData() |> subset(text == "...", select = c(7, 9))
#> token text
#> 4 SYMBOL_FORMALS ...
#> 10 SYMBOL_FUNCTION_CALL ...
The second ellipsis is recognized during lexical analysis as the name of a function call. It doesn't have its own token like |> does. The output is a pairlist ( typeof(f(a, b)) ), which in this case is the same as a regular list (?). I guess it is not a parser trick. But whatever it is, it has been around for a while!
Question:
How does ...() work?
Note: When referring to documentation and source code, I provide links to an unofficial GitHub mirror of R's official Subversion repository. The links are bound to commit 97b6424 in the GitHub repo, which maps to revision 81461 in the Subversion repo (the latest at the time of this edit).
substitute is a "special" whose arguments are not evaluated (doc).
typeof(substitute)
[1] "special"
That means that the return value of substitute may not agree with parser logic, depending on how the unevaluated arguments are processed internally.
In general, substitute receives the call ...(<exprs>) as a LANGSXP of the form (pseudocode) pairlist(R_DotsSymbol, <exprs>) (doc). The context of the substitute call determines how the SYMSXP R_DotsSymbol is processed. Specifically, if substitute was called inside of a function with ... as a formal argument and rho as its execution environment, then the result of
findVarInFrame3(rho, R_DotsSymbol, TRUE)
in the body of C utility substituteList (source) is either a DOTSXP or R_MissingArg—the latter if and only if f was called without arguments (doc). In other contexts, the result is R_UnboundValue or (exceptionally) some other SEXP—the latter if and only if a value is bound to the name ... in rho. Each of these cases is handled specially by substituteList.
The multiplicity in the processing of R_DotsSymbol is the reason why these R statements give different results:
f0 <- function() substitute(...(n = 1)); f0()
## ...(n = 1)
f1 <- function(...) substitute(...(n = 1)); f1()
## $n
## [1] 1
g0 <- function() {... <- quote(x); substitute(...(n = 1))}; g0()
## Error in g0() : '...' used in an incorrect context
g1 <- function(...) {... <- quote(x); substitute(...(n = 1))}; g1()
## Error in g1() : '...' used in an incorrect context
h0 <- function() {... <- NULL; substitute(...(n = 1))}; h0()
## $n
## [1] 1
h1 <- function(...) {... <- NULL; substitute(...(n = 1))}; h1()
## $n
## [1] 1
Given how ...(n = 1) is parsed, you might have expected f1 to return call("...", n = 1), both g0 and g1 to return call("x", n = 1), and both h0 and h1 to throw an error, but that is not the case for the above, mostly undocumented reasons.
Internals
When called inside of the R function f,
f <- function(...) substitute(...(<exprs>))
substitute evaluates a call to the C utility do_substitute—you can learn this by looking here—in which argList gets a LISTSXP of the form pairlist(x, R_MissingArg), where x is a LANGSXP of the form pairlist(R_DotsSymbol, <exprs>) (source).
If you follow the body of do_substitute, then you will find that the value of t passed to substituteList from do_substitute is a LISTSXP of the form pairlist(copy_of_x) (source).
It follows that the while loop inside of the substituteList call (source) has exactly one iteration and that the statement CAR(el) == R_DotsSymbol in the body of the loop (source) is false in that iteration.
In the false branch of the conditional (source), h gets the value
pairlist(substituteList(copy_of_x, env)). The loop exits and substituteList returns h to do_substitute, which in turn returns CAR(h) to R (source 1, 2, 3).
Hence the return value of substitute is substituteList(copy_of_x, env), and it remains to deduce the identity of this SEXP. Inside of this call to substituteList, the while loop has 1+m iterations, where m is the number of <exprs>. In the first iteration, the statement CAR(el) == R_DotsSymbol in the body of the loop is true.
In the true branch of the conditional (source), h is either a DOTSXP or R_MissingArg, because f has ... as a formal argument (doc). Continuing, you will find that substituteList returns:
R_NilValue if h was R_MissingArg in the first while iteration and m = 0,
or, otherwise,
a LISTSXP listing the expressions in h (if h was a DOTSXP in the first while iteration) followed by <exprs> (if m > 1), all unevaluated and without substitutions, because the execution environment of f is empty at the time of the substitute call.
Indeed:
f <- function(...) substitute(...())
is.null(f())
## [1] TRUE
f <- function(...) substitute(...(n = 1))
identical(f(a = sin(x), b = zzz), pairlist(a = quote(sin(x)), b = quote(zzz), n = 1))
## [1] TRUE
Misc
FWIW, it helped me to recompile R after adding some print statements to coerce.c. For example, I added the following before UNPROTECT(3); in the body of do_substitute (source):
Rprintf("CAR(t) == R_DotsSymbol? %d\n",
CAR(t) == R_DotsSymbol);
if (TYPEOF(CAR(t)) == LISTSXP || TYPEOF(CAR(t)) == LANGSXP) {
Rprintf("TYPEOF(CAR(t)) = %s, length(CAR(t)) = %d\n",
type2char(TYPEOF(CAR(t))), length(CAR(t)));
Rprintf("CAR(CAR(t)) = R_DotsSymbol? %d\n",
CAR(CAR(t)) == R_DotsSymbol);
Rprintf("TYPEOF(CDR(CAR(t))) = %s, length(CDR(CAR(t))) = %d\n",
type2char(TYPEOF(CDR(CAR(t)))), length(CDR(CAR(t))));
}
if (TYPEOF(s) == LISTSXP || TYPEOF(s) == LANGSXP) {
Rprintf("TYPEOF(s) = %s, length(s) = %d\n",
type2char(TYPEOF(s)), length(s));
Rprintf("TYPEOF(CAR(s)) = %s, length(CAR(s)) = %d\n",
type2char(TYPEOF(CAR(s))), length(CAR(s)));
}
which helped me confirm what was going into and coming out of the substituteList call on the previous line:
f <- function(...) substitute(...(n = 1))
invisible(f(hello, world, hello(world)))
CAR(t) == R_DotsSymbol? 0
TYPEOF(CAR(t)) = language, length(CAR(t)) = 2
CAR(CAR(t)) = R_DotsSymbol? 1
TYPEOF(CDR(CAR(t))) = pairlist, length(CDR(CAR(t))) = 1
TYPEOF(s) = pairlist, length(s) = 1
TYPEOF(CAR(s)) = pairlist, length(CAR(s)) = 4
invisible(substitute(...()))
CAR(t) == R_DotsSymbol? 0
TYPEOF(CAR(t)) = language, length(CAR(t)) = 1
CAR(CAR(t)) = R_DotsSymbol? 1
TYPEOF(CDR(CAR(t))) = NULL, length(CDR(CAR(t))) = 0
TYPEOF(s) = pairlist, length(s) = 1
TYPEOF(CAR(s)) = language, length(CAR(s)) = 1
Obviously, compiling R with debugging symbols and running R under a debugger helps, too.
Another puzzle
Just noticed this oddity:
g <- function(...) substitute(...(n = 1), new.env())
gab <- g(a = sin(x), b = zzz)
typeof(gab)
## [1] "language"
gab
## ...(n = 1)
Someone here can do another deep dive to find out why the result is a LANGSXP rather than a LISTSXP when you supply env different from environment() (including env = NULL).
I'm working on an R package where the same input-checking functions are called by multiple "actual" functions that are exported to users. If I use a simple stop() call, the error message is going to say that an error occurred in the input-checking function, which is not that useful...
I thought I'd get around this by wrapping the call to the input-checking function inside a tryCatch(), and then handling the error in the outer function. This does mostly what I want, but doesn't quite give the output that I'd like. The closest I've come is the following:
f <- function(i) {
tryCatch({
check_input(i)
}, error = function(e) stop("in f: ", e$message, call. = FALSE)
)
}
check_input <- function(i) {
if(i < 0)
stop("i is negative, value given was ", i)
}
f(-1)
# Error: in f: i is negative, value given was -1
Ideally, I'd like the error message to be
Error in f: i is negative, value given was -1
, which would be the case if stop were called within f() instead of check_input().
Here's how you can grab the name of the function from the call stack and paste it in to the error message
f <- function(i) {
check_input(i)
}
g <- function(i) {
check_input(i)
}
check_input <- function(i, from=deparse(sys.calls()[[sys.nframe()-1]][[1]])) {
getmsg <- function(m) ifelse(!is.null(from), paste0("in ", from, ": ", m), m)
if(i < 0)
stop(getmsg(paste0("i is negative, value given was ", i)), call. = FALSE)
}
f(-1)
# Error: in f: i is negative, value given was -1
g(-1)
# Error: in g: i is negative, value given was -1
You could also call check_input(i, from="otherfunction") to show whatever function name you want or check_input(i, from=NULL) to suppress the function name.
I am experimenting with the functional programming paradigm in R. I have defined a function that sums a sequence of integers from n to m. When I use sum() the function returns the expected result:
sumRange <- function(n, m) {
if (n <= m) {
return(sum(n, sumRange((n + 1), m)))
}
}
sumRange(1, 10)
# [1] 55
However, when I use the + operator the function returns numeric(0):
sumRange <- function(n, m) {
if (n <= m) {
return(n + sumRange((n + 1), m))
}
}
sumRange(1, 10)
# numeric(0)
Why does the operator + not work in this recursive function? Is there a way to rewrite the function so that it does?
The issue is that you never specify an else condition, hence at the end of the recursion it appears that R is returning NULL when the if condition fails. Returning 0 as the else condition fixes your problem:
sumRange <- function(n, m) return(ifelse (n <= m, (n + sumRange((n+1), m)), 0))
sumRange(1, 10)
[1] 55
Note that this is essentially defining a base case for your recursion. A base case, when hit, ends the recursion and causes the calls on the stack to be unwound.
To see the issue with the way you phrased your code, try writing out your function explicitly:
sumRange <- function(n, m) {
if (n <= m) {
return(n + sumRange((n+1), m))
}
// but what gets returned if n > m ?
// this is undefined behavior
}
I'm not an R guru, but my understanding is that R was written in C, and C might allow a recursion like this with no else condition. But the behavior is not well defined and you should not be relying on it.
Demo
If there is no return (using a explicit or implicit return statement) is executed, then R functions seems to return a NULL object.
If you add numerical value to a this object, it will simply return numeric(0).
So, what happens in the second case is that when n reaches 11, it returns a NULL object, and goes back adding values to it. But NULL + 10 + 9 .. = numeric(0).
Check this with
no_ret <- function ()
{
# just return nothing
}
obj <- no_ret()
obj
# NULL
class(obj)
# "NULL
new_obj <- obj + 10
new_obj
# numeric(0)
When the first function is executed, the what the sum statement get is
a vector with a NULL in it. For example,
vec <- c(NULL, 10, 9,...) which is actually vec <- c(10, 9, ...), so you get the expected outcome.
> c(NULL, 10:1)
[1] 10 9 8 7 6 5 4 3 2 1
> sum(NULL, 10:1)
[1] 55
> NULL + 10:1
integer(0)
Begin a new R session with an empty environment. Write a series of functions with a parameter that is to be used as the value of the times parameter in a call to rep().
f <- function(n) {
rep("hello", times = n)
}
f(x)
One expect this to fail, and indeed one gets:
# Error in f(x) : object 'x' not found
Modify the function a bit:
f2 <- function(n) {
ls.str()
rep("hello", times = n)
}
f2(x)
As expected, it still fails:
# Error in f2(x) : object 'x' not found
Modify a bit more (to see the environment in the console):
f3 <- function(n) {
print(ls.str())
rep("hello", times = n)
}
f3(x)
I still expect failure, but instead get:
## n : <missing>
## [1] "hello"
It is as if the call to print() makes rep work as though times were set to 1.
This is not an answer, but too long to post as a comment. A minimal reproducible example is:
f3 <- function(n) {
try(get("n", environment(), inherits=FALSE))
rep("hello", times = n)
}
f3(x)
## Error in get("n", environment(), inherits = FALSE) : object 'x' not found
## [1] "hello"
The following is speculative and based on loosely examining the source for do_rep. get starts the promise evaluation, but upon not finding the "missing" symbol appears to leave the promise partially unevaluated. rep, being a primitive, then attempts to operate on n without realizing that it is a partially evaluated promise and basically that leads implicitly to the assumption that 'n == 1'.
Also, this shows that the promise is in a weird state (have to use browser/debug to see it):
f3a <- function(n) {
try(get("n", environment(), inherits=FALSE))
browser()
rep("hello", times = n)
}
f3a(x)
## Error in get("n", environment(), inherits = FALSE) : object 'x' not found
## Called from: f3a(x)
# Browse[1]> (n)
## Error: object 'x' not found
## In addition: Warning message:
## restarting interrupted promise evaluation
## Browse[1]> c
## [1] "hello"
I received earlier today a report that the bug has been fixed in R-devel and R-patched.
The issue was that the test for missingness in the R sources did not consider the case of an interrupted promise evaluation. A fix has been committed by Luke Tierney and can be seen on GitHub.
f4 <- function(n) {
print('test')
print(ls.str())
print('end test')
rep("hello", times = n)
}
f4(x)
## [1] "test"
## n : <missing>
## [1] "end test"
## [1] "hello"
There's something within print.ls_str, from Frank's test on chat the follwing code exhibit the same problem:
f6 <- function(n) {
z = tryCatch(get("n", new.env(), mode = "any"), error = function(e) e)
rep("A", n)
}
Digging a little inside R source I found the following code
# define GET_VALUE(rval) \
/* We need to evaluate if it is a promise */ \
if (TYPEOF(rval) == PROMSXP) { \
PROTECT(rval); \
rval = eval(rval, genv); \
UNPROTECT(1); \
} \
\
if (!ISNULL(rval) && NAMED(rval) == 0) \
SET_NAMED(rval, 1)
GET_VALUE(rval);
break;
case 2: // get0(.)
if (rval == R_UnboundValue)
return CAD4R(args);// i.e. value_if_not_exists
GET_VALUE(rval);
break;
}
return rval;
}
#undef GET_VALUE
I'm quite surprised this compile properly, as far as I remember (my C is quite far behind) #define doesn't allow spaces between the # and define.
After digging for that, I'm wrong, from gcc doc:
Whitespace is also allowed before and after the `#'.
So there's probably something around this part of code, but that's above my head to pinpoint what exactly.
I'm new to R and have some trouble of understanding so called "envirionments" and way to use them properly. What I miss a lot in R language are static variables (like in Java).
I'm writing a program with couple of functions that will need to initialize during first run. To achieve this for each function I've created new environment which will be only accessed by this particular function (for example "f1" will be only accessed from inside "myfunction1").
What I don't like about my solution is that there is some additional code outside of function body and it's not too readable. Is there any simpler way to achieve the same? And if yes then it would be nice if you could provide me with modified example to show me how it works. Thank you.
f1 <- new.env()
f1$initialized <- FALSE
f1$o <- NULL
f1$length <- NULL
f1$compute
myfunction1 <- function(x) {
if(f1$initialized == FALSE){
f1$initialized <- TRUE
f1$compute <- 2*pi^2+3
}
if(is.null(f1$length) || f1$length!=length(x)){
f1$length <- length(x)
if(f1$length==2) {f1$o<-read.table("data_1.txt")}
else {f1$o<-read.table("data_2.txt")}
}
print("Lets print something!")
return(f1$o * f1$compute * x + 1000)
}
If you are familiar with Java then maybe using RefrenceClasses would be a good way to go. This seems to do what you are looking for:
myclass <- setRefClass('myclass', fields = list(initilized = 'logical',
o = 'data.frame',
len = 'numeric',
compute = 'numeric'))
#constructor
myclass$methods(initialize = function(initialized, len){
initilized <<- initialized
len <<- len
})
#method
myclass$methods(myfunction1 = function(x){
if(initilized == FALSE){
initilized <<- TRUE
compute <<- 2*pi^2+3
}
if(is.null(len) || len != length(x)){
len <<- length(x)
if(len==2) {o <<- read.table("data_1.txt")}
else {o <<- read.table("data_2.txt")}
}
print("Lets print something!")
return(o * compute * x + 1000)
})
obj <- myclass$new(FALSE, 0)
obj$myfunction1(2)
Check out ?ReferenceClasses for information on what's going on here (much more OOP styled and has some support for class inheritance, which sounds like what you want anyway).