Parsing of anonymous function evaluation in R - r

This may be related to lazy evaluation, but it was definitely not what I was expecting. It may also be related to the behavior that makes an implicit semi-colon, where possible, at the end of each input line.
> (function(x) x * 10)(10)
[1] 100
expected.
> function(x) x * 10
function(x) x * 10
expected.
> (function(x) x * 10)
function(x) x * 10
here he strips off the () and evaluates, assuming that there is no actuals to follow
> (function(x) x * 10
+ )(10)
[1] 100
omit the closing ) and he waits for it and the actuals.
> (function(x) x * 10)
function(x) x * 10
> (10)
[1] 10
but not if the ) is on the same line.
> (function(x) x * 10)(
+ 10)
[1] 100
but the ( for the actuals can go on the first line and the behavior is as expected.
I would have expected the parser to recognize an anonymous function call (f(x)) in progress and hold off evaluation until the arguments have been specified. It will do this if the last thing on the line is an (, but not if the ( is the first thing in the following line.
So basically these )( have to be together on the first line or the second line in order for it to recognize an anonymous call in progress. Sort of argues for explicit ; termination ala perl, etc.
This example is somewhat trivial for exposition. One could put it all on one line and have done with it. The problem is not as trivial when trying to maintain a functional programming style where the argument is not 10 but the output of another anonymous function (or several).

Nothing to do with lazy evaluation. That's just how the interactive read process is set up. It waits, not until functions have their arguments specified, but rather until there is a complete expression that will return a value. Since a function is a legitimate value, it returns that. The action is different when source-ing from afile. That action has no read-eval-print. It's more a parse-eval-act loop. (I thought this was in the R-FAQ, but I've failed to find it so far. Until I can find a better reference, I'd refer you to ?source, where the differences between file handling and command line handling is discussed.) If you want to establish a style that avoids this ambiguity, then use "{" right after the argument list specification.( I think it gives you more informative error messages when you screw up.)

Related

Finding limits at infinity in R

I am doing an R code to evaluate limits.
I am not sure if my code even works I just run it and then it doesn't give anything and R stop debugging code / gets stuck, nothing works right after not even print statements. fn is supposed to be any function and tol is the error tolrence, I want to stop the program when the consective terms difference is less than 1e-6. I have to restart R and I always get the message "R session is currently busy" when I try to close R studio
lim<-function(funx,tol=1e-6){
n<-1
while(TRUE){
n<-n+1
term<-funx
next_term<-term+funx
if(abs(term-next_term)<tol){
break
}
}
return(term)
}
n<-1
fn<-(1/5)**n
lim(fn)
You made some mistakes in your program. For one, you always add the same number (funx) which will always be 0.20 and never smaller than the tolerance, so you get an endless loop.
If you want to call a function each time, you have to define this function and pass it to the lim() function. Otherwise, you just define fn as 0.20 and pass it as a double value to the function. It will never change.
If you want to find the limes of (1/5)^n, you can do it like that:
lim = function(f,x=1,tol=0.0001){
next.diff=tol
while(next.diff>=tol){
next.diff = abs(f(x)-f(x+1))
x = x + 1
}
return(list("Iterations"=x,"Limit"=f(x),"Next Value"=f(x+1)))
}
my.fun = function(x){(1/5)^x}
lim(my.fun,1,1e-6)
It wil lthen call the function for inceasing values of x and abort the loop as soon as the tolerance is reached. In this example:
> lim(my.fun,1,1e-6)
$Iterations
[1] 10
$Limit
[1] 1.024e-07
$`Next Value`
[1] 2.048e-08
So, at (1/5)^10 you already reach a value where the next iteration is closer than your tolerance. It's safe to say that it would converge to 0.
You can define any function of a value x and pass it to this lim function with a starting value for x and a tolerance level.
EDIT: For the limes of sqrt(x+1)-sqrt(x), you would just have to define a new function of x (or of n, if you wish) and pass it to lim():
> fun2 = function(x){sqrt(x+1)-sqrt(x)}
> lim(fun2,1,1e-6)
$Iterations
[1] 3969
$Limit
[1] 0.007936008
$`Next Value`
[1] 0.007935009
It's unclear as to what you really want to find out here, but as far as I understood, you want to see where the sequence (not a function) (and of the type a^n) converges. Well, if that is the case, then you need to change your code to something like this:
lim<-function(a,tol=1e-6)
{
n<-1
repeat
{
term<-a^n;next_term<-a^(n+1)
if(abs(term-next_term)<tol) break
n<-n+1
}
return(term)
}
Ok so here's what I did:
I assumed that the sequence you input is of the form a^n where a is a constant term, and n increase on the set of natural numbers
I defined the value of n initially inside the loop (why? cause I want to iterate over all the possible values of n, one-by-one)
Then I defined the first term of the sequence (named as term). As assumed, it's a^n initially. So the next term (a.k.a. next_term in my code) should be a^(n+1).
Now take their absolute difference. If it satisfies the condition, break out from the loop. Else, increase the value of n by 1 and let the loop run once again.
Then finally, return the value of the term. That's all...
I hope you will now be able to understand where you went wrong. Your approach was similar, but the code was of something else.
Remember, in this code, you don't need to enter the value of n separately while calling the function.
Here's what it returned:
> lim(1/5)
[1] 5.12e-07
> fn<-1/12
> lim(fn)
[1] 3.34898e-07

return value of if statement in r

So, I'm brushing up on how to work with data frames in R and I came across this little bit of code from https://cloud.r-project.org/web/packages/data.table/vignettes/datatable-intro.html:
input <- if (file.exists("flights14.csv")) {
"flights14.csv"
} else {
"https://raw.githubusercontent.com/Rdatatable/data.table/master/vignettes/flights14.csv"
}
Apparently, this assigns the strings (character vectors?) in the if and else statements to input based on the conditional. How is this working? It seems like magic. I am hoping to find somewhere in the official R documentation that explains this.
From other languages I would have just done:
if (file.exists("flights14.csv")) {
input <- "flights14.csv"
} else {
input <- "https://raw.githubusercontent.com/Rdatatable/data.table/master/vignettes/flights14.csv"
}
or in R there is ifelse which also seems designed to do exactly this, but somehow that first example also works. I can memorize that this works but I'm wondering if I'm missing the opportunity to understand the bigger picture about how R works.
From the documentation on the ?Control help page under "Value"
if returns the value of the expression evaluated, or NULL invisibly if none was (which may happen if there is no else).
So the if statement is kind of like a function that returns a value. The value that's returned is the result of either evaulating the if or the then block. When you have a block in R (code between {}), the brackets are also like a function that just return the value of the last expression evaluated in the block. And a string literal is a valid expression that returns itself
So these are the same
x <- "hello"
x <- {"hello"}
x <- {"dropped"; "hello"}
x <- if(TRUE) {"hello"}
x <- if(TRUE) {"dropped"; "hello"}
x <- if(TRUE) {"hello"} else {"dropped"}
And you only really need blocks {} with if/else statements when you have more than one expression to run or when spanning multiple lines. So you could also do
x <- if(TRUE) "hello" else "dropped"
x <- if(FALSE) "dropped" else "hello"
These all store "hello" in x
You are not really missing anything about the "big picture" in R. The R if function is atypical compared both to other languages as well as to R's typical behavior. Unlike most functions in R which do require assignment of their output to a "symbol", i.e a proper R name, if allows assignments that occur within its consequent or alternative code blocks to occur within the global environment. Most functions would return only the final evaluation, while anything else that occurred inside the function body would be garbage collected.
The other common atypical function is for. R for-loops only
retain these interior assignments and always return NULL. The R Language Definition calls these atypical R functions "control structures". See section 3.3. On my machine (and I suspect most Linux boxes) that document is installed at: http://127.0.0.1:10731/help/doc/manual/R-lang.html#Control-structures. If you are on another OS then there is probably a pulldown Help menu in your IDE that will have a pointer to it. Thew help document calls them "control flow constructs" and the help page is at ?Control. Note that it is necessary to quote these terms when you wnat to access that help page using one of those names since they are "reserved words". So you would need ?'if' rather than typing ?if. The other reserved words are described in the ?Reserved page.
?Control
?'if' ; ?'for'
?Reserved
# When you just type:
?if # and hit <return>
# you will see a "+"-sign which indicateds an incomplete expression.
# you nthen need to hit <escape> to get back to a regular R interaction.
In R, functions don't need explicit return. If not specified the last line of the function is automatically returned. Consider this example :
a <- 5
b <- 1
result <- if(a == 5) {
a <- a + 1
b <- b + 1
a
} else {b}
result
#[1] 6
The last line in if block was saved in result. Similarly, in your case the string values are "returned" implicitly.

Where are function constants stored if a function is created inside another function?

I am using a parent function to generate a child function by returning the function in the parent function call. The purpose of the parent function is to set a constant (y) in the child function. Below is a MWE. When I try to debug the child function I cannot figure out in which environment the variable is stored in.
power=function(y){
return(function(x){return(x^y)})
}
square=power(2)
debug(square)
square(3)
debugging in: square(3)
debug at #2: {
return(x^y)
}
Browse[2]> x
[1] 3
Browse[2]> y
[1] 2
Browse[2]> ls()
[1] "x"
Browse[2]> find('y')
character(0)
If you inspect the type of an R function, you’ll observe the following:
> typeof(square)
[1] "closure"
And that is, in fact, exactly the answer to your question: a closure is a function that carries an environment around.
R also tells you which environment this is (albeit not in a terribly useful way):
> square
function(x){return(x^y)}
<environment: 0x7ffd9218e578>
(The exact number will differ with each run — it’s just a memory address.)
Now, which environment does this correspond to? It corresponds to a local environment that was created when we executed power(2) (a “stack frame”). As the other answer says, it’s now the parent environment of the square function (in fact, in R every function, except for certain builtins, is associated with a parent environment):
> ls(environment(square))
[1] "y"
> environment(square)$y
[1] 2
You can read more about environments in the chapter in Hadley’s Advanced R book.
Incidentally, closures are a core feature of functional programming languages. Another core feature of functional languages is that every expression is a value — and, by implication, a function’s (return) value is the value of its last expression. This means that using the return function in R is both unnecessary and misleading!1 You should therefore leave it out: this results in shorter, more readable code:
power = function (y) {
function (x) x ^ y
}
There’s another R specific subtlety here: since arguments are evaluated lazily, your function definition is error-prone:
> two = 2
> square = power(two)
> two = 10
> square(5)
[1] 9765625
Oops! Subsequent modifications of the variable two are reflected inside square (but only the first time! Further redefinitions won’t change anything). To guard against this, use the force function:
power = function (y) {
force(y)
function (x) x ^ y
}
force simply forces the evaluation of an argument name, nothing more.
1 Misleading, because return is a function in R and carries a slightly different meaning compared to procedural languages: it aborts the current function exectuion.
The variable y is stored in the parent environment of the function. The environment() function returns the current environment, and we use parent.env() to get the parent environment of a particular environment.
ls(envir=parent.env(environment())) #when using the browser
The find() function doesn't seem helpful in this case because it seems to only search objects that have been attached to the global search path (search()). It doesn't try to resolve variable names in the current scope.

Confused by ...()?

In another question, sapply(substitute(...()), as.character) was used inside a function to obtain the names passed to the function. The as.character part sounds fine, but what on earth does ...() do?
It's not valid code outside of substitute:
> test <- function(...) ...()
> test(T,F)
Error in test(T, F) : could not find function "..."
Some more test cases:
> test <- function(...) substitute(...())
> test(T,F)
[[1]]
T
[[2]]
F
> test <- function(...) substitute(...)
> test(T,F)
T
Here's a sketch of why ...() works the way it does. I'll fill in with more details and references later, but this touches on the key points.
Before performing substitution on any of its components, substitute() first parses an R statement.
...() parses to a call object, whereas ... parses to a name object.
... is a special object, intended only to be used in function calls. As a consequence, the C code that implements substitution takes special measures to handle ... when it is found in a call object. Similar precautions are not taken when ... occurs as a symbol. (The relevant code is in the functions do_substitute, substitute, and substituteList (especially the latter two) in R_SRCDIR/src/main/coerce.c.)
So, the role of the () in ...() is to cause the statement to be parsed as a call (aka language) object, so that substitution will return the fully expanded value of the dots. It may seem surprising that ... gets substituted for even when it's on the outside of the (), but: (a) calls are stored internally as list-like objects and (b) the relevant C code seems to make no distinction between the first element of that list and the subsequent ones.
Just a side note: for examining behavior of substitute or the classes of various objects, I find it useful to set up a little sandbox, like this:
f <- function(...) browser()
f(a = 4, 77, B = "char")
## Then play around within the browser
class(quote(...)) ## quote() parses without substituting
class(quote(...()))
substitute({...})
substitute(...(..., X, ...))
substitute(2 <- (makes * list(no - sense))(...))

Why the "=" R operator should not be used in functions?

The manual states:
The operator ‘<-’ can be used anywhere,
whereas the operator ‘=’ is only allowed at the top level (e.g.,
in the complete expression typed at the command prompt) or as one
of the subexpressions in a braced list of expressions.
The question here mention the difference when used in the function call. But in the function definition, it seems to work normally:
a = function ()
{
b = 2
x <- 3
y <<- 4
}
a()
# (b and x are undefined here)
So why the manual mentions that the operator ‘=’ is only allowed at the top level??
There is nothing about it in the language definition (there is no = operator listed, what a shame!)
The text you quote says at the top level OR in a braced list of subexpressions. You are using it in a braced list of subexpressions. Which is allowed.
You have to go to great lengths to find an expression which is neither toplevel nor within braces. Here is one. You sometimes want to wrap an assignment inside a try block: try( x <- f() ) is fine, but try( x = f(x) ) is not -- you need to either change the assignment operator or add braces.
Expressions not at the top level include usage in control structures like if. For example, the following programming error is illegal.
> if(x = 0) 1 else x
Error: syntax error
As mentioned here: https://stackoverflow.com/a/4831793/210673
Also see http://developer.r-project.org/equalAssign.html
Other than some examples such as system.time as others have shown where <- and = have different results, the main difference is more philisophical. Larry Wall, the creater of Perl, said something along the lines of "similar things should look similar, different things should look different", I have found it interesting in different languages to see what things are considered "similar" and which are considered "different". Now for R assignment let's compare 2 commands:
myfun( a <- 1:10 )
myfun( a = 1:10 )
Some would argue that in both cases we are assigning 1:10 to a so what we are doing is similar.
The other argument is that in the first call we are assigning to a variable a that is in the same environment from which myfun is being called and in the second call we are assigning to a variable a that is in the environment created when the function is called and is local to the function and those two a variables are different.
So which to use depends on whether you consider the assignments "similar" or "different".
Personally, I prefer <-, but I don't think it is worth fighting a holy war over.

Resources