What is the Most Rly Way for Lazy Conditional Evaluation - r

The case. I have a part of code like this: if (exists("mybooleanvar") & mybooleanvar) {statement1} else {statement2}. I expect that if the conditions are lazily (short-circuit) evaluated astatement1 will be run if mybooleanvar is not assigned and statement2 will be called if mybooleanvar does not exist or equals FALSE.
But in practice I am getting a runtime error showing that the value of mybooleanvar is acessed and compared to TRUE if exists("mybooleanvar") == FALSE. So the complete boolean evaluation takes place.
Of course the issue can be solved by enclosed if statements with outer ones evaluating exists() and inner ones - booleans. But I wonder what is the most Rly way to properly avoid evaluation of n'th members of conditional statement if the result becomes known despite the values of further statements.
For example statement1 & statement2 will be FALSE if statement1 == FALSE. statement1 | statement2 is TRUE if statement1 == TRUE and statement2 needs not to be checked (or at least this check can be switched off by something like compiler directive {$B-) in Delphi).

Here I would use && instead of &. They differ in two ways (cf. ?"&&"):
The shorter form performs elementwise comparisons ...
and:
The longer form evaluates left to right examining only the first element of each vector. Evaluation proceeds only until the result is determined. The longer form is appropriate for programming control-flow and typically preferred in if clauses.
Example:
foo <- function()
if (exists("x") && (x)) cat("is TRUE\n") else cat("not existing or FALSE\n")
x <- TRUE
foo()
x <- FALSE
foo()
rm(x)
foo()
More can be found in this post.

Related

why does as.vector deep copy a matrix?

Using top, I manually measured the following memory usages at the specific points designated in the comments of the following code block:
x <- matrix(rnorm(1e9),nrow=1e4)
#~15gb
gc()
# ~7gb after gc()
y <- as.vector(x)
gc()
#~15gb after gc()
It's pretty clear that rnorm(1e9) is a ~7gb vector that's then copied to create the matrix. gc() removes the original vector since it's not assigned to anything. as.vector(x) then coerces and copies the data to vector.
My question is, why can't these three objects all point to the same memory block (at least until one is modified)? Isn't a matrix really just a vector with some additional metadata?
This is in R version 3.6.2
edit: also tested in 4.0.3, same results.
The question you're asking is to the reasoning. That seems more suited for R-devel, and I am assuming the answer in return is "no one knows". The relevant function from R-source is the do_asvector function.
Going down the source code of a call to as.vector(matrix(...)), it is important to note that the default argument for mode is any. This translates to ANYSXP (see R internals). This lets us find the evil culprit (line 1524) of the copy-behaviour.
// source reference: do_asvector
...
if(type == ANYSXP || TYPEOF(x) == type) {
switch(TYPEOF(x)) {
case LGLSXP:
case INTSXP:
case REALSXP:
case CPLXSXP:
case STRSXP:
case RAWSXP:
if(ATTRIB(x) == R_NilValue) return x;
ans = MAYBE_REFERENCED(x) ? duplicate(x) : x; // <== evil culprit
CLEAR_ATTRIB(ans);
return ans;
case EXPRSXP:
case VECSXP:
return x;
default:
;
}
...
Going one step further, we can find the definition for MAYBE_REFERENCED in src/include/Rinternals.h, and by digging a bit we can find that it checks whether sxpinfo.named is equal to 0 (false) or not (true). What I am guessing here is that the assignment operator <- increments the sxpinfo.named counter and thus MAYBE_REFERENCED(x) returns TRUE and we get a duplicate (deep copy).
However, Is this behaviour necessary?
That is a great question. If we had given an argument to mode other than any or class(x) (same as our input class), we skip the duplicate line, and we continue down the function, until we hit a ascommon. So I dug a bit extra and took a look at the source code for ascommon, we can see that if we were to try and convert to list manually (setting mode = "list"), ascommon only calls shallowDuplicate.
// Source reference: ascommon
---
if ((type == LISTSXP) &&
!(TYPEOF(u) == LANGSXP || TYPEOF(u) == LISTSXP ||
TYPEOF(u) == EXPRSXP || TYPEOF(u) == VECSXP)) {
if (MAYBE_REFERENCED(v)) v = shallow_duplicate(v); // <=== ascommon duplication behaviour
CLEAR_ATTRIB(v);
}
return v;
}
---
So one could imagine that the call to duplicate in do_asvector could be replaced by a call to shallow_duplicate. Perhaps a "better safe than sorry" strategy was chosen when the code was originally implemented (prior to R-2.13.0 according to a comment in the source code), or perhaps there is a scenario in one of the types not handled by ascommon that requires a deep-copy.
For now I would test if the function does a deep-copy if we set mode='list' or pass the list without assignment. In either case it might not be a bad idea to send a follow-up question to the R-devel mailing list.
Edit: <- behaviour
I took the liberty to confirm my suspicion, and looked at the source code for <-. I previously stated that I assumed that <- incremented sxpinfo.named, and we can confirm this by looking at do_set (the c source code for <-). When assigning as x <- ... x is a SYMSXP, and this we can see that the source code calls INCREMENT_NAMED which in turn calls SET_NAMED(x, NAMED(X) + 1). So everything else equal we should see a copy behaviour for x <- matrix(...); y <- as.vector(x) while we shouldn't for y <- as.vector(matrix(...)).
At the final gc(), you have x pointing to a vector with a dim attribute, and y pointing to a vector without any dim attribute. The data is an intrinsic part of the object, it's not an attribute, so those two vectors have to be different.
If matrices had been implemented as lists, e.g.
x <- list(data = rnorm(1e9), dim = c(1e4, 1e5))
then a shallow copy would be possible, but that's not how it was done. You can read the details of the internal structure of objects in the R Internals manual. For the current release, that's here: https://cloud.r-project.org/doc/manuals/r-release/R-ints.html#SEXPs .
You may wonder why things were implemented this way. I suspect it's intended to be efficient for the common use cases. Converting a matrix to a vector isn't generally necessary (you can treat x as a vector already, e.g. x[100000] and y[100000] will give the same value), so there's no need for "convert to vector" to be efficient. On the other hand, extracting elements is very common, so you don't want to have an extra pointer dereference slowing that down.

declaration of variables in R

I have a problem of using a variable in R Studio. My code is as following. "child_birth" is a vector composed of 49703 strings that indicates some information about the birth of childre. What I did here is to tell whether the last 7 characters in each element of the vector is "at home". So I used a for loop and an if statement. if it is "at home", then the corresponding element in vector "GetValue" will be TRUE.
forloop <- (1:49703)
for (i in forloop){
temp <- child_birth[i]
if (substr(temp, nchar(temp)-6, nchar(temp)) == "at home" ) {
GetValue[i] = TRUE
}
else{ GetValue[i] = FALSE }
}
I googled it to make sure that in R I don't need to do a predecalration before using a variable. but when I ran the code above, I got the error information:" Error: object 'GetValue' not found". So what's the problem with it?
Thank you!
GetValue[i] only makes sense if GetValue (and i) exist. Compare: x+i only makes sense if x and i exist, which has nothing to do with whether or not x and i must be declared before being used.
In this case, you need to define GetValue before the loop. I recommend
GetValue <- logical(length(child_birth))
so as to allocate enough space. In this case, you could drop the else clause completely since the default logical value is FALSE.
I also recommend dropping the variable forloop and using
for(i in seq_along(child_birth))
Why hard-wire in the magic number 49703? Such numbers are subject to change. If you put them explicitly in the code, you are setting yourself up for future bugs.

return value of if statement in r

So, I'm brushing up on how to work with data frames in R and I came across this little bit of code from https://cloud.r-project.org/web/packages/data.table/vignettes/datatable-intro.html:
input <- if (file.exists("flights14.csv")) {
"flights14.csv"
} else {
"https://raw.githubusercontent.com/Rdatatable/data.table/master/vignettes/flights14.csv"
}
Apparently, this assigns the strings (character vectors?) in the if and else statements to input based on the conditional. How is this working? It seems like magic. I am hoping to find somewhere in the official R documentation that explains this.
From other languages I would have just done:
if (file.exists("flights14.csv")) {
input <- "flights14.csv"
} else {
input <- "https://raw.githubusercontent.com/Rdatatable/data.table/master/vignettes/flights14.csv"
}
or in R there is ifelse which also seems designed to do exactly this, but somehow that first example also works. I can memorize that this works but I'm wondering if I'm missing the opportunity to understand the bigger picture about how R works.
From the documentation on the ?Control help page under "Value"
if returns the value of the expression evaluated, or NULL invisibly if none was (which may happen if there is no else).
So the if statement is kind of like a function that returns a value. The value that's returned is the result of either evaulating the if or the then block. When you have a block in R (code between {}), the brackets are also like a function that just return the value of the last expression evaluated in the block. And a string literal is a valid expression that returns itself
So these are the same
x <- "hello"
x <- {"hello"}
x <- {"dropped"; "hello"}
x <- if(TRUE) {"hello"}
x <- if(TRUE) {"dropped"; "hello"}
x <- if(TRUE) {"hello"} else {"dropped"}
And you only really need blocks {} with if/else statements when you have more than one expression to run or when spanning multiple lines. So you could also do
x <- if(TRUE) "hello" else "dropped"
x <- if(FALSE) "dropped" else "hello"
These all store "hello" in x
You are not really missing anything about the "big picture" in R. The R if function is atypical compared both to other languages as well as to R's typical behavior. Unlike most functions in R which do require assignment of their output to a "symbol", i.e a proper R name, if allows assignments that occur within its consequent or alternative code blocks to occur within the global environment. Most functions would return only the final evaluation, while anything else that occurred inside the function body would be garbage collected.
The other common atypical function is for. R for-loops only
retain these interior assignments and always return NULL. The R Language Definition calls these atypical R functions "control structures". See section 3.3. On my machine (and I suspect most Linux boxes) that document is installed at: http://127.0.0.1:10731/help/doc/manual/R-lang.html#Control-structures. If you are on another OS then there is probably a pulldown Help menu in your IDE that will have a pointer to it. Thew help document calls them "control flow constructs" and the help page is at ?Control. Note that it is necessary to quote these terms when you wnat to access that help page using one of those names since they are "reserved words". So you would need ?'if' rather than typing ?if. The other reserved words are described in the ?Reserved page.
?Control
?'if' ; ?'for'
?Reserved
# When you just type:
?if # and hit <return>
# you will see a "+"-sign which indicateds an incomplete expression.
# you nthen need to hit <escape> to get back to a regular R interaction.
In R, functions don't need explicit return. If not specified the last line of the function is automatically returned. Consider this example :
a <- 5
b <- 1
result <- if(a == 5) {
a <- a + 1
b <- b + 1
a
} else {b}
result
#[1] 6
The last line in if block was saved in result. Similarly, in your case the string values are "returned" implicitly.

Prolog - Recursive function always returning false value

I am used to implement a recursive function that checks if a given list L is written in a reverse-order:
orderIsReverse(L):-
[X|Q]=L,
[XP|_]=Q,
(X<XP -> false; orderIsReverse(Q)),
true.
However after compiling the code and prompting orderIsReverse([3,2,1]) within SWI Prolog, I get false returned.
What's wrong with the code?
You need to handle the case when the input list is empty (and also when it contains one single element, as you need two for a comparison).
orderIsReverse([X1,X2|L]):-
X1 > X2, orderIsReverse([X2|L]).
orderIsReverse([_]).
orderIsReverse([]).
Update: fixed the logic.

Why the "=" R operator should not be used in functions?

The manual states:
The operator ‘<-’ can be used anywhere,
whereas the operator ‘=’ is only allowed at the top level (e.g.,
in the complete expression typed at the command prompt) or as one
of the subexpressions in a braced list of expressions.
The question here mention the difference when used in the function call. But in the function definition, it seems to work normally:
a = function ()
{
b = 2
x <- 3
y <<- 4
}
a()
# (b and x are undefined here)
So why the manual mentions that the operator ‘=’ is only allowed at the top level??
There is nothing about it in the language definition (there is no = operator listed, what a shame!)
The text you quote says at the top level OR in a braced list of subexpressions. You are using it in a braced list of subexpressions. Which is allowed.
You have to go to great lengths to find an expression which is neither toplevel nor within braces. Here is one. You sometimes want to wrap an assignment inside a try block: try( x <- f() ) is fine, but try( x = f(x) ) is not -- you need to either change the assignment operator or add braces.
Expressions not at the top level include usage in control structures like if. For example, the following programming error is illegal.
> if(x = 0) 1 else x
Error: syntax error
As mentioned here: https://stackoverflow.com/a/4831793/210673
Also see http://developer.r-project.org/equalAssign.html
Other than some examples such as system.time as others have shown where <- and = have different results, the main difference is more philisophical. Larry Wall, the creater of Perl, said something along the lines of "similar things should look similar, different things should look different", I have found it interesting in different languages to see what things are considered "similar" and which are considered "different". Now for R assignment let's compare 2 commands:
myfun( a <- 1:10 )
myfun( a = 1:10 )
Some would argue that in both cases we are assigning 1:10 to a so what we are doing is similar.
The other argument is that in the first call we are assigning to a variable a that is in the same environment from which myfun is being called and in the second call we are assigning to a variable a that is in the environment created when the function is called and is local to the function and those two a variables are different.
So which to use depends on whether you consider the assignments "similar" or "different".
Personally, I prefer <-, but I don't think it is worth fighting a holy war over.

Resources